Báo cáo hệ thống cơ điện tử thông minh_ Đề tài :Defect classification of bamboo strips based on convolutional neural network

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	12
Dung lượng	864 KB
File đính kèm	Surface inspection and classification.zip (4 MB)

Nội dung

tài liêu bảo gồm file Doc + PPT sv lncs Defect classification of bamboo strips based on convolutional neural network Hong Hai Hoang1, Trinh Hoang Hieu1 1School of Mechanical Engineering, Hanoi University of Science and Technology,. Abstract. Bamboo material becomes more popular in manufacturing a wide range of furniture, the requirement of bamboo strip goes up. Due to the difficult in inspecting bamboo strip manually by human vision system, the paper proposes a new approach of classify bamboo strips’ surface texture based on our modified neural network architecture. Bottlenecked layers are built by stacking 1x1 and 3x3 Separable Convolution layers, and with skip connection, we build the network similar with resnet architecture. Skip connections are also added between blocks, in the feedforward fashion. Comparative experiments of our modified network and several networks have been carried out on a set of bamboo strips images, provides the advantages of our network, which are decreasing vanishinggradient problem, encouraging features spreading and reusing, and reducing number of parameters and FLOPs. Keywords: Classification, Bamboo strip, convolution neural network. 1. Introduction Traditionally, bamboo material is used to make primitive and simple product like chopsticks, chair and table. In recent year, great improvements of bamboo strip processing lead to the extent of bamboo product ranges as well as quality, which greatly improve market value for the green product. Based on color, fiber structure, good strength, bamboo product is gradually considered a green alternative one for wood to manufacture furniture and floor. Bamboo production industry plays an important role in economic, agricultural and tradition for country. Because of being green product, bamboo material is heavily affected by termite, weevil, other worms, and by damage or by error while cutting. For the defect in the surface of bamboo strips, such as wormeaten hole, mildews, cracks, tilted cut sides, traditionally be classified manually by visual judgement of human experience during defect inspection. Error on manual visual inspection raise up along with tiredness and boredom of workers. Many approaches to automatic online inspection have been developed in the wood defects by computer vision technology. Olli Silv´en, Matti Niskanen, Hannu Kauppinen 1 used a nonsupervised clustering method to detect wood surface defects. Xingguang Qi, Xiaoting Li, Hailun Zhang 2 introduced a surface defect detection based on blob algorithm. Q. Xiansheng, H. Feng, L. Qiong and S. Xin 3 made a online defect inspection algorithm of bamboo strip based on computer vision. Mean filtering with edge detection then texture filtering are applied to detect big and small defect with good point of accuracy. Xuanyin Wang, Dongtai Liang, Weiyan Deng 4 introduced a great method of surface grading using multiscale color texture features in Eigen space, which showed quite decent accuracy. Recent researches in deep learning field with convolution neural network (CNNs), a majority of problems in machine visions are solved that were difficult in the traditional computer visions methods. Alex Krizhevsky 5 first introduced CNN with AlexNet that trained with the difficult ImageNet dataset, solved the classification task with top1 and top5 error rates of 37.5% and 17.0%, which is by far better than previous works. Resnet 6 and Highway Networks 7 introduce the idea that ‘vanishing gradient’ problem could be taclked by bypassing signal from layer to layer by connections. Deeper architectures created using skip connections; show better accuracy; DenseNet 8 utilizes these connections by creating short paths between every layer in its architecture. Xception 9 and MobileNet 10 present Separable Convolutional layer, which is significantly reduce computational cost, comparing to traditional Convolutional layer, with only a small return of accuracy. Our target is making a network that performs defect classification between good and bad bamboo strips, which lower prediction time while maintain decent value of accuracy. In our first contribution, we show and compare several recent networks to classify bamboo strips. In our second contribution, we propose a modified network based on our original Resnet architecture, named MResnet with the aim of increasing performance. We compare this approach to the formal networks to show that the model has lower prediction time while keeps decent accuracy. This paper is organized in the following context: the section II shows the work from grabbing images to making prediction base on several recent networks. Our modified network is shown in section III, while results, discussion and conclusion are detailed later in last section.

Defect classification of bamboo strips based on convolutional neural network Hong-Hai Hoang1*, Trinh Hoang Hieu1 School of Mechanical Engineering, Hanoi University of Science and Technology, Vietnam Email:hai.hoanghong@hust.edu.vn Abstract Bamboo material becomes more popular in manufacturing a wide range of furniture, the requirement of bamboo strip goes up Due to the difficult in inspecting bamboo strip manually by human vision system, the paper proposes a new approach of classify bamboo strips’ surface texture based on our modified neural network architecture Bottlenecked layers are built by stacking 1x1 and 3x3 Separable Convolution layers, and with skip connection, we build the network similar with resnet architecture Skip connections are also added between blocks, in the feed-forward fashion Comparative experiments of our modified network and several networks have been carried out on a set of bamboo strips images, provides the advantages of our network, which are decreasing vanishing-gradient problem, encouraging features spreading and reusing, and reducing number of parameters and FLOPs Keywords: Classification, Bamboo strip, convolution neural network Introduction Traditionally, bamboo material is used to make primitive and simple product like chopsticks, chair and table In recent year, great improvements of bamboo strip processing lead to the extent of bamboo product ranges as well as quality, which greatly improve market value for the green product Based on color, fiber structure, good strength, bamboo product is gradually considered a green alternative one for wood to manufacture furniture and floor Bamboo production industry plays an important role in economic, agricultural and tradition for country Because of being green product, bamboo material is heavily affected by termite, weevil, other worms, and by damage or by error while cutting For the defect in the surface of bamboo strips, such as worm-eaten hole, mildews, cracks, tilted cut sides, traditionally be classified manually by visual judgement of human experience during defect inspection Error on manual visual inspection raise up along with tiredness and boredom of workers Many approaches to automatic online inspection have been developed in the wood defects by computer vision technology Olli Silvén, Matti Niskanen, Hannu Kauppinen [1] used a non-supervised clustering method to detect wood surface defects Xingguang Qi, Xiaoting Li, Hailun Zhang [2] introduced a surface defect detection based on blob algorithm Q Xiansheng, H Feng, L Qiong and S Xin [3] made a online defect inspection algorithm of bamboo strip based on computer vision Mean filtering with edge detection then texture filtering are applied to detect big and small defect with good point of accuracy Xuanyin Wang, Dongtai Liang, Weiyan Deng [4] introduced a great method of surface grading using multi-scale color texture features in Eigen space, which showed quite decent accuracy Recent researches in deep learning field with convolution neural network (CNNs), a majority of problems in machine visions are solved that were difficult in the traditional computer visions methods Alex Krizhevsky [5] first introduced CNN with AlexNet that trained with the difficult ImageNet dataset, solved the classification task with top-1 and top-5 error rates of 37.5% and 17.0%, which is by far better than previous works Resnet [6] and Highway Networks [7] introduce the idea that ‘vanishing gradient’ problem could be taclked by by-passing signal from layer to layer by connections Deeper architectures created using skip connections; show better accuracy; DenseNet [8] utilizes these connections by creating short paths between every layer in its architecture Xception [9] and MobileNet [10] present Separable Convolutional layer, which is significantly reduce computational cost, comparing to traditional Convolutional layer, with only a small return of accuracy Our target is making a network that performs defect classification between good and bad bamboo strips, which lower prediction time while maintain decent value of accuracy In our first contribution, we show and compare several recent networks to classify bamboo strips In our second contribution, we propose a modified network based on our original Resnet architecture, named M-Resnet with the aim of increasing performance We compare this approach to the formal networks to show that the model has lower prediction time while keeps decent accuracy This paper is organized in the following context: the section II shows the work from grabbing images to making prediction base on several recent networks Our modified network is shown in section III, while results, discussion and conclusion are detailed later in last section Related Works CNNs have been proved to be extremely efficient in a wide range of computer vision area, and classification is one of the branches that attracts many scientific effort Many architectures have been created with the aim to improve performance, increase accuracy and lower prediction time AlexNet and VGG network [11], by its simple architecture, reinforced the notion that convolutional neural networks have to have a deep network of layers in order for this hierarchical representation of visual data to work As layers go deeper, a newly problem – vanishing gradient - appears, gradient through layer to layer is smaller and smaller, at some point that learning is no longer effective Highway Networks, inspired by LSTM [12] is the first to introduce concept of gating mechanisms to regulate information flow, creating more than 100 layers The gating mechanisms allow neural networks to have paths for information to follow across different layers ("information highways"), using and spreading knowledge from previous layers Resnet exploits this idea, creates an architecture with 152 layers and won ILSVRC 2015 with an incredible error rate of 3.6% For the task of improving prediction time, Depthwise Separable Convolution is introduced in Xception as a (Inception) module and places throughout the deep learning architecture Depthwise Separable Convolution is made by a depthwise convolution (channel-wise DKxDK spatial convolution) followed by a pointwise convolution (1x1 convolution) to change the dimension With DKxDK = 3x3, Depthwise Separable Convolution lowers computation effort by to times, with only a small reduction in accuracy MobileNet adapts the idea and builds network architecture that even outperforms Squeezenet and AlexNet while the multi-adds and parameters are much fewer Material and Methods 3.1 Images acquisition of bamboo strips To get the better discrimination effectiveness and classification accuracy, a set of images of bamboo strips are used in the experiment with the ground truth of classes obtain from experts manually All of the images were acquired in this step by using an Area Color camera (BASLER- acA1600 – 20um, 20 frames per second) with a lens of 50-mm focal length (TAMRON), a frame grabber and a PC The camera was fixed above the bamboo strip and set focus on the surface Because of the important of keeping light in an undisturbed environment, a circular LED light was fixed above the bamboo strips The type of lights are effective against refection and shadow, as well as the disturb light from environment The size of images acquired is 250x500 pixels Fig.1 Examples of bamboo strips images 3.2 Methods a Preparing dataset The bamboo images dataset is first manually classified into two classes by experts: - Training set: 4432 images + Good (without defect): 2224 images + Bad (with defect): 2208 images - Validation set: 1108 images - Test set: 516 images b Experiment with several networks We use four different and popular models: VGG16, ResNet50, Xception, DenseNet, for experiment The classifiers, including AveragePooling and FC(s) layers, from models’ architectures are removed as it is used to classify 1000 classes, which are different from two classes in the bamboo dataset (good and error) We crop images dataset to 224z224x3, as normally used in most of classification papers The images goes through data pre-processing stage, scaling and zero centering data to help converges faster as well as get better accuracy For image classification, follow the method of series of fully connected layers, we fine-tune the models by adding Global Average Pooling layer, which speed up the training process and avoid overfitting, following by two consecutive FC layers: FC(1024) and FC(2) for classification (relu and softmax activation, respectively) Actually, the new classifier has the same architecture as the pre-trained model According to Rawat and Wang [6], ‘comparing the performance of different classifiers on top of deep convolutional neural networks still requires further investigation and thus makes for an interesting research direction’ Research on what classifier architectures should be used on is beyond scope of this report We choose batch size of 32 for 50 epochs and SGD method, with learning rate = 1e-3, divided by 10 at epoch 60 and 30, weight decay = 1e-6, momentum = 0.9 and Nesterov momentum is applied Models are trained on GPU on Google Colab, with compute capability is 6.0, average training time is about 1.5 hour A threshold value (t >= 0.90) is applied in prediction results to eliminate prediction for background images without bamboo strips Where P is prediction results in forms of array, P e and Pg is the predict score for bamboo strips images TABLE I The top-1 and top-5 accuracy refers to the model's performance on the ImageNet validation dataset Model VGG16 ResNet50 Xception Top-1 Accuracy 0.713 0.749 0.790 Top-5 Accuracy 0.901 0.921 0.945 Our Approach 2D Separable Convolutional [8], [9] propose Separable Convolutions layer, which performs first a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes together the resulting output channels Normal convolutional layer takes as input a DF × DF × M feature map F and produces a DF × DF × N feature map G where DF is the spatial width and height of a square input feature map, M is the number of input channels (input depth), DG is the spatial width and height of a square output feature map and N is the number of output channel (output depth) The normal convolutional layer is parameterized by convolution kernel K of size DK ×DK ×M×N where DK is the spatial dimension of the kernel assumed to be square and M is number of input channels and N is the number of output channels as defined Normal convolutions have the computational cost of: Where the computational cost depends multiplicatively on the number of input channels M, the number of output channels N the kernel size Dk × Dk and the feature map size DF × DF Separable convolution performs first a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes together the resulting output channels Depthwise spatial convolution is the channelwise DK×DK spatial convolution Pointwise convolution is the 1×1 convolution to change the dimension Separable convolutions have the computational cost: The computational reduction is: When DK×DK is 3×3, to times less computation can be achieved, but with only small reduction in accuracy, as shown in [25] Bottlenecked layers It has been noted in [6, 13] that a 1×1 convolution can be introduced as bottleneck layer before each 3×3 convolution to reduce the number of input featuremaps, and thus to improve computational efficiency Skip connection Traditional convolutional feed-forward networks connect the output of the layer as input to the Layer, which gives rise to the following layer transition: ResNets add a skip-connection that bypasses the non-linear transformations with an identity function: An advantage of ResNets is that the gradient can flow directly through the identity function from early, close to input layers to the later layers Fig Identity block, s-Conv block, and block DenseNet [8], U-Net [14] and V-Net [15] show that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output Inspired by these ideas, we first create bottlenecked block with skip connection similar to Resnet, and connect every block to other in feed-forward fashion Generally, the block receives the feature maps of all early blocks , its function: where block has layers Upstream and MaxPooling Perform skip connection between blocks is not feasible because the size of feature maps change To address this problem, , we add 2D upstream layers with size respect to size of input feature map and shortcut separable convolution layer with size equal target feature maps, then maxpooling layers to get back to the target dimension, so the network can go down deeper Fig Connection between blocks with upstream and pooling layers Fig Network architecture Experiment results and conclusion We evaluate our method on our dataset As described previously, our Bamboo dataset contains classes: No defect and with defect (good or bad bamboo strips), around 4400 images for training and 1100 images for validation Model is tested by test dataset containing about 500 unlabeled images Images are cropped to 224x224x3, with per-pixel mean subtracted With the help of ImageGenerator from Keras [16], flipping and rotating images are generated for training phase We choose batch size of 32 for 50 epochs and stochastic gradient descent (SGD), with learning rate = 1e-3, divided by 10 at epoch 60 and 30, weight decay = 1e-6, momentum = 0.9 and Nesterov momentum is applied We train our model on GPU on Google Colab, which lasts about hour of training a Results TABLE II Accuracy and FLOPS in test set, compare with other methods Model* MResNet50 ResNet50 r Paramete FLOPS** 16,737,0 36,905,06 (%) 97.33 47,157,64 98.16 41,758,89 99.08 45 Accura 25,636,7 12 Xception cy 22,910,4 80 138,357, 268,512,6 98.16 544 56 *without transfer learning **we remove all FC(s) layers after AveragePooling then VGG16 Fig Comparison of accuracy and loss in training and validation between m-resnet and several models In our experiment, M-ResNet starts convergence from the very first epochs, then training accuracy and loss make progress gradually Accuracy is able to increase if the model is trained by more epochs Generally, it performs equally good in comparison with ResNet, and Xception; and slightly better than VGG, (in bamboo dataset) However, validation accuracy and loss are non-stable, we presume that because of dataset’s error in the classification by human step, or the learning rate is high b Conclusion In this paper, we propose a new method of defect classification on bamboo strips based on convolution neural network, named M-Resnet M-ResNet has several compelling advantages: it can handle the vanishing-gradient problem, strengthen feature propagation, promote feature reuse, and reduce the number of parameters While we apply the proposed architecture for bamboo defect classification, it is general and can be applied to other tasks In future work, we will explore this architecture with more and deeper layers, while maintaining its performance References [1] Olli Silvén, Matti Niskanen, Hannu Kauppinen, “Wood inspection with non-supervised clustering” Machine Vision and Applications, 2003, 13, pp 275-285 [2] Xingguang Qi ; Xiaoting Li ; Hailun Zhang, ‘’ Research of paper surface defects detection system based on blob algorithm’’ 2013 IEE International Conference [3] Q Xiansheng, H Feng, L Qiong and S Xin, "Online defect inspection algorithm of bamboo strip based on computer vision," 2009 IEEE International Conference on Industrial Technology, Gippsland, VIC, 2009, pp 1-5 [4] Xuanyin Wanga, Dongtai Lianga,b, Weiyan Dengc, ‘‘Surface grading of bamboo strips using multiscale color texture features in Eigenspace’’, Computers and Electronics in Agriculture 73 (2010) 91– 98 [5] Alex Krizhevsky, Sutskever, and Hinton, ImageNet Classification with Deep Convolutional Neural Networks, In NIPS ‘2012 corresponding paper [6] K He, X Zhang, S Ren, and J Sun Deep residual learning for image recognition In CVPR ‘2016 corresponding paper [7] Rupesh Kumar Srivastava and Klaus Greff and Jurgen Schmidhuber, ‘Highway Networks’, In CoRR ‘2015 corresponding paper [8] Huang, Gao et al “Densely Connected Convolutional Networks.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016): 2261-2269 [9] François Chollet Xception: ‘Deep Learning with Depthwise Separable Convolutions’, In CVPR’ 2017 corresponding paper [10] Andrew G Howard and Menglong Zhu and Bo Chen and Dmitry Kalenichenko and Weijun Wang and Tobias Weyand and Marco Andreetto and Hartwig Adam, ‘MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications’, arXiv eprint 1704.04861, 2017 [11] Simonyan, Karen and Andrew Zisserman “Very Deep Convolutional Networks for Large-Scale Image Recognition.” CoRR ‘2014 corresponding paper [12] Hochreiter, Sepp & Schmidhuber, Jürgen ‘Long Short-term Memory’ Neural computation 1997 [13] C Szegedy, V Vanhoucke, S Ioffe, J Shlens, and Z Wojna Rethinking the inception architecture for computer vision In CVPR ‘2016 corresponding paper [14] Ronneberger, Olaf, Philipp Fischer and Thomas Brox “U-Net: Convolutional Networks for Biomedical Image Segmentation.” ArXiv abs/1505.04597 (2015) [15] Milletari, Fausto et al “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation.” 2016 Fourth International Conference on 3D Vision (3DV) (2016): 565-571 [16] F Chollet et al., “Keras,” https://github.com/fchollet/keras, 2015

Ngày đăng: 15/05/2023, 15:14