In this paper, authors present the overview model of convolutional neural network including major function in each layer and transformations in this network. Based on that theory, authors have designed an application of English handwriting recognition using convolutional neural network as well as having comparative tests between some recognition algorithms.
HAPPY NEW YEAR 2018 HANDWRITING RECOGNITION BASED ON CONVOLUTIONAL NEURAL NETWORK PHAM TUAN DAT, LE THE ANH Faculty of Information Technology, Vietnam Maritime University Abstract Recent years, there is a new research approach based on convolutional neural network which is known as one of the most advanced deep learning models In fact, convolutional neural network has been widely applied in many artificial intelligence problems such as object recognition, feature detection and text classification To meet the need of those problems applications must have an efficient algorithm In this paper, authors present the overview model of convolutional neural network including major function in each layer and transformations in this network Based on that theory, authors have designed an application of English handwriting recognition using convolutional neural network as well as having comparative tests between some recognition algorithms Keywords: Convolutional, perceptron, nearest neighbor, feature map, pooling, receptive field, backpropagation, cross - entropy Tóm tắt Những năm gần đây, có hướng nghiên cứu dựa mạng nơron nhân chập biết mơ hình học sâu tiên tiến Thực tế mạng nơron nhân chập ứng dụng rộng rãi nhiều tốn trí tuệ nhân tạo nhận dạng đối tượng, phát đặc trưng hay phân loại văn Để đáp ứng yêu cầu vấn đề trên, ứng dụng phải có giải thuật hiệu Trong báo này, nhóm tác giả giới thiệu mơ hình tổng quan mạng nơron nhân chập gồm chức tầng phép biến đổi mạng Trên sở đó, nhóm tác giả thiết kế ứng dụng nhận dạng chữ viết tay tiếng Anh sử dụng mạng nơron nhân chập có so sánh kết số giải thuật nhận dạng Từ khóa: Nhân chập, giải thuật học có giám sát, láng giềng gần nhất, đồ đặc tính, tổng hợp, trường tiếp nhận, truyền ngược, độ đo số bít tối thiểu cho mã hóa Introduction Convolutional neural network (CNN) is an advanced deep learning model on which different artificial intelligence problems may be supported For instance, major problems such as feature detection and object recognition from digital image can be solved by applying CNN model Currently many applications of face recognition or natural language processing were installed on different platforms of computers or mobile devices Compared with other recognition algorithms, neural network has abilities of learning, fault tolerance, and classifying samples into different classes CNN is improved from traditional neural network, so that it does have above advantages and also offers high recognition accuracy Furthermore, training algorithm in neural network can generate parameters for the input of various models such as support vector machine All network models could be applied for artificial intelligence fields but only some of them are efficient enough for recognition problems Therefore in the paper authors present the overview model and comparative experiments of English handwriting recognition based on CNN, multilayer perceptron network (MLP), and the other deep learning model called nearest neighbor algorithm Theoretical Background CNN is different from MLP in some points: Firstly, CNN consists of hidden layers that are linked together through convolutional operations and non-linear functions Secondly, in CNN, each neuron of hidden layer is only connected to some neurons in local region of input layer Lastly, CNN works on basic concepts such as local receptive field, shared weight and pooling Local receptive field: let input of network be a digital image with size 28*28 and divided into regions with size 5*5 as depicted in the Figure If the window is moved by one pixel in order (from left to right and from top to bottom of image) then 24*24 regions of image will be generated according to 24*24 neurons at convolutional layer, and this transformation is known as a feature map 70 Journal of Marine Science and Technology No 53 - January 2018 HAPPY NEW YEAR 2018 Figure Neuron is created from the local region Shared weight: each feature map has 26 weights or parameters, including 25 shared weights and bias parameter, so that feature maps will produce 156 weights In fact, recognition applications always have dozens of feature maps Figure Max - pooling procedure Pooling: Its task is to reduce number of neurons, each region including 2*2 neurons will create neuron at next layer Two helpful procedures for pooling are called max - pooling and l2 - pooling Figure describes the overview model of one CNN The structure of network consists of input and output layer, convolutional layers, pooling layers, fully connected layer The input size is 28*28, feature map number is 6, output layer allows to recognize labels from to Of course if the size of input or class number is large, then network must have more layers Figure The overview model of one CNN Transformations from local regions to 1st convolutional layer: 2 C ( i , j ) ( I (i u, j v) k11, p (u, v) b1p ) p u 2 v 2 p ; i , j 24 ( x) e x (1) Activation equation in this case is sigmoid but CNN providing another one is function Transformations from 1st convolutional layer to 1st pooling layer: 1 1 S (i,j) p C p( 2i u, j v) u 0 v 0 p ; i, j 12 Journal of Marine Science and Technology No 53 - January 2018 (2) 71 HAPPY NEW YEAR 2018 Transformations from 1st pooling layer to 2nd convolutional layer: 2 C ( i , j ) ( S 1p (i u, j v) k p2, q (u, v) bp2 ) q p 1 u 2 v 2 q 12; i, j (3) Transformations from 2nd convolutional layer to 2nd pooling layer: 1 S q (i,j) C q ( 2i u, j v) u 0 v 0 q 6; i, j (4) In Equations (2) and (4), the procedure used is l2 - pooling Transformations from 2nd pooling layer to fully connected layer: f F ({S q2 }q 1 12 ) y (W * f b) (5) Estimation error between true label and prediction: there are two choices including cross - entropy and mean squared error Recent research showed that applying cross - entropy cost is better than applying mean squared error cost in neural network [4] Furthermore, to increase the accuracy of recognition problem then CNN must use backpropagation algorithm with the detail content described in [1], [3] Experimental Work Authors have carried out experiments of English handwriting recognition with three recognition algorithms based on CNN [2], MLP, and nearest neighbor Toolkits for the application are Python and relevant libraries Patterns were collected from “www.ee.surrey.ac.uk/CVSSP/demos/chars74k/” including 62 classes (0 - 9, a - z, A - Z) Of all the patterns, digits number is 484 while upper letters and lower letters account for 1297 and 1139 patterns respectively The number of patterns for training and testing was splitted by rate 7:3 Model of the CNN structure in experiments is the same as the above model but having some differences: the size of input data is 28*40, kernel number is 16/32, learning rate parameter is 0.01; the procedure used in transformations from convolutional layers to pooling layers is max – pooling, the estimation function is cross - entropy MLP network has hidden layers with 1024 features on each one and like CNN, it also uses the cross - entropy function while the estimation function of nearest neighbor algorithm is manhattan distance (l1 norm) [5] The process for training and testing upper letters on CNN: Figure Training and testing upper letters on CNN The experiment detects handwriting letters from digital image according to labels of the above model: Figure Recognizing lower letters 72 Journal of Marine Science and Technology No 53 - January 2018 HAPPY NEW YEAR 2018 Figure describes the recognition result of algorithms CNN model recognizes well for upper letters patterns, MLP network also gives rather good results while nearest neighbor achieves the highest accuracy Nevertheless, with lower letters both algorithms don’t give recognition results as expectation Digits Lower letters Upper letters Convolutional neural network 89.6% 78.9% 90.3% Multilayer perceptron network 89.6% 75.1% 87.0% Nearest neighbor 87.6% 83.7% 94.4% Figure The accuracy of recognition results Conclusion Nowadays, the problem of English text recognition is not a new subject Actually, many applications have been implemented on the multilayer perceptron network approach For the difficult problem like English handwriting recognition, authors present a new approach based on CNN model and compare to some other methods about the accuracy of recognition result Through the experimental work, authors obtained the results of training and testing on CNN more exactly than those on other networks In addition, one remarkable advantage of neural networks is the ability to change the number of layers in the network structure to improve accuracy However, the recognition result based on CNN is not really higher when compared with nearest neighbor algorithm On the other hand, the limitation of CNN has not been resolved completely, if complex shapes of patterns or the quality of input data is not good then the recognition efficiency will decrease REFERENCES [1] Gavin Hackeling, Mastering Machine Learning with scikit - learn, Packt Publishing, October 2014 [2] Rodolfo Bonnin, Building Machine Learning Projects with TensorFlow, Packt Publishing, November 2016 [3] Zhifei Zhang, “Derivation of Backpropagation in Convolutional Neural Network (CNN)”, October 2016 [4] Michael Nielsen, Neural Networks and Deep Learning, Determination Press, 2015 [5] Deepak Sinwar and Rahul Kaushik, “Study of Euclidean and Manhattan Distance Metrics using Simple K-Means Ckustering”, International Journal For Research In Applied Science And Engineering Technology, Vol.2 IssueV, May 2014 Received: Revised: Accepted: 11 January 2018 22 January 2018 26 January 2018 Journal of Marine Science and Technology No 53 - January 2018 73 ... letters both algorithms don’t give recognition results as expectation Digits Lower letters Upper letters Convolutional neural network 89.6% 78.9% 90.3% Multilayer perceptron network 89.6% 75.1% 87.0%... accuracy of recognition results Conclusion Nowadays, the problem of English text recognition is not a new subject Actually, many applications have been implemented on the multilayer perceptron network. .. [3] Zhifei Zhang, “Derivation of Backpropagation in Convolutional Neural Network (CNN)”, October 2016 [4] Michael Nielsen, Neural Networks and Deep Learning, Determination Press, 2015 [5] Deepak