BÁO CÁO "OPTICAL CHARACTER RECOGNITION FOR VIETNAMESE SCANNED TEXT " ppt

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	6
Dung lượng	624,71 KB

Nội dung

Tuyển tập Báo cáo Hội nghị Sinh viên Nghiên cứu Khoa học lần thứ 8 Đại học Đà Nẵng năm 2012 OPTICAL CHARACTER RECOGNITION FOR VIETNAMESE SCANNED TEXT Authors: Tran Anh Viet, Le Minh Hoang Hac, Le Tuan Bao Ngoc, Le Anh Duy Class: 08ECE, Electronic and Communication Engineering Department, DaNang University of Technology Advisors: Ph.D. Pham Van Tuan, M.E. Hoang Le UyenThuc Electronic and Communication Engineering Department, Da Nang University of Technology Abstract Optical Character Recognition is a technology that enable human to digitize scanned images, converting into editable text on the computer and increasing the speed of data transmission directly into computer from many source of documents. In addition, it is also useful in handwriting recognition and making digital images searchable for text. In this paper, we proposed anOCR system which is capable of recognizing Vietnamese characters fortyped texts using template matching and artificial neural network recognizing methods. Each method has its own advantage as well as weakness and they will be clearly shown through this paper, so that the readers can figure out what method they might use for specific situation of OCR for Vietnamese typed text. 1. Introduction In recent years, OCR has become a popular industry aroundthe world with variety of languages and Vietnamese is not an exception. However,in comparison with other languages, Vietnamese OCR technology is still young and needs improvement for higher efficiency as well as growing more applicable. With this inspiration, our group decides to do a research on OCR to find a simpler but efficient alternative for Vietnamese language. The process of how to do OCR for printed Vietnamese script will be discussed throughout this paper in detail. 2. Procedure A general approach for any OCR problem [2] contains 7 steps as shown in figure 1 Imageswithsome standard format such as bmp or jpeg format are feed into our system. The scanned images, respectively, go through pre-processing, segmentation, feature extraction, classification/recognition, post-processing step,than appear at the output as text [figure 1]. We will discuss this procedure step-by-step as following. 2.1 Pre-processing Scanned image input Preprocessing Segmentation Classification and recognition Post-processing Recognized text Figure 1: General structure of an OCR process Feature extraction Tuyển tập Báo cáo Hội nghị Sinh viên Nghiên cứu Khoa học lần thứ 8 Đại học Đà Nẵng năm 2012 The first step in preprocessing stage is toconvert the color or gray-scale images into binary image. To determine a threshold value for binarizing image we apply the following formula [4]: T = T[x, y, p(x,y), f(x,y)] (1) Where T is threshold function, f(x, y) is the fray scale level of point (x, y) and p(x, y) denotes some local property of this point. A thresholded image g(x, y) is defined as:                  (2) The second step is to remove background noise using Median filter [4] 2.2 Segmentation Segmentation process consists of the following steps respectively:  Split the original image into individual lines using horizontal projection profile of image [6] Lines are cells which correspond to horizontal projection profile value greater some minimum value (0 in this case)  Split each word from lines into characters using bounding box and vertical projection profile [6] Letters are cells that correspond to vertical projection profile value greater some minimum value (0 in this case)  After obtaining images of single letters, next is cropping unnecessary region and reshaping the characters’ image to appropriate size as shown in figure 5 and 6a,b Figure 3: Horizontal projection profile diagram of an image Figure 2: Eliminate Gaussian noise with Median filter 0 50 100 150 200 250 300 350 400 0 10 20 30 40 Figure 4: Vertical projection profile diagram for every single character Figure 5:Cropping empty region from a letter’s image Figure 6a: Resize height to 16, width depends on image ratio ce Figure 6b: Resize both height and width to 16 ratioce Tuyển tập Báo cáo Hội nghị Sinh viên Nghiên cứu Khoa học lần thứ 8 Đại học Đà Nẵng năm 2012  Mapping pixel from a letter matrix into a cell array for recognition process. 2.3 Feature extraction To reduce matrix size for a letter’s image, which does a great help for training and recognition process, feature extraction should be used to achieve this goal. In our project, we employ Hu’s seven moment extraction [5]. Hu’s seven moment invariants are invariant to image transformations including scaling, translation and rotation. Computing Hu’s moments follows figure 8[5]: Theletter’simage is first converted to binary format. The function to compute the regular moments is in the format: [m] = moment (fig , p , q). fig is the input binary image, and p , q are predefined moment’s order. With these parameters available, we do the summation according to the regular moments’definition [5]: The centroid of the binary image is computed according to and .Based on thecentroidoftheimage, similarto the regular moments, the centralmomentsfunctionisinthe format: [µ] = central_moment(fig , p , q). This is computed according to the definition [5]: 2.4 Training & Recognition We used two main methods for Character Recognizer engine: Artificial Neural Network (ANN) and Template Matching (TM) 2.4.1 Template Matching Method Template matchingmethod uses simple algorithm to measure the difference between the character samples and character prototypes in the library. In order to function Figure 7: matrix mapping for a letter ratioce Figure 8: Block diagram of computing Hu’s seven moment invariants ratioce (3) (4) Tuyển tập Báo cáo Hội nghị Sinh viên Nghiên cứu Khoa học lần thứ 8 Đại học Đà Nẵng năm 2012 Table 1: Overall results for template-matching technique well enough, TM must have an extensive library of character image prototypes to deal with many of the conditions of the input images. The core algorithm of TM is the Euclidian Distance (ED) [5] which computes the Euclid distance of the input image with the entire library of prototypes. The Euclidian Distance is calculated as follow:               With x(i,j) and y(i,j) are the pixels of the input image and the prototype, respectively. As mentioned above, TM recognition engine produces considerable amount of errors if the number of prototypes is small. Since TM engine is rather simple it doesn’t need features extraction for a character image; the computation takes the pixels of the image as its input instead. 2.4.2 Artificial Neuron Network Method(ANN) General process for ANN recognition method as shown in figure 9 [2]: Figure 10 demonstrates a training neural network with 256 cell arrays at input, 14 cell arrays at output and 150 hidden layers. 3. Experimentswith TM and ANN 3.1 Template matching With 1600 prototypes we are able to recognize one page of printed text with Arial font with an accuracy of 82% up to 90%. Confidence level is much higher with clean printed texts and reduces significantly with poorly printed ones. Font Accuracy Prototypes/character Text condition Arial 82%-90% 8 Good Arial 82% - 90% 6 Good Figure 9: Training and recognition using artificial neural network Figure 10: Training network ratioce First is analyzing image for characters than convert symbols to pixel matrices. Second step is retrieving corresponding desired output character and convert to Unicode and reshape matrix and feed to network than compute network Tuyển tập Báo cáo Hội nghị Sinh viên Nghiên cứu Khoa học lần thứ 8 Đại học Đà Nẵng năm 2012 Arial 82% - 90% 3 Good 3.2 Artificial neural network Input of network: 256 binary values, according to 256 pixel of resized image Output of network: 14 binary values, according to binary Unicode character Training data: 114 lowercase Vietnamese character images in “Arial” font. We use 5 more same collections of images with different size 200, 180, 160, 140, 120, 100 height pixel and 6 space characters image. In total, we have 690 sample data for each font without noise. Target data: 114 lowercase Vietnamese character “Arial” font and space character. Character image size (Height pixel) 76 67 58 49 Result 94.7% 92.11% 92.11% 85.6% Actual image’s recognition for ANN  o v d đ c e ắ ằ ấ ầ ề ế l 9 w  Khi hoàn thành vào năm 2017, đây sẽ là tòa tháp cao nhất thế giới  hello word ŷ viefnmesecharacferrecogition 4. Conclusion For both Template matching (TM) and Artificial neural network (ANN) methods, the recognizing performance for each letters is pretty high (82% - 95%). TM seems to be better in recognizing script in good condition than ANN. However, TM strongly depends on font style and condition while ANN is capable of dealing with different fonts and various testing condition. In future work, from results and our point of view,ANN should be more focused in recognition of printed Vietnamese script due to its adaptive ability to multiple text style and condition. To enhance ANN’s performance, besides better segmentation, multiple feature extraction can be a good way to go. References [1] Twan van Laarhoven(Eng) (11/2010), Text Recognition in Printed Historical Documents.pdf, Concordia University, Montreal, Canada. [2] Raghuraj Singh, C.S. Yadac, PrahatVerma, VibhashYadav (Eng) (6/2010), Optical Recognition [OCR] for Printed Devnagari Script using Atificial Neural Network.pdf [3] K L. Du, PhD, M.N.S. Swamy, PhD, D.Sc, (Eng), (2006), Neural Networks in a Table 2: Results for Neural network method Tuyển tập Báo cáo Hội nghị Sinh viên Nghiên cứu Khoa học lần thứ 8 Đại học Đà Nẵng năm 2012 Softcomputing Framework.pdf, Concordia University, Montreal, Canada. [4] Rafael C.Gonzalez, Richard E.Woods, (Eng), (1992), Digital image processing second edition. [5] Qing Chen, Ottawa, Canada, (Eng), (2003), Evaluation of OCR Algorithms for Images with Different Spatial Resolutions and Noises, University of Ottawa. [6] YI LU, (Eng), (5/1994), Machine-printed segmentation, Department of Electrical and Computer Engineering, The University of Michigan Dearborn, Dearborn, . Tuyển tập Báo cáo Hội nghị Sinh viên Nghiên cứu Khoa học lần thứ 8 Đại học Đà Nẵng năm 2012 OPTICAL CHARACTER RECOGNITION FOR VIETNAMESE SCANNED TEXT . handwriting recognition and making digital images searchable for text. In this paper, we proposed anOCR system which is capable of recognizing Vietnamese characters

Ngày đăng: 06/03/2014, 02:20

Xem thêm