Automatic text extraction using DWT and Neural Network

5 507 1
Automatic text extraction using DWT and Neural Network

Đang tải... (xem toàn văn)

Thông tin tài liệu

Automatic text extraction using DWT and Neural Network

Automatic Text Extraction Using DWT and Neural Network Po-Yueh Chen (陳伯岳), Chung-Wei Liang (梁忠瑋) Department of Computer Science and Information Engineering, Chaoyang University of Technology (168 Gifeng E. Rd., Wufeng, Taichung County, Taiwan, R.O.C.) Tel: (04) 23323000 ext. 4420 Email:pychen@mail.cyut.edu.tw 摘要 本論文提出一個利用離散小波轉換與類神經網路來擷取影像中的文字區域的方法。原始影像經過離散小波轉換分解成四個子頻帶,正確文字區域的高頻子頻帶與非文字區域不同,所以可利用其差距計算出三個特徵值來當作類神經網路的輸入,然後用倒傳遞架構的類神經網路來訓練待測的文字區域。文字區域的類神經網路輸出值不同於非文字區域的輸出值,因此可利用一臨界值來判定其是否為文字區域。最後,將其偵測的文字區域經過擴張運算後便可得到正確的文字區域。 關鍵詞:文字擷取、離散小波轉換、類神經網路 Abstract In this paper, we present a new text extraction method based on discrete wavelet transform and neural network. The method successfully extracts features of candidate text regions using discrete wavelet transform. This is because the intensity characteristic of any detail component sub-band is different from that of the others. We employ this difference to extract features of candidate text regions. A neural network based on back propagation algorithm (BP) is trained according to these features. The final network output of real text regions is different from those non-text regions. Hence, we can apply an appropriate threshold value with some dilation operators to obtain the real text regions. Keywords:Text extraction, DWT, Neural Network 1. Introduction Text extraction plays an important role in static images and video sequences analysis. Texts provide important information about images or video sequences and hence can be used for video browsing/retrieval in a large video database. However, text extraction presents a number of problems because the properties of text may vary, as well as the text sizes and the text fonts. Furthermore, texts may appear in a cluttered background. These are the reasons make text extraction a challenging task. Many papers concerning extraction of texts from static images or video sequences have been published in recent years. Those methods are applied either on uncompressed images or compressed images. Text extraction from uncompressed image can be classified as either component-based or texture-based. For component-based text extraction methods, text regions are detected by analyzing the robust edges or homogeneous color/grayscale components that belong to characters. For example, Cai et al. [1] detect text edges in video sequences using a color edge detector and then apply a low threshold to filter out definite non-edge points. Real text edges are detected using an edge-strength-smoothing operator and an edge-clustering-power operator. Finally, they employ a string-oriented coarse-to-fine detection method to extract the real text regions. Datong Chen et al. [2] detect vertical edges and horizontal edges in an image and dilate these two kinds of edges using different dilation operators. The logical AND operator is performed on dilated vertical edges and dilated horizontal edges to obtain candidate text regions. Real Text regions are then identified using the support vector machine. Text regions usually have special texture features because they consist of components of characters. These components also contrast the background and exhibit a periodic horizontal intensity variation due to the horizontal alignment of characters. As a result, texts can be extracted according to these special texture features of characters. Paul et al [3] segmented and classified texts in a newspaper by generic texture analysis. Small masks are applied to obtain local textural characteristics. All the text extraction methods described above are applied on uncompressed images. Today, most of digital videos and static images are usually stored in compressed forms. For example, the JPEG2000 image compression standard applies DWT coding to decompose the original image and the DCT (Discrete Cosine Transform) is employed in the previous JPEG standard. Zhong et al. [4] extract captions from the compressed videos (MPEG video and JPEG image) based on DCT. DCT is able to detect edges in different directions from a candidate image. Edge regions containing texts are then obtained using a threshold afterward. Chun et al. [5] extract text regions in video sequences using Fast Fourier Transform operation (FFT) and neural network. In this paper, we proposed an efficient method that extracts text regions in video sequences or images using Discrete Wavelet Transform (DWT) and neural network. First of all, DWT extracts some edge features of the original image. Then the text regions are obtained using a neural network trained with those features. The proposed extraction method is described with details in section 2. In section 3, experiment results are displayed. The sample images are selected from complex images and videos with both texts and picture regions in them so as to demonstrate the efficiency of the proposed method. Finally, we conclude in section 4. 2. Proposed Method In this section, we present a method to extract texts in static images or video sequences using DWT and neural network. DWT decomposes one original image into four sub-bands. The transformed image includes one average component sub-band and three detail component sub-bands. Each detail component sub-band contains different features information of the real text regions. Those features are applied to the back-propagation (BP) algorithm for training a neural network which eventually extracts the text regions. In a colored image, the color components may differ in a text region. However, the information about colors does not help extracting texts from images. If the input image is a gray-level image, the image is processed directly starting at the discrete wavelet transform. If the input image is colored, the RGB components are combined to give an intensity image Y as follows: Y = 0.299R + 0.587G +0.114B (1) Image Y is then processed with discrete wavelet transform and the whole extraction algorithm afterward. If the input image itself is already stored in the DWT compressed form, the DWT operation can be omitted in the proposed algorithm. The flow chart of the proposed algorithm is shown in Figure 1. We choose Haar DWT because it is the simplest among all wavelets [6]. The working principle of Haar DWT is discussed in the next sub-section in details. Haar DWTFeature extractionCandidate text extraction using NeuralNetworkText regions extraction results Figure 1. Flow chart of the proposed algorithm LLLHHLHH Figure 2. The result of 2-D DWT decomposition 2.1 Haar Discrete wavelet transform The discrete wavelet transform is a very useful tool for signal analysis and image processing, especially in multi-resolution representation [7]. It decomposes signals into different components in the frequency domain. One-dimensional discrete wavelet transform (1-D DWT) decomposes an input sequence into two components (the average component and the detail component) with a low-pass filter and a high-pass filter [8]. Two-dimensional discrete wavelet transform (2-D DWT) decomposes an input image into four sub-bands, one average component (LL) and three detail components (LH, HL, HH) as shown in Fig 2. In these three detail components of an image, we can obtain various edge features of the original image. ABC DEFGHIJKLMNOP (A+B)(C+D)(A-B)(C-D)(E + F) (G + H ) (E - F) (G - H )(I + J) (K + L ) (I - J) (K - L)(M + N ) (O + P) (M - N ) (O - P ) (a) (b) (A + B) + (E + F) (C + D) + (G + H) (A - B) + (E - F) (C - D) + (G - H)(I + J) + (M + N) (K + L) + (O + P) (I - J) + (M- N) (K - L) + (O - P)(A+B)-(E+F) (C+D)-(G+H) (A-B)-(E-F) (C-D)-(G-H)(I + J) - (M + N) (K + L) - (O + P) (I - J) - (M - N) (K - L) - (O - P) (c) Figure 3. (a) The original image (b) the row operation of 2-D Haar DWT (c) the column operation of 2-D Haar DWT Figure 4. Original gray-level image We demonstrate the operations of 2-D Haar DWT with an example as shown in Figure 3. Figure 3(a) is a sample of 4×4 gray-level images. Only addition and subtraction are involved in the computation processes. 2-D DWT is achieved by two ordered 1-D DWT operations (row and column). First of all, we perform the row operation to obtain the result shown in Figure 3(b). Then it is transformed by the column operation and the final resulted 2-D Haar DWT is shown in Figure 3(c). 2-D Haar DWT decomposes a gray-level image into one average component sub-band and three detail component sub-bands. From these three detail components, we can obtain important features of candidate text regions. As a practical example, a gray-level original image is shown in Figure 4. The corresponding DWT sub-bands are shown in Figure 5. We can extract features of candidate text regions from the detail component sub-bands in Figure 5 In next subsection, a neural network is employed to learn the features of candidate text regions obtained from those detail component sub-bands. Finally, the well trained neural network is ready to extract the real text regions. Figure 5. 2-D Haar discrete wavelet transform image LH HL HHOutput nodeHidden nodeInput node Figure 6. Proposed architecture of the neural network 2.2 Neural Network In this subsection, text extraction from static image or video sequences is accomplished using the back-propagation (BP) algorithm on a neural network. The training of the neural network is based on the features we obtain from the DWT detail component sub-bands. As shown in Figure 6, the proposed neural network architecture is simpler than architectures proposed previously [9]. It consists of three input nodes, three hidden nodes and one output node. The features expressed in equations (2), (3) and (4) are computed for every pixel in the detail component sub-bands. 22| LH(i, j) - HL(i, j) |feature1(i, j) =255 (2) 22| LH(i, j) - HH(i, j) |feature2(i, j)=255 (3) 22| HL(i, j) - HH(i, j) |feature3(i, j) =255 (4) Figure 7. The extracted text region The sample images chosen for experiments include some pure text samples and some samples containing non-text regions. Corresponding to the text characteristics of an image, the intensity of detail component sub-bands is quiet different from one sub-band to another. We employ this intensity difference to compute 3 features of candidate text regions. Those features are used as the input of a neural network for training based on the back-propagation algorithm for neural networks. After the neural network is well trained, new input data will produce an output value between zero and one. The output values of real text regions are pretty different from those of the non-text regions. Therefore, we can apply an appropriate threshold to remove the non-text regions. Finally, the remained real text regions are processed by some dilation operations and shown in Figure 7. 3. Experiment Results Experiments are performed on static images and video sequences. The frame size is 1024×768 in BMP or MPEG format. We convert the colored frames into gray-level before applying the proposed method. In Figure 8, the results of the proposed algorithm are illustrated step by step. The original images shown in Figure 8(a) are decomposed into one average component sub-band and three detail component sub-bands as shown in Figure 8(b). Those detail component sub-bands contain the key features of text regions. According to these features, the text regions are obtained using a neural network. The final results are shown in Figure 8(c). 4. Conclusion This paper presents a method for extracting text regions from static images or video sequences using DWT and a neural network. DWT provides features of text regions for the training of a neural network using the back-propagation (BP) algorithm. We employ the proposed method to extract the text regions from some complicated images. Observing the experiment results, we find the proposed scheme is an efficient yet simple one for extracting text regions from images or video sequences. References [1] Min Cai, Jiqiang Song, M. R. Lyu “A new approach for video text detection,” IEEE International Conference on Inage processing, 2002, Volume: 1, 22-25 September, 2002 Page(s) : I-117 -I-120 vol.1 [2] Datong Chen, Bourlard H., Thiran J. -P., “Text Identification in Complex Background Using SVM,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001, Volume: 2, 8-14 Dec. 2001 Page(s): II-621 -II-626 vol.2 [3] Williams. P.S., Alder. M. D., ” Generic texture analysis applied to newspaper segmentation,” IEEE International Conference on Neural Networks, 1996. , Volume: 3, 3-6 June 1996 Page(s): 1664 -1669 vol.3 [4] Yu Zhong, Hongjiang Zhang, Jain, A.K., ” Automatic caption localization in compressed video, “ IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 22 Issue: 4 , April 2000 Page(s): 385 –392 [5] Byung Tae Chun, Younglae Bae, Tai-Yun Kim, Fuzzy ”Automatic Text Extraction in Digital Videos using FFT and Neural Network”, IEEE International Conference of Fuzzy systems 1999, FUZZ-IEEE '99. Volume: 2, 22-25 Aug. 1 Page(s): 1112 -1115 vol.2, 1999 [6] K. Grochening, W. R. Madych “Multiresoultion Analysis, Haar Bases, and Self-Similar Tilings of Rn “ IEEE Transaction on Information Theory, Vol. 38, No 2, Mar. 1992. [7] S. G. Mallat, “A theory for Multiresolution Signal Decomposition: The Wavelet Representation, “IEEE Trans. On PAMI, Vol. 11, No. 7, July 1989, pp.674-693. [8] Tinku Acharya, Po-Yueh Chen, “VLSI Implementation of a DWT Architecture “ISCAS ’98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems, Volume: 2, Page(s): 272-275 vol.2, 1998 [9] Keechul Jung, “Neural Network-based text location in clolr images,” Pattern Recognition Letters, Volume: 22, Issue: 14, December, 2001, pp. 1503-1515 (a) The original images (b) 2-D DWT sub-bands (c) The extracted text regions Figure 8. Two samples of the experiment results . text regions from static images or video sequences using DWT and a neural network. DWT provides features of text regions for the training of a neural network. the real text regions. Keywords Text extraction, DWT, Neural Network 1. Introduction Text extraction plays an important role in static images and video

Ngày đăng: 05/11/2012, 14:51

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan