Pattern Recognition, VoL29, No 11, pp 1789 1805, 1996 Copyright © 1996 Pattern Recognition Society Published by Elsevier Science Ltd Printed in Great Britain 0031 3203/96 $15.00+.00 Pergamon PII: S0031-3203(96)00039-8 STROKE EXTRACTION FOR CHINESE CHARACTERS USING A TREND-FOLLOWED TRANSCRIBING TECHNIQUE J I - R O N G L I N and C H A N G - F U U C H E N Department of Electrical Engineering, Tatung Institute of Technology, 40, Chung-Shan N Road, 3rd Sec., Taipei, 10451, Taiwan, R.O.C (Received February 1994; in revisedform 15 December 1995; receivedfor publication April 1996) Abstract The merit of the stroke extraction algorithms which utilize the thinning process is the ease of the feature abstracting from the skeleton of a character The two main tasks for this kind of algorithms are to find the certain adjacent segmental strokes for being merged into a complete stroke, and to search the corner point to divide the bend segmental stroke into two or more individual strokes This paper proposes an intuitive and effective stroke extraction method that passes through the distorted region and gets the reliable information of global features by applying the trend-followed transcribing technique to correctly accomplish the tasks In our experiments, the most frequently used 1500 Chinese characters printed in both the Ming font and the Fang Sung font with the size of 64 x 64 points are tested The results of the experiments show that the rate for correctly extracting all strokes of a character is 97.8% for the Ming font and 98.4% for the Fang-Sung font That is, the proposed stroke extraction algorithm is useful and reliable Copyright © 1996 Pattern Recognition Society Published by Elsevier Science Ltd Chinese character stroke extraction Thinning process INTRODUCTION The recognition of a Chinese character is considered a very difficult problem Considering the practical conditions of a Chinese character recognition system, the recognition rate of the conventional methods, such as shape matching, position matching, projection profiles, transform recognition, etc., is low (~-s) Some popular ideas for increasing the recognition rate have been developed, and an appreciative one is the structure analysis method (6-i°) The Chinese character is constructed from some rules that are dependent on the printed font, the writing style, the size, and the thickness However, the spatial relations and geometric configurations of the elemental strokes are usually well maintained Therefore, these properties can be regarded as invariant features and be applied to the recognition of Chinese characters In the structure analysis methods, the success of recognition is greatly influenced by how correctly the elemental strokes can be extracted There are two essentialrules for stroke extraction (i) A representative stroke that is a dot (-), a horizontal stroke ( ), a vertical stroke ([), or a leaning stroke (/or \) is extracted to be an individual stroke They should be exactly extracted without being split or being merged (ii) A bent stroke, such as -3, should be split up into two representative strokes at its corner point To follow these rules, the stroke extraction has been studied by m a n y authors and the stroke extraction algorithms can be generally divided into two groups according to the utilization of the thinning process or not The algo- Travel algorithm Trend-followed rithms without thinning process (1 l-13) extract strokes by the knowledge about the width-variation of strokes Theoretically a stroke can be extracted by means of the width-variation, but to measure the width of a stroke which is related to the trend of it is technically difficult Thus, owing to the illness of width-measure, the rules for stroke extraction based on knowledge about the width-variation of strokes are more complex and unexpected F o r instance, the knowledge is converted into about 20 parameters in Tseng's algorithm (12) F o r the algorithms with thinning process, (a4-~6) the two main tasks are to handle fork points and to find corner points The violent distortion on fork points produced by the thinning process, such as the split of one fork point into two or more fork points, makes the handle of fork points difficult The general method (i4'i6) devotes to finding and merging the split fork points The problem of these methods is that they not take care of both the local and global considerations such that they are likely to fall into local traps The same problem happens in the search of corner points In this paper, an intuitive and effective stroke extraction method, which uses the thinning process and is called the trend-followed stroke transcribing algorithm, is proposed In order to handle the problem mentioned above, the algorithm intends to pass through the distorted region to get the reliable information of the global features By the same manner, the corner-searching algorithm is flexible for both the smooth transition corner and the abrupt transition corner This stroke extraction technique consists of three stages: (i) thin the character and label the feature 1789 1790 J.-R L I N and C.-F C H E N points; (ii) search and label the corner points; and (iii) transcribe each stroke from its one end point and record the traces until finding the other end point of this stroke or the fork point that is judged to be the end of this stroke by calculating the trends of the branches Section will give these proposed procedures and algorithms in detail The experimental results are described in Section while the algorithm is summarized in Section TREND-FOLLOWED STROKE EXTRACTION F o r easily describing the system, some definitions are given in the following: Character point: a character point is a point of the skeleton of a thinned character Neighbor: as shown in Fig 1, point n~, i = 0, 1, , 7, is named neighbor of character point P A neighbor that is a character point is named character neighbor, and a neighbor that is not a character point is named non-character neighbor Arm: an isolated character neighboring or a group of consecutive character neighbors of character point P is named arm of character point P For example, as n~ no n7 n4 P no n3 n~ n, shown in Fig 2, P1, Pz,-.-, P8 has 1, 1, 2, 2, 3, 3, 4, and arms, respectively End point: an end point is a character point which has only one arm In Fig 2, P1 and P2 are end points Middle point: a middle point is a character point that has two arms In Fig 2, P3 and P4 are middle points Fork point: a fork point is a character point that has three or four arms If it has (4) arms, it is named 3-fork (4-fork) point In Fig 2, Ps, P6, P7 and Ps are fork points Corner point: a corner point is a connection point of two strokes whose trends construct a sharp angle less than C_ANGLE, a constant angle that varies with the font of the input character Branch point: if an arm of character point P is an isolated character neighbor P,, then P, is denoted as the branch point of this arm If an arm of P is a group of character neighbors P , ~ P~,,+ k)mod S, and the numbers of arms of P, ~ P(,+k) mod8 are a o ~ ak, respectively, then the branch point of this arm is P(, + moa which ai=max{a o ~ a k } and i < j for ai=aj, where O