Application of Bi-gram Driven Chinese Handwritten Character Segmentation for an Address Reading System
In this paper, we describe a bi-gram driven method for automatic reading of Chinese handwritten mails. In destination address block (DAB) location, text lines are first extracted by connected components analysis. Each candidate line is segmented and recognized by our holistic method, which incorporates mail layout features, recognition confidence and context cost. All these are also taken into consideration to identify the DABs from the candidate text lines. Based on them, street address line and organization name line are determined. At last step, edit distance based string matching is performed against given databases. We also discuss the pretreatment to deal with Chinese address databases consisted of a large amount of vocabularies in order to generate keywords for fast indexing during matching. Detailed experiment results on handwritten mail samples are given in the last section.
KeywordsEdit Distance Text Line Character Segmentation Text Block Arabic Digit
- 1.El Yacoubi, A., Bertille, J.M., Gilloux, M.: Conjoined location and recognition of street names within a postal address delivery line. In: Proc. 5th International Conference on Document Analysis and Recognition, pp. 1024–1027 (1995)Google Scholar
- 4.Liu, C.L., Koga, M., Fujisawa, H.: Lexicon-driven segmentation and recognition of handwritten character strings for Japanese address reading. IEEE Trans. PAMI 24(11), 1425–1437 (2002)Google Scholar
- 5.Liu, C.L., Masaki, N.: Precise Candidate Selection for large character set recognition by confidence evaluation. IEEE Trans. PAMI 22(6), 36–642 (2000)Google Scholar