Application of Bi-gram Driven Chinese Handwritten Character Segmentation for an Address Reading System

  • Yan Jiang
  • Xiaoqing Ding
  • Qiang Fu
  • Zheng Ren
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3872)

Abstract

In this paper, we describe a bi-gram driven method for automatic reading of Chinese handwritten mails. In destination address block (DAB) location, text lines are first extracted by connected components analysis. Each candidate line is segmented and recognized by our holistic method, which incorporates mail layout features, recognition confidence and context cost. All these are also taken into consideration to identify the DABs from the candidate text lines. Based on them, street address line and organization name line are determined. At last step, edit distance based string matching is performed against given databases. We also discuss the pretreatment to deal with Chinese address databases consisted of a large amount of vocabularies in order to generate keywords for fast indexing during matching. Detailed experiment results on handwritten mail samples are given in the last section.

Keywords

Edit Distance Text Line Character Segmentation Text Block Arabic Digit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    El Yacoubi, A., Bertille, J.M., Gilloux, M.: Conjoined location and recognition of street names within a postal address delivery line. In: Proc. 5th International Conference on Document Analysis and Recognition, pp. 1024–1027 (1995)Google Scholar
  2. 2.
    Jimenez, V.M., Marzal, A.: Computing the K shortest paths: A new algorithm and an experimental comparison. In: Vitter, J.S., Zaroliagis, C.D. (eds.) WAE 1999. LNCS, vol. 1668, pp. 15–29. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  3. 3.
    Li, Y.X., Ding, X.Q., Tan, C.L., Liu, C.S.: Contextual Post-processing based on the confusion matrix in offline handwritten Chinese script recognition. Pattern Recognition 37(9), 1901–1912 (2004)MATHCrossRefGoogle Scholar
  4. 4.
    Liu, C.L., Koga, M., Fujisawa, H.: Lexicon-driven segmentation and recognition of handwritten character strings for Japanese address reading. IEEE Trans. PAMI 24(11), 1425–1437 (2002)Google Scholar
  5. 5.
    Liu, C.L., Masaki, N.: Precise Candidate Selection for large character set recognition by confidence evaluation. IEEE Trans. PAMI 22(6), 36–642 (2000)Google Scholar
  6. 6.
    Levenshtein, V.I.: Binary codes capable of correcting insertions and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)MathSciNetGoogle Scholar
  7. 7.
    Xue, J.L., Ding, X.Q.: Location and interpretation of destination addresses on handwritten Chinese envelopes. Pattern Recognition Letters 22(6), 639–656 (2001)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yan Jiang
    • 1
  • Xiaoqing Ding
    • 1
  • Qiang Fu
    • 1
  • Zheng Ren
    • 2
  1. 1.Department of Electronic EngineeringTsinghua UniversityBeijingChina
  2. 2.Siemens AGKonstanzGermany

Personalised recommendations