Document-Form Identification Using Constellation Matching of Keywords Abstracted by Character Recognition

  • Hiroshi Sako
  • Naohiro Furukawa
  • Masakazu Fujio
  • Shigeru Watanabe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2423)

Abstract

A document-form identification method based on constellation matching of targets is proposed. Mathematical analysis shows that the method achieves a high identification rate by preparing plural targets. The method consists of two parts: (i) extraction of targets such as important keywords in a document by template matching between recogised characters and word strings in a keyword dictionary, and (ii) analysis of the positional or semantic relationship between the targets by point-pattern matching between these targets and word location information in the keyword dictionary. All characters in the document are recognised by means of a conventional character-recognition method. An automatic keyword-determination method, which is necessary for making a keyword dictionary beforehand, is also proposed. This method selects the most suitable keywords from a general word dictionary by measuring the uniqueness of keywords and the stability of their recognition. Experiments using 671 sample documents with 107 different forms in total confirmed that (i) the keyword-determination method can determine sets of keywords automatically in 92.5% of 107 different forms and (ii) that the form-identification method can correctly identify 97.1% of 671 document samples at a rejection rate 2.9%.

References

  1. 1.
    M. Asano and S. Shimotsuji, “Form Document Identification Using Cell Structures,” Technical Report of IEICE, PRU95-61, pp. 67–72, 1995 (in Japanese).Google Scholar
  2. 2.
    Q. Luo, T. Watanabe and N. Sugie, “Structure Recognition of Various Kinds of Table-Form Documents,” Trans. of IEICE, Vol, J76-D-II, No. 10, pp. 2165–2176, 1993 (in Japanese).Google Scholar
  3. 3.
    M. Ishida and T. Watanabe, “An Approach to Recover Recognition Failure in Understanding Table-form Documents,” Technical Report of IEICE, PRU94-35, pp. 65–72, 1994 (in Japanese).Google Scholar
  4. 4.
    T. Watanabe and T. Fukumura, “A Framework for Validating Recognized Results In Understanding Table-form Document Images,” Proc. of ICDAR’ 95, pp. 536–539, 1995.Google Scholar
  5. 5.
    F. Cesarini, M. Gori, S. Marinai and G. Soda, “INFORMys: A Flexible Invoice-Like Form-Reader System,” IEEE Trans. on PAMI, Vol.20, No. 7, pp. 730–745, 1998.Google Scholar
  6. 6.
    S.L. Lam and S. N. Srihari, “Multi-Domain Document Layout Understanding,” Proc. of ICDAR’ 93, pp. 497–501, 1993.Google Scholar
  7. 7.
    H. Sako, M. Fujio and N. Furukawa, “The Constellation Matching and Its Application,” Proc. of ICIP 2001, pp. 790–793, 2001.Google Scholar
  8. 8.
    H. Fujisawa, Y. Nakano and K. Kurino, “Segmentation Methods for Character Recognition: From Segmentation to Document Structure Analysis,” Proc. of the IEEE, Vol. 80, No. 7, pp. 1079–1092, 1992.Google Scholar
  9. 9.
    N. Furukawa, A, Imaizumi, M. Fujio and H. Sako, “Document Form Identification Using Constellation Matching,” Technical Report of IEICE, PRMU2001-125, pp. 85–92, 2001. (in Japanese)Google Scholar
  10. 10.
    H. Shinjo, K. Nakashima, M. Koga, K. Marukawa, Y. Shima and E. Hadano, “A Method for Connecting Disappeared Junction Patterns on Frame Lines in Form Documents,” Proc. of ICDAR’ 97, pp. 667–670, 1997.Google Scholar
  11. 11.
    H. Shinjo, E. Hadano, K. Marukawa, Y. Shima and H. Sako, “A Recursive Analysis for Form Cell Recognition,” Proc. of ICDAR 2001, pp. 694–698, 2001.Google Scholar
  12. 12.
    H. Ikeda, Y. Ogawa, M. Koga, H. Nishimura, H. Sako and H. Fujisawa, “A Recognition Method for Touching Japanese Handwritten Characters,” Proc. of ICDAR’ 99, pp. 641–644, 1999.Google Scholar
  13. 13.
    F. Kimura, M. Shridhar and Z. Chen, “Improvements of a lexicon directed algorithm for recognition of unconstrained handwritten words,” Proc. of ICDAR’ 93, pp. 18–22, 1993.Google Scholar
  14. 14.
    F. Kimura, S. Tsuruoka, Y. Miyake and M. Shridhar, “A lexicon directed algorithm for recognition of unconstrained handwritten words,” IEICE Trans. Info. & Syst., Vol. E77-D, No. 7, pp. 785–793, 1994. (in Japanese)Google Scholar
  15. 15.
    H. Bunke, “A fast algorithm for finding the nearest neighbor of a word in a dictionary,” Report of Institut fur Informatik und Angewandte Mathematik, Universitat Bern, 1993.Google Scholar
  16. 16.
    M. Fujio, N. Furukawa, S. Watanabe and H. Sako, “Automatic Generation of Keyword Dictionary for Efficient Document Form Identification,” Technical Report of IEICE, PRMU2001-126, pp. 93–98, 2001 (in Japanese).Google Scholar
  17. 17.
    H. Sakou, T. Miyatake, S. Kashioka and M. Ejiri, “A Position Recognition Algorithm for Semiconductor Alignment Based on Structural Pattern Matching,” IEEE Trans. Acoustic, Speech, Signal Processing, ASSP-37, pp. 2148–2157, Dec. 1989.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Hiroshi Sako
    • 1
  • Naohiro Furukawa
    • 1
  • Masakazu Fujio
    • 1
  • Shigeru Watanabe
    • 2
  1. 1.Central Research Laboratory, Hitachi, Ltd.TokyoJAPAN
  2. 2.Mechatronics Systems Division, Hitachi, Ltd.AichiJAPAN

Personalised recommendations