Advertisement

Abstract

In this work, we propose a new scheme for the recognition of document images under a syntactic approach. We present a new method to model the layout of the document using a tree-like representation of the form. The syntactic representation of the documents are used to infer a tree automaton for each one of the classes involved in the task. An error-correcting analysis of tree languages allows us to carry out the classification. The experimentation carried out showed the good behaviour of the approach: error rate of 1.18%.

Keywords

Machine Intelligence Document Image Tree Automaton Tree Language Syntactic Approach 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Cattoni, R., Coianiz, T., Messelodi, S., Modena, C.M.: Geometric layout analysis techniques for document image understanding. Technical report, Instituto Trentino di Cultura (1998)Google Scholar
  2. 2.
    Cesarini, F., Gori, M., Marinai, S., Soda, G.: Informys: a flexible invoice-line formreader system. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(7), 730–745 (1995)CrossRefGoogle Scholar
  3. 3.
    Dimmick, D.L., Garris, M.D.: Nist special database 6. structured forms database 2. Technical report, National Institute od Standards and Technology. Advanced Systems Division. Image Recognition Group (1992)Google Scholar
  4. 4.
    Dori, D., Doermann, D., Shin, C., Haralick, R., Phillips, I., Buchman, M., Ross, D.: Handbook on Optical Character Recognition and Document Image Analysis. In: The representation of document structure: a generic object-process analysis, World Scientific Pub. Co., Singapore (1996)Google Scholar
  5. 5.
    Fernau, H.: Learning tree languages from text. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 153–168. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    García, P.: Learning k-testable tree sets from positive data. Technical Report DSIC/II/46/1993, Departamento de Sistemas Informáticos y Computación. Universidad Politécnica de Valencia (1993), Available on http://www.dsic.upv.es/users/tlcc/tlcc.html
  7. 7.
    Hunter, G.M., Steiglitz, K.: Operations on images using quad trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 1(2), 145–153 (1979)CrossRefGoogle Scholar
  8. 8.
    Knuutila, T.: Inference of k-Testable Tree Languages. In: Advances in Structural and Syntactic Pattern Recognition: Proc. of the International Workshop, pp. 109–120. World Scientific, Singapore (1992)CrossRefGoogle Scholar
  9. 9.
    Lee, S.W., Ryu, D.S.: Parameter-free geometric document layout analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(11), 1240–1256 (2001)CrossRefGoogle Scholar
  10. 10.
    López, D., España, S.: Error correcting tree language inference. Pattern Recognition Letters 23(1-3), 1–12 (2002)CrossRefGoogle Scholar
  11. 11.
    López, D.: Inferencia de lenguajes de árboles. PhD thesis, Departamento de Sistemas Informáticos y Computación. Universidad Politécnica de Valencia (2003) (in spanish)Google Scholar
  12. 12.
    López, D., Piñaga, I.: Syntactic pattern recognition by error correcting analysis on tree automata. In: Amin, A., Pudil, P., Ferri, F., Iñesta, J.M. (eds.) SPR 2000 and SSPR 2000. LNCS, vol. 1876, pp. 133–142. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. 13.
    López, D., Ruiz, J., García, P.: Inference of k-piecewise testable tree languages. In: Pattern Recognition and String Matching, Kluwer, Dordrecht (2003)Google Scholar
  14. 14.
    López, D., Sempere, J.M., García, P.: Error correcting analysis for tree languages. International Journal on Pattern Recognition and Artificial Intelligence 14(3), 357–368 (2000)CrossRefGoogle Scholar
  15. 15.
    Mäkinen, E.: On inferring linear single-tree languages. Information Processing Letters 73, 1–3 (2000)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Peng, H., Long, F., Chi, Z.: Document image recognition based on template matching of component block projections. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(9), 1188–1192 (2003)CrossRefGoogle Scholar
  17. 17.
    Peng, H., Long, F., Chi, Z., Siu, W.-C.: Document image template matching based on component block list. Pattern Recognition Letters 22, 1033–1042 (2001)zbMATHCrossRefGoogle Scholar
  18. 18.
    Ramdane, S., Taconet, B., Zahour, A.: Classification of forms with handwritten fields by planar hiddenmarkov models. Pattern Recognition 36, 1045–1060 (2003)CrossRefGoogle Scholar
  19. 19.
    Safari, R., Narasimhamurthi, N., Shridhar, M., Ahmadi, M.: Document registration using projective geometry. IEEE Transactions on Image Processing 6(9), 1337–1341 (1997)CrossRefGoogle Scholar
  20. 20.
    Tang, Y.Y., Cheriet, M., Liu, J., Said, J.N., Suen, C.Y.: Handbook of Pattern Recognition and Computer Vision, Document analysis and recognition by computers. World Scientific Pub. Co., Singapore (1999)Google Scholar
  21. 21.
    Watanabe, T., Luo, Q., Sugie, N.: Layout recognition of multi-kinds of tableform documents. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(4), 432–445 (1995)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Ignacio Perea
    • 1
  • Damián López
    • 1
  1. 1.Departamento de Sistemas Informáticos y ComputaciónUniversidad Politécnica de ValenciaValenciaSpain

Personalised recommendations