Skip to main content
Log in

Table form document analysis based on the document structure grammar

  • Original Paper
  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Structure analysis of table form documents is an important issue because a printed document and even an electronic document do not provide logical structural information but merely geometrical layout and lexical information. To handle these documents automatically, logical structure information is necessary. In this paper, we first analyze the elements of the form documents from a communication point of view and retrieve the grammatical elements that appear in them. Then, we present a document structure grammar which governs the logical structure of the form documents. Finally, we propose a structure analysis system of the table form documents based on the grammar. By using grammar notation, we can easily modify and keep it consistent, as the rules are relatively simple. Another advantage of using grammar notation is that it can be used for generating documents only from logical structure. In our system, documents are assumed to be composed of a set of boxes and they are classified as seven box types. Then the box relations between the indication box and its associated entry box are analyzed based on the semantic and geometric knowledge defined in the document structure grammar. Experimental results have shown that the system successfully analyzed several kinds of table forms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lopresti, D., Nagy, G.: A tabular survey of automated table processing. In: Proceedings of GREC, LNCS 1941, 93–120 (2000)

    Google Scholar 

  2. Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition Models, observations, transformations, and inferences. Int. J. Document Anal. Recogn. (online) (2004)

  3. Tang, Y.Y., Ma, H., Liu, J., Li, B.F., Xi, D.: Multiresolution analysis in extraction of reference lines from documents with gray level background. IEEE Trans. Pattern Anal. Mach. Intell. 19(8), 921–925 (1997)

    Article  Google Scholar 

  4. Yu, B., Jain, A.K.: A generic system for form dropout. IEEE Trans. Pattern Anal. Mach. Intell. 18(11), 1127–1134 (1996)

    Article  Google Scholar 

  5. Liu, J., Jain, A.K.: Image-based form document retrieval. In: Proceedings of ICPR, pp. 626–628 (1998)

  6. Liu, J., Ding, X., Wu, Y.: Description and recognition of form and automated form data entry. In: Proceedings of ICDAR, pp. 579–582 (1995)

  7. Shimotsuji, S., Asano, M.: Form identification based on cell structure. In: Proceedings of ICPR, pp. 793–797 (1996)

  8. Hirayama, Y.: Analyzing form images by using lineshared-adjacent cell relations. In: Proceedings of ICPR, pp. 768–772 (1996)

  9. Lin, J., Lee, C., Chen, Z.: Identification of business forms using relationships between adjacent frames. Mach.Vision Appl. 9, 56–64 (1996)

    Google Scholar 

  10. Duygulu, P., Atalay, V., Dincel, E.: A heuristic algorithm for hierarchicalrepresen tation of form documents. In: Proceedings of ICPR, pp. 929–931 (1998)

  11. Duygulu, P., Atalay, V.: A hierarchical representation of form documents for identification and retrieval. Int. J. Document Anal. Recogn. 5, 17–27 (2002)

    Article  MATH  Google Scholar 

  12. Ishitani, Y.: Flexible and robust model matching based on association graph for form image understanding. Pattern Anal. Appl. 3, 104–119 (2000)

    Article  Google Scholar 

  13. Watanabe, T., Luo, Q., Sugie, N.: Layout recognition of multi-kinds of table-form documents. IEEE Trans. Pattern Anal. Mach. Intell. 17(4), 432–445 (1995)

    Article  Google Scholar 

  14. Cesarini, F., Gori, M., Marinai, S., Soda, G.: INFORMys: A flexible invoice-like form-reader system. IEEE Trans. Pattern Anal. Mach. Intell. 20(7), 730–745 (1998)

    Article  Google Scholar 

  15. Bing, L., Zao, J., Hong, Z., Ostgathe, T.: New method for logical structure extraction of form document image. SPIE Proc. 3651, 183–193 (1999)

    Article  Google Scholar 

  16. Rahgozar, M., Cooperman, R.: A graph-based table recognition system. SPIE Proc. 2660, 192–203 (1996)

    Article  Google Scholar 

  17. Cracknell, C., Downton, A.C., Du, L.: An object-oriented form description language and approach to handwritten form processing. In: Proceedings of ICDAR, pp. 180–184 (1997)

  18. Belaid, A.: Recognition of table of contents for electronic library consulting. Int. J. Document Anal. Recogn. 4, 35–45 (2001)

    Article  Google Scholar 

  19. Information processing Text and office systems—Standard Generalized Markup Language (SGML). ISO 8879:1986

  20. DMA The Document Management Alliance. Available at http://www.infonuovo.com/dma/

  21. Information technology Open Document Architecture (ODA) and interchange format: Document structures. ISO/IEC, 8613-2: 1995

  22. Amano, A., Asada, N.: Graph grammar based analysis system of complex table form document. In: Proceedings of ICDAR, pp. 916–920 (2003)

  23. Costagliola, G., Chang, S., Tomita, M.: Parsing 2D Languages by a Pictorial GLR parser. In: Proceedings of the International Workshop on Advanced Visual Interfaces, pp. 27–29 (1992)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akira Amano.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amano, A., Asada, N., Mukunoki, M. et al. Table form document analysis based on the document structure grammar. IJDAR 8, 201–213 (2006). https://doi.org/10.1007/s10032-005-0008-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-005-0008-3

Keywords

Navigation