Abstract
Structure analysis of table form documents is an important issue because a printed document and even an electronic document do not provide logical structural information but merely geometrical layout and lexical information. To handle these documents automatically, logical structure information is necessary. In this paper, we first analyze the elements of the form documents from a communication point of view and retrieve the grammatical elements that appear in them. Then, we present a document structure grammar which governs the logical structure of the form documents. Finally, we propose a structure analysis system of the table form documents based on the grammar. By using grammar notation, we can easily modify and keep it consistent, as the rules are relatively simple. Another advantage of using grammar notation is that it can be used for generating documents only from logical structure. In our system, documents are assumed to be composed of a set of boxes and they are classified as seven box types. Then the box relations between the indication box and its associated entry box are analyzed based on the semantic and geometric knowledge defined in the document structure grammar. Experimental results have shown that the system successfully analyzed several kinds of table forms.
Similar content being viewed by others
References
Lopresti, D., Nagy, G.: A tabular survey of automated table processing. In: Proceedings of GREC, LNCS 1941, 93–120 (2000)
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition Models, observations, transformations, and inferences. Int. J. Document Anal. Recogn. (online) (2004)
Tang, Y.Y., Ma, H., Liu, J., Li, B.F., Xi, D.: Multiresolution analysis in extraction of reference lines from documents with gray level background. IEEE Trans. Pattern Anal. Mach. Intell. 19(8), 921–925 (1997)
Yu, B., Jain, A.K.: A generic system for form dropout. IEEE Trans. Pattern Anal. Mach. Intell. 18(11), 1127–1134 (1996)
Liu, J., Jain, A.K.: Image-based form document retrieval. In: Proceedings of ICPR, pp. 626–628 (1998)
Liu, J., Ding, X., Wu, Y.: Description and recognition of form and automated form data entry. In: Proceedings of ICDAR, pp. 579–582 (1995)
Shimotsuji, S., Asano, M.: Form identification based on cell structure. In: Proceedings of ICPR, pp. 793–797 (1996)
Hirayama, Y.: Analyzing form images by using lineshared-adjacent cell relations. In: Proceedings of ICPR, pp. 768–772 (1996)
Lin, J., Lee, C., Chen, Z.: Identification of business forms using relationships between adjacent frames. Mach.Vision Appl. 9, 56–64 (1996)
Duygulu, P., Atalay, V., Dincel, E.: A heuristic algorithm for hierarchicalrepresen tation of form documents. In: Proceedings of ICPR, pp. 929–931 (1998)
Duygulu, P., Atalay, V.: A hierarchical representation of form documents for identification and retrieval. Int. J. Document Anal. Recogn. 5, 17–27 (2002)
Ishitani, Y.: Flexible and robust model matching based on association graph for form image understanding. Pattern Anal. Appl. 3, 104–119 (2000)
Watanabe, T., Luo, Q., Sugie, N.: Layout recognition of multi-kinds of table-form documents. IEEE Trans. Pattern Anal. Mach. Intell. 17(4), 432–445 (1995)
Cesarini, F., Gori, M., Marinai, S., Soda, G.: INFORMys: A flexible invoice-like form-reader system. IEEE Trans. Pattern Anal. Mach. Intell. 20(7), 730–745 (1998)
Bing, L., Zao, J., Hong, Z., Ostgathe, T.: New method for logical structure extraction of form document image. SPIE Proc. 3651, 183–193 (1999)
Rahgozar, M., Cooperman, R.: A graph-based table recognition system. SPIE Proc. 2660, 192–203 (1996)
Cracknell, C., Downton, A.C., Du, L.: An object-oriented form description language and approach to handwritten form processing. In: Proceedings of ICDAR, pp. 180–184 (1997)
Belaid, A.: Recognition of table of contents for electronic library consulting. Int. J. Document Anal. Recogn. 4, 35–45 (2001)
Information processing Text and office systems—Standard Generalized Markup Language (SGML). ISO 8879:1986
DMA The Document Management Alliance. Available at http://www.infonuovo.com/dma/
Information technology Open Document Architecture (ODA) and interchange format: Document structures. ISO/IEC, 8613-2: 1995
Amano, A., Asada, N.: Graph grammar based analysis system of complex table form document. In: Proceedings of ICDAR, pp. 916–920 (2003)
Costagliola, G., Chang, S., Tomita, M.: Parsing 2D Languages by a Pictorial GLR parser. In: Proceedings of the International Workshop on Advanced Visual Interfaces, pp. 27–29 (1992)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Amano, A., Asada, N., Mukunoki, M. et al. Table form document analysis based on the document structure grammar. IJDAR 8, 201–213 (2006). https://doi.org/10.1007/s10032-005-0008-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-005-0008-3