Advertisement

A simple and effective table detection system from document images

  • S. MandalEmail author
  • S. P. Chowdhury
  • A. K. Das
  • Bhabatosh Chanda
Regular Paper

Abstract

The requirement of detection and identification of tables from document images is crucial to any document image analysis and digital library system. In this paper we report a very simple but extremely powerful approach to detect tables present in document pages. The algorithm relies on the observation that the tables have distinct columns which implies that gaps between the fields are substantially larger than the gaps between the words in text lines. This deceptively simple observation has led to the design of a simple but powerful table detection system with low computation cost. Moreover, mathematical foundation of the approach is also established including formation of a regular expression for ease of implementation.

Keywords

Table detection Document image segmentation Digital document library 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baird, H.S.: Digital libraries and document image analysis. In: Proceedings of 7th Internatiolal Conference on Document Image Anlysis; vol. 1, pp. 2–14. IEEE Computer Society Los Alamitos, California (2003)Google Scholar
  2. 2.
    Belaid, Y., Panchevre, J.L., Belaid A.: Form analysisi by neural classification of cells. In: Proceedings of 3rd IAPR Workshop on Document Analysis Systems (DAS'98), pp. 69–78. Nagano, Japan (1998)Google Scholar
  3. 3.
    Chandran, S., Balasubramanian, S., Gandhi, T., Prasad, A., Kasturi, R., Chhabra, A.: Structure recognition and information extraction from tabular documents. IJIST 7(4), 289–303 (1996)CrossRefGoogle Scholar
  4. 4.
    Chowdhury, S.P., Mandal,S., Das, A.K., Chanda, B.: Automated segmentation of math-zones from document images. In: 7th International Conference on Document Analysis and Recognition, vol. 2, pp. 755–759. Edinburgh, UK (2003)Google Scholar
  5. 5.
    Das, A.K.: Document image segmentation: a morphological approach. PhD thesis, Bengal Engineering College (Deemed University), Sibpur, India (1998)Google Scholar
  6. 6.
    Das, A.K., Chanda, B.: Text segmentation from document images: a morphological approach. J Institute Eng. 1(77), 50–56 (1996)Google Scholar
  7. 7.
    Das, A.K., Chanda, B.: Detection of tables and headings from document image: a morphological approach. In: International Conference on Computational linguistics, Speech and Document Processing (ICCLSDP'98), pp. A57–A64. Calcutta, India, (1998)Google Scholar
  8. 8.
    Das, A.K., Chanda, B.: A fast algorithm for skew detection of document images using morphology. Int. J. Doc. Anal. Recog. 4, 109–114 (2001)CrossRefGoogle Scholar
  9. 9.
    Gonzalez, R.C., Wood, R.: Digital Image Processing. Addision-Wesley, Reading, MA (1992)Google Scholar
  10. 10.
    Hu, J., Kashi, R., Lopresti, D., Wilfong, G.: Medium-independent table detection. In: SPIE Document Recognition and Retrieval VII, pp. 291–302. San Jose, CA (2000)Google Scholar
  11. 11.
    Itonori, K.: Table structure recognition based on textblock arrangement and ruled Line position. In: Proceedings of ICDAR, pp. 765–768 (1993)Google Scholar
  12. 12.
    Joseph, S.H.: Processing of engineering line drawings for automatic input to cad. Pattern Recog. 22, 1–11 (1989)CrossRefGoogle Scholar
  13. 13.
    Katsura, E., Takasu, A., Hara, S., Aizawa, A.: Design considerations for capturing an electronic library. Inf. Serv. Use, pp. 99–112 (1992)Google Scholar
  14. 14.
    Kieninger, T.G.: Table structure recognition based on robust block segmentation. In: Proceedings Document Recognition V, SPIE, vol. 3305, pp. 22–32. San Jose, California (1998)Google Scholar
  15. 15.
    Liu, J., Wu, X.: Description and recognition of form and automated form data entry. In: Proceedings of 3rd International Conference on Document Analysis and Recognition (ICDAR'95), pp. 579–582 (1995)Google Scholar
  16. 16.
    Otsu, N.: A threshold selection method from gray-level histogram. IEEE Trans. SMC 9(1), 62–66 (1979)MathSciNetGoogle Scholar
  17. 17.
    Ramel, J.-Y., Crucianu, M., Vincent, N., Faure, C.: Detection, extraction and representation of tables. In: 7th International Conference on Document Analysis and Recognition, vol. 1, pp. 374–378. Edinburgh, UK (2003)Google Scholar
  18. 18.
    Satoh, S., Takasu, A., Katsura, E.: An automated generation of electronic library based on document image understanding. In: Proceedings of ICDAR 1995, pp. 163–166 (1995)Google Scholar
  19. 19.
    Tanaka, T., Tsuruoka, S.: Table form document understanding using node classification method and html document generation. In: Proceedings of 3rd IAPR Workshop on Document Analysis Systems (DAS '98), pp. 157–158. Nagano, Japan (1998)Google Scholar
  20. 20.
    Tersteegen, W.T., Wenzel, C.: Scantab: table recognition by reference tables. In: Proceedings of 3rd IAPR workshop on Document Analysis Systems (DAS'98), pp. 356–365. Nagano, Japan (1998)Google Scholar
  21. 21.
    Tsuruoka, S., Takao, K., Tanaka, T., Yoshikawa, T., Shinogi, T.: Region segmentation for table image with unknown complex structure. In: Proceedings of ICDAR'2001, pp. 709–713 (2001)Google Scholar
  22. 22.
    Watanabe, T., Luo, Q.L., Sugie, N.: Layout recognition of multi-kinds of table-form documents. IEEE Trans. on Pattern Anal. and Machine Intell. 17(4), 432–446 (1995)CrossRefGoogle Scholar
  23. 23.
    Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: models, observations, transformations, and inferenences. IJDAR 7(1), 1–16 (2004)CrossRefGoogle Scholar
  24. 24.
    Zuyev, K.: Table image image segmentation. In: Proceedings of ICDAR'1997, pp. 705–707. Ulm, Germany (1997)Google Scholar

Copyright information

© Springer-Verlag 2005

Authors and Affiliations

  • S. Mandal
    • 1
    Email author
  • S. P. Chowdhury
    • 1
  • A. K. Das
    • 1
  • Bhabatosh Chanda
    • 2
  1. 1.CST Department, Bengal Engineering College (DU)SibpurHowrah
  2. 2.ECS Unit, Indian Statistical UnitCalcuttaIndia

Personalised recommendations