Advertisement

A Tabular Survey of Automated Table Processing

  • Daniel Lopresti
  • George Nagy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1941)

Abstract

Tables are the only acceptable means of communicating certain types of structured data. A precise definition of “tabularity” remains elusive because some bureaucratic forms, multicolumn text layouts, and schematic drawings share many characteristics of tables. There are significant differences between typeset tables, electronic files designed for display of tables, and tables in symbolic form intended for information retrieval. Although most research to date has addressed the extraction of low-level geometric information from scanned raster images of paper tables, the recent trend toward the analysis of tables in electronic form may pave the way to a higher level of table understanding. Recent research on table composition and table analysis has improved our understanding of the distinction between the logical and physical structures of tables, and has led to improved formalisms for modeling tables. The present study indicates that progress on half-a-dozen specific research issues would open the door to using existing paper and electronic tables for database update, tabular browsing, structured information retrieval through graphical and audio interfaces, multimedia table editing, and platform-independent display. Although tables are not a conventional format for conveying the primary content of technical papers, here we attempt to subdue our natural garrulity by adopting this genre to communicate what we have to say about tables entirely in tabular form.

Keywords

Document Image Graphic Recognition Document Recognition Relational DBMS Tsukuba Science City 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Abu-Tarif. Table processing and table understanding. Master’s thesis, Rensselaer Polytechnic Institute, May 1998. 100Google Scholar
  2. 2.
    J. F. Arias, S. Balasubramanian, A. Prasad, R. Kasturi, and A. Chhabra. Information extraction from telephone company drawings. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 729–732, Seattle, Washington, June 1994. 100Google Scholar
  3. 3.
    J. F. Arias, A. Chhabra, and V. Misra. Efficient interpretation of tabular documents. In Proceedings of the International Conference on Pattern Recognition (ICPR’96), volume III, pages 681–685, Vienna, Austria, August 1996. 100Google Scholar
  4. 4.
    J. F. Arias, A. Chhabra, and V. Misra. Interpreting and representing tabular documents. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 600–605, San Francisco, CA, June 1996. 100Google Scholar
  5. 5.
    J. F. Arias and R. Kasturi. Efficient techniques for line drawing interpretation and their application to telephone company drawings. Technical Report CSE TR CSE-95-020, Penn State University, August 1995. 100Google Scholar
  6. 6.
    S. Balasubramanian, S. Chandran, J. F. Arias, R. Kasturi, and A. Chhabra. Information extraction from tabular drawings. In Proceedings of Document Recognition I (IS&T/SPIE Electronic Imaging’94), volume 2181, pages 152–163, San Jose, CA, June 1994. 100Google Scholar
  7. 7.
    L. Bing, J. Zao, and X. Hong. New method for logical structure extraction of form document image. In Proceedings of Document Recognition and Retrieval VI (IS&T/SPIE Electronic Imaging’99), volume 3651, pages 183–193, San Jose, CA, January 1999. 100Google Scholar
  8. 8.
    S. Chandran and R. Kasturi. Structural recognition of tabulated data. In Proceedings of the Second International Conference on Document Analysis and Recognition (ICDAR’93), pages 516–519, Tsukuba Science City, Japan, October 1993. 100Google Scholar
  9. 9.
    A. K. Chhabra, V. Misra, and J. Arias. Detection of horizontal lines in noisy run length encoded images: The FAST method. In R. Kasturi and K. Tombre, editors, Graphics Recognition — Methods and Applications, volume 1072 of Lecture Notes in Computer Science, pages 35–48. Springer-Verlag, Berlin, Germany, 1996. 100Google Scholar
  10. 11.
    E. Codd. A relational model of data for large shared data banks. Communications of the ACM, 13(6), June 1970. 104Google Scholar
  11. 12.
    M. J. DeHaemer, G. Wright, and T. W. Dillon. Automated speech recognition for spreadsheet tasks: Performance effects for experts and novices. International Journal of Human-Computer Interaction, 6(3):299–318, 1994. 100CrossRefGoogle Scholar
  12. 13.
    S. Douglas, M. Hurst, and D. Quinn. Using natural language processing for identifying and interpreting tables in plain text. In Proceedings of the Symposium on Document Analysis and Information Retrieval (SDAIR’95), pages 535–545, Las Vegas, NV, April 1995. 100Google Scholar
  13. 14.
    D. Embley, B. Kurtz, and S. Woodfield. Object-oriented Systems Analysis: A Model Driven Apprach. Yourdon Press, 1992. 100, 104Google Scholar
  14. 15.
    M. Garris, S. Janet, and W. Klein. Federal Register document image database. In Proceedings of Document Recognition and Retrieval VI (IS&T/SPIE Electronic Imaging’99), volume 3651, pages 97–108, San Jose, CA, January 1999. 100Google Scholar
  15. 16.
    P. Gray, S. Embury, W. Gray, and K. Hui. An agent-based system for handling distributed design constraints. In Proceedings of Agents’98, 1998. 100Google Scholar
  16. 17.
    E. A. Green. Model-based analysis of printed tables. PhD thesis, Rensselaer Polytechnic Institute, May 1996. 100Google Scholar
  17. 18.
    E. A. Green and M. Krishnamoorthy. Model-based analysis of printed tables. In Proceedings of the Third International Conference on Document Analysis and Recognition (ICDAR’95), pages 214–217, Montréal, Canada, August 1995. 100, 104Google Scholar
  18. 19.
    E. A. Green and M. Krishnamoorthy. Model-based analysis of printed tables. In Proceedings of the First International Workshop on Graphics Recognition (GREC’95), pages 234–242, PA, 1995. 100, 104Google Scholar
  19. 20.
    E. A. Green and M. Krishnamoorthy. Recognition of tables using table grammars. In Proceedings of the Symposium on Document Analysis and Information Retrieval (SDAIR’95), pages 261–277, Las Vegas, NV, April 1995. 100, 104Google Scholar
  20. 21.
    T. B. Haas. The development of a prototype knowledge-based table-processing system. Master’s thesis, Brigham Young University, December 1997. 100, 104Google Scholar
  21. 22.
    R. Hall. Handbook of Tabular Presentation. The Ronald Press Company, New York, NY, 1943. 100Google Scholar
  22. 23.
    Y. Hirayama. A method for table structure analysis using DP matching. In Proceedings of the Third International Conference on Document Analysis and Recognition (ICDAR’95), pages 583–586, Montréal, Canada, August 1995. 100Google Scholar
  23. 24.
    O. Hori and D. S. Doermann. Robust table-form structure analysis based on boxdriven reasoning. In Proceedings of the Third International Conference on Document Analysis and Recognition (ICDAR’95), pages 218–221, Montréal, Canada, August 1995. 100Google Scholar
  24. 25.
    J. Hu, R. Kashi, D. Lopresti, and G. Wilfong. Medium-independent table detection. In Proceedings of Document Recognition and Retrieval VII (IS&T/SPIE Electronic Imaging’00), San Jose, CA, January 2000. To appear. 100Google Scholar
  25. 26.
    T. Hu. Recognizing table entries in a scanned document. Master’s thesis, Rensselaer Polytechnic Institute, October 1993. 100Google Scholar
  26. 27.
    M. Hurst and S. Douglas. Layout and language: Preliminary investigations in recognizing the structure of tables. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR’97), pages 1043–1047, August 1997. 100, 104Google Scholar
  27. 28.
    K. Itonori. A table structure recognition based on textblock arrangement and ruled line position. In Proceedings of the Second International Conference on Document Analysis and Recognition (ICDAR’93), pages 765–768, Tsukuba Science City, Japan, October 1993. 100Google Scholar
  28. 29.
    T. G. Kieninger. Table structure recognition based on robust block segmentation. In Proceedings of Document Recognition V (IS&T/SPIE Electronic Imaging’98), volume 3305, pages 22–32, San Jose, CA, January 1998. 100Google Scholar
  29. 30.
    W. Kornfeld and J. Wattecamps. Automatically locating, extracting and analyzing tabular data. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 347–348, Melbourne, Australia, August 1998. 100Google Scholar
  30. 31.
    M. Krishnamoorthy. TBL, an easy to use table description language. Internal document, Rensselaer Polytechnic Institute, 1992. 100Google Scholar
  31. 32.
    G. Kyriazis. Analysis of digitized tables. Senior project report, Rensselaer Polytechnic Institute, 1990. 100Google Scholar
  32. 33.
    L. Lamport. LATEX: A Document Preparation System. Addison-Wesley, Reading, MA, 1985. 100Google Scholar
  33. 34.
    A. Laurentini and P. Viada. Identifying and understanding tabular material in compound documents. In Proceedings of the Eleventh International Conference on Pattern Recognition (ICPR’92), pages 405–409, The Hague, 1992. 100Google Scholar
  34. 35.
    M. Lesk. Tbl — a program to format tables. In UNIX Programmer’s Manual, volume 2A. Bell Telephone Laboratories, Murray Hill, NJ, 1979. 100Google Scholar
  35. 36.
    D. Lopresti and G. Nagy. Automated table processing: An (opinionated) survey. In Proceedings of the Third IAPR International Workshop on Graphics Recognition, pages 109–134, Jaipur, India, September 1999. 94Google Scholar
  36. 39.
    G. Nagy, M. Krishnamoorthy, S. Seth, and M. Viswanathan. Syntactic segmentation and labeling of digitized pages from technical journals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(7):737–747, 1993. 100CrossRefGoogle Scholar
  37. 40.
    G. Nagy and S. Seth. Hierarchical representation of optically scanned documents. In Proceedings the International Conference on Pattern Recognition (ICPR), pages 347–349, 1984. 100Google Scholar
  38. 41.
    C. Peterman, C. H. Chang, and H. Alam. A system for table understanding. In Proceedings of the Symposium on Document Image Understanding Technology (SDIUT’97), pages 55–62, Annapolis, MD, April/May 1997. 94, 100Google Scholar
  39. 42.
    P. Pyreddy and W. B. Croft. TINTIN: A system for retrieval in text tables. Technical Report UM-CS-1997-002, University of Massachusetts, Amherst, January 1997. 100Google Scholar
  40. 43.
    M. A. Rahgozar and R. Cooperman. A graph-based table recognition system. In Proceedings of Document Recognition III (IS&T/SPIE Electronic Imaging’96), volume 2660, pages 192–203, San Jose, CA, January 1996. 100Google Scholar
  41. 45.
    D. Rus and D. Subramanian. Customizing information capture and access. ACM Transactions on Information Systems, 15(1):67–101, 1997. 100CrossRefGoogle Scholar
  42. 46.
    J. H. Shamalian, H. S. Baird, and T. L. Wood. A retargetable table reader. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR’97), pages 158–163, August 1997. 100Google Scholar
  43. 47.
    R. Sproat, J. Hu, and H. Chen. EMU: an e-mail preprocessor for text-to-speech. In Proceedings of the IEEE Workshop on Multimedia Signal Processing, pages 239–244, Los Angeles, CA, December 1998. 100Google Scholar
  44. 48.
    E. R. Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire, CT, 1983. 100Google Scholar
  45. 49.
    E. Turolla, Y. Belaid, and A. Belaid. Form item extraction based on line searching. In R. Kasturi and K. Tombre, editors, Graphics Recognition — Methods and Applications, volume 1072 of Lecture Notes in Computer Science, pages 69–79. Springer-Verlag, Berlin, Germany, 1996. 100Google Scholar
  46. 50.
    M. A. Walker, J. Fromer, G. D. Fabbrizio, C. Mestel, and D. Hindle. What can I say?: Evaluating a spoken language interface to email. In Proceedings of the Conference on Human Factors in Computing Systems (CHI), pages 582–589, Los Angeles, CA, April 1998. 100Google Scholar
  47. 51.
    X. Wang. Tabular abstraction, editing, and formatting. PhD thesis, University of Waterloo, 1996. 99, 100, 102, 104Google Scholar
  48. 52.
    T. Watanabe, Q. L. Quo, and N. Sugie. Layout recognition of multi-kinds of table-form documents. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(4):432–445, 1995. 100, 104CrossRefGoogle Scholar
  49. 53.
    S. Whittaker and C. Sidner. Email overload: exploring personal information management of email. In Proceedings of the Conference on Human Factors in Computing Systems (CHI), pages 276–283, Vancouver, British Columbia, Canada, April 1996. 100Google Scholar
  50. 54.
    P. Wright. Using tabulated information. Ergonomics, 11(4):331–343, 1968. 100CrossRefGoogle Scholar
  51. 55.
    P. Wright. Understanding tabular displays. Visible Language, 7:351–359, 1973. 100Google Scholar
  52. 56.
    P. Wright. The comprehension of tabulated information: some similarities between prose and reading tables. NSPI Journal, XIX(8):25–29, October 1980. 100CrossRefGoogle Scholar
  53. 57.
    K. Zuyev. Table image segmentation. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR’97), pages 705–708, August 1997. 100Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Daniel Lopresti
    • 1
  • George Nagy
    • 2
  1. 1.Bell LabsLucent Technologies Inc.Murray Hill
  2. 2.Department of ElectricalTroy

Personalised recommendations