Advertisement

Notes on Contemporary Table Recognition

  • David W. Embley
  • Daniel Lopresti
  • George Nagy
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3872)

Abstract

The shift of interest to web tables in HTML and PDF files, coupled with the incorporation of table analysis and conversion routines in commercial desktop document processing software, are likely to turn table recognition into more of a systems than an algorithmic issue. We illustrate the transition by some actual examples of web table conversion. We then suggest that the appropriate target format for table analysis, whether performed by conventional customized programs or by off-the-shelf software, is a representation based on the abstract table introduced by X. Wang in 1996. We show that the Wang model is adequate for some useful tasks that prove elusive for less explicit representations, and outline our plans to develop a semi-automated table processing system to demonstrate this approach. Screen-snaphots of a prototype tool to allow table mark-up in the style of Wang are also presented.

Keywords

Rensselaer Polytechnic Institute Prototype Tool Portable Document Format Table Processing Array Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Embley, D.W., Hurst, M., Lopresti, D., Nagy, G.: Table processing paradigms: A research survey (2005) (in submission)Google Scholar
  2. 2.
    Hurst, M.: The Interpretation of Tables in Texts. PhD thesis, University of Edinburgh (2000)Google Scholar
  3. 3.
    Lopresti, D., Nagy, G.: Automated table processing: An (opinionated) survey. In: Proceedings of the Third IAPR International Workshop on Graphics Recognition, Jaipur, India, pp. 109–134 (1999)Google Scholar
  4. 4.
    Lopresti, D., Nagy, G.: A tabular survey of automated table processing. In: Chhabra, A.K., Dori, D. (eds.) GREC 1999. LNCS, vol. 1941, pp. 93–120. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, observations, transformations, and inferences. International Journal on Document Analysis and Recognition 7, 1–16 (2004)Google Scholar
  6. 6.
    Wang, X.: Tabular abstraction, editing, and formatting. PhD thesis, University of Waterloo (1996)Google Scholar
  7. 7.
    Douglas, S., Hurst, M., Quinn, D.: Using natural language processing for identifying and interpreting tables in plain text. In: Proceedings of the Symposium on Document Analysis and Information Retrieval (SDAIR 1995), Las Vegas, NV, pp. 535–545 (1995)Google Scholar
  8. 8.
    Hurst, M., Douglas, S.: Layout and language: Preliminary investigations in recognizing the structure of tables. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR 1997), pp. 1043–1047 (1997)Google Scholar
  9. 9.
    Embley, D., Tao, C., Liddle, S.: Automatically extracting ontologically specified data from HTML tables with unknown structure. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 322–327. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  10. 10.
    Embley, D., Tao, C., Liddle, S.: Automating the extraction of data from HTML tables with unknown structure. In: Data and Knowledge Engineering (2005) (in press)Google Scholar
  11. 11.
    Tijerino, Y.A., Embley, D.W., Lonsdale, D.W., Nagy, G.: Towards ontology generation from tables. World Wide Web Journal 8, 261–285 (2005)CrossRefGoogle Scholar
  12. 12.
    Zou, J.: Computer Assisted Visual InterActive Recognition. PhD thesis, Rensselaer Polytechnic Institute (2004)Google Scholar
  13. 13.
    Zou, J., Nagy, G.: Evaluation of model-based interactive flower recognition. In: Proceedings of the 17th International Conference on Pattern Recognition, vol. 2, pp. 311–314 (2004)Google Scholar
  14. 14.
    Pivk, A., Cimiano, P., Sure, Y.: From tables to frames. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 166–181. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  15. 15.
    Zanibbi, R., Blostein, D., Cordy, J.R.: The recognition strategy language. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition, Seoul, South Korea, pp. 565–569 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • David W. Embley
    • 1
  • Daniel Lopresti
    • 2
  • George Nagy
    • 3
  1. 1.Computer Science DepartmentBrigham Young UniversityProvo
  2. 2.Department of Computer Science and EngineeringLehigh UniversityBethlehem
  3. 3.Department of Electrical, Computer, and Systems EngineeringRensselaer Polytechnic InstituteTroy

Personalised recommendations