Skip to main content

Interactive Conversion of Web Tables

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6020))

Abstract

Two hundred web tables from ten sites were imported into Excel. The tables were edited as needed, then converted into layout independent Wang Notation using the Table Abstraction Tool (TAT). The output generated by TAT consists of XML files to be used for constructing narrow-domain ontologies. On an average each table required 104 seconds for editing. Augmentations like aggregates, footnotes, table titles, captions, units and notes were also extracted in an average time of 93 seconds. Every user intervention was logged and audited. The logged interactions were analyzed to determine the relative influence of factors like table size, number of categories and various types of augmentations on the processing time. The analysis suggests which aspects of interactive table processing can be automated in the near term, and how much time such automation would save. The correlation coefficient between predicted and actual processing time was 0.66.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tijerino, Y.A., Embley, D.W., Lonsdale, D.W., Ding, Y., Nagy, G.: Toward Ontology Generation from Tables. World Wide Web: Internet and Web Information Systems 8(3), 261–285 (2005)

    Google Scholar 

  2. Padmanabhan, R.: Table Abstraction Tool, RPI DocLab, Master’s Thesis, May 16 (2009)

    Google Scholar 

  3. Jha, P., Nagy, G.: Wang Notation Tool: Layout Independent Representation of Tables. In: Proceedings of the Nineteenth International Conference on Pattern Recognition (ICPR 2008), Tampa (April 2008)

    Google Scholar 

  4. Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, observations, transformations, and inferences. International Journal of Document Analysis and Recognition 7(1), 1–16 (2004)

    Google Scholar 

  5. Lopresti, D., Embley, D.W., Hurst, M., Nagy, G.: Table Processing Paradigms: A Research Survey. International Journal of Document Analysis and Recognition 8(2-3), 66–86 (2006)

    Article  Google Scholar 

  6. Sobue, T., Watanabe, T.: Identification of Item Fields in Table-form Documents with/without Line Segments. In: Proceedings of IAPR Workshop on Machine Vision Applications, Tokyo, Japan, November 12-14, pp. 522–525 (1996)

    Google Scholar 

  7. Klink, S., Kieninger, T.: Rule-based document structure understanding with a fuzzy combination of layout and textual features. International Journal of Document Analysis and Recognition 4(1), 18–26 (2001)

    Article  Google Scholar 

  8. Laurentini, A., Viada, P.: Identifying and understanding tabular material in compound documents. In: Proceedings of the Eleventh International Conference on Pattern Recognition (ICPR 1992), The Hague, pp. 405–409 (1992)

    Google Scholar 

  9. Itonori, K.: A table structure recognition based on textblock arrangement and ruled line position. In: Proceedings of the Second International Conference on Document Analysis and Recognition (ICDAR 1993), Tsukuba Science City, Japan, pp. 765–768 (1993)

    Google Scholar 

  10. Silva, E.C., Jorge, A.M., Torgo, L.: Design of an end-to-end method to extract information from tables. International Journal of Document Analysis and Recognition 8(2), 144–171 (2006)

    Article  Google Scholar 

  11. Krüpl, B., Herzog, M., Gatterbauer, W.: Using visual cues for extraction of tabular data from arbitrary HTML documents. In: Proceedings of the 14th Int’l. Conf. on World Wide Web, pp. 1000–1001 (2005)

    Google Scholar 

  12. Lopresti, D., Nagy, G.: Automated Table Processing: An (Opinionated) Survey. In: Proceedings of the Third IAPR International Workshop on Graphics Recognition, Jaipur, India, pp. 109–134 (September 1999)

    Google Scholar 

  13. Wang, Y., Hu, J.: Automatic Table Detection in HTML Documents. In: Web Document Analysis: Challenges and Opportunities, October 2003, pp. 135–154 (2003)

    Google Scholar 

  14. Handley, J.C.: Table analysis for multiline cell identification. In: Proceedings of Document Recognition and Retrieval VIII (IS\&T/SPIE Electronic Imaging), San Jose, CA, vol. 4307, pp. 44–55 (2001)

    Google Scholar 

  15. Jandhyala, R.C., Nagy, G., Seth, S., Silversmith, W., Krishnamoorthy, M., Padmanabhan, R.: From tessellations to table interpretation. In: Carette, J., Dixon, L., Coen, C.S., Watt, S.M. (eds.) Calculemus 2009. LNCS, vol. 5625, pp. 422–437. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  16. Embley, D.W., Lopresti, D., Nagy, G.: Notes on Contemporary Table Recognition Workshop on Document Analysis Systems. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 164–175. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  17. Wang, X.: Tabular Abstraction, Editing, and Formatting, Ph.D Dissertation, University of Waterloo, Waterloo, ON, Canada (1996)

    Google Scholar 

  18. Lopresti, D., Nagy, G.: A Tabular Survey of Automated Table Processing, Graphics Recognition: Recent Advances. In: Chhabra, A.K., Dori, D. (eds.) GREC 1999. LNCS, vol. 1941, pp. 93–120. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  19. Seth, S., Jandhyala, R., Krishnamoorthy, M., Nagy, G.: Analysis and Taxonomy of Column Header Categories for Web Tables. To appear in Proceedings of the Document Analysis Systems, Boston (June 2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Padmanabhan, R.K., Jandhyala, R.C., Krishnamoorthy, M., Nagy, G., Seth, S., Silversmith, W. (2010). Interactive Conversion of Web Tables. In: Ogier, JM., Liu, W., Lladós, J. (eds) Graphics Recognition. Achievements, Challenges, and Evolution. GREC 2009. Lecture Notes in Computer Science, vol 6020. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13728-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13728-0_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13727-3

  • Online ISBN: 978-3-642-13728-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics