Advertisement

Locating Candidate Tables in a Spreadsheet Rendered Web Page

Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 199)

Abstract

A method to locate web table(s) is presented in this paper. Web page is captured as a spread sheet grid of textual elements (web sheet) with all visual attributes retained, using a spread sheet software. The leaf tables in that web page are captured in a separate sheet using DOM analysis (DOM sheet). Locating a table in a web sheet consists of two sub tasks namely locating the start point and the end point of the table. Start point is located by text comparison of the table elements from DOM sheet with that of web sheet. End point is located by navigating through the web sheet with located start point. Rows, columns information needed for navigation are used from DOM sheet. This method is tested for arbitrarily selected 60 URLs containing 450 leaf tables and in more than 90% of the cases, tables were located correctly.

Keywords

information extraction web mining web table location spreadsheet grid 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K.: A Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering, 1411–1428 (2006)Google Scholar
  2. 2.
    Chen, H.-H., Tsai, S.-C., Tsai, J.-H.: Mining tables from large scale HTML texts. In: Proc. 18th COLING, pp. 166–172. Morgan Kaufmann (2000)Google Scholar
  3. 3.
    Cohen, W.W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in HTML documents. In: Proc. 11th WWW, pp. 232–241. ACM (2002)Google Scholar
  4. 4.
    Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S., Teixeir, J.S.: A Brief Survey of Web Data Extraction Tools. SIGMOD Record 31(2), 84–93 (2002)CrossRefGoogle Scholar
  5. 5.
    Millard, B.T.: Collections of Interesting Data Tables (2007), http://projectcerbera.com/web/study/2007/tables/ (accessed August 2, 2009)
  6. 6.
    Muslea, I.: Extraction Patterns for Information Extraction Tasks: A Survey. In: Proc. AAAI 1999 Workshop Machine Learning for Information Extraction, pp. 1–6 (1999)Google Scholar
  7. 7.
    Tengli, A., Yang, Y., Ma, N.L.: Learning table extraction from examples. In: Proc. 20th COLING, pp. 987–993 (2004)Google Scholar
  8. 8.
    Wang, Y., Hu, J.: A Machine Learning Based Approach for Table Detection on the web. In: Proceedings of the 11th International Conference on World Wide Web, pp. 242–250 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Computer CentreMangalore UniversityMangalagangotriIndia
  2. 2.Department of StatisticsMangalore UniversityMangalagangotriIndia

Personalised recommendations