Advertisement

Rule-Based Table Analysis and Interpretation

  • Alexey ShigarovEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 538)

Abstract

Today, a huge amount of tables are presented in web pages, word documents, and spreadsheets. Many of them are unstructured tabular data. They are intended to be understood by humans but not to be interpreted by machines. At the same time, we often need to have that information in a structured form, e.g. relational databases. We propose a rule-based approach to table analysis and interpretation and demonstrate how it can be applied to transform tabular data from unstructured (spreadsheets) to structured (relational databases) form. The paper discusses representing tabular data as facts in the working memory of a rule engine, a formal language for defining rules of table analysis and interpretation, and its implementation.

Keywords

Table analysis and interpretation Table understanding Information extraction from tables Unstructured tabular data integration 

Notes

Acknowledgments

The research work was financially supported by the Russian Foundation for Basic Research (Grant No. 15-37-20042) and the Council for grants of the President of the Russian Federation (Grant No. SP-3387.2013.5).

References

  1. 1.
    Hurst, M.: Layout and language: challenges for table understanding on the web. In: 1st International Workshop on Web Document Analysis, pp. 27–30, Seattle (2001)Google Scholar
  2. 2.
    Embley, D.W., Hurst, M., Lopresti, D., Nagy, G.: Table-processing paradigms: a research survey. IJDAR 8(2), 66–86 (2006)CrossRefGoogle Scholar
  3. 3.
    Shigarov, A.O.: Table understanding using a rule engine. Expert Syst. Appl. 42(2), 929–937 (2015)CrossRefGoogle Scholar
  4. 4.
    Drools Expert. http://www.drools.org
  5. 5.
    Tijerino, Y.A., Embley, D.W., Lonsdale, D.W., Ding, Y., Nagy, G.: Towards ontology generation from tables. World Wide Web Internet Web Inf. Syst. 8(3), 261–285 (2005)CrossRefGoogle Scholar
  6. 6.
    Embley, D.W., Tao, C., Liddle, S.W.: Automating the extraction of data from HTML tables with unknown structure. Data Knowl. Eng. 54(1), 3–28 (2005)CrossRefGoogle Scholar
  7. 7.
    Wang, J., Wang, H., Wang, Z., Zhu, K.Q.: Understanding tables on the web. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012 Main Conference 2012. LNCS, vol. 7532, pp. 141–155. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  8. 8.
  9. 9.
    Gatterbauer, W., Bohunsky, P., Herzog, M., Krpl, B., Pollak, B.: Towards domain-independent information extraction from web tables. In: 16th International Conference on World Wide Web, pp. 71–80. ACM, Banff (2007)Google Scholar
  10. 10.
    Pivk, A., Cimianob, P., Sure, Y.: From tables to frames. Web Seman. Sci. Serv. Agents World Wide Web. 3(2–3), 132–146 (2005)CrossRefGoogle Scholar
  11. 11.
    Pivk, A., Cimiano, P., Sure, Y., Gams, M., Rajkovic, V., Studer, R.: Transforming arbitrary tables into logical form with TARTAR. Data Knowl. Eng. 60(3), 567–595 (2007)CrossRefGoogle Scholar
  12. 12.
    Kim, Y.-S., Lee, K.-H.: Extracting logical structures from HTML tables. Comput. Stan. Interfaces 30(5), 296–308 (2008)CrossRefGoogle Scholar
  13. 13.
    Embley, D.W., Nagy, G., Seth, S.: Transforming web tables to a relational database. In: 22nd International Conference on Pattern Recognition, pp. 2781–2786. IEEE Computer Society, Washington (2014)Google Scholar
  14. 14.
    Nagy, G., Embley, D.W., Seth, S.: End-to-end conversion of HTML tables for populating a relational database. In: 11th IAPR International Workshop on Document Analysis Systems, pp. 222–226. IEEE Computer Society, Troy (2014)Google Scholar
  15. 15.
    Wang, X.: Tabular Abstraction, Editing, and Formatting. PhD Thesis. University of Waterloo, Waterloo (1996)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Matrosov Institute for System Dynamics and Control Theory SB RASIrkutskRussia

Personalised recommendations