Skip to main content

TabbyLD: A Tool for Semantic Interpretation of Spreadsheets Data

  • 213 Accesses

Part of the Communications in Computer and Information Science book series (CCIS,volume 1341)

Abstract

Spreadsheets are one of the most convenient ways to structure and represent statistical and other data. In this connection, automatic processing and semantic interpretation of spreadsheets data have become an active area of scientific research, especially in the context of integrating this data into the Semantic Web. In this paper, we propose a TabbyLD tool for semantic interpretation of data extracted from spreadsheets. Main features of our software connected with: (1) using original metrics for defining semantic similarity between cell values and entities of a global knowledge graph: string similarity, NER label similarity, heading similarity, semantic similarity, context similarity; (2) using a unified canonicalized form for representation of arbitrary spreadsheets; (3) integration TabbyLD with the TabbyDOC project’s tools in the context of the overall pipeline. TabbyLD architecture, main functions, a method for annotating spreadsheets including original similarity metrics, the illustrative example, and preliminary experimental evaluation are presented. In our evaluation, we used the T2Dv2 Gold Standard dataset. Experiments have shown the applicability of TabbyLD for semantic interpretation of spreadsheets data. We also identified some issues in this process.

Keywords

  • Semantic table interpretation
  • Annotation
  • Spreadsheet data
  • Entity linking
  • Linked data
  • Knowledge graph
  • DBpedia

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-68527-0_20
  • Chapter length: 19 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-68527-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

References

  1. Lehmberg, O., Ritze, D., Meusel, R., Bizer, C.: A large public corpus of web tables containing time and context metadata. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 75–76 (2016). https://doi.org/10.1145/2872518.2889386

  2. Star Open Data. https://5stardata.info. Accessed 19 Oct 2020

  3. TabbyLD. https://github.com/tabbydoc/tabbyld. Accessed 19 Oct 2020

  4. Shigarov, A.O., et al.: Towards End-to-End Transformation of Arbitrary Tables from Untagged Portable Documents (PDF) to Linked Data. CEUR Workshop Proceedings for the 2nd Scientific-practical Workshop Information Technologies: Algorithms, Models, Systems, vol. 2463, pp. 1–12 (2019)

    Google Scholar 

  5. T2Dv2 Gold Standard for Matching Web Tables to DBpedia. http://webdatacommons.org/webtables/goldstandardV2.html. Accessed 19 Oct 2020

  6. de Vos, M., Wielemaker, J., Rijgersberg, H., Schreiber, G., Wielinga, B., Top, J.: Combining information on structure and content to automatically annotate natural science spreadsheets. Int. J. Hum. Comput. Stud. 130, 63–76 (2017). https://doi.org/10.1016/j.ijhcs.2017.02.006

    CrossRef  Google Scholar 

  7. Zhang, Z.: Effective and Efficient Semantic Table Interpretation using TableMiner+. Semantic Web 8(6), 921–957 (2017). https://doi.org/10.3233/sw-160242

    CrossRef  Google Scholar 

  8. Ermilov, I. Ngomo, A.-C.N.: TAIPAN: automatic property mapping for tabular data. In: Proceedings of the 20th International Conference on European Knowledge Acquisition Workshop, EKAW, pp. 163–179 (2016). https://doi.org/10.1007/978-3-319-49004-5_11

  9. Ritze, D., Lehmberg, O., Bizer, C.: Matching HTML tables to DBpedia. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics (WIMS’15), pp. 1–6 (2015). https://doi.org/10.1145/2797115.2797118

  10. Mulwad, V., Finin, T., Joshi, A.: A Domain Independent Framework for Extracting Linked Semantic Data from Tables. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 7538, pp. 16–33. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34213-4_2

    CrossRef  Google Scholar 

  11. Kruit, B., Boncz, P., Urbani, J.: Extracting novel facts from tables for knowledge graph completion. In: Proceedings of the 18th International Semantic Web Conference (ISWC 2019), pp. 364–381 (2019). https://doi.org/10.1007/978-3-030-30793-6_21

  12. Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Matching web tables with knowledge base entities: from entity lookups to entity embeddings. In: Proceedings of the 16th International Semantic Web Conference (ISWC 2017), pp. 260–277 (2017). https://doi.org/10.1007/978-3-319-68288-4_16

  13. Ell, B., et al.: Towards l. In: Proceedings of the 5th International workshop on Linked Data for Information Extraction (LD4IE), pp. 1–12 (2017)

    Google Scholar 

  14. Wu, T., Yan, S., Piao, Z., Xu, L., Wang, R., Qi, G.: Entity linking in web tables with multiple linked knowledge bases. In: Proceedings of the 6th Joint International Semantic Technology Conference (JIST), pp. 239–253 (2016). https://doi.org/10.1007/978-3-319-50112-3_18

  15. Venetis, P., et al.: Recovering semantics of tables on the web. Proc. VLDB Endowment, 528–538 (2011). https://doi.org/10.14778/2002938.2002939

  16. Wang, J., Wang, H., Wang, Z., Zhu, K.Q.: Understanding tables on the web. In: Proceedings of the 31th International Conference on Conceptual Modeling (ER), pp. 141–155 (2012). https://doi.org/10.1007/978-3-642-34002-4_11

  17. Shen, W., Wang, J., Luo, P., Wang, M.: LIEGE: link entities in web lists with knowledge base. In: Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1424–1432 (2012). https://doi.org/10.1145/2339530.2339753

  18. Muñoz, E., Hogan, A., Mileo, A.: Using linked data to mine RDF from wikipedia’s tables. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 533–542 (2014). https://doi.org/10.1145/2556195.2556266

  19. Bhagavatula, C.S., Noraset, T., Downey, D.: TabEL: entity linking in web tables. In: Proceedings of the 14th International Semantic Web Conference (ISWC 2014), pp. 425–441 (2015). https://doi.org/10.1007/978-3-319-25007-6_25

  20. Bizer, C., et al.: DBpedia - a crystallization point for the web of data. J. Web Semantics 7(3), 154–165 (2009). https://doi.org/10.1016/j.websem.2009.07.002

    CrossRef  Google Scholar 

  21. Dorodnykh, N.O., Yurin, A.Y., Shigarov, A.O.: Conceptual model engineering for industrial safety inspection based on spreadsheet data analysis. In: Simian, D., Stoica, L.F. (eds.) MDIS 2019. CCIS, vol. 1126, pp. 51–65. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39237-6_4

    CrossRef  Google Scholar 

  22. Yurin, A.Yu., Dorodnykh, N.O.: A reverse engineering process for inferring conceptual models from canonicalized tables. In: Proceedings of the 2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON), pp. 485–490 (2020). https://doi.org/10.1109/SIBIRCON48586.2019.8958458

  23. Shigarov, A.O., Mikhailov, A.A.: Rule-based spreadsheet data transformation from arbitrary to relational tables. Inf. Syst. 71, 123–136 (2017). https://doi.org/10.1016/j.is.2017.08.004

    CrossRef  Google Scholar 

  24. Tijerino, Y.A., Embley, D.W., Lonsdale, D.W., Ding, Y., Nagy, G.: Towards ontology generation from tables. World Wide Web Internet Web Inf. Syst 8(8), 261–285 (2005). https://doi.org/10.1007/s11280-005-0360-8

    CrossRef  Google Scholar 

  25. SPARQL 1.1 Query Language. https://www.w3.org/TR/sparql11-query/. Accessed 19 Oct 2020

  26. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Technical report 8, Soviet Physics Doklady (1966)

    Google Scholar 

  27. Stanford CoreNLP, https://stanfordnlp.github.io/CoreNLP/. Last accessed 19 Oct 2020

  28. Stanford CoreNLP - Named Entity Recognition. https://stanfordnlp.github.io/CoreNLP/ner.html. Accessed 19 Oct 2020

  29. TabbyPDF. PDF table extraction tool. http://cells.icc.ru/pdfte/. Accessed 19 Oct 2020

  30. Shigarov, A., Khristyuk, V., Mikhailov, A.: STabbyXL: software platform for rule-based spreadsheet data extraction and transformation. SoftwareX 10, 100270 (2019). https://doi.org/10.1016/j.softx.2019.100270

    CrossRef  Google Scholar 

Download references

Acknowledgement

This work was supported by the Russian Science Foundation, grant number 18-71-10001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aleksandr Yu. Yurin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Dorodnykh, N.O., Yurin, A.Y. (2021). TabbyLD: A Tool for Semantic Interpretation of Spreadsheets Data. In: Simian, D., Stoica, L.F. (eds) Modelling and Development of Intelligent Systems. MDIS 2020. Communications in Computer and Information Science, vol 1341. Springer, Cham. https://doi.org/10.1007/978-3-030-68527-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68527-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68526-3

  • Online ISBN: 978-3-030-68527-0

  • eBook Packages: Computer ScienceComputer Science (R0)