Skip to main content

TAIPAN: Automatic Property Mapping for Tabular Data

  • Conference paper
  • First Online:
Knowledge Engineering and Knowledge Management (EKAW 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10024))

Included in the following conference series:

Abstract

The Web encompasses a significant amount of knowledge hidden in entity-attributes tables. Bridging the gap between these tables and the Web of Data thus has the potential to facilitate a large number of applications, including the augmentation of knowledge bases from tables, the search for related tables and the completion of tables using knowledge bases. Computing such bridges is impeded by the poor accuracy of automatic property mapping, the lack of approaches for the discovery of subject columns and the mere size of table corpora. We propose Taipan, a novel approach for recovering the semantics of tables. Our approach begins by identifying subject columns using a combination of structural and semantic features. It then maps binary relations inside a table to predicates from a given knowledge base. Therewith, our solution supports both the tasks of table expansion and knowledge base augmentation. We evaluate our approach on a table dataset generated from real RDF data and a manually curated version of the T2D gold standard. Our results suggest that we outperform the state of the art by up to 85 % F-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://lodstats.aksw.org.

  2. 2.

    http://lov.okfn.org/.

  3. 3.

    https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html.

  4. 4.

    https://github.com/aksw/taipan.

  5. 5.

    http://webdatacommons.org/webtables/goldstandard.html.

  6. 6.

    For a complete analysis, see https://github.com/AKSW/TAIPAN-Datasets/tree/master/T2D.

  7. 7.

    https://www.w3.org/Submission/CBD/.

  8. 8.

    https://github.com/aksw/TAIPAN-DBD-Datagen.

  9. 9.

    https://github.com/AKSW/TAIPAN-Synth-Datagen/tree/master/DBpediaTableDataset/tables.

  10. 10.

    Accuracy is defined as a ratio of correctly guessed subject columns to a number of overall guessed subject columns.

  11. 11.

    We contacted the authors to obtain their corpus but were not provided access to it. Still, we followed the specification of the SVM in their paper exactly.

  12. 12.

    We used the classifier implementations from scikit-learn python library at http://scikit-learn.org/. For more information on the implementation, please refer to the Taipan Github repository at https://github.com/AKSW/TAIPAN.

  13. 13.

    http://dws.informatik.uni-mannheim.de/en/research/T2K.

  14. 14.

    https://commoncrawl.org/.

  15. 15.

    http://webdatacommons.org/.

  16. 16.

    http://webdatacommons.org/webtables/.

References

  1. Balakrishnan, S., Halevy, A., Harb, B., Lee, H., Madhavan, J., Rostamizadeh, A., Shen, W., Wilder, K., Wu, F., Yu, C.: Applying webtables in practice

    Google Scholar 

  2. Carmel, D., Chang, M.-W., Gabrilovich, E., Hsu, B.-J.P., Wang, K.: ERD’14: entity recognition and disambiguation challenge. In: ACM SIGIR Forum, vol. 48, pp. 63–77. ACM (2014)

    Google Scholar 

  3. Ermilov, I., Auer, S., Stadler, C.: CSV2RDF: user-driven CSV to RDF mass conversion framework. In: Proceedings of the ISEM 2013, Graz, Austria, 04–06 September 2013

    Google Scholar 

  4. Ermilov, I., Auer, S., Stadler, C.: User-driven semantic mapping of tabular data. In: Proceedings of 9th International Conference on Semantic Systems, I-SEMANTICS 2013, pp. 105–112. ACM, New York (2013)

    Google Scholar 

  5. Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165(1), 91–134 (2005)

    Article  Google Scholar 

  6. Gerber, D., Ngomo, A.-C.N.: Extracting multilingual natural-language patterns for RDF predicates. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 87–96. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Hripcsak, G., Rothschild, A.S.: Agreement, the F-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)

    Article  Google Scholar 

  8. Knoblock, C.A., et al.: Semi-automatically mapping structured sources into the semantic web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30284-8_32

    Chapter  Google Scholar 

  9. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  10. Lehmberg, O., Ritze, D., Ristoski, P., Meusel, R., Paulheim, H., Bizer, C.: The Mannheim search join engine. Web Semant.: Sci. Serv. Agents World Wide Web 35, 159–166 (2015)

    Article  Google Scholar 

  11. Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endow. 3(1–2), 1338–1347 (2010)

    Article  Google Scholar 

  12. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 2, pp. 1003–1011. Association for Computational Linguistics (2009)

    Google Scholar 

  13. Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. In: COLD, vol. 665 (2010)

    Google Scholar 

  14. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  15. Nakashole, N., Weikum, G., Suchanek, F.: Patty: a taxonomy of relational patterns with semantic types. In: Proceedings of 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1135–1145. Association for Computational Linguistics (2012)

    Google Scholar 

  16. Ritze, D., Lehmberg, O., Bizer, C.: Matching HTML tables to DBpedia. In: Proceedings of 5th International Conference on Web Intelligence, Mining and Semantics, p. 10. ACM (2015)

    Google Scholar 

  17. Ritze, D., Lehmberg, O., Oulabi, Y., Bizer, C.: Profiling the potential of web tables for augmenting cross-domain knowledge bases. In: Proceedings of 25th International Conference on World Wide Web, pp. 251–261. International World Wide Web Conferences Steering Committee (2016)

    Google Scholar 

  18. Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: Advances in Neural Information Processing Systems, vol. 17 (2004)

    Google Scholar 

  19. Speck, R., Ngonga Ngomo, A.-C.: Ensemble learning for named entity recognition. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 519–534. Springer, Heidelberg (2014)

    Google Scholar 

  20. Suchanek, F.M., Abiteboul, S., Senellart, P.: Paris: probabilistic alignment of relations, instances, and schema. Proc. VLDB Endow. 5(3), 157–168 (2011)

    Article  Google Scholar 

  21. Usbeck, R., Ngonga Ngomo, A.-C., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - graph-based disambiguation of named entities using linked data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 457–471. Springer, Heidelberg (2014)

    Google Scholar 

  22. Usbeck, R., Röder, M., Ngonga Ngomo, A.-C., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., et al.: Gerbil: general entity annotator benchmarking framework. In: Proceedings of 24th International Conference on World Wide Web, pp. 1133–1143. International World Wide Web Conferences Steering Committee (2015)

    Google Scholar 

  23. Venetis, P., Halevy, A., Madhavan, J., Pasca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Table search using recovered semantics (2010)

    Google Scholar 

  24. Venetis, P., Halevy, A., Madhavan, J., Paşca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering semantics of tables on the web. Proc. VLDB Endow. 4(9), 528–538 (2011)

    Article  Google Scholar 

  25. Wang, C., Chakrabarti, K., He, Y., Ganjam, K., Chen, Z., Bernstein, P.A.: Concept expansion using web tables. In: Proceedings of 24th International Conference on World Wide Web, pp. 1198–1208. International World Wide Web Conferences Steering Committee (2015)

    Google Scholar 

  26. Wang, J., Wang, H., Wang, Z., Zhu, K.Q.: Understanding tables on the web. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012 Main Conference 2012. LNCS, vol. 7532, pp. 141–155. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  27. Zhang, Z.: Towards efficient and effective semantic table interpretation. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 487–502. Springer, Heidelberg (2014)

    Google Scholar 

Download references

Acknowledgments

This work has been supported by Eurostars projects DIESEL (project no. 01QE1512C), the BMWI Project GEISER (project no. 01MD16014) as well as the European Union’s H2020 research and innovation action HOBBIT (GA no. 688227).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan Ermilov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Ermilov, I., Ngomo, AC.N. (2016). TAIPAN: Automatic Property Mapping for Tabular Data. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds) Knowledge Engineering and Knowledge Management. EKAW 2016. Lecture Notes in Computer Science(), vol 10024. Springer, Cham. https://doi.org/10.1007/978-3-319-49004-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49004-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49003-8

  • Online ISBN: 978-3-319-49004-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics