Advertisement

Making Sense of Numerical Data - Semantic Labelling of Web Tables

  • Emilia Kacprzak
  • José M. Giménez-García
  • Alessandro Piscopo
  • Laura Koesten
  • Luis-Daniel Ibáñez
  • Jeni Tennison
  • Elena Simperl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11313)

Abstract

With the increasing amount of structured data on the web the need to understand and support search over this emerging data space is growing. Adding semantics to structured data can help address existing challenges in data discovery, as it facilitates understanding the values in their context. While there are approaches on how to lift structured data to semantic web formats to enrich it and facilitate discovery, most work to date focuses on textual fields rather than numerical data. In this paper, we propose a two level (row and column based) approach to add semantic meaning to numerical values in tables, called NUMER. We evaluate our approach using a benchmark (NumDB) generated for the purpose of this work. We show the influence of the different levels of analysis on the success of assigning semantic labels to numerical values in tables. Our approach outperforms the state of the art and is less affected by data structure and quality issues such as a small number of entities or deviations in the data.

Keywords

Semantic labelling Numerical values Linked data 

Notes

Acknowledgements

This project is supported by the European Union Horizon 2020 program under the Marie Skłodowska-Curie grant agreement No. 642795.

References

  1. 1.
    Kacprzak, E., Koesten, L., Heath, T., Tennison, J.: Position paper: Dataset profling for un-linked data. In: Proceedings of the 3rd International Workshop (PROFILES), The 13th ESWC Conference (2016)Google Scholar
  2. 2.
    Mitlöhner, J., Neumaier, S., Umbrich, J., Polleres, A.: Characteristics of open data CSV files. In: 2nd International Conference on Open and Big Data, OBD 2016, Vienna, Austria, 22–24 August 2016, pp. 72–79 (2016).  https://doi.org/10.1109/OBD.2016.18
  3. 3.
    Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. PVLDB 3(1), 1338–1347 (2010)Google Scholar
  4. 4.
    Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. In: Proceedings of the First International Workshop on Consuming Linked Data. CEUR Workshop Proceedings, vol. 665 (2010)Google Scholar
  5. 5.
    Syed, Z., Finin, T., Mulwad, V., Joshi, A.: Exploiting a web of semantic data for interpreting tables. In: Proceedings of the 2nd Web Science Conference (2010)Google Scholar
  6. 6.
    Venetis, P., et al.: Recovering semantics of tables on the web. Proc. VLDB Endow. 4(9), 528–538 (2011).  https://doi.org/10.14778/2002938.2002939CrossRefGoogle Scholar
  7. 7.
    Wang, J., Wang, H., Wang, Z., Zhu, K.Q.: Understanding tables on the web. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012. LNCS, vol. 7532, pp. 141–155. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-34002-4_11CrossRefGoogle Scholar
  8. 8.
    Taheriyan, M., Knoblock, C.A., Szekely, P., Ambite, J.L.: A scalable approach to learn semantic models of structured sources. In: Proceedings of the International Conference on Semantic Computing (2014)Google Scholar
  9. 9.
    Ritze, D., Lehmberg, O., Bizer, C.: Matching HTML tables to DBpedia. In: Proceedings of the International Conference on Web Intelligence, Mining and Semantics, pp. 10:1–10:6 (2015)Google Scholar
  10. 10.
    Ermilov, I., Ngomo, A.-C.N.: TAIPAN: automatic property mapping for tabular data. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS (LNAI), vol. 10024, pp. 163–179. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49004-5_11CrossRefGoogle Scholar
  11. 11.
    Neumaier, S., Umbrich, J., Parreira, J.X., Polleres, A.: Multi-level semantic labelling of numerical values. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 428–445. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46523-4_26CrossRefGoogle Scholar
  12. 12.
    Pham, M., Alse, S., Knoblock, C.A., Szekely, P.: Semantic labeling: a domain-independent approach. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 446–462. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46523-4_27CrossRefGoogle Scholar
  13. 13.
  14. 14.
    Ritze, D., Lehmberg, O., Oulabi, Y., Bizer, C.: Profiling the potential of web tables for augmenting cross-domain knowledge bases. In: WWW, pp. 251–261. ACM (2016)Google Scholar
  15. 15.
    Knoblock, C.A., et al.: Semi-automatically mapping structured sources into the semantic web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-30284-8_32CrossRefGoogle Scholar
  16. 16.
    Ermilov, I., Auer, S., Stadler, C.: User-driven semantic mapping of tabular data. In: Proceedings of the 9th International Conference on Semantic Systems, New York, NY, USA, pp. 105–112. ACM (2013).  https://doi.org/10.1145/2506182.2506196
  17. 17.
    Adelfio, M.D., Samet, H.: Schema extraction for tabular data on the web. Proc. VLDB Endow. 6(6), 421–432 (2013).  https://doi.org/10.14778/2536336.2536343CrossRefGoogle Scholar
  18. 18.
    Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-07443-6_34CrossRefGoogle Scholar
  19. 19.
    Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Matching web tables with knowledge base entities: from entity lookups to entity embeddings. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 260–277. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68288-4_16CrossRefGoogle Scholar
  20. 20.
    Bhagavatula, C.S., Noraset, T., Downey, D.: TabEL: entity linking in web tables. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 425–441. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25007-6_25CrossRefGoogle Scholar
  21. 21.
    Ramnandan, S.K., Mittal, A., Knoblock, C.A., Szekely, P.: Assigning semantic labels to data sources. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 403–417. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-18818-8_25CrossRefGoogle Scholar
  22. 22.
    Koesten, L.M., Kacprzak, E., Tennison, J.F.A., Simperl, E.: The trials and tribulations of working with structured data:-a study on information seeking behaviour. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 1277–1289 (2017).  https://doi.org/10.1145/3025453.3025838
  23. 23.
    Goel, A., Knoblock, C.A., Lerman, K.: Exploiting structure within data for accurate labeling using conditional random fields. In: Proceedings of the 14th International Conference on Artificial Intelligence (ICAI) (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Emilia Kacprzak
    • 1
    • 2
  • José M. Giménez-García
    • 3
  • Alessandro Piscopo
    • 1
    • 2
  • Laura Koesten
    • 1
    • 2
  • Luis-Daniel Ibáñez
    • 1
  • Jeni Tennison
    • 2
  • Elena Simperl
    • 1
  1. 1.Electronics and Computer ScienceUniversity of SouthamptonSouthamptonUK
  2. 2.The Open Data InstituteLondonUK
  3. 3.Laboratoire Hubert CurienUniversity of Lyon, UJM-Saint-Étienne, CNRSLyonFrance

Personalised recommendations