Measuring Accuracy of Triples in Knowledge Graphs

Liu, Shuangyan; d’Aquin, Mathieu; Motta, Enrico

doi:10.1007/978-3-319-59888-8_29

Shuangyan Liu¹⁹,
Mathieu d’Aquin¹⁹ &
Enrico Motta¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10318))

Included in the following conference series:

International Conference on Language, Data and Knowledge

1674 Accesses
10 Citations

Abstract

An increasing amount of large-scale knowledge graphs have been constructed in recent years. Those graphs are often created from text-based extraction, which could be very noisy. So far, cleaning knowledge graphs are often carried out by human experts and thus very inefficient. It is necessary to explore automatic methods for identifying and eliminating erroneous information. In order to achieve this, previous approaches primarily rely on internal information i.e. the knowledge graph itself. In this paper, we introduce an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples (among target triples) from other knowledge graphs. TAA uses knowledge graph interlinks to find identical resources and apply different matching methods between the predicates of source triples and target triples. Then based on the matched triples, TAA calculates a confidence score to indicate the correctness of a source triple. In addition, we present an evaluation of our approach using the FactBench dataset for fact validation. Our findings show promising results for distinguishing between correct and wrong triples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
dbr refers to http://dbpedia.org/resource, dbo points to http://dbpedia.org/ontology, xsd refers to http://www.w3.org/2001/XMLSchema#.
2.
http://wiki.dbpedia.org/.
3.
http://yago-knowledge.org/.
4.
https://developers.google.com/freebase/.
5.
http://www.wikidata.org.
6.
sameAs service, http://sameas.org.
7.
SameAs4J API, http://99soft.github.io/sameas4j/.
8.
owl is a namespace prefix referring to http://www.w3.org/2002/07/owl#.
9.
http://www.geonames.org/.
10.
http://linkedgeodata.org/.
11.
geodata and geonames refer to http://sws.geonames.org/ and http://www.geonames.org/ontology# respectively.
12.
https://github.com/SmartDataAnalytics/FactBench.
13.
http://iridia.ulb.ac.be/irace/.
14.
https://wordnet.princeton.edu/.
15.
http://www.nltk.org/.
16.
https://github.com/gsi-upm/sematch.
17.
https://github.com/TriplesAccuracyAssessment.

References

Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Flöck, F., Lehmann, J.: Detecting linked data quality issues via crowdsourcing: a DBpedia study. Semant. Web J. (to appear). http://www.semantic-web-journal.net/
Birattari, M., Yuan, Z., Balaprakash, P., Stützle, T.: F-race and iterated F-race: an overview. In: Bartz-Beielstein, T., Chiarandini, M., Paquete, L., Preuss, M. (eds.) Experimental Methods for the Analysis of Optimization Algorithms, pp. 311–336. Springer, Heidelberg (2010)
Google Scholar
Cheng, G., Xu, D., Qu, Y.: C3D+P: a summarization method for interactive entity resolution. Web Semant. Sci. Serv. Agents World Wide Web 35, 203–213 (2015)
Article Google Scholar
Färber, M., Ell, B., Menne, C., Rettinger, A., Bartscherer, F.: Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semant. Web J. (to appear). http://www.semantic-web-journal.net/
Fleischhacker, D., Paulheim, H., Bryl, V., Völker, J., Bizer, C.: Detecting errors in numerical linked data using cross-checked outlier detection. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 357–372. Springer, Cham (2014). doi:10.1007/978-3-319-11964-9_23
Google Scholar
Gerber, D., Esteves, D., Lehmann, J., Bühmann, L., Usbeck, R., Ngomo, A.C.N., Speck, R.: Defacto–temporal and multilingual deep fact validation. Web Semant. Sci. Serv. Agents World Wide Web 35, 85–101 (2015)
Article Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv eprint arXiv:cmp-lg/9709008 (1997)
Lehmann, J., Gerber, D., Morsey, M., Ngomo, A.C.N.: Defacto-deep fact validation. In: The Semantic Web-ISWC 2012, Part I. LNCS, vol. 7649, pp. 312–327. Springer, Heidelberg (2012)
Google Scholar
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26. ACM (1986)
Google Scholar
Li, H., Li, Y., Xu, F., Zhong, X.: Probabilistic error detecting in numerical linked data. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9261, pp. 61–75. Springer, Cham (2015). doi:10.1007/978-3-319-22849-5_5
Chapter Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: ICML, vol. 98, pp. 296–304 (1998)
Google Scholar
Liu, S., d’Aquin, M., Motta, E.: Towards linked data fact validation through measuring consensus. In: 2nd Workshop on Linked Data Quality, CEUR Workshop Proceedings, vol. 1376 (2015)
Google Scholar
López-Ibáñez, M., Dubois-Lacoste, J., Cáceres, L.P., Birattari, M., Stützle, T.: The irace package: iterated racing for automatic algorithm configuration. Oper. Res. Perspect. 3, 43–58 (2016)
Article MathSciNet Google Scholar
Maron, O., Moore, A.W.: Hoeffding races: accelerating model selection search for classification and function approximation. Adv. Neural Inform. Proc. Syst. 6, 59–66 (1994)
Google Scholar
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web J. 8(3), 489–508 (2017)
Article Google Scholar
Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Int. J. Semant. Web Inform. Syst. (IJSWIS) 10(2), 63–86 (2014)
Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: 14th International Joint Conference on AI (IJCAI), pp. 448–453. IJCAI/AAAI (1995)
Google Scholar
Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 245–260. Springer, Cham (2014). doi:10.1007/978-3-319-11964-9_16
Google Scholar
Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Cham (2014). doi:10.1007/978-3-319-11964-9_23
Chapter Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)
Google Scholar
Zhu, G., Iglesias, C.A.: Computing semantic similarity of concepts in knowledge graphs. IEEE Trans. Knowl. Data Eng. 29(1), 72–85 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Knowledge Media Institute, The Open University, Milton Keynes, UK
Shuangyan Liu, Mathieu d’Aquin & Enrico Motta

Authors

Shuangyan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu d’Aquin
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Motta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuangyan Liu .

Editor information

Editors and Affiliations

Universidad Politécnica de Madrid, Madrid, Spain
Jorge Gracia
Nanyang Technological University, Singapore, Singapore
Francis Bond
Insight Centre for Data Analytics, National University of Ireland, Galway, Galway, Ireland
John P. McCrae
Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
Paul Buitelaar
Goethe-University Frankfurt, Frankfurt, Germany
Christian Chiarcos
University of Leipzig, Leipzig, Germany
Sebastian Hellmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, S., d’Aquin, M., Motta, E. (2017). Measuring Accuracy of Triples in Knowledge Graphs. In: Gracia, J., Bond, F., McCrae, J., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds) Language, Data, and Knowledge. LDK 2017. Lecture Notes in Computer Science(), vol 10318. Springer, Cham. https://doi.org/10.1007/978-3-319-59888-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-59888-8_29
Published: 27 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59887-1
Online ISBN: 978-3-319-59888-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics