Skip to main content

Measuring Accuracy of Triples in Knowledge Graphs

  • Conference paper
  • First Online:
Language, Data, and Knowledge (LDK 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10318))

Included in the following conference series:

Abstract

An increasing amount of large-scale knowledge graphs have been constructed in recent years. Those graphs are often created from text-based extraction, which could be very noisy. So far, cleaning knowledge graphs are often carried out by human experts and thus very inefficient. It is necessary to explore automatic methods for identifying and eliminating erroneous information. In order to achieve this, previous approaches primarily rely on internal information i.e. the knowledge graph itself. In this paper, we introduce an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples (among target triples) from other knowledge graphs. TAA uses knowledge graph interlinks to find identical resources and apply different matching methods between the predicates of source triples and target triples. Then based on the matched triples, TAA calculates a confidence score to indicate the correctness of a source triple. In addition, we present an evaluation of our approach using the FactBench dataset for fact validation. Our findings show promising results for distinguishing between correct and wrong triples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    dbr refers to http://dbpedia.org/resource, dbo points to http://dbpedia.org/ontology, xsd refers to http://www.w3.org/2001/XMLSchema#.

  2. 2.

    http://wiki.dbpedia.org/.

  3. 3.

    http://yago-knowledge.org/.

  4. 4.

    https://developers.google.com/freebase/.

  5. 5.

    http://www.wikidata.org.

  6. 6.

    sameAs service, http://sameas.org.

  7. 7.

    SameAs4J API, http://99soft.github.io/sameas4j/.

  8. 8.

    owl is a namespace prefix referring to http://www.w3.org/2002/07/owl#.

  9. 9.

    http://www.geonames.org/.

  10. 10.

    http://linkedgeodata.org/.

  11. 11.

    geodata and geonames refer to http://sws.geonames.org/ and http://www.geonames.org/ontology# respectively.

  12. 12.

    https://github.com/SmartDataAnalytics/FactBench.

  13. 13.

    http://iridia.ulb.ac.be/irace/.

  14. 14.

    https://wordnet.princeton.edu/.

  15. 15.

    http://www.nltk.org/.

  16. 16.

    https://github.com/gsi-upm/sematch.

  17. 17.

    https://github.com/TriplesAccuracyAssessment.

References

  1. Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Flöck, F., Lehmann, J.: Detecting linked data quality issues via crowdsourcing: a DBpedia study. Semant. Web J. (to appear). http://www.semantic-web-journal.net/

  2. Birattari, M., Yuan, Z., Balaprakash, P., Stützle, T.: F-race and iterated F-race: an overview. In: Bartz-Beielstein, T., Chiarandini, M., Paquete, L., Preuss, M. (eds.) Experimental Methods for the Analysis of Optimization Algorithms, pp. 311–336. Springer, Heidelberg (2010)

    Google Scholar 

  3. Cheng, G., Xu, D., Qu, Y.: C3D+P: a summarization method for interactive entity resolution. Web Semant. Sci. Serv. Agents World Wide Web 35, 203–213 (2015)

    Article  Google Scholar 

  4. Färber, M., Ell, B., Menne, C., Rettinger, A., Bartscherer, F.: Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semant. Web J. (to appear). http://www.semantic-web-journal.net/

  5. Fleischhacker, D., Paulheim, H., Bryl, V., Völker, J., Bizer, C.: Detecting errors in numerical linked data using cross-checked outlier detection. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 357–372. Springer, Cham (2014). doi:10.1007/978-3-319-11964-9_23

    Google Scholar 

  6. Gerber, D., Esteves, D., Lehmann, J., Bühmann, L., Usbeck, R., Ngomo, A.C.N., Speck, R.: Defacto–temporal and multilingual deep fact validation. Web Semant. Sci. Serv. Agents World Wide Web 35, 85–101 (2015)

    Article  Google Scholar 

  7. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv eprint arXiv:cmp-lg/9709008 (1997)

  8. Lehmann, J., Gerber, D., Morsey, M., Ngomo, A.C.N.: Defacto-deep fact validation. In: The Semantic Web-ISWC 2012, Part I. LNCS, vol. 7649, pp. 312–327. Springer, Heidelberg (2012)

    Google Scholar 

  9. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26. ACM (1986)

    Google Scholar 

  10. Li, H., Li, Y., Xu, F., Zhong, X.: Probabilistic error detecting in numerical linked data. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9261, pp. 61–75. Springer, Cham (2015). doi:10.1007/978-3-319-22849-5_5

    Chapter  Google Scholar 

  11. Lin, D.: An information-theoretic definition of similarity. In: ICML, vol. 98, pp. 296–304 (1998)

    Google Scholar 

  12. Liu, S., d’Aquin, M., Motta, E.: Towards linked data fact validation through measuring consensus. In: 2nd Workshop on Linked Data Quality, CEUR Workshop Proceedings, vol. 1376 (2015)

    Google Scholar 

  13. López-Ibáñez, M., Dubois-Lacoste, J., Cáceres, L.P., Birattari, M., Stützle, T.: The irace package: iterated racing for automatic algorithm configuration. Oper. Res. Perspect. 3, 43–58 (2016)

    Article  MathSciNet  Google Scholar 

  14. Maron, O., Moore, A.W.: Hoeffding races: accelerating model selection search for classification and function approximation. Adv. Neural Inform. Proc. Syst. 6, 59–66 (1994)

    Google Scholar 

  15. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  16. Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web J. 8(3), 489–508 (2017)

    Article  Google Scholar 

  17. Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Int. J. Semant. Web Inform. Syst. (IJSWIS) 10(2), 63–86 (2014)

    Google Scholar 

  18. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: 14th International Joint Conference on AI (IJCAI), pp. 448–453. IJCAI/AAAI (1995)

    Google Scholar 

  19. Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 245–260. Springer, Cham (2014). doi:10.1007/978-3-319-11964-9_16

    Google Scholar 

  20. Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Cham (2014). doi:10.1007/978-3-319-11964-9_23

    Chapter  Google Scholar 

  21. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)

    Google Scholar 

  22. Zhu, G., Iglesias, C.A.: Computing semantic similarity of concepts in knowledge graphs. IEEE Trans. Knowl. Data Eng. 29(1), 72–85 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuangyan Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Liu, S., d’Aquin, M., Motta, E. (2017). Measuring Accuracy of Triples in Knowledge Graphs. In: Gracia, J., Bond, F., McCrae, J., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds) Language, Data, and Knowledge. LDK 2017. Lecture Notes in Computer Science(), vol 10318. Springer, Cham. https://doi.org/10.1007/978-3-319-59888-8_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59888-8_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59887-1

  • Online ISBN: 978-3-319-59888-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics