Skip to main content

Mining Significant Maximum Cardinalities in Knowledge Bases

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2019 (ISWC 2019)

Abstract

Semantic Web connects huge knowledge bases whose content has been generated from collaborative platforms and by integration of heterogeneous databases. Naturally, these knowledge bases are incomplete and contain erroneous data. Knowing their data quality is an essential long-term goal to guarantee that querying them returns reliable results. Having cardinality constraints for roles would be an important advance to distinguish correctly and completely described individuals from those having data either incorrect or insufficiently informed. In this paper, we propose a method for automatically discovering from the knowledge base’s content the maximum cardinality of roles for each concept, when it exists. This method is robust thanks to the use of Hoeffding’s inequality. We also design an algorithm, named C3M, for an exhaustive search of such constraints in a knowledge base benefiting from pruning properties that drastically reduce the search space. Experiments conducted on DBpedia demonstrate the scaling up of C3M, and also highlight the robustness of our method, with a precision higher than 95%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use the Description Logics (DL) [2] terminology, as DL are the theoretical foundations of OWL, so we use the terms concept (i.e. class), role (i.e. property), individual and fact (i.e. instances).

  2. 2.

    The prototype and the results are available at https://github.com/asoulet/c3m, both in CSV and in RDF (Turtle); we provide also the schema of our constraints expressed in RDF.

  3. 3.

    DL formal semantics are given in terms of interpretations, see [2].

  4. 4.

    We denote \(C \sqsubset C'\) when \(C \sqsubseteq C'\) and \(C' \not \sqsubseteq C\).

  5. 5.

    http://jena.apache.org and https://dbpedia.org.

  6. 6.

    The results for \(min_\tau = 0.97\) and the ground truth used to evaluate the precision are available at https://github.com/asoulet/c3m.

  7. 7.

    https://wiki.dbpedia.org/services-resources/datasets/dbpedia-datasets.

  8. 8.

    We do not compare our method with [15] because in the case of DBpedia, this method systematically returns a wrong maximum cardinality for all constraints.

References

  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  2. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F. (eds.): The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, New York (2003)

    MATH  Google Scholar 

  3. Darari, F., Nutt, W., Pirrò, G., Razniewski, S.: Completeness statements about RDF data sources and their use for query answering. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 66–83. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_5

    Chapter  Google Scholar 

  4. Darari, F., Razniewski, S., Prasojo, R.E., Nutt, W.: Enabling fine-grained RDF data completeness assessment. In: Bozzon, A., Cudre-Maroux, P., Pautasso, C. (eds.) ICWE 2016. LNCS, vol. 9671, pp. 170–187. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-38791-8_10

    Chapter  Google Scholar 

  5. Debattista, J., Lange, C., Auer, S., Cortis, D.: Evaluating the quality of the LOD cloud: an empirical investigation. Semant. Web 9(6), 859–901 (2018)

    Article  Google Scholar 

  6. Erxleben, F., Günther, M., Krötzsch, M., Mendez, J., Vrandečić, D.: Introducing wikidata to the linked data web. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 50–65. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_4

    Chapter  Google Scholar 

  7. Färber, M., Bartscherer, F., Menne, C., Rettinger, A.: Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semant. Web 9(1), 77–129 (2018)

    Article  Google Scholar 

  8. Galárraga, L., Razniewski, S., Amarilli, A., Suchanek, F.M.: Predicting completeness in knowledge bases. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining, pp. 375–383. ACM (2017)

    Google Scholar 

  9. Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: Proceedings of World Wide Web Conference, pp. 413–422. ACM (2013)

    Google Scholar 

  10. Galárraga, L., Hose, K., Razniewski, S.: Enabling completeness-aware querying in SPARQL. In: Proceedings of the 21st Workshop on the Web and Databases, pp. 19–22. ACM (2017)

    Google Scholar 

  11. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(310), 13–20 (1963)

    Article  MathSciNet  Google Scholar 

  12. Lajus, J., Suchanek, F.M.: Are all people married? Determining obligatory attributes in knowledge bases. In: Proceedings of World Wide Web Conference, pp. 1115–1124 (2018)

    Google Scholar 

  13. Mirza, P., Razniewski, S., Darari, F., Weikum, G.: Enriching knowledge bases with counting quantifiers. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 179–197. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_11

    Chapter  Google Scholar 

  14. Motro, A.: Integrity = validity + completeness. ACM Trans. Database Syst. 14(4), 480–502 (1989)

    Article  Google Scholar 

  15. Muñoz, E., Nickles, M.: Mining cardinalities from knowledge bases. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10438, pp. 447–462. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64468-4_34

    Chapter  Google Scholar 

  16. Pernelle, N., Saïs, F., Symeonidou, D.: An automatic key discovery approach for data linking. Web Semant.: Sci. Serv. Agents World Wide Web 23, 16–30 (2013)

    Article  Google Scholar 

  17. Razniewski, S., Korn, F., Nutt, W., Srivastava, D.: Identifying the extent of completeness of query answers over partially complete databases. In: Proceedings of the ACM SIGMOD, pp. 561–576. ACM (2015)

    Google Scholar 

  18. Soulet, A., Giacometti, A., Markhoff, B., Suchanek, F.M.: Representativeness of knowledge bases with the generalized Benford’s law. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 374–390. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_22

    Chapter  Google Scholar 

  19. Symeonidou, D., Galárraga, L., Pernelle, N., Saïs, F., Suchanek, F.: VICKEY: mining conditional keys on knowledge bases. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 661–677. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_39

    Chapter  Google Scholar 

  20. Pellissier Tanon, T., Stepanova, D., Razniewski, S., Mirza, P., Weikum, G.: Completeness-aware rule learning from knowledge graphs. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 507–525. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_30

    Chapter  Google Scholar 

  21. Weikum, G., Hoffart, J., Suchanek, F.M.: Ten years of knowledge harvesting: lessons and challenges. IEEE Data Eng. Bull. 39(3), 41–50 (2016)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by the grant ANR-18-CE38-0009 (“SESAME”).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnaud Soulet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Giacometti, A., Markhoff, B., Soulet, A. (2019). Mining Significant Maximum Cardinalities in Knowledge Bases. In: Ghidini, C., et al. The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science(), vol 11778. Springer, Cham. https://doi.org/10.1007/978-3-030-30793-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30793-6_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30792-9

  • Online ISBN: 978-3-030-30793-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics