On Learnability of Constraints from RDF Data

  • Emir MuñozEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9678)


RDF is structured, dynamic, and schemaless data, which enables a big deal of flexibility for Linked Data to be available in an open environment such as the Web. However, for RDF data, flexibility turns out to be the source of many data quality and knowledge representation issues. Tasks such as assessing data quality in RDF require a different set of techniques and tools compared to other data models. Furthermore, since the use of existing schema, ontology and constraint languages is not mandatory, there is always room for misunderstanding the structure of the data. Neglecting this problem can represent a threat to the widespread use and adoption of RDF and Linked Data. Users should be able to learn the characteristics of RDF data in order to determine its fitness for a given use case, for example. For that purpose, in this doctoral research, we propose the use of constraints to inform users about characteristics that RDF data naturally exhibits, in cases where ontologies (or any other form of explicitly given constraints or schemata) are not present or not expressive enough. We aim to address the problems of defining and discovering classes of constraints to help users in data analysis and assessment of RDF and Linked Data quality.


RDF constraints Linked data mining Data quality Data semantics 



This thesis is supervised by Dr. Matthias Nickles. The author would like to thank Prof. Dr. Heiko Paulheim for his valuable comments and suggestions. The work presented in this paper has been supported by TOMOE project funded by Fujitsu Laboratories Limited and Insight Centre for Data Analytics at NUI Galway.


  1. 1.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases: The Logical Level, 1st edn. Addison-Wesley, Boston (1995)Google Scholar
  2. 2.
    Akhtar, W., Cortés-Calabuig, A., Paredaens, J.: Constraints in RDF. In: 4th International Workshops on Semantics in Data and Knowledge Bases, SDKB, pp. 23–39 (2010)Google Scholar
  3. 3.
    Arenas, M., Daenen, J., Neven, F., Ugarte, M., den Bussche, J.V., Vansummeren, S.: Discovering XSD keys from XML data. ACM Trans. Database Syst. 39(4), 28:1–28:49 (2014)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Atencia, M., et al.: Defining key semantics for the RDF datasets: experiments and evaluations. In: Hernandez, N., Jäschke, R., Croitoru, M. (eds.) ICCS 2014. LNCS, vol. 8577, pp. 65–78. Springer, Heidelberg (2014)Google Scholar
  5. 5.
    Atencia, M., David, J., Scharffe, F.: Keys and pseudo-keys detection for web datasets cleansing and interlinking. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 144–153. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  6. 6.
    Brown, P., Link, S.: Probabilistic keys for data quality management. In: Zdravkovic, J., Kirikova, M., Johannesson, P. (eds.) CAiSE 2015. LNCS, vol. 9097, pp. 118–132. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  7. 7.
    Cortés-Calabuig, A., Paredaens, J.: Semantics of constraints in RDFS. In: Proceedings of the 6th Alberto Mendelzon International Workshop on Foundations of Data Management, pp. 75–90 (2012)Google Scholar
  8. 8.
    Ferrarotti, F., Hartmann, S., Link, S., Marin, M., Muñoz, E.: The finite implication problem for expressive XML keys: foundations, applications, and performance evaluation. In: Hameurlain, A., Küng, J., Wagner, R., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) TLDKS X. LNCS, vol. 8220, pp. 60–94. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  9. 9.
    Ferrarotti, F., Hartmann, S., Link, S., Marin, M., Muñoz, E.: Soft cardinality constraints on XML data. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013, Part I. LNCS, vol. 8180, pp. 382–395. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  10. 10.
    Grahne, G., Zhu, J.: Discovering approximate keys in XML data. In: Proceedings of the 2002 ACM CIKM, pp. 453–460 (2002)Google Scholar
  11. 11.
    Hartmann, S.: Soft constraints and heuristic constraint correction in entity-relationship modelling. In: Bertossi, L., Katona, G.O.H., Schewe, K.-D., Thalheim, B. (eds.) Semantics in Databases 2001. LNCS, vol. 2582, pp. 82–99. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Hogan, A.: Skolemising blank nodes while preserving isomorphism. In: Proceedings of the 24th WWW, pp. 430–440 (2015)Google Scholar
  13. 13.
    Hogan, A., Arenas, M., Mallea, A., Polleres, A.: Everything you always wanted to know about blank nodes. Web Semant. Sci. Serv. Agents World Wide Web 27–28, 42–69 (2014). Semantic Web Challenge 2013CrossRefGoogle Scholar
  14. 14.
    Lausen, G., Meier, M., Schmidt, M.: SPARQLing constraints for RDF. In: Proceeding of the 11th EDBT, pp. 499–509 (2008)Google Scholar
  15. 15.
    Liddle, S.W., Embley, D.W., Woodfield, S.N.: Cardinality constraints in semantic data models. Data Knowl. Eng. 11(3), 235–270 (1993)CrossRefzbMATHGoogle Scholar
  16. 16.
    Motik, B., Horrocks, I., Sattler, U.: Bridging the gap between OWL and relational databases. Web Semant. Sci. Serv. Agents World Wide Web 7(2), 74–89 (2009)CrossRefGoogle Scholar
  17. 17.
    Muñoz, E.: Learning content patterns from linked data. In: Proceedings of the Linked Data for Information Extraction (LD4IE) Workshop, ISWC, CEUR Workshop Proceedings, vol. 1267, pp. 21–32. (2014)Google Scholar
  18. 18.
    Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web - Interoperability Usability Appl. IOS Press J. (2016, to appear).
  19. 19.
    Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 245–260. Springer, Heidelberg (2014)Google Scholar
  20. 20.
    Schmidt, M., Lausen, G.: Pleasantly consuming linked data with RDF data descriptions. In: Proceedings of the 4th COLD Workshop (2013)Google Scholar
  21. 21.
    Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: Proceedings of the 13th ICDT, pp. 4–33. ACM (2010)Google Scholar
  22. 22.
    Seaborne, A.: SPARQL 1.1 Property Paths (2010). Accessed Nov 2015
  23. 23.
    Soru, T., Marx, E., Ngomo, A.N.: ROCKER: a refinement operator for key discovery. In: Proceedings of the 24th WWW, pp. 1025–1033 (2015)Google Scholar
  24. 24.
    Soru, T., Marx, E., Ngonga Ngomo, A.-C.: Enhancing dataset quality using keys. In: Proceedings of the 14th ISWC, Posters & Demonstrations Track (2015)Google Scholar
  25. 25.
    Stickler, P.: CBD - Concise Bounded Description (2005). Accessed Oct 2015
  26. 26.
    Strong, D.M., Lee, Y.W., Wang, R.Y.: Data quality in context. Commun. ACM 40(5), 103–110 (1997)CrossRefGoogle Scholar
  27. 27.
    Symeonidou, D., Armant, V., Pernelle, N., Saïs, F.: SAKey: scalable almost key discovery in RDF data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 33–49. Springer, Heidelberg (2014)Google Scholar
  28. 28.
    Tao, J., Sirin, E., Bao, J., McGuinness, D.L.: Extending OWL with integrity constraints. In: Description Logics, CEUR Workshop Proceedings, vol. 573. (2010)Google Scholar
  29. 29.
    Thalheim, B.: Fundamentals of cardinality constraints. In: Pernul, G., Tjoa, A.M. (eds.) ER 1992. LNCS, vol. 645, pp. 7–23. Springer, Heidelberg (1992)Google Scholar
  30. 30.
    Töpper, G., Knuth, M., Sack, H.: DBpedia ontology enrichment for inconsistency detection. In: Proceedings of the 8th International Conference on Semantic Systems, I-SEMANTICS 2012, pp. 33–40. ACM, New York (2012)Google Scholar
  31. 31.
    Völker, J., Fleischhacker, D., Stuckenschmidt, H.: Automatic acquisition of class disjointness. Web Semant. Sci. Serv. Agents World Wide Web 35(Part 2), 124–139 (2015). Machine Learning and Data Mining for the Semantic Web (MLDMSW)CrossRefGoogle Scholar
  32. 32.
    Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12(4), 5–33 (1996)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Fujitsu Ireland LimitedDublinIreland
  2. 2.Insight Centre for Data AnalyticsNational University of IrelandGalwayIreland

Personalised recommendations