Skip to main content
Log in

A survey on semantic schema discovery

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

More and more weakly structured, and irregular data sources are becoming available every day. The schema of these sources is useful for a number of tasks, such as query answering, exploration and summarization. However, although semantic web data might contain schema information, in many cases this is completely missing or partially defined. In this paper, we present a survey of the state of the art on schema information extraction approaches. We analyze and classify these approaches into three families: (1) approaches that exploit the implicit structure of the data, without assuming that some explicit statements on the schema are provided in the dataset; (2) approaches that use the explicit schema statements contained in the dataset to complement and enrich the schema, and (3) those that discover structural patterns contained in a dataset. We compare these studies in terms of their approach, advantages and limitations. Finally we discuss the problems that remain open.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Resource Description Framework: http://www.w3.org/RDF/.

  2. Protégé: https://protege.stanford.edu/.

  3. Knofuss: technologies.kmi.open.ac.uk/knofuss.

  4. Silk: wifo5-03.informatik.uni-mannheim.de/bizer/silk.

  5. w3C Shape Expressions (ShEx) 2.1 Primer: https://shex.io/shex-primer/.

  6. StaTIX: https://github.com/eXascaleInfolab/TInfES.

  7. Hadoop: http://hadoop.apache.org/.

  8. https://github.com/dfleischhacker/goldminer.

  9. http://www.ke.tu-darmstadt.de/resources/mob4lod/.

  10. http://dl-learner.org.

  11. https://github.com/SmartDataAnalytics/DL-Learner.

  12. https://github.com/HeikoPaulheim/sd-type-validate.

  13. SPARQL Query Language queries: http://www.w3.org/TR/rdf-sparql-query/

  14. SchemaDecrypt(++): https://github.com/Kenza-Kellou-Menouer/SchemaDecrypt

References

  1. Abiteboul, S., Arenas, M., Barceló, P., Bienvenu, M., Calvanese, D., David, C., Hull, R., Hüllermeier, E., Kimelfeld, B., Libkin, L., Martens, W., Milo, T., Murlak, F., Neven, F., Ortiz, M., Schwentick, T., Stoyanovich, J., Su, J., Suciu, D., Vianu, V., Yi, K.: Research directions for principles of data management (dagstuhl perspectives workshop 16151). arXiv:1701.09007 (2017)

  2. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995)

    MATH  Google Scholar 

  3. Agathangelos, G., Troullinou, G., Kondylakis, H., Stefanidis, K., Plexousakis, D.: Incremental data partitioning of RDF data in SPARK. In: The Semantic Web: ESWC 2018 Satellite Events—ESWC 2018 Satellite Events, Heraklion, Crete, Greece, June 3-7, 2018, Revised Selected Papers, Lecture Notes in Computer Science, vol. 11155, pp. 50–54. Springer (2018)

  4. Agathangelos, G., Troullinou, G., Kondylakis, H., Stefanidis, K., Plexousakis, D.: RDF query answering using apache spark: Review and assessment. In: 34th IEEE International Conference on Data Engineering Workshops, ICDE Workshops 2018, Paris, France, April 16–20, 2018, pp. 54–59. IEEE Computer Society (2018)

  5. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), pp. 478–499. Morgan Kaufmann (1994)

  6. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7–9, 2007, pp. 1027–1035 (2007)

  7. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A nucleus for a web of open data the semantic web. The Semantic Web (2007)

  8. Baazizi, M.A., Colazzo, D., Ghelli, G., Sartiani, C.: Parametric schema inference for massive JSON datasets. VLDB J. 28(4), 497–521 (2019)

    Article  Google Scholar 

  9. Baazizi, M.A., Lahmar, H.B., Colazzo, D., Ghelli, G., Sartiani, C.: Schema inference for massive JSON datasets. In: Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21–24, 2017, pp. 222–233 (2017)

  10. Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The r*-tree: An efficient and robust access method for points and rectangles. In: Garcia-Molina, H., Jagadish, H.V. (eds.) Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, USA, May 23–25, 1990, pp. 322–331. ACM Press (1990). https://doi.org/10.1145/93597.98741

  11. Belghaouti, F., Bouzeghoub, A., Kazi-Aoul, Z., Chiky, R.: Fregrapad: Frequent RDF graph patterns detection for semantic data streams. In: 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS), pp. 1–9. IEEE (2016)

  12. Benedetti, F., Bergamaschi, S., Po, L.: Online index extraction from linked open data sources. In: Gentile, A.L., Zhang, Z., d’Amato, C., Paulheim, H. (eds.) Proceedings of the Second International Workshop on Linked Data for Information Extraction (LD4IE 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 20, 2014, CEUR Workshop Proceedings, vol. 1267, pp. 9–20. CEUR-WS.org (2014)

  13. Benedetti, F., Bergamaschi, S., Po, L.: Exposing the underlying schema of LOD sources. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2015, Singapore, December 6–9, 2015 - Volume I, pp. 301–304. IEEE Computer Society (2015)

  14. Bouhamoum, R., Kedad, Z., Lopes, S.: Scalable schema discovery for RDF data. In: The 46th Transactions on Large-Scale Data and Knowledge-Centered Systems journal (TLDKS XLVI), pp. 91–120. Springer-Verlag Berlin (2020)

  15. Bouhamoum, R., Kellou-Menouer, K., Kedad, Z., Lopes, S.: Scaling up schema discovery for RDF datasets. In: 34th IEEE International Conference on Data Engineering Workshops, ICDE Workshops 2018, Paris, France, April 16–20, 2018, pp. 84–89 (2018)

  16. Brosius, D., Staab, S.: Linked data querying through fca-based schema indexing. In: Proceedings of the 5th International Workshop “What can FCA do for Artificial Intelligence”? co-located with the European Conference on Artificial Intelligence, FCA4AI@ECAI 2016, The Hague, the Netherlands, August 30, 2016., pp. 63–68 (2016)

  17. Bühmann, L., Lehmann, J., Westphal, P.: Dl-learner—a framework for inductive learning on the semantic web. J. Web Semant. 39, 15–24 (2016)

    Article  Google Scholar 

  18. Bühmann, L., Lehmann, J., Westphal, P., Bin, S.: Dl-learner structured machine learning on semantic web data. In: Companion Proceedings of the The Web Conference 2018, pp. 467–471 (2018)

  19. Carmel, D., Roitman, H., Zwerdling, N.: Enhancing cluster labeling using wikipedia. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 139–146. ACM (2009)

  20. Cebiric, S., Goasdoué, F., Kondylakis, H., Kotzinos, D., Manolescu, I., Troullinou, G., Zneika, M.: Summarizing semantic graphs: a survey. VLDB J. 28(3), 295–327 (2019)

    Article  Google Scholar 

  21. Čebirić, Š, Goasdoué, F., Manolescu, I.: Query-oriented summarization of RDF graphs. Proc. VLDB Endow. 8(12), 2012–2015 (2015)

    Article  Google Scholar 

  22. Chen, J.X., Reformat, M.Z.: Learning categories from linked open data. In: International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 396–405. Springer (2014)

  23. Cheung, W., Zaïane, O.R.: Incremental mining of frequent patterns without candidate generation or support constraint. In: Desai, B.C., Ng, W. (eds.) 7th International Database Engineering and Applications Symposium (IDEAS 2003), 16-18 July 2003, Hong Kong, China, pp. 111–116. IEEE Computer Society (2003). https://doi.org/10.1109/IDEAS.2003.1214917

  24. Christodoulou, K., Paton, N.W., Fernandes, A.A.: Structure inference for linked data sources using clustering. In: Transactions on Large-Scale Data-and Knowledge-Centered Systems XIX, pp. 1–25. Springer (2015)

  25. Christodoulou, K., Paton, N.W., Fernandes, A.A.A.: Structure inference for linked data sources using clustering. In: Joint 2013 EDBT/ICDT Conferences, EDBT/ICDT ’13, Genoa, Italy, March 22, 2013, Workshop Proceedings, pp. 60–67 (2013)

  26. Christophides, V., Efthymiou, V., Palpanas, T., Papadakis, G., Stefanidis, K.: An overview of end-to-end entity resolution for big data. ACM Comput. Surv. 53(6), 127:1-127:42 (2021). https://doi.org/10.1145/3418896

    Article  Google Scholar 

  27. Ellefi, M.B., Bellahsene, Z., Breslin, J.G., Demidova, E., Dietze, S., Szymanski, J., Todorov, K.: RDF dataset profiling—a survey of features, methods, vocabularies and applications. Semantic Web 9(5), 677–705 (2018)

    Article  Google Scholar 

  28. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA, pp. 226–231 (1996)

  29. Fang, L., Miao, Q., Meng, Y.: Dbpedia entity type inference using categories. In: Proceedings of the ISWC 2016 Posters & Demonstrations Track co-located with 15th International Semantic Web Conference (ISWC 2016), Kobe, Japan, October 19, 2016 (2016)

  30. Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Mach. Learn. 2(2), 139–172 (1987)

    Google Scholar 

  31. Friedman, J.H., Baskett, F., Shustek, L.J.: An algorithm for finding nearest neighbors. IEEE Trans. Comput. 100(10), 1000–1006 (1975)

    Article  Google Scholar 

  32. Fuglede, B., Topsøe, F.: Jensen–Shannon divergence and Hilbert space embedding. In: Proceedings of the International Symposium on Information Theory, ISIT, p. 31. IEEE (2004)

  33. Gennari, J.H., Langley, P., Fisher, D.H.: Models of incremental concept formation. Artif. Intell. 40(1–3), 11–61 (1989). https://doi.org/10.1016/0004-3702(89)90046-5

    Article  Google Scholar 

  34. Glimm, B., Horrocks, I., Motik, B., Stoilos, G., Wang, Z.: Hermit: an OWL 2 reasoner. J. Autom. Reason. 53(3), 245–269 (2014). https://doi.org/10.1007/s10817-014-9305-1

    Article  MATH  Google Scholar 

  35. Goldman, R., Widom, J.: Dataguides: Enabling query formulation and optimization in semistructured databases (1997)

  36. Gómez, S.N., Etcheverry, L., Marotta, A., Consens, M.P.: Findings from two decades of research on schema discovery using a systematic literature review. In: Olteanu, D., Poblete, B. (eds.) Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management, Cali, Colombia, May 21–25, 2018, CEUR Workshop Proceedings, vol. 2100. CEUR-WS.org (2018)

  37. Gottron, T., Scherp, A., Krayer, B., Peters, A.: Lodatio: A schema-based retrieval system for linked open data at web-scale. In: The Semantic Web: ESWC 2013 Satellite Events—ESWC 2013 Satellite Events, Montpellier, France, May 26–30, 2013, Revised Selected Papers, Lecture Notes in Computer Science, vol. 7955, pp. 142–146. Springer (2013)

  38. Gottron, T., Scherp, A., Krayer, B., Peters, A.: Lodatio: using a schema-level index to support users infinding relevant sources of linked data. In: Benjamins, V.R., d’Aquin, M., Gordon, A. (eds.) Proceedings of the 7th International Conference on Knowledge Capture, K-CAP 2013, Banff, Canada, June 23–26, 2013, pp. 105–108. ACM (2013). https://doi.org/10.1145/2479832.2479841

  39. Hagen, M., Michel, M., Stein, B.: What was the query? Generating queries for document sets with applications in cluster labeling. In: Proceedings of the International Conference on Applications of Natural Language to Information Systems, pp. 124–133. Springer (2015)

  40. Hamdi, F., Cherfi, S.S.: Une approche pour évaluer la complétude de données RDF. Ingénierie des Systèmes d Inf. 21(3), 31–52 (2016). https://doi.org/10.3166/isi.21.3.31-52

    Article  Google Scholar 

  41. Hignette, G., Buche, P., Dibie-Barthélemy, J., Haemmerlé, O.: Fuzzy annotation of web data tables driven by a domain ontology. In: Proceedings of the European Semantic Web Conference, pp. 638–653. Springer (2009)

  42. Issa, S., Paris, P., Hamdi, F., Cherfi, S.S.: Revealing the conceptual schemas of RDF datasets. In: Advanced Information Systems Engineering—31st International Conference, CAiSE 2019, Rome, Italy, June 3–7, 2019, Proceedings, pp. 312–327 (2019)

  43. Kardoulakis, N., Kellou-Menouer, K., Troullinou, G., Kedad, Z., Plexousakis, D., Kondylakis, H.: Hint: Hybrid and incremental type discovery for large RDF data sources. In: SSDBM (2021)

  44. Kellou-Menouer, K.: Découverte de schéma pour les données du web sémantique. (schema discovery in semantic web data sources). Ph.D. thesis, University of Paris-Saclay, France (2017). https://tel.archives-ouvertes.fr/tel-01630962

  45. Kellou-Menouer, K., Kedad, Z.: Evaluating the gap between an RDF dataset and its schema. In: Conceptual Modeling—34th International Conference, ER 2015 Workshops, QMMQ, pp. 283–292. Springer (2015)

  46. Kellou-Menouer, K., Kedad, Z.: A clustering based approach for type discovery in RDF data sources. In: Otjacques, B., Darmont, J., Tamisier, T. (eds.) 15èmes Journées Francophones Extraction et Gestion des Connaissances, EGC 2015, 27–30 Janvier 2015, Luxembourg, Revue des Nouvelles Technologies de l’Information, vol. E-28, pp. 471–472. Hermann-Éditions (2015). http://editions-rnti.fr/?inprocid=1002113

  47. Kellou-Menouer, K., Kedad, Z.: Discovering types in RDF datasets. In: Gandon, F., Guéret, C., Villata, S., Breslin, J.G., Faron-Zucker, C., Zimmermann, A. (eds.) The Semantic Web: ESWC 2015 Satellite Events—ESWC 2015 Satellite Events Portorož, Slovenia, May 31–June 4, 2015, Revised Selected Papers, Lecture Notes in Computer Science, vol. 9341, pp. 77–81. Springer (2015). https://doi.org/10.1007/978-3-319-25639-9_15

  48. Kellou-Menouer, K., Kedad, Z.: Schema discovery in RDF data sources. In: Conceptual Modeling—34th International Conference, ER 2015, pp. 481–495. Springer (2015)

  49. Kellou-Menouer, K., Kedad, Z.: Class annotation using linked open data. In: On the Move to Meaningful Internet Systems: OTM 2016 Conferences—Confederated International Conferences: CoopIS, C&TC, and ODBASE 2016, Rhodes, Greece, October 24–28, 2016, Proceedings, pp. 709–726 (2016)

  50. Kellou-Menouer, K., Kedad, Z.: A self-adaptive and incremental approach for data profiling in the semantic web. Large-Scale Data- and Knowl. Cent. Syst 29, 108–133 (2016)

    Google Scholar 

  51. Kellou-Menouer, K., Kedad, Z.: On-line versioned schema inference for large semantic web data sources. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, SSDBM 2017, Chicago, USA, 2017 (2017)

  52. Kellou-Menouer, K., Kedad, Z.: SchemaDecrypt++: parallel on-line versioned schema inference for large semantic web data sources. Inf. Syst. J. 93, 101551 (2020). https://doi.org/10.1016/j.is.2020.101551

    Article  Google Scholar 

  53. Khatchadourian, S., Consens, M.P.: Explod: Summary-based exploration of interlinking and RDF usage in the linked open data cloud. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) The Semantic Web: Research and Applications, 7th Extended Semantic Web Conference, ESWC 2010, Heraklion, Crete, Greece, May 30–June 3, 2010, Proceedings, Part II, Lecture Notes in Computer Science, vol. 6089, pp. 272–287. Springer (2010). https://doi.org/10.1007/978-3-642-13489-0_19

  54. Kirchberg, M., Leonardi, E., Tan, Y.S., Link, S., Ko, R.K., Lee, B.S.: Formal concept discovery in semantic Web data. In: Formal Concept Analysis, pp. 164–179. Springer (2012)

  55. Kondylakis, H., Plexousakis, D.: Ontology evolution in data integration: query rewriting to the rescue. In: Conceptual Modeling—ER 2011, 30th International Conference, ER 2011, Brussels, Belgium, October 31 –November 3, 2011. Proceedings, Lecture Notes in Computer Science, vol. 6998, pp. 393–401. Springer (2011)

  56. Konrath, M., Gottron, T., Scherp, A.: Schemex–web-scale indexed schema extraction of linked open data. Semantic Web Challenge, Submission to the Billion Triple Track, pp. 52–58 (2011)

  57. Konrath, M., Gottron, T., Staab, S., Scherp, A.: Schemex: efficient construction of a data catalogue by stream-based indexing of linked data. In: Web Semantics: Science, Services and Agents on the World Wide Web 16, 52–58 (2012)

  58. Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endow. 3(1–2), 1338–1347 (2010)

    Article  Google Scholar 

  59. Lucchese, C., Orlando, S., Perego, R.: A unifying framework for mining approximate top- \(k\) binary patterns. IEEE Trans. Knowl. Data Eng. 26(12), 2900–2913 (2014)

    Article  Google Scholar 

  60. Lutov, A., Roshankish, S., Khayati, M., Cudré-Mauroux, P.: Statix -statistical type inference on linked data. In: Abe, N., Liu, H., Pu, C., Hu, X., Ahmed, N., Qiao, M., Song, Y., Kossmann, D., Liu, B., Lee, K., Tang, J., He, J., Saltz, J.S. (eds.) IEEE International Conference on Big Data, Big Data 2018, Seattle, WA, USA, December 10–13, 2018, pp. 2253–2262. IEEE (2018)

  61. Masseglia, F., Poncelet, P., Teisseire, M.: Incremental mining of sequential patterns in large databases. Data Knowl. Eng. 46(1), 97–121 (2003). https://doi.org/10.1016/S0169-023X(02)00209-4

    Article  Google Scholar 

  62. McKusick, K.B., Langley, P.: Constraints on tree structure in concept formation. In: Mylopoulos, J., Reiter, R. (eds.) Proceedings of the 12th International Joint Conference on Artificial Intelligence. Sydney, Australia, August 24–30, 1991, pp. 810–816. Morgan Kaufmann (1991). http://ijcai.org/Proceedings/91-2/Papers/031.pdf

  63. Milner, R.: Communication and concurrency, vol. 84 (1989)

  64. Motik, B., Shearer, R., Horrocks, I.: Hypertableau reasoning for description logics. J. Artif. Intell. Res. 36, 165–228 (2009). https://doi.org/10.1613/jair.2811

    Article  MathSciNet  MATH  Google Scholar 

  65. Motta, E., Mulholland, P., Peroni, S., d’Aquin, M., Gómez-Pérez, J.M., Mendez, V., Zablith, F.: A novel approach to visualizing and navigating ontologies. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N.F., Blomqvist, E. (eds.) The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23–27, 2011, Proceedings, Part I, Lecture Notes in Computer Science, vol. 7031, pp. 470–486. Springer (2011). https://doi.org/10.1007/978-3-642-25073-6_30

  66. Nestorov, S., Abiteboul, S., Motwani, R.: Inferring structure in semistructured data. ACM SIGMOD Record, pp. 39–43 (1997)

  67. Nestorov, S., Abiteboul, S., Motwani, R.: Extracting schema from semistructured data. In: ACM SIGMOD Record, vol. 27, pp. 295–306. ACM (1998)

  68. Nuzzolese, A.G., Gangemi, A., Presutti, V., Ciancarini, P.: Type inference through the analysis of wikipedia links. In: WWW2012 Workshop on Linked Data on the Web, Lyon, France (2012)

  69. Papakonstantinou, Y., Garcia-Molina, H., Widom, J.: Object exchange across heterogeneous information sources. In: Data Engineering, Proceedings of the Eleventh International Conference on, pp. 251–260. IEEE (1995)

  70. Paulheim, H.: Browsing linked open data with auto complete. Semantic Web Challenge (2012)

  71. Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semantic Web 8(3), 489–508 (2017)

    Article  Google Scholar 

  72. Paulheim, H., Bizer, C.: Type inference on noisy RDF data. In: The Semantic Web–ISWC 2013, pp. 510–525. Springer (2013)

  73. PRISS, U.: Formal concept analysis in information science. Annu. Rev. Inf. Sci. Technol. 40, 521–543 (2006)

    Article  Google Scholar 

  74. Prud’hommeaux, E., Gayo, J.E.L., Solbrig, H.R.: Shape expressions: an RDF validation and transformation language. In: Sack, H., Filipowska, A., Lehmann, J., Hellmann, S. (eds.) Proceedings of the 10th International Conference on Semantic Systems, SEMANTICS 2014, Leipzig, Germany, September 4–5, 2014, pp. 32–40. ACM (2014). https://doi.org/10.1145/2660517.2660523

  75. Quercini, G., Reynaud, C.: Entity discovery and annotation in tables. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 693–704. ACM (2013)

  76. Quilitz, B., Leser, U.: Querying distributed RDF data sources with SPARQL. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Tenerife, Canary Islands, Spain, June 1–5, 2008, Proceedings, Lecture Notes in Computer Science, vol. 5021, pp. 524–538. Springer (2008). https://doi.org/10.1007/978-3-540-68234-9_39

  77. Ruiz, D.S., Morales, S.F., Molina, J.G.: Inferring versioned schemas from NoSQL databases and its applications. In: International Conference on Conceptual Modeling, pp. 467–480. Springer (2015)

  78. Schätzle, A., Neu, A., Lausen, G., Przyjaciel-Zablocki, M.: Large-scale bisimulation of RDF graphs. In: Proceedings of the Fifth Workshop on Semantic Web Information Management, p. 1. ACM (2013)

  79. Simancik, F., Kazakov, Y., Horrocks, I.: Consequence-based reasoning beyond horn ontologies. In: Walsh, T. (ed.) IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16–22, 2011, pp. 1093–1098. IJCAI/AAAI (2011). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-187

  80. Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: a practical OWL-DL reasoner. J. Web Semant. 5(2), 51–53 (2007). https://doi.org/10.1016/j.websem.2007.03.004

    Article  Google Scholar 

  81. Spahiu, B., Porrini, R., Palmonari, M., Rula, A., Maurino, A.: ABSTAT: ontology-driven linked data summaries with pattern minimalization. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenic, D., Auer, S., Lange, C. (eds.) The Semantic Web—ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29–June 2, 2016, Revised Selected Papers, Lecture Notes in Computer Science, vol. 9989, pp. 381–395 (2016). https://doi.org/10.1007/978-3-319-47602-5_51

  82. Stein, B., Zu Eissen, S.M.: Topic identification: Framework and application. In: Proceedings of the International Conference on Knowledge Management (2004)

  83. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web (2007)

  84. Treeratpituk, P., Callan, J.: Automatically labeling hierarchical clusters. In: Proceedings of the International Conference on Digital Government Research (2006)

  85. Troullinou, G., Kondylakis, H., Daskalaki, E., Plexousakis, D.: RDF digest: Efficient summarization of RDF/S kbs. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) The Semantic Web. Latest Advances and New Domains—12th European Semantic Web Conference, ESWC 2015, Portoroz, Slovenia, May 31–June 4, 2015. Proceedings, Lecture Notes in Computer Science, vol. 9088, pp. 119–134. Springer (2015). https://doi.org/10.1007/978-3-319-18818-8_8

  86. Troullinou, G., Kondylakis, H., Lissandrini, M., Mottin, D.: SOFOS: demonstrating the challenges of materialized view selection on knowledge graphs. In: SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20–25, 2021, pp. 2789–2793. ACM (2021)

  87. Troullinou, G., Kondylakis, H., Plexousakis, D.: Semantic partitioning for RDF datasets. In: Information Search, Integration, and Personlization—11th International Workshop, ISIP 2016, vol. 760, pp. 99–115. Springer (2016). https://doi.org/10.1007/978-3-319-68282-2_7

  88. Troullinou, G., Kondylakis, H., Stefanidis, K., Plexousakis, D.: Exploring RDFS kbs using summaries. In: The Semantic Web—ISWC 2018—17th International Semantic Web Conference, Monterey, CA, USA, October 8–12, 2018, Proceedings, Part I, Lecture Notes in Computer Science, vol. 11136, pp. 268–284. Springer (2018)

  89. Troullinou, G., Kondylakis, H., Stefanidis, K., Plexousakis, D.: Rdfdigest+: A summary-driven system for kbs exploration. In: Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018), Monterey, USA, October 8th–12th, 2018, CEUR Workshop Proceedings, vol. 2180. CEUR-WS.org (2018)

  90. Tsarkov, D., Horrocks, I.: Fact++ description logic reasoner: System description. In: Furbach, U., Shankar, N. (eds.) Automated Reasoning, Third International Joint Conference, IJCAR 2006, Seattle, WA, USA, August 17–20, 2006, Proceedings, Lecture Notes in Computer Science, vol. 4130, pp. 292–297. Springer (2006). https://doi.org/10.1007/11814771_26

  91. Tsuboi, Y., Suzuki, N.: An algorithm for extracting shape expression schemas from graphs. In: Schimmler, S., Borghoff, U.M. (eds.) Proceedings of the ACM Symposium on Document Engineering 2019, Berlin, Germany, September 23–26, 2019, pp. 32:1–32:4. ACM (2019)

  92. Vassiliou, G., Troullinou, G., Papadakis, N., Kondylakis, H.: Wbsum: workload-based summaries for RDF/S kbs. In: SSDBM 2021: 33rd International Conference on Scientific and Statistical Database Management, Tampa, FL, USA, July 6–7, 2021, pp. 248–252. ACM (2021)

  93. Vassiliou, G., Troullinou, G., Papadakis, N., Stefanidis, K., Pitoura, E., Kondylakis, H.: Coverage-based summaries for RDF kbs. In: The Semantic Web: ESWC 2021 Satellite Events–Virtual Event, June 6–10, 2021, Revised Selected Papers, Lecture Notes in Computer Science, vol. 12739, pp. 98–102. Springer (2021)

  94. Venetis, P., Halevy, A., Madhavan, J., Paşca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering semantics of tables on the web. Proc. VLDB Endow. 4(9), 528–538 (2011)

    Article  Google Scholar 

  95. Völker, J., Niepert, M.: Statistical schema induction. In: The Semantic Web: Research and Applications, pp. 124–138. Springer (2011)

  96. W3C: Rdf 1.1 semantics. https://www.w3.org/TR/2014/REC-rdf11-mt-20140225/

  97. W3C: Resource description framework. http://www.w3.org/RDF/

  98. W3C: Owl 1 web ontology language. https://www.w3.org/TR/owl-features/ (2012)

  99. W3C: Owl 2 web ontology language. https://www.w3.org/TR/owl2-overview/ (2012)

  100. Wang, K., Liu, H.: Schema discovery for semistructured data. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), Newport Beach, California, USA, August 14–17, 1997, pp. 271–274 (1997)

  101. Wang, Q.Y., Yu, J.X., Wong, K.F.: Approximate graph schema extraction for semi-structured data. In: Advances in Database Technology EDBT 2000, pp. 302–316. Springer (2000)

  102. Wu, G., Li, J., Feng, L., Wang, K.: Identifying potentially important concepts and relations in an ontology. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T.W., Thirunarayan, K. (eds.) The Semantic Web—ISWC 2008, 7th International Semantic Web Conference, ISWC 2008, Karlsruhe, Germany, October 26–30, 2008. Proceedings, Lecture Notes in Computer Science, vol. 5318, pp. 33–49. Springer (2008). https://doi.org/10.1007/978-3-540-88564-1_3

  103. Zadeh, P.D.H., Reformat, M.Z.: Context-aware similarity assessment within semantic space formed in linked data. J. Ambient. Intell. Humaniz. Comput. 4(4), 515–532 (2013)

    Article  Google Scholar 

  104. Zemmouchi-Ghomari, L., Mezaache, K., Oumessad, M.: Ontology assessment based on linked data principles. Int. J. Web Inf. Syst. 14(4), 453–479 (2018). https://doi.org/10.1108/IJWIS-01-2018-0003

    Article  Google Scholar 

  105. Zhang, X., Cheng, G., Ge, W., Qu, Y.: Summarizing vocabularies in the global semantic web. J. Comput. Sci. Technol. 24(1), 165–174 (2009). https://doi.org/10.1007/s11390-009-9212-9

    Article  Google Scholar 

  106. Zhang, X., Cheng, G., Qu, Y.: Ontology summarization based on RDF sentence graph. In: Williamson, C.L., Zurko, M.E., Patel-Schneider, P.F., Shenoy, P.J. (eds.) Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8–12, 2007, pp. 707–716. ACM (2007). https://doi.org/10.1145/1242572.1242668

  107. Zheng, Z., Webb, G.I.: Lazy learning of Bayesian rules. Mach. Learn. 41(1), 53–84 (2000)

    Article  Google Scholar 

  108. Zneika, M., Lucchese, C., Vodislav, D., Kotzinos, D.: Rdf graph summarization based on approximate patterns. In: International Workshop on Information Search, Integration, and Personalization, pp. 69–87. Springer (2015)

  109. Zneika, M., Lucchese, C., Vodislav, D., Kotzinos, D.: Summarizing linked data RDF graphs using approximate graph pattern mining. In: Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, March 15–16, 2016, Bordeaux, France, March 15–16, 2016, pp. 684–685 (2016)

  110. Zong, N., Im, D., Yang, S., Namgoong, H., Kim, H.: Dynamic generation of concepts hierarchies for knowledge discovering in bio-medical linked data sets. In: The 6th International Conference on Ubiquitous Information Management and Communication, ICUIMC ’12, Kuala Lumpur, Malaysia, February 20–22, 2012, pp. 12:1–12:5 (2012)

Download references

Acknowledgements

This research project was partially supported by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “2nd Call for H.F.R.I. Research Projects to support Post-Doctoral Researchers” (iQARuS Project No 1147).

Author information

Authors and Affiliations

Authors

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kellou-Menouer, K., Kardoulakis, N., Troullinou, G. et al. A survey on semantic schema discovery. The VLDB Journal 31, 675–710 (2022). https://doi.org/10.1007/s00778-021-00717-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-021-00717-x

Keywords

Navigation