Compacting frequent star patterns in RDF graphs


Knowledge graphs have become a popular formalism for representing entities and their properties using a graph data model, e.g., the Resource Description Framework (RDF). An RDF graph comprises entities of the same type connected to objects or other entities using labeled edges annotated with properties. RDF graphs usually contain entities that share the same objects in a certain group of properties, i.e., they match star patterns composed of these properties and objects. In case the number of these entities or properties in these star patterns is large, the size of the RDF graph and query processing are negatively impacted; we refer these star patterns as frequent star patterns. We address the problem of identifying frequent star patterns in RDF graphs and devise the concept of factorized RDF graphs, which denote compact representations of RDF graphs where the number of frequent star patterns is minimized. We also develop computational methods to identify frequent star patterns and generate a factorized RDF graph, where compact RDF molecules replace frequent star patterns. A compact RDF molecule of a frequent star pattern denotes an RDF subgraph that instantiates the corresponding star pattern. Instead of having all the entities matching the original frequent star pattern, a surrogate entity is added and related to the properties of the frequent star pattern; it is linked to the entities that originally match the frequent star pattern. Since the edges between the entities and the objects in the frequent star pattern are replaced by edges between these entities and the surrogate entity of the compact RDF molecule, the size of the RDF graph is reduced. We evaluate the performance of our factorization techniques on several RDF graph benchmarks and compare with a baseline built on top gSpan, a state-of-the-art algorithm to detect frequent patterns. The outcomes evidence the efficiency of proposed approach and show that our techniques are able to reduce execution time of the baseline approach in at least three orders of magnitude. Additionally, RDF graph size can be reduced by up to 66.56% while data represented in the original RDF graph is preserved.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

  2. 2.

    property type refers to rdf:type

  3. 3.

    Available at:


  1. Abadi, D., Madden, S., Ferreira, M. (2006). Integrating compression and execution in column-oriented database systems. In Proceedings of the 2006 ACM sigmod international conference on management of data (pp. 671–682): ACM, DOI

  2. Allen, D., Hodler, A., Hunger, M., Knobloch, M., Lyon, W., Needham, M., Voigt, H. (2019). Understanding trolls with efficient analytics of large graphs in neo4j. BTW 2019.

  3. Álvarez-García, S., Brisaboa, N.R., Fernández, J.D., Martínez-Prieto, M.A. (2011). Compressed k2-triples for full-in-memory RDF engines. arXiv:1105.4004.

  4. Arenas, M., Gutierrez, C., Pérez, J. (2009). Foundations of RDF databases. In Reasoning web. semantic technologies for information systems (pp. 158–204): Springer, DOI

  5. Auer, S., Kovtun, V., Prinz, M., Kasprzik, A., Stocker, M., Vidal, M. (2018). Towards a knowledge graph for science. In Proceedings of the 8th international conference on web intelligence, mining and semantics. WIMS 2018, DOI

  6. Bizer, C., Heath, T., Berners-Lee, T. (2011). Linked data: The story so far. In Semantic services, interoperability and web applications: emerging concepts. IGI Global (pp. 205–227), DOI

  7. Boncz, P.A., Zukowski, M., Nes, N. (2005). Monetdb/x100: Hyper-pipelining query execution. In Cidr., (Vol. 5 pp. 225–237).

  8. Brisaboa, N.R., Ladra, S., Navarro, G. (2009). k2-trees for compact web graph representation. In International symposium on string processing and information retrieval (pp. 18–30): Springer, DOI

  9. Compton, M., Barnaghi, P., Bermudez, L., Garciá-Castro, R., Corcho, O., Cox, S., Graybeal, J., Hauswirth, M., Henson, C., Herzog, A., et al. (2012). The ssn ontology of the w3c semantic sensor network incubator group. Web semantics: science, services and agents on the world wide web, 17, 25–32,

  10. Copeland, G.P., & Khoshafian, S.N. (1985). A decomposition storage model. In ACM sigmod record, (Vol. 14 pp. 268–279): ACM, DOI

  11. Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P. (2014). Grami: Frequent subgraph and pattern mining in a single large graph. Proceedings of the VLDB endowment, 7(7), 517–528.

    Article  Google Scholar 

  12. Ernst, P., Siu, A., Weikum, G. (2015). Knowlife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC bioinformatics, 16 (1), 157.

    Article  Google Scholar 

  13. Fernández, J.D., Martínez-prieto, M.A., Gutiérrez, C., Polleres, A., Arias, M. (2013). Binary RDF representation for publication and exchange (hdt). web semantics: science, services and agents on the world wide web, 19, 22–41,

  14. Fernández, J.D., Llaves, A., Óscar Corcho. (2014). Efficient RDF Interchange (ERI) Format for RDF Data Streams. In The semantic web - ISWC 2014 - 13th international semantic web conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part II, (pp. 244–259).

  15. Grangel-González, I., Halilaj, L., Vidal, M., Rana, O., Lohmann, S., Auer, S., Múller, A.W. (2018). Knowledge graphs for semantically integrating cyber-physical systems. In Database and expert systems applications - 29th international conference, DOI

  16. Joshi, A.K., Hitzler, P., Dong, G. (2013). Logical linked data compression. In Extended semantic web conference (pp. 170–184): Springer, DOI

  17. Karim, F., Mami, M.N., Vidal, M.E., Auer, S. (2017). Large-scale storage and query processing for semantic sensor data. In Proceedings of the 7th international conference on web intelligence, mining and semantics (p. 8): ACM, DOI

  18. Lassila, O., Swick, R.R., et al. Resource description framework (RDF) model and syntax specification (1998).

  19. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S., et al. (2015). Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6(2), 167–195.

    Article  Google Scholar 

  20. Meier, M. (2008). Towards rule-based minimization of RDF graphs under constraints. In International conference on web reasoning and rule systems (pp. 89–103): Springer, DOI

  21. Pan, J.Z., Pérez, J.M.G., Ren, Y., Wu, H., Wang, H., Zhu, M. (2014). Graph pattern based RDF data compression. In Joint international semantic technology conference (pp. 239–256): Springer, DOI

  22. Patni, H.K., Henson, C.A., Sheth, A.P. (2010). Linked sensor data.

  23. Pichler, R., Polleres, A., Skritek, S., Woltran, S. (2010). Redundancy elimination on RDF graphs in the presence of rules, constraints, and queries. In International conference on web reasoning and rule systems (pp. 133–148): Springer, DOI

  24. Prud’hommeaux, E., & Seaborne, A. (2011). Sparql query language for RDF. w3c recommendation (january 15, 2008).

  25. Roth, M.A., & Van Horn, S.J. (1993). Database compression. ACM sigmod record, 22(3), 31–39.

    Article  Google Scholar 

  26. Singhal, A. (2012). Introducing the knowledge graph: things, not strings. Official google blog 5.

  27. Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., et al. (2005). C-store: a column-oriented dbms. In Proceedings of the 31st international conference on very large data bases (pp. 553–564): VLDB Endowment, DOI

  28. Vidal, M.E., Endris, K.M., Jazashoori, S., Sakor, A., Rivas, A. (2019). Transforming heterogeneous data into knowledge for personalized treatments a use case. Datenbank-Spektrum, 1–12.

  29. Westmann, T., Kossmann, D., Helmer, S., Moerkotte, G. (2000). The implementation and performance of compressed databases. ACM Sigmod Record, 29(3), 55–67.

    Article  Google Scholar 

  30. Yan, X., & Han, J. (2002). gspan: Graph-based substructure pattern mining. In 2002 IEEE international conference on data mining, 2002. proceedings (pp. 721–724): IEEE.

  31. Zhu, M., Wu, W., Pan, J.Z., Han, J., Huang, P., Liu, Q. (2018). Predicate invention based RDF data compression. In Joint international semantic technology conference (pp. 153–161): Springer, DOI

  32. Zukowski, M., Heman, S., Nes, N., Boncz, P.A. (2006). Super-scalar ram-cpu cache compression. In Icde, (Vol. 6 p. 59), DOI

Download references


Farah Karim is supported by the German Academic Exchange Service (DAAD); this work is partially funded by the EU H2020 project IASiS (GA No.727658).

Author information



Corresponding author

Correspondence to Farah Karim.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Karim, F., Vidal, M. & Auer, S. Compacting frequent star patterns in RDF graphs. J Intell Inf Syst (2020).

Download citation


  • Semantic Web
  • RDF compaction
  • Linked data
  • Knowledge graph