Entity Comparison in RDF Graphs

  • Alina PetrovaEmail author
  • Evgeny Sherkhonov
  • Bernardo Cuenca Grau
  • Ian Horrocks
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10587)


In many applications, there is an increasing need for the new types of RDF data analysis that are not covered by standard reasoning tasks such as SPARQL query answering. One such important analysis task is entity comparison, i.e., determining what are similarities and differences between two given entities in an RDF graph. For instance, in an RDF graph about drugs, we may want to compare Metamizole and Ibuprofen and automatically find out that they are similar in that they are both analgesics but, in contrast to Metamizole, Ibuprofen also has a considerable anti-inflammatory effect. Entity comparison is a widely used functionality available in many information systems, such as universities or product comparison websites. However, comparison is typically domain-specific and depends on a fixed set of aspects to compare. In this paper, we propose a formal framework for domain-independent entity comparison over RDF graphs. We model similarities and differences between entities as SPARQL queries satisfying certain additional properties, and propose algorithms for computing them.


  1. 1.
    Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988)MathSciNetGoogle Scholar
  2. 2.
    Arenas, M., Diaz, G.I., Kostylev, E.V.: Reverse engineering SPARQL queries. In: Proceedings of the 25th International Conference on World Wide Web, pp. 239–249. International World Wide Web Conferences Steering Committee (2016)Google Scholar
  3. 3.
    Baader, F., Turhan, A.-Y.: On the problem of computing small representations of least common subsumers. In: Jarke, M., Lakemeyer, G., Koehler, J. (eds.) KI 2002. LNCS, vol. 2479, pp. 99–113. Springer, Heidelberg (2002). doi: 10.1007/3-540-45751-8_7 CrossRefGoogle Scholar
  4. 4.
    Barcelo, P., Romero, M.: The complexity of reverse engineering problems for conjunctive queries. In: Proceedings of the 20th International Conference on Database Theory. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (to appear, 2017)Google Scholar
  5. 5.
    Belleau, F., Nolin, M.-A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inf. 41(5), 706–716 (2008)CrossRefGoogle Scholar
  6. 6.
    Bonifati, A., Ciucanu, R., Lemay, A.: Learning path queries on graph databases. In: 18th International Conference on Extending Database Technology (EDBT) (2015)Google Scholar
  7. 7.
    Cheng, G., Zhang, Y., Qu, Y.: Explass: exploring associations between entities via top-K ontological patterns and facets. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8797, pp. 422–437. Springer, Cham (2014). doi: 10.1007/978-3-319-11915-1_27
  8. 8.
    Choi, S.-S., Cha, S.-H., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inf. 8(1), 43–48 (2010)Google Scholar
  9. 9.
    Cohen, S., Weiss, Y.Y.: Learning tree patterns from example graphs. In: LIPIcs-Leibniz International Proceedings in Informatics, vol. 31. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2015)Google Scholar
  10. 10.
    Harris, S., Seaborne, A.: SPARQL 1.1 query language. W3C proposed recommendation, 21 March 2013. World Wide Web Consortium (2013). Accessed 1 October 2016
  11. 11.
    Heim, P., Hellmann, S., Lehmann, J., Lohmann, S., Stegemann, T.: RelFinder: revealing relationships in RDF knowledge bases. In: Chua, T.-S., Kompatsiaris, Y., Mérialdo, B., Haas, W., Thallinger, G., Bailer, W. (eds.) SAMT 2009. LNCS, vol. 5887, pp. 182–187. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-10543-2_21 CrossRefGoogle Scholar
  12. 12.
    Huang, A.: Similarity measures for text document clustering. In: Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008), Christchurch, New Zealand, pp. 49–56 (2008)Google Scholar
  13. 13.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web J. 6, 167–195 (2014)Google Scholar
  14. 14.
    Lehmann, J., Schüppel, J., Auer, S.: Discovering unknown connections-the dbpedia relationship finder. CSSW 113, 99–110 (2007)Google Scholar
  15. 15.
    Marchionini, G.: Exploratory search: from finding to understanding. Commun. ACM 49(4), 41–46 (2006)CrossRefGoogle Scholar
  16. 16.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. (TODS) 34(3), 16 (2009)CrossRefGoogle Scholar
  17. 17.
    Staworko, S., Wieczorek, P.: Learning twig and path queries. In: Proceedings of the 15th International Conference on Database Theory, pp. 140–154. ACM (2012)Google Scholar
  18. 18.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from Wikipedia and Wordnet. Web Semant. Sci. Serv. Agents World Wide Web 6(3), 203–217 (2008)CrossRefGoogle Scholar
  19. 19.
    ten Cate, B., Dalmau, V.: The product homomorphism problem and applications. In: 18th International Conference on Database Theory (ICDT 2015), pp. 161–176 (2015)Google Scholar
  20. 20.
    Tran, Q.T., Chan, C.-Y., Parthasarathy, S.: Query by output. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 535–548. ACM (2009)Google Scholar
  21. 21.
    Tran, Q.T., Chan, C.-Y., Parthasarathy, S.: Query reverse engineering. VLDB J. 23(5), 721–746 (2014)CrossRefGoogle Scholar
  22. 22.
    White, R.W., Roth, R.A.: Exploratory search: beyond the query-response paradigm. Synth. Lect. Inf. Concepts Retr. Serv. 1, 1–98 (2009)Google Scholar
  23. 23.
    Zhang, M., Elmeleegy, H., Procopiuc, C.M., Srivastava, D.: Reverse engineering complex join queries. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 809–820. ACM (2013)Google Scholar
  24. 24.
    Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 553–562. ACM (2009)Google Scholar
  25. 25.
    Zloof, M.M.: Query-by-example: a data base language. IBM Syst. J. 16(4), 324–343 (1977)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Alina Petrova
    • 1
    Email author
  • Evgeny Sherkhonov
    • 1
  • Bernardo Cuenca Grau
    • 1
  • Ian Horrocks
    • 1
  1. 1.University of OxfordOxfordUK

Personalised recommendations