Efficient Graph-Based Document Similarity

Conference paper

DOI: 10.1007/978-3-319-34129-3_21

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9678)
Cite this paper as:
Paul C., Rettinger A., Mogadala A., Knoblock C.A., Szekely P. (2016) Efficient Graph-Based Document Similarity. In: Sack H., Blomqvist E., d'Aquin M., Ghidini C., Ponzetto S., Lange C. (eds) The Semantic Web. Latest Advances and New Domains. ESWC 2016. Lecture Notes in Computer Science, vol 9678. Springer, Cham

Abstract

Assessing the relatedness of documents is at the core of many applications such as document retrieval and recommendation. Most similarity approaches operate on word-distribution-based document representations - fast to compute, but problematic when documents differ in language, vocabulary or type, and neglecting the rich relational knowledge available in Knowledge Graphs. In contrast, graph-based document models can leverage valuable knowledge about relations between entities - however, due to expensive graph operations, similarity assessments tend to become infeasible in many applications. This paper presents an efficient semantic similarity approach exploiting explicit hierarchical and transversal relations. We show in our experiments that (i) our similarity measure provides a significantly higher correlation with human notions of document similarity than comparable measures, (ii) this also holds for short documents with few annotations, (iii) document similarity can be calculated efficiently compared to other graph-traversal based approaches.

Keywords

Semantic document similarity Knowledge graph based document models Efficient similarity calculation 

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Institute of Applied Informatics and Formal Description Methods (AIFB)Karlsruhe Institute for TechnologyKarlsruheGermany
  2. 2.Information Sciences InstituteUniversity of Southern CaliforniaMarina Del ReyUSA

Personalised recommendations