Knowledge Extraction for Information Retrieval

  • Francesco Corcoglioniti
  • Mauro DragoniEmail author
  • Marco Rospocher
  • Alessio Palmero Aprosio
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9678)


Document retrieval is the task of returning relevant textual resources for a given user query. In this paper, we investigate whether the semantic analysis of the query and the documents, obtained exploiting state-of-the-art Natural Language Processing techniques (e.g., Entity Linking, Frame Detection) and Semantic Web resources (e.g., YAGO, DBpedia), can improve the performances of the traditional term-based similarity approach. Our experiments, conducted on a recently released document collection, show that Mean Average Precision (MAP) increases of 3.5 % points when combining textual and semantic analysis, thus suggesting that semantic content can effectively improve the performances of Information Retrieval systems.


DBpedia Entity Linking (EL) Named Entity Recognition And Classification (NERC) Normalized Discounted Cumulative Gain (NDCG) Ke Ir 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Gangemi, A., Draicchio, F., Presutti, V., Nuzzolese, A.G., Recupero, D.R.: A machine reader for the semantic web. In: Demos of ISWC, pp. 149–152 (2013)Google Scholar
  2. 2.
    Rospocher, M., van Erp, M., Vossen, P., Fokkens, A., Aldabe, I., Rigau, G., Soroa, A., Ploeger, T., Bogaard, T.: Building event-centric knowledge graphs from news. J. Web Semant. (to appear)Google Scholar
  3. 3.
    Corcoglioniti, F., Rospocher, M., Palmero Aprosio, A.: A 2-phase frame-based knowledge extraction framework. In: Proceedings of ACM Symposium on Applied Computing (SAC 2016) (2016, to appear)Google Scholar
  4. 4.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2), 167–195 (2015)Google Scholar
  5. 5.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRefzbMATHGoogle Scholar
  6. 6.
    Waitelonis, J., Exeler, C., Sack, H.: Linked data enabled generalized vector space model to improve document retrieval. In: Proceedings of NLP & DBpedia 2015 Workshop in Conjunction with 14th International Semantic Web Conference (ISWC 2015). CEUR Workshop Proceedings (2015)Google Scholar
  7. 7.
    Croft, W.B.: User-specified domain knowledge for document retrieval. In: Bernardi, L.R., Rabitti, F. (eds.) SIGIR, pp. 201–206. ACM (1986)Google Scholar
  8. 8.
    Gonzalo, J., Verdejo, F., Chugur, I., Cigarrán, J.: Indexing with WordNet synsets can improve text retrieval. CoRR (1998)Google Scholar
  9. 9.
    Fellbaum, C. (ed.): WordNet: An Electonic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  10. 10.
    Dridi, O.: Ontology-based information retrieval: overview and new proposition. In: RCIS, pp. 421–426 (2008)Google Scholar
  11. 11.
    Tomassen, S.L.: Research on ontology-driven information retrieval. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM 2006 Workshops. LNCS, vol. 4278, pp. 1460–1468. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Castells, P., Fernández, M., Vallet, D.: An adaptation of the vector-space model for ontology-based information retrieval. IEEE Trans. Knowl. Data Eng. 19(2), 261–272 (2007)CrossRefGoogle Scholar
  13. 13.
    Vallet, D., Fernández, M., Castells, P.: An ontology-based information retrieval model. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 455–470. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Jimeno-Yepes, A., Llavori, R.B., Rebholz-Schuhmann, D.: Ontology refinement for improved information retrieval. Inf. Process. Manage. 46(4), 426–435 (2010)CrossRefGoogle Scholar
  15. 15.
    Fernández, M., Cantador, I., Lopez, V., Vallet, D., Castells, P., Motta, E.: Semantically enhanced information retrieval: an ontology-based approach. J. Web Sem. 9(4), 434–452 (2011)CrossRefGoogle Scholar
  16. 16.
    Spink, A., Jansen, B., Blakely, C., Koshman, S.: A study of results overlap and uniqueness among major web search engines. Inf. Process. Manage. 42(5), 1379–1391 (2006)CrossRefGoogle Scholar
  17. 17.
    Stojanovic, N.: An approach for defining relevance in the ontology-based information retrieval. In: Web Intelligence, pp. 359–365 (2005)Google Scholar
  18. 18.
    Baziz, M., Boughanem, M., Pasi, G., Prade, H.: An information retrieval driven by ontology: from query to document expansion. In: RIAO (2007)Google Scholar
  19. 19.
    Rouces, J., de Melo, G., Hose, K.: FrameBase: representing n-ary relations using semantic frames. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 505–521. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  20. 20.
    Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    da Costa Pereira, C., Dragoni, M., Pasi, G.: Multidimensional relevance: prioritized aggregation in a personalized information retrieval setting. Inf. Process. Manage. 48(2), 340–357 (2012)CrossRefGoogle Scholar
  22. 22.
    Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  23. 23.
    Corcoglioniti, F., Rospocher, M., Mostarda, M., Amadori, M.: Processing billions of RDF triples on a single machine using streaming and sorting. In: ACM SAC, pp. 368–375 (2015)Google Scholar
  24. 24.
    Voorhees, E., Harman, D.: Overview of the sixth text retrieval conference (trec-6). In: TREC, pp. 1–24 (1997)Google Scholar
  25. 25.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)CrossRefGoogle Scholar
  26. 26.
    Sanderson, M., Zobel, J.: Information retrieval system evaluation: effort, sensitivity, and reliability. In: SIGIR, pp. 162–169. ACM (2005)Google Scholar
  27. 27.
    Noreen, E.W.: Computer-Intensive Methods for Testing Hypotheses: An Introduction. Wiley, New York (1989)Google Scholar
  28. 28.
    Abdelali, A., Cowie, J., Soliman, H.: Improving query precision using semantic expansion. Inf. Process. Manage. 43(3), 705–716 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Francesco Corcoglioniti
    • 1
  • Mauro Dragoni
    • 1
    Email author
  • Marco Rospocher
    • 1
  • Alessio Palmero Aprosio
    • 1
  1. 1.Fondazione Bruno KesslerTrentoItaly

Personalised recommendations