Advertisement

Identifying and exploiting target entity type information for ad hoc entity retrieval

  • Darío Garigliotti
  • Faegheh Hasibi
  • Krisztian Balog
Knowledge Graphs and Semantics in Text Analysis and Retrieval

Abstract

Today, the practice of returning entities from a knowledge base in response to search queries has become widespread. One of the distinctive characteristics of entities is that they are typed, i.e., assigned to some hierarchically organized type system (type taxonomy). The primary objective of this paper is to gain a better understanding of how entity type information can be utilized in entity retrieval. We perform this investigation in two settings: firstly, in an idealized “oracle” setting, assuming that we know the distribution of target types of the relevant entities for a given query; and secondly, in a realistic scenario, where target entity types are identified automatically based on the keyword query. We perform a thorough analysis of three main aspects: (i) the choice of type taxonomy, (ii) the representation of hierarchical type information, and (iii) the combination of type-based and term-based similarity in the retrieval model. Using a standard entity search test collection based on DBpedia, we show that type information can significantly and substantially improve retrieval performance, yielding up to 67% relative improvement in terms of NDCG@10 over a strong text-only baseline in an oracle setting. We further show that using automatic target type detection, we can outperform the text-only baseline by 44% in terms of NDCG@10. This is as good as, and sometimes even better than, what is attainable by using explicit target type information provided by humans. These results indicate that identifying target entity types of queries is challenging even for humans and attests to the effectiveness of our proposed automatic approach.

Keywords

Entity retrieval Entity types Semantic search Query understanding 

References

  1. Balog, K. (2018). Entity-oriented search (Vol. 39)., The information retrieval series Berlin: Springer.Google Scholar
  2. Balog, K., Bron, M., & De Rijke, M. (2011). Query modeling for entity search based on terms, categories, and examples. ACM Transactions on Information Systems, 29(4), 22:1–22:31.CrossRefGoogle Scholar
  3. Balog, K., de Vries, A .P., Serdyukov, P., Thomas, P., & Westerveld, T. (2010) Overview of the TREC 2009 entity track. In Proceedings of the twentieth text retrieval conference, TREC ’10.Google Scholar
  4. Balog, K., & Neumayer, R. (2012). Hierarchical target type identification for entity-oriented queries. In Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12 (pp. 2391–2394).Google Scholar
  5. Balog, K., & Neumayer, R. (2013). A test collection for entity search in DBpedia. In Proceedings of the 36th annual international ACM SIGIR conference on research and development in information retrieval (pp. 737–740).Google Scholar
  6. Balog, K., Serdyukov, P., & De Vries, A. P. (2012) Overview of the TREC 2011 entity track. In Proceedings of the twentieth text retrieval conference, TREC ’11.Google Scholar
  7. Bron, M., Balog, K., & de Rijke, M. (2010). Ranking related entities: Components and analyses. In Proceedings of the 19th ACM conference on information and knowledge management, CIKM ’10 (pp. 1079–1088).Google Scholar
  8. Demartini, G., Firan, C. S., & Iofciu, T. (2008) L3S at INEX 2007. In Focused access to XML documents, 7th international workshop of the initiative for the evaluation of XML retrieval, INEX ’08 (pp. 252–263).Google Scholar
  9. Demartini, G., Firan, C. S., Iofciu, T., Krestel, R., & Nejdl, W. (2010a). Why finding entities in Wikipedia is difficult, sometimes. Information Retrieval, 13(5), 534–567.CrossRefGoogle Scholar
  10. Demartini, G., Iofciu, T., & De Vries, A. P. (2010b) Overview of the INEX 2009 entity ranking track. In Focused retrieval and evaluation, 8th international workshop of the initiative for the evaluation of XML retrieval, revised and selected papers, INEX ’09 (pp. 254–264).Google Scholar
  11. Fleischman, M., & Hovy, E. (2002). Fine grained classification of named entities. In Proceedings of the 15th international conference on computational linguistics, COLING ’02 (pp. 1–7).Google Scholar
  12. Fossati, M., Kontokostas, D., & Lehmann, J. (2015). Unsupervised learning of an extensive and usable taxonomy for DBpedia. In Proceedings of the 11th international conference on semantic systems, SEMANTICS ’15 (pp. 177–184).Google Scholar
  13. Gangemi, A., Nuzzolese, A. G., Presutti, V., Draicchio, F., Musetti, A., & Ciancarini, P. (2012). Automatic typing of DBpedia entities. In Proceedings of the semantic web: 11th international semantic web conference, ISWC ’12 (pp. 65–81).Google Scholar
  14. Garigliotti, D., & Balog, K. (2017). On type-aware entity retrieval. In Proceedings of the ACM SIGIR international conference on theory of information retrieval, ICTIR ’17 (pp. 27–34). ACM.Google Scholar
  15. Garigliotti, D., Hasibi, F., & Balog, K. (2017). Target type identification for entity-bearing queries. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’17 (pp. 845–848).Google Scholar
  16. Giuliano, C. (2009) Fine-grained classification of named entities exploiting latent semantic kernels. In Proceedings of the thirteenth conference on computational natural language learning, CoNLL ’09 (pp. 201–209).Google Scholar
  17. Hasibi, F., Balog, K., & Bratsberg, S. E. (2016). Exploiting entity linking in queries for entity retrieval. In Proceedings of the ACM SIGIR international conference on theory of information retrieval, ICTIR ’16 (pp. 209–218).Google Scholar
  18. Hasibi, F., Nikolaev, F., Xiong, C., Balog, K., Bratsberg, S. E., Kotov, A., & Callan, J. (2017). DBpedia-entity v2: A test collection for entity search. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’17 (pp. 1265–1268).Google Scholar
  19. Hulpuş, I., Prangnawarat, N., & Hayes, C. (2015). Path-based semantic relatedness on linked data and its use to word and entity disambiguation. In Proceedings of the 14th international conference on the semantic web—Volume 9366, ISWC ’15 (pp. 442–457).Google Scholar
  20. Jain, P., Kumar, P., Mausam, & Chakrabarti, S. (2018). Type-sensitive knowledge base inference without explicit type supervision. In Proceedings of the 56th annual meeting of the association for computational linguistics, ACL ’18 (pp. 75–80).Google Scholar
  21. Jämsen, J., Näppilä, T., & Arvola, P. (2008) Entity ranking based on category expansion. In Focused access to XML documents, 7th international workshop of the initiative for the evaluation of XML retrieval, INEX ’08 (pp. 264–278).Google Scholar
  22. Kaptein, R., & Kamps, J. (2009). Finding entities in Wikipedia using links and categories. In Advances in focused retrieval, 7th international workshop of the initiative for the evaluation of XML retrieval, INEX ’09 (pp. 273–279).Google Scholar
  23. Kaptein, R., & Kamps, J. (2013). Exploiting the category structure of Wikipedia for entity ranking. Artificial Intelligence, 194, 111–129.CrossRefGoogle Scholar
  24. Kaptein, R., Serdyukov, P., De Vries, A. P., & Kamps, J. (2010). Entity ranking using Wikipedia as a pivot. In Proceedings of the 19th ACM conference on information and knowledge management, CIKM ’10 (pp. 69–78).Google Scholar
  25. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., et al. (2015). DBpedia: A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 6(2), 167–195.Google Scholar
  26. Lin, T., Mausam, & Etzioni, O. (2012). No noun phrase left behind: Detecting and typing unlinkable entities. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, EMNLP-CoNLL ’12 (pp. 893–903).Google Scholar
  27. Ling, X., & Weld, D. S. (2012). Fine-grained entity recognition. In Proceedings of the thirty-second AAAI conference on artificial intelligence, AAAI ’12 (pp. 94–100).Google Scholar
  28. Lopez, V., Unger, C., Cimiano, P., & Motta, E. (2013). Evaluating question answering over linked data. Web Semantics: Science, Services and Agents on the World Wide Web, 21, 3–13.CrossRefGoogle Scholar
  29. Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’05 (pp. 472–479). ACM.Google Scholar
  30. Mika, P. (2013). Entity search on the web. In Proceedings of the 22nd international world wide web conference, WWW ’13 (pp. 1231–1232).Google Scholar
  31. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems 26: Proceedings of the 27th annual conference on neural information processing systems, NIPS ’13 (pp. 3111–3119).Google Scholar
  32. Nakashole, N., Tylenda, T., & Weikum, G. (2013). Fine-grained semantic typing of emerging entities. In Proceedings of the 51st annual meeting of the association for computational linguistics, volume 1: Long papers, ACL ’13 (pp. 1488–1497).Google Scholar
  33. Neumayer, R., Balog, K., & Nørvåg, K. (2012). On the modeling of entities for ad-hoc entity search in the web of data. In Advances in information retrieval: Proceedings of the 34th European conference on IR research, ECIR ’12 (pp. 133–145).Google Scholar
  34. Nuzzolese, A. G., Gangemi, A., Presutti, V., & Ciancarini, P. (2012). Type inference through the analysis of Wikipedia links. In Proceedings of workshop on linked data on the web (LDOW), WWW ’12.Google Scholar
  35. Pehcevski, J., Thom, J. A., Vercoustre, A. M., & Naumovski, V. (2010). Entity ranking in Wikipedia: Utilising categories, links and topic difficulty prediction. Information Retrieval, 13(5), 568–600.CrossRefGoogle Scholar
  36. Pound, J., Mika, P., & Zaragoza, H. (2010). Ad-hoc object retrieval in the web of data. In Proceedings of the 19th international world wide web conference, WWW ’10 (pp. 771–780).Google Scholar
  37. Rahman, A., & Ng, V. (2010). Inducing fine-grained semantic classes via hierarchical and collective classification. In Proceedings of the 23rd international conference on computational linguistics, COLING ’10 (pp. 931–939).Google Scholar
  38. Raviv, H., Carmel, D., & Kurland, O. (2012). A ranking framework for entity oriented search using Markov random fields. In Proceedings of the 1st joint international workshop on entity-oriented and semantic search, JIWES ’12 (pp. 1:1–1:6).Google Scholar
  39. Sawant, U., & Chakrabarti, S. (2013). Learning joint query interpretation and response ranking. In Proceedings of the 22nd international world wide web conference, WWW ’13 (pp. 1099–1109).Google Scholar
  40. Subramanian, S., & Chakrabarti, S. (2018). New embedded representations and evaluation protocols for inferring transitive relations. In The 41st international ACM SIGIR conference on research & development in information retrieval, SIGIR ’18 (pp. 1037–1040).Google Scholar
  41. Suchanek, F. M., Kasneci, G., & Weikum, G. (2007). YAGO: A core of semantic knowledge. In Proceedings of the 16th international world wide web conference, WWW ’07 (pp. 697–706).Google Scholar
  42. Tonon, A., Catasta, M., Demartini, G., Cudré-Mauroux, P., & Aberer, K. (2013). TRank: Ranking entity types using the web of data. In Proceedings of the semantic web: 11th international semantic web conference, Part I, ISWC ’13 (pp. 640–656).Google Scholar
  43. Tonon, A., Catasta, M., Prokofyev, R., Demartini, G., Aberer, K., & Cudré-Mauroux, P. (2016). Contextualized ranking of entity types based on knowledge graphs. Web Semantics: Science, Services and Agents on the World Wide Web, 37–38, 170–183.CrossRefGoogle Scholar
  44. Vallet, D., & Zaragoza, H. (2008). Inferring the most important types of a query: A semantic approach. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 857–858).Google Scholar
  45. Vercoustre, A. M., Pehcevski, J., & Thom, J. A. (2008). Using Wikipedia categories and links in entity ranking. In Focused access to XML documents, 6th international workshop of the initiative for the evaluation of XML retrieval, INEX ’07 (pp. 321–335).Google Scholar
  46. Wang, Q., Mao, Z., Wang, B., & Guo, L. (2017). Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, 29(12), 2724–2743.CrossRefGoogle Scholar
  47. Weerkamp, W., Balog, K., & Meij, E. J. (2009). A generative language modeling approach for ranking entities. In Advances in focused retrieval, 7th international workshop of the initiative for the evaluation of XML retrieval, INEX ’09 (pp. 292–299).Google Scholar
  48. Yosef, M. A., Bauer, S., Spaniol, J. H. M., & Weikum, G. (2012). HYENA: Hierarchical type classification for entity names. In Proceedings of the 25th international conference on computational linguistics, COLING ’12 (pp. 1361–1370).Google Scholar
  49. Zhang, S., & Balog, K. (2017). Design patterns for fusion-based object retrieval. In Advances in information retrieval: Proceedings of the 39th European conference on IR research, ECIR ’17 (pp. 684–690).Google Scholar
  50. Zhu, J., Song, D., & Rüger, S. (2008). Integrating document features for entity ranking. In Focused access to XML documents, 7th international workshop of the initiative for the evaluation of XML retrieval, INEX ’08 (pp 336–347).Google Scholar

Copyright information

© Springer Nature B.V. 2018

Authors and Affiliations

  1. 1.University of StavangerStavangerNorway
  2. 2.Norwegian University of Science and TechnologyTrondheimNorway

Personalised recommendations