Information Retrieval Journal

, Volume 18, Issue 6, pp 473–503 | Cite as

Latent entity space: a novel retrieval approach for entity-bearing queries

  • Xitong Liu
  • Hui Fang


Analysis on Web search query logs has revealed that there is a large portion of entity-bearing queries, reflecting the increasing demand of users on retrieving relevant information about entities such as persons, organizations, products, etc. In the meantime, significant progress has been made in Web-scale information extraction, which enables efficient entity extraction from free text. Since an entity is expected to capture the semantic content of documents and queries more accurately than a term, it would be interesting to study whether leveraging the information about entities can improve the retrieval accuracy for entity-bearing queries. In this paper, we propose a novel retrieval approach, i.e., latent entity space (LES), which models the relevance by leveraging entity profiles to represent semantic content of documents and queries. In the LES, each entity corresponds to one dimension, representing one semantic relevance aspect. We propose a formal probabilistic framework to model the relevance in the high-dimensional entity space. Experimental results over TREC collections show that the proposed LES approach is effective in capturing latent semantic content and can significantly improve the search accuracy of several state-of-the-art retrieval models for entity-bearing queries.


Latent entity space Entity profile Document retrieval 



This material is based upon work supported by the National Science Foundation under Grant Number IIS-1423002. We thank the anonymous reviewers for their useful comments.


  1. Balog, K., Azzopardi, L., & De Rijke, M. (2006). Formal models for expert finding in enterprise corpora. In SIGIR (pp. 43–50).Google Scholar
  2. Balog, K., de Vries, A. P., Serdyukov, P., Thomas, P., & Westerveld, T. (2010). Overview of the TREC 2009 entity track. In Proceedings of TREC.Google Scholar
  3. Balog, K., Serdyukov, P., & de Vries, A. P. (2011). Overview of the TREC 2010 entity track. In Proceedings of TREC.Google Scholar
  4. Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., & Etzioni, O. (2007). Open information extraction from the Web. IJCAI, 7, 2670–2676.Google Scholar
  5. Bendersky, M., & Croft, W. B. (2008). Discovering key concepts in verbose queries. In SIGIR (pp. 491–498).Google Scholar
  6. Billerbeck, B., & Zobel, J. (2004). Questioning query expansion: An examination of behaviour and parameters. In Proceedings of the 15th Australasian database conference-Volume 27 (pp. 69–76). Australian Computer Society Inc.Google Scholar
  7. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of machine Learning research, 3, 993–1022.zbMATHGoogle Scholar
  8. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: A collaboratively created graph database for structuring human knowledge. In SIGMOD (pp. 1247–1250).Google Scholar
  9. Cafarella, M. J., Madhavan, J., & Halevy, A. (2009). Web-scale extraction of structured data. ACM SIGMOD Record, 37(4), 55–61.CrossRefGoogle Scholar
  10. Clarke, C. L. A., Craswell, N., & Soboroff, I. (2009). Overview of the TREC 2009 Web track. In TREC.Google Scholar
  11. Clarke, C. L. A., Craswell, N., Soboroff, I., & Cormack, G. (2010). Overview of the TREC 2010 Web track. In TREC.Google Scholar
  12. Clarke, C. L. A., Craswell, N., Soboroff, I., & Voorhees, E. (2011). Overview of the TREC 2011 Web track. In TREC.Google Scholar
  13. Clarke, C. L. A., Craswell, N., & Voorhees, E. (2012). Overview of the TREC 2012 Web track. In TREC.Google Scholar
  14. Collins-Thompson, K., Bennett, P., Diaz, F., Clarke, C. L. A., & Voorhees, E. M. (2013). TREC 2013 Web track overview. In TREC.Google Scholar
  15. Collins-Thompson, K., Macdonald, C., Bennett, P., Diaz, F., & Voorhees, E. M. (2014). TREC 2014 Web track overview. In TREC.Google Scholar
  16. Cormack, G., Smucker, M., & Clarke, C. (2011). Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval, 14(5), 441–465.CrossRefGoogle Scholar
  17. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297.zbMATHGoogle Scholar
  18. Craswell, N., de Vries, A. P., & Soboroff, I. (2005). Overview of the TREC 2005 enterprise track. In Proceedings of TREC.Google Scholar
  19. Cucerzan, S. (2007). Large-scale named entity disambiguation based on Wikipedia data. In EMNLP-CoNLL, 7, 708–716.Google Scholar
  20. Dalton, J., Dietz, L., & Allan, J. (2014). Entity query feature expansion using knowledge base links. In SIGIR (pp. 365–374).Google Scholar
  21. Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. JASIS, 41(6), 391–407.CrossRefGoogle Scholar
  22. Demartini, G. (2011). From people to entities: Typed search in the enterprise and the web. PhD thesis, Leibniz University of Hannover, Germany.Google Scholar
  23. Demartini, G., de Vries, A., Iofciu, T., & Zhu, J. (2009). Overview of the INEX 2008 entity ranking track. In Focused retrieval and evaluation (pp. 243–252).Google Scholar
  24. Demartini, G., Gaugaz, J., & Nejdl, W. (2009) A vector space model for ranking entities and its application to expert search. In ECIR (pp. 189–201).Google Scholar
  25. Egozi, O., Markovitch, S., & Gabrilovich, E. (2011). Concept-based information retrieval using explicit semantic analysis. ACM Transactions on Information Systems (TOIS), 29(2), 8.CrossRefGoogle Scholar
  26. Elsas, J. L., Arguello, J., Callan, J., & Carbonell, J. G. (2008). Retrieval and feedback models for blog feed search. In SIGIR (pp. 347–354).Google Scholar
  27. Fang, H., Zhai, C. (2007). Probabilistic models for expert finding. In ECIR (pp. 418–430).Google Scholar
  28. Frank, J. R., Kleiman-Weiner, M., Roberts, D. A., Niu, F., Zhang, C., Ré, C., & Soboroff, I. (2012). Building an entity-centric stream filtering test collection for TREC 2012. In Proceedings of TREC.Google Scholar
  29. Gabrilovich, E., & Markovitch, S. (2009). Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research, 34(2), 443.zbMATHGoogle Scholar
  30. Gabrilovich, E., Ringgaard, M., & Subramanya, A. (2013). FACC1: Freebase annotation of ClueWeb corpora, Version 1 (Release date 2013-06-26, Format version 1, Correction level 0)., June 2013.
  31. Grootjen, F. A., & Van Der Weide, T. P. (2006). Conceptual query expansion. Data & Knowledge Engineering, 56(2), 174–193.CrossRefGoogle Scholar
  32. He, B., & Ounis, I. (2006). Query performance prediction. Information Systems, 31(7), 585–594.CrossRefGoogle Scholar
  33. Lafferty, J., & Zhai, C. (2003). Probabilistic relevance models based on document and query generation. Language Modeling and Information Retrieval, Kluwer International Series on Information Retrieval.Google Scholar
  34. Lavrenko, V., & Croft, W.B. (2001). Relevance-based language models. In SIGIR (pp. 120–127).Google Scholar
  35. Lin, T., Pantel, P., Gamon, M., Kannan, A., & Fuxman, A. (2012). Active objects: Actions for entity-centric search. In WWW (pp. 589–598).Google Scholar
  36. Liu, X., Chen, F., Fang, H., & Wang, M. (2014a). Exploiting entity relationship for query expansion in enterprise search. Information Retrieval, 17(3), 265–294.CrossRefGoogle Scholar
  37. Liu, X., Yang, P., & Fang, H. (2014b). Entity came to rescue - leveraging entities to minimize risks in web search. In TREC.Google Scholar
  38. Macdonald, C., & Ounis, I. (2006). Voting for candidates: Adapting data fusion techniques for an expert search task. In CIKM (pp. 387–396).Google Scholar
  39. Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In SIGIR (pp. 472–479).Google Scholar
  40. Metzler, D., & Croft, W. B. (2007). Latent concept expansion using Markov random fields. In SIGIR (pp. 311–318).Google Scholar
  41. Milne, D. N., Witten, I. H., & Nichols, D. M. (2007). A knowledge-based search engine powered by Wikipedia. In CIKM (pp. 445–454).Google Scholar
  42. Petkova, D., & Croft, W. B. (2007). Proximity-based document representation for named entity retrieval. In CIKM (pp. 731–740).Google Scholar
  43. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In SIGIR (pp. 275–281).Google Scholar
  44. Pound, J., Mika, P., & Zaragoza, H. (2010). Ad-hoc object retrieval in the web of data. In WWW (pp. 771–780).Google Scholar
  45. Robertson, S. E., & Walker, S. (1994) Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR (pp. 232–241).Google Scholar
  46. Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.zbMATHCrossRefGoogle Scholar
  47. Soboroff, I., de Vries, A. P., Craswell, N. (2006). Overview of the TREC 2006 enterprise track. In Proceedings of TREC.Google Scholar
  48. Styltsvig, H. B. (2006). Ontology-based information retrieval. PhD thesis, Roskilde University, Denmark.Google Scholar
  49. Vallet, D., Fernández, M., & Castells, P. (2005) An ontology-based information retrieval model. In The Semantic Web: Research and Applications (pp. 455–470). Springer: Berlin.Google Scholar
  50. Wang, L., Bennett, P. N., & Collins-Thompson, K. (2012). Robust ranking models via risk-sensitive optimization. In SIGIR (pp. 761–770).Google Scholar
  51. Wei, X., & Croft, W. B. (2006). LDA-Based document models for Ad-hoc retrieval. In SIGIR (pp. 178–185).Google Scholar
  52. Xu, Y., Jones, G. J., & Wang, B. (2009). Query dependent pseudo-relevance feedback based on Wikipedia. In SIGIR (pp. 59–66).Google Scholar
  53. Yang, P., & Fang, H. (2013). Evaluating the effectiveness of axiomatic approaches in web track. In TREC.Google Scholar
  54. Zhai, C., & Lafferty, J. (2001a). A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR (pp. 334–342).Google Scholar
  55. Zhai, C., & Lafferty, J. (2001b). Model-based feedback in the language modeling approach to information retrieval. In CIKM (pp. 403–410).Google Scholar
  56. Zhou, Y., & Croft, W. B. (2007). Query performance prediction in web search environments. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 543–550). ACM.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringUniversity of DelawareNewarkUSA

Personalised recommendations