Information Retrieval

, Volume 17, Issue 3, pp 265–294 | Cite as

Exploiting entity relationship for query expansion in enterprise search

  • Xitong Liu
  • Fei Chen
  • Hui Fang
  • Min Wang


Enterprise search is important, and the search quality has a direct impact on the productivity of an enterprise. Enterprise data contain both structured and unstructured information. Since these two types of information are complementary and the structured information such as relational databases is designed based on ER (entity-relationship) models, there is a rich body of information about entities in enterprise data. As a result, many information needs of enterprise search center around entities. For example, a user may formulate a query describing a problem that she encounters with an entity, e.g., the web browser, and want to retrieve relevant documents to solve the problem. Intuitively, information related to the entities mentioned in the query, such as related entities and their relations, would be useful to reformulate the query and improve the retrieval performance. However, most existing studies on query expansion are term-centric. In this paper, we propose a novel entity-centric query expansion framework for enterprise search. Specifically, given a query containing entities, we first utilize both unstructured and structured information to find entities that are related to the ones in the query. We then discuss how to adapt existing feedback methods to use the related entities and their relations to improve search quality. Experimental results over two real-world enterprise collections show that the proposed entity-centric query expansion strategies are more effective and robust to improve the search performance than the state-of-the-art pseudo feedback methods for long natural language-like queries with entities. Moreover, results over a TREC ad hoc retrieval collections show that the proposed methods can also work well for short keyword queries in the general search domain.


Entity centric Enterprise search Retrieval Query expansion Combining structured and unstructured data 



This material is based upon work supported by the HP Labs Innovation Research Program. We thank reviewers for their useful comments.


  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). DBpedia: A nucleus for a web of open data. In Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (Eds.), The semantic web, volume 4825 of lecture notes in computer science (pp. 722–735). Berlin: Springer.Google Scholar
  2. Bailey, P., Craswell, N., de Vries, A. P., & Soboroff, I. (2007). Overview of the TREC 2007 enterprise track. In: Proceedings of TREC’07.Google Scholar
  3. Bailey, P., Hawking, D., & Matson, B. (2006). Secure search in enterprise webs: Tradeoffs in efficient implementation for document level security. In CIKM (pp. 493–502).Google Scholar
  4. Balog, K. (2007). People search in the enterprise. In SIGIR (pp. 916–916).Google Scholar
  5. Balog, K., Azzopardi, L., & de Rijke, M. (2006). Formal models for expert finding in enterprise corpora. In SIGIR (pp. 43–50).Google Scholar
  6. Balog, K., & de Rijke, M. (2008). Non-local evidence for expert finding. In CIKM (pp. 489–498).Google Scholar
  7. Balog, K., de Vries, A. P., Serdyukov, P., Thomas, P., & Westerveld, T. (2010). Overview of the TREC 2009 entity track. In Proceedings of TREC.Google Scholar
  8. Balog, K., Serdyukov, P., & de Vries, A. P. (2011). Overview of the TREC 2010 entity track. In Proceedings of TREC.Google Scholar
  9. Balog, K., Soboroff, I., Thomas, P., Bailey, P., Craswell, N., & de Vries, A. P. (2008) Overview of the TREC 2008 enterprise track. In Proceedings of TREC’08.Google Scholar
  10. Bendersky, M., & Croft, W. B. (2012). Modeling higher-order term dependencies in information retrieval using query hypergraphs. In SIGIR (pp. 941–950).Google Scholar
  11. Bendersky, M., Metzler, D., & Croft, W. B. (2010). Learning concept importance using a weighted dependence model. In Proceedings of the third ACM international conference on web search and data mining, WSDM ’10 (pp. 31–40).Google Scholar
  12. Bendersky, M., Metzler, D., & Croft, W. B. (2011). Parameterized concept weighting in verbose queries. In SIGIR (pp. 605–614).Google Scholar
  13. Brunnert, J., Alonso, O., & Riehle, D. (2007). Enterprise people and skill discovery using tolerant retrieval and visualization. In ECIR (pp. 674–677).Google Scholar
  14. Cao, G., Nie, J.-Y., Gao, J., & Robertson, S. (2008). Selecting good expansion terms for pseudo-relevance feedback. In SIGIR (pp. 243–250).Google Scholar
  15. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., H. E. R. Jr., & Mitchell T. M. (2010). Toward an architecture for never-ending language learning. In AAAI.Google Scholar
  16. Coffman, J., & Weaver, A. (2013). An empirical performance evaluation of relational keyword search techniques. Knowledge and Data Engineering, IEEE Transactions on PP(99), pp. 1–1.Google Scholar
  17. Cohen, W. W., Ravikumar, P., & Fienberg, S. E. (2003). A comparison of string distance metrics for name-matching tasks. In IJCAI (pp. 73–78).Google Scholar
  18. Craswell, N., de Vries, A. P., & Soboroff, I. (2005). Overview of the TREC 2005 enterprise track. In Proceedings of TREC’05.Google Scholar
  19. Şah, M., & Wade, V. (2010). Automatic metadata extraction from multilingual enterprise content. In CIKM, (pp. 1665–1668).Google Scholar
  20. Demartini, G., de Vries, A., Iofciu, T., & Zhu, J. (2009). Overview of the INEX 2008 entity ranking track. In Focused Retrieval and Evaluation (pp. 243–252).Google Scholar
  21. Demartini, G., Iofciu, T., & de Vries, A. (2010). Overview of the INEX 2009 entity ranking track. In Focused Retrieval and Evaluation (pp. 254–264).Google Scholar
  22. Doan, A., Ramakrishnan, L. G. R., & Vaithyanathan, S. (2009). Introduction to the special issue on managing information extraction. SIGMOD Record, 37(4).Google Scholar
  23. Fang, H., & Zhai, C. (2006). Semantic term matching in axiomatic approaches to information retrieval. In SIGIR (pp. 115–122).Google Scholar
  24. Feldman, S., & Sherman, C. (2003). The high cost of not finding information. In Technical Report No. 29127. IDC.Google Scholar
  25. Freund, L., & Toms, E. G. (2006). Enterprise search behaviour of software engineers. In SIGIR (pp. 645–646).Google Scholar
  26. Garcia-Molina, H., Ullman, J., & Widom, J. (2008). Database systems: the complete book. Upper Saddle River, NJ: Prentice-Hall.Google Scholar
  27. Hawking, D. (2004). Challenges in enterprise search. In Proceedings of ADC’04 (pp. 15–24).Google Scholar
  28. Hearst, M. A. (2011). ’Natural’ search user interfaces. Communications of the ACM 54(11), 60–67.CrossRefGoogle Scholar
  29. Kolla, M., & Vechtomova, O. (2007). Retrieval of discussions from enterprise mailing lists. In SIGIR (pp. 881–882).Google Scholar
  30. Lafferty, J., & Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In SIGIR (pp. 111–119).Google Scholar
  31. Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML (pp. 282–289).Google Scholar
  32. Lavrenko, V., & Croft, W. B. (2001). Relevance-based language models. In SIGIR (pp. 120–127).Google Scholar
  33. Lin, T., Pantel, P., Gamon, M., Kannan, A., & Fuxman, A. (2012). Active objects: Actions for entity-centric search. In WWW (pp. 589–598).Google Scholar
  34. Liu, X., Fang, H., Yao, C.-L., & Wang, M. (2011). Finding relevant information of certain types from enterprise data. In CIKM (pp. 47–56).Google Scholar
  35. Lv, Y., & Zhai, C. (2009). A comparative study of methods for estimating query language models with pseudo feedback. In SIGIR (pp. 1895–1898).Google Scholar
  36. Lv, Y., & Zhai, C. (2010). Positional relevance model for pseudo-relevance feedback. In SIGIR (pp. 579–586).Google Scholar
  37. Macdonald, C., & Ounis, I. (2006). Combining fields in known-item email search. In SIGIR (pp. 675–676).Google Scholar
  38. Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In SIGIR (pp. 472–479).Google Scholar
  39. Metzler, D., Croft, & W. B. (2007). Latent concept expansion using Markov random fields. In SIGIR (pp. 311–318).Google Scholar
  40. Mihalcea, R., & Csomai, A. (2007). Wikify! Linking documents to encyclopedic knowledge. In Proceedings of CIKM (pp. 233–242).Google Scholar
  41. Miller, D. R. H., Leek, T., & Schwartz, R. M. (1999). A hidden Markov model information retrieval system. In SIGIR (pp. 214–221).Google Scholar
  42. Ponte, J. M., & Croft, W. B. (1998.) A language modeling approach to information retrieval. In SIGIR (pp. 275–281).Google Scholar
  43. Rizzolo, N., & Roth, D. (2010). Learning based Java for rapid development of NLP systems. In LREC, 5.Google Scholar
  44. Rocchio, J. (1971). Relevance feedback in information retrieval. In: Salton G. (Eds.) The SMART retrieval system: Experiments in automatic document processing, Prentice-Hall Series in Automatic Computation, chapter 14 (pp. 313–323). Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
  45. Sarawagi, S. (2008). Information extraction. Foundations and Trends in Databases 1(3), 261–377.CrossRefGoogle Scholar
  46. Serdyukov, P., Rode, H., & Hiemstra, D. (2008). Modeling multi-step relevance propagation for expert finding. In CIKM (pp. 1133–1142).Google Scholar
  47. Shen, W., Wang, J., Luo, P., Wang, M. (2012). LINDEN: Linking named entities with knowledge base via semantic knowledge. In Proceedings of the 21st international conference on world wide web, WWW ’12 (pp. 449–458).Google Scholar
  48. Soboroff, I., de Vries, A. P., & Craswell, N. (2006). Overview of the TREC 2006 enterprise track. In Proceedings of TREC’06.Google Scholar
  49. Suchanek, F. M., Kasneci, G., & Weikum, G. (2007). YAGO: A core of semantic knowledge unifying WordNet and wikipedia. In WWW (pp. 697–706).Google Scholar
  50. Tan, B., Velivelli, A., Fang, H., & Zhai, C. (2007). Term feedback for information retrieval with language models. In SIGIR (pp. 263–270).Google Scholar
  51. Tao, T., & Zhai, C. (2006). Regularized estimation of mixture models for robust pseudo-relevance feedback. In SIGIR (pp. 162–169).Google Scholar
  52. Voorhees, E. M., & Harman, D. K. (2005). TREC: Experiment and Evaluation in Information Retrieval. Cambridge: The MIT Press.Google Scholar
  53. Wang, L., Bennett, P. N., & Collins-Thompson, K. (2012). Robust ranking models via risk-sensitive optimization. In SIGIR (pp. 761–770).Google Scholar
  54. Weerkamp, W., Balog, K., & de Rijke, M. (2012). Exploiting external collections for query expansion. ACM Transactions on the Web, 6(4).Google Scholar
  55. Weerkamp, W., Balog, K., & Meij, E. (2009). A generative language modeling approach for ranking entities. In Focused Retrieval and Evaluation (pp. 292–299).Google Scholar
  56. Xu, J., & Croft, W. B. (1996). Query expansion using local and global document analysis. In SIGIR (pp. 4–11).Google Scholar
  57. Zelenko, D., Aone, C., & Richardella, A. (2003). Kernel methods for relation extraction. The Journal of Machine Learning Research 3, 1083–1106.zbMATHMathSciNetGoogle Scholar
  58. Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to Ad Hoc information retrieval. In SIGIR (pp. 334–342).Google Scholar
  59. Zhai, C., & Lafferty, J. (2001). Model-based feedback in the language modeling approach to information retrieval. In CIKM.Google Scholar
  60. Zhu, J., Nie, Z., Liu, X., Zhang, B., & Wen, J.-R. (2009). StatSnowball: A statistical approach to extracting entity relationships. In WWW (pp. 101–110).Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringUniversity of DelawareNewarkUSA
  2. 2.HP LabsPalo AltoUSA
  3. 3.Google ResearchMountain ViewUSA

Personalised recommendations