Category-Based Query Modeling for Entity Search

  • Krisztian Balog
  • Marc Bron
  • Maarten de Rijke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5993)


Users often search for entities instead of documents and in this setting are willing to provide extra input, in addition to a query, such as category information and example entities. We propose a general probabilistic framework for entity search to evaluate and provide insight in the many ways of using these types of input for query modeling. We focus on the use of category information and show the advantage of a category-based representation over a term-based representation, and also demonstrate the effectiveness of category-based expansion using example entities. Our best performing model shows very competitive performance on the INEX-XER entity ranking and list completion tasks.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Balog, K.: People Search in the Enterprise. PhD thesis, University of Amsterdam (2008)Google Scholar
  2. 2.
    Balog, K., Azzopardi, L., de Rijke, M.: Formal models for expert finding in enterprise corpora. In: SIGIR 2006, pp. 43–50 (2006)Google Scholar
  3. 3.
    Balog, K., Weerkamp, W., de Rijke, M.: A few examples go a long way. In: SIGIR 2008, pp. 371–378 (2008)Google Scholar
  4. 4.
    Balog, K., Soboroff, I., Thomas, P., Craswell, N., de Vries, A.P., Bailey, P.: Overview of the TREC 2008 enterprise track. In: TREC 2008, NIST (2009)Google Scholar
  5. 5.
    Chu-Carroll, J., Czuba, K., Prager, J., Ittycheriah, A., Blair-Goldensohn, S.: IBM’s PIQUANT II in TREC 2004. In: Proceedings TREC 2004 (2004)Google Scholar
  6. 6.
    Conrad, J., Utt, M.: A system for discovering relationships by feature extraction from text databases. In: SIGIR 1994, pp. 260–270 (1994)Google Scholar
  7. 7.
    Craswell, N., Demartini, G., Gaugaz, J., Iofciu, T.: L3S at INEX2008: retrieving entities using structured information. In: Geva, et al. (eds.) [12], pp. 253–263Google Scholar
  8. 8.
    de Vries, A., Vercoustre, A.-M., Thom, J.A., Craswell, N., Lalmas, M.: Overview of the INEX 2007 entity ranking track. In: Fuhr, et al. (eds.) [11], pp. 245–251Google Scholar
  9. 9.
    Demartini, G., de Vries, A., Iofciu, T., Zhu, J.: Overview of the INEX 2008 entity ranking track. In: Geva, et al. (eds.) [12], pp. 243–252Google Scholar
  10. 10.
    Fissaha Adafre, S., de Rijke, M., Tjong Kim Sang, E.: Entity retrieval. In: Recent Advances in Natural Language Processing (RANLP 2007) (September 2007)Google Scholar
  11. 11.
    Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.): INEX 2007. LNCS, vol. 4862. Springer, Heidelberg (2008)Google Scholar
  12. 12.
    Geva, S., Kamps, J., Trotman, A. (eds.): INEX 2008. LNCS, vol. 5631. Springer, Heidelberg (2009)Google Scholar
  13. 13.
    Ghahramani, Z., Heller, K.A.: Bayesian sets. In: NIPS 2005 (2005)Google Scholar
  14. 14.
    GoogleSets (2009), (accessed January 2009)
  15. 15.
    Jämsen, J., Näppilä, T., Arvola, P.: Entity ranking based on category expansion. In: Fuhr, et al. (eds.) [11], pp. 264–278Google Scholar
  16. 16.
    Jiang, J., Liu, W., Rong, X., Gao, Y.: Adapting language modeling methods for expert search to rank wikipedia entities. In: Geva, et al. (eds.) [12], pp. 264–272Google Scholar
  17. 17.
    Kaptein, R., Kamps, J.: Finding entities in wikipedia using links and categories. In: Geva, et al. (eds.) [12], pp. 273–279Google Scholar
  18. 18.
    Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: SIGIR 2001, pp. 111–119 (2001)Google Scholar
  19. 19.
    Losada, D., Azzopardi, L.: An analysis on document length retrieval trends in language modeling smoothing. Information Retrieval 11(2), 109–138 (2008)CrossRefGoogle Scholar
  20. 20.
    Mishne, G., de Rijke, M.: A study of blog search. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 289–301. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  21. 21.
    Raghavan, H., Allan, J., Mccallum, A.: An exploration of entity models, collective classification and relation description. In: Link KDD 2004 (2004)Google Scholar
  22. 22.
    Rose, D.E., Levinson, D.: Understanding user goals in web search. In: WWW 2004, pp. 13–19 (2004)Google Scholar
  23. 23.
    Sayyadian, M., Shakery, A., Doan, A., Zhai, C.: Toward entity retrieval over structured and text data. In: WIRD 2004 (2004)Google Scholar
  24. 24.
    Song, F., Croft, W.B.: A general language model for information retrieval. In: CIKM 1999, pp. 316–321 (1999)Google Scholar
  25. 25.
    Tsikrika, T., Serdyukov, P., Rode, H., Westerveld, T., Aly, R., Hiemstra, D., de Vries, A.P.: Structured document retrieval, multimedia retrieval, and entity ranking using PF/Tijah. In: Fuhr, et al. (eds.) [11], pp. 306–320Google Scholar
  26. 26.
    Vercoustre, A.-M., Pehcevski, J., Thom, J.A.: Using wikipedia categories and links in entity ranking. In: Fuhr, et al. (eds.) [11], pp. 321–335Google Scholar
  27. 27.
    Vercoustre, A.-M., Thom, J.A., Pehcevski, J.: Entity ranking in wikipedia. In: SAC 2008, pp. 1101–1106 (2008)Google Scholar
  28. 28.
    Vercoustre, A.-M., Pehcevski, J., Naumovski, V.: Topic difficulty prediction in entity ranking. In: Geva, et al. (eds.) [12], pp. 280–291Google Scholar
  29. 29.
    Voorhees, E.: Overview of the TREC 2004 question answering track. In: Proceedings of TREC 2004 (2005) NIST Special Publication: SP 500–261Google Scholar
  30. 30.
    Weerkamp, W., He, J., Balog, K., Meij, E.: A generative language modeling approach for ranking entities. In: Geva, et al. (eds.) [12], pp. 292–299Google Scholar
  31. 31.
    Yilmaz, E., Kanoulas, E., Aslam, J.A.: A simple and efficient sampling method for estimating AP and NDCG. In: SIGIR 2008, pp. 603–610 (2008)Google Scholar
  32. 32.
    Zaragoza, H., Rode, H., Mika, P., Atserias, J., Ciaramita, M., Attardi, G.: Ranking very many typed entities on wikipedia. In: CIKM 2007, pp. 1015–1018 (2007)Google Scholar
  33. 33.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)CrossRefGoogle Scholar
  34. 34.
    Zhu, J., Song, D., Rüger, S.: Integrating document features for entity ranking. In: Fuhr, et al. (eds.) [11], pp. 336–347Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Krisztian Balog
    • 1
  • Marc Bron
    • 1
  • Maarten de Rijke
    • 1
  1. 1.ISLAUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations