Collection Ranking and Selection for Federated Entity Search

  • Krisztian Balog
  • Robert Neumayer
  • Kjetil Nørvåg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7608)


Entity search has emerged as an important research topic over the past years, but so far has only been addressed in a centralized setting. In this paper we present an attempt to solve the task of ad-hoc entity retrieval in a cooperative distributed environment. We propose a new collection ranking and selection method for entity search, called AENN. The key underlying idea is that a lean, name-based representation of entities can efficiently be stored at the central broker, which, therefore, does not have to rely on sampling. This representation can then be utilized for collection ranking and selection in a way that the number of collections selected and the number of results requested from each collection is dynamically adjusted on a per-query basis. Using a collection of structured datasets in RDF and a sample of real web search queries targeting entities, we demonstrate that our approach outperforms state-of-the-art distributed document retrieval methods in terms of both effectiveness and efficiency.


Mean Average Precision Relevance Judgment Mean Reciprocal Rank Relevant Entity Federate Entity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Balog, K., Soboroff, I., Thomas, P., Craswell, N., de Vries, A.P., Bailey, P.: Overview of the TREC 2008 enterprise track. In: TREC 2008. NIST (2009)Google Scholar
  2. 2.
    Balog, K., de Vries, A.P., Serdyukov, P., Thomas, P., Westerveld, T.: Overview of the TREC 2009 entity track. In: TREC 2009 (2010)Google Scholar
  3. 3.
    Blanco, R., Halpin, H., Herzig, D., Mika, P., Pound, J., Thompson, H., Duc, T.: Entity search evaluation over structured web data. In: EOS 2011 (2011)Google Scholar
  4. 4.
    Callan, J.: Distributed information retrieval. In: Advances in Information Retrieval. Kluwer Academic Publishers (2000)Google Scholar
  5. 5.
    Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: SIGIR 1995. ACM (1995)Google Scholar
  6. 6.
    de Vries, A.P., Vercoustre, A.-M., Thom, J.A., Craswell, N., Lalmas, M.: Overview of the INEX 2007 Entity Ranking Track. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 245–251. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  7. 7.
    Gravano, L., Garcia-Molina, H.: Generalizing GlOSS to vector-space databases and broker hierarchies. In: VLDB 1995 (1995)Google Scholar
  8. 8.
    Haas, K., Mika, P., Tarjan, P., Blanco, R.: Enhanced results for web search. In: SIGIR 2011. ACM (2011)Google Scholar
  9. 9.
    Halpin, H., Herzig, D.M., Mika, P., Blanco, R., Pound, J., Thompson, H.S., Tran, D.T.: Evaluating ad-hoc object retrieval. In: IWEST 2010 (2010)Google Scholar
  10. 10.
    Pound, J., Mika, P., Zaragoza, H.: Ad-hoc object retrieval in the web of data. In: WWW 2010. ACM (2010)Google Scholar
  11. 11.
    Shokouhi, M.: Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 160–172. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Shokouhi, M., Si, L.: Federated search. Foundations and Trends in Information Retrieval 5, 1–102 (2011)CrossRefGoogle Scholar
  13. 13.
    Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: SIGIR 2003. ACM (2003)Google Scholar
  14. 14.
    Si, L., Jin, R., Callan, J., Ogilvie, P.: A language modeling framework for resource selection and results merging. In: CIKM 2002. ACM (2002)Google Scholar
  15. 15.
    Thomas, P., Shokouhi, M.: SUSHI: scoring scaled samples for server selection. In: SIGIR 2009. ACM (2009)Google Scholar
  16. 16.
    Voorhees, E.: Overview of the TREC 2004 question answering track. In: TREC 2004. NIST (2005)Google Scholar
  17. 17.
    Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: SIGIR 1999. ACM (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Krisztian Balog
    • 1
  • Robert Neumayer
    • 1
  • Kjetil Nørvåg
    • 1
  1. 1.Norwegian University of Science and TechnologyTrondheimNorway

Personalised recommendations