Advertisement

Query Hardness Estimation Using Jensen-Shannon Divergence Among Multiple Scoring Functions

  • Javed A. Aslam
  • Virgil Pavlu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4425)

Abstract

We consider the issue of query performance, and we propose a novel method for automatically predicting the difficulty of a query. Unlike a number of existing techniques which are based on examining the ranked lists returned in response to perturbed versions of the query with respect to the given collection or perturbed versions of the collection with respect to the given query, our technique is based on examining the ranked lists returned by multiple scoring functions (retrieval engines) with respect to the given query and collection. In essence, we propose that the results returned by multiple retrieval engines will be relatively similar for “easy” queries but more diverse for “difficult” queries. By appropriately employing Jensen-Shannon divergence to measure the “diversity” of the returned results, we demonstrate a methodology for predicting query difficulty whose performance exceeds existing state-of-the-art techniques on TREC collections, often remarkably so.

Keywords

Information Retrieval Average Precision Query Expansion Query Performance Clarity Score 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Yom-Tov, E., et al.: Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In: SIGIR, pp. 512–519 (2005)Google Scholar
  2. 2.
    Yom-Tov, E., et al.: Metasearch and Federation using Query Difficulty Prediction. In: Predicting Query Difficulty - Methods and Applications (August 2005)Google Scholar
  3. 3.
    Carmel, D., et al.: What makes a query difficult? In: SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, Washington, USA, pp. 390–397. ACM Press, New York (2006), doi:10.1145/1148170.1148238CrossRefGoogle Scholar
  4. 4.
    Amati, G., Carpineto, C., Romano, G.: Query difficulty, robustness and selective application of query expansion. In: McDonald, S., Tait, J. (eds.) ECIR 2004. LNCS, vol. 2997, Springer, Heidelberg (2004)Google Scholar
  5. 5.
    Kwok, K.L.: An attempt to identify weakest and strongest queries. In: ACM SIGIR’05 Query Prediction Workshop, Salvador, Brazil, ACM Press, New York (2005)Google Scholar
  6. 6.
    Cronen-Townsend, S., Zhou, Y., Croft, W.: Predicting query performance. In: Proceedings of the ACM Conference on Research in Information Retrieval (SIGIR), ACM Press, New York (2002)Google Scholar
  7. 7.
    Zhou, Y., Croft, W.B.: Ranking robustness: A novel framework to predict query performance. Technical Report IR-532, Center for Intelligent Information Retrieval, University of Massachusetts, Amherst (2006), URL: http://maroo.cs.umass.edu/pub/web/674
  8. 8.
    Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Infor. Theory 37, 145–151 (1991)zbMATHCrossRefGoogle Scholar
  9. 9.
    Voorhees, E.M., Harman, D.: Overview of the Fifth Text REtrieval Conference (TREC-5). In: TREC (1996)Google Scholar
  10. 10.
    Voorhees, E.M., Harman, D.: Overview of the Sixth Text REtrieval Conference (TREC-6). In: TREC, pp. 1–24 (1997)Google Scholar
  11. 11.
    Voorhees, E.M., Harman, D.: Overview of the Seventh Text REtrieval Conference (TREC-7). In: Proceedings of the Seventh Text REtrieval Conference (TREC-7), pp. 1–24 (1999)Google Scholar
  12. 12.
    Voorhees, E.M., Harman, D.: Overview of the Eighth Text REtrieval Conference (TREC-8). In: Proceedings of the Eighth Text REtrieval Conference (TREC-8), pp. 1–24 (2000)Google Scholar
  13. 13.
    Voorhees, E.M.: The TREC robust retrieval track. SIGIR Forum 39(1), 11–20 (2005), doi:10.1145/1067268.1067272CrossRefGoogle Scholar
  14. 14.
    Clarke, C., Craswell, N., Soboroff, I.: The TREC terabyte retrieval track. In: Proceedings of the Fourteenth Text REtrieval Conference (TREC 2004) (2004)Google Scholar
  15. 15.
    A., C.L., Scholer, F., Clarke, I.S.: The TREC 2005 terabyte track. In: Proceedings of the Fourteenth Text REtrieval Conference (TREC 2005) (2005)Google Scholar
  16. 16.
    Macdonald, C., He, B., Ounis, I.: Predicting query performance in intranet search. In: ACM SIGIR’05 Query Prediction Workshop, Salvador, Brazil, ACM Press, New York (2005)Google Scholar
  17. 17.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, Chichester (1991)zbMATHGoogle Scholar
  18. 18.
    Aslam, J.A., Pavlu, V., Savell, R.: A unified model for metasearch, pooling, and system evaluation. In: Frieder, O., et al. (eds.) Proceedings of the Twelfth International Conference on Information and Knowledge Management, November 2003, pp. 484–491. ACM Press, New York (2003)CrossRefGoogle Scholar
  19. 19.
    Aslam, J.A., Pavlu, V., Yilmaz, E.: A statistical method for system evaluation using incomplete judgments. In: Dumais, S., et al. (eds.) Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 2006, pp. 541–548. ACM Press, New York (2006)CrossRefGoogle Scholar
  20. 20.
    Aslam, J.A., Pavlu, V., Yilmaz, E.: Measure-based metasearch. In: Marchionini, G., et al. (eds.) Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 2005, pp. 571–572. ACM Press, New York (2005)CrossRefGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Javed A. Aslam
    • 1
  • Virgil Pavlu
    • 1
  1. 1.Northeastern University 

Personalised recommendations