Exploring the Space of IR Functions

  • Parantapa Goswami
  • Simon Moura
  • Eric Gaussier
  • Massih-Reza Amini
  • Francis Maes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8416)

Abstract

In this paper we propose an approach to discover functions for IR ranking from a space of simple closed-form mathematical functions. In general, all IR ranking models are based on two basic variables, namely, term frequency and document frequency. Here a grammar for generating all possible functions is defined which consists of the two above said variables and basic mathematical operations - addition, subtraction, multiplication, division, logarithm, exponential and square root. The large set of functions generated by this grammar is filtered by checking mathematical feasibility and satisfiability to heuristic constraints on IR scoring functions proposed by the community. Obtained candidate functions are tested on various standard IR collections and several simple but highly efficient scoring functions are identified. We show that these newly discovered functions are outperforming other state-of-the-art IR scoring models through extensive experimentation on several IR collections. We also compare the performance of functions satisfying IR constraints to those which do not, and show that the former set of functions clearly outperforms the latter one.

Keywords

IR Theory Function Generation Automatic Discovery 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002)CrossRefGoogle Scholar
  2. 2.
    Clinchant, S., Gaussier, E.: Information-based models for ad hoc ir. In: Proceedings of the 33rd ACM SIGIR Conference (2010)Google Scholar
  3. 3.
    Cohen, W.W., Schapire, R.E., Singer, Y.: Learning to order things. Journal of Artificial Intelligence Research 10(1), 243–270 (1999)MathSciNetMATHGoogle Scholar
  4. 4.
    Crammer, K., Singer, Y.: Pranking with ranking. In: Advances in Neural Information Processing Systems (NIPS 14), pp. 641–647. MIT Press (2001)Google Scholar
  5. 5.
    Cummins, R., O’Riordan, C.: Evolved term-weighting schemes in information retrieval: an analysis of the solution space. Artif. Intell. Rev. 26(1-2), 35–47 (2006)CrossRefGoogle Scholar
  6. 6.
    Cummins, R., O’Riordan, C.: Evolving local and global weighting schemes in information retrieval. Inf. Retr. 9(3), 311–330 (2006)CrossRefGoogle Scholar
  7. 7.
    Cummins, R., O’Riordan, C.: Analysing Ranking Functions in Information Retrieval Using Constraints. In: Information Extraction from the Internet, CreateSpace Independent Publishing Platform (August 2009)Google Scholar
  8. 8.
    Cummins, R., O’Riordan, C.: Measuring constraint violations in information retrieval. In: Proceedings of the 32nd SIGIR, pp. 722–723 (2009)Google Scholar
  9. 9.
    Fan, W., Gordon, M.D., Pathak, P.: A generic ranking function discovery framework by genetic programming for information retrieval. Inf. Process. Manage. 40(4), 587–602 (2004)CrossRefMATHGoogle Scholar
  10. 10.
    Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: Proceedings of the 27th ACM SIGIR Conference (2004)Google Scholar
  11. 11.
    Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research (2003)Google Scholar
  12. 12.
    Gordon, M.: Probabilistic and genetic algorithms in document retrieval. Commun. ACM 31(10), 1208–1218 (1988)CrossRefGoogle Scholar
  13. 13.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the 8th ACM SIGKDD, pp. 133–142 (2002)Google Scholar
  14. 14.
    Maes, F., Wehenkel, L., Ernst, D.: Automatic discovery of ranking formulas for playing with multi-armed bandits. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 5–17. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  15. 15.
    Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: SIGIR, pp. 472–479 (2005)Google Scholar
  16. 16.
    Pathak, P., Gordon, M.D., Fan, W.: Effective information retrieval using genetic algorithms based matching functions adaptation. In: HICSS (2000)Google Scholar
  17. 17.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st ACM SIGIR Conference (1998)Google Scholar
  18. 18.
    Robertson, S.E., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval 3(4), 333–389 (2009)CrossRefGoogle Scholar
  19. 19.
    Salton, G., McGill, J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)MATHGoogle Scholar
  20. 20.
    Valizadegan, H., Jin, R., Zhang, R., Mao, J.: Learning to rank by optimizing ndcg measure. In: Advances in Neural Information Processing Systems (NIPS 22), pp. 1883–1891 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Parantapa Goswami
    • 1
  • Simon Moura
    • 1
  • Eric Gaussier
    • 1
  • Massih-Reza Amini
    • 1
  • Francis Maes
    • 2
  1. 1.CNRS - LIG/AMAUniversité Grenoble AlpsGrenobleFrance
  2. 2.D-LabsParisFrance

Personalised recommendations