Generating Pseudo Test Collections for Learning to Rank Scientific Articles

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7488)


Pseudo test collections are automatically generated to provide training material for learning to rank methods. We propose a method for generating pseudo test collections in the domain of digital libraries, where data is relatively sparse, but comes with rich annotations. Our intuition is that documents are annotated to make them better findable for certain information needs. We use these annotations and the associated documents as a source for pairs of queries and relevant documents. We investigate how learning to rank performance varies when we use different methods for sampling annotations, and show how our pseudo test collection ranks systems compared to editorial topics with editorial judgements. Our results demonstrate that it is possible to train a learning to rank algorithm on generated pseudo judgments. In some cases, performance is on par with learning on manually obtained ground truth.


Digital Library Query Expansion System Ranking Test Collection Annotation Dimension 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asadi, N., Metzler, D., Elsayed, T., Lin, J.: Pseudo test collections for learning web search ranking functions. In: SIGIR 2011, pp. 1073–1082. ACM (2011)Google Scholar
  2. 2.
    Azzopardi, L., de Rijke, M., Balog, K.: Building simulated queries for known-item topics: an analysis using six european languages. In: SIGIR 2007, pp. 455–462. ACM (2007)Google Scholar
  3. 3.
    Beitzel, S., Jensen, E., Chowdhury, A., Grossman, D.: Using titles and category names from editor-driven taxonomies for automatic evaluation. In: CIKM 2003, pp. 17–23. ACM (2003)Google Scholar
  4. 4.
    Cronen-Townsend, S., Croft, W.: Quantifying query ambiguity. In: HLT 2002, pp. 104–109. Morgan Kaufmann Publishers Inc. (2002)Google Scholar
  5. 5.
    Di Nunzio, G.M.: Working notes CLEF 2007, Appendix C, Results of the Domain Specific Track. In: Working notes CLEF 2007 (2007)Google Scholar
  6. 6.
    Di Nunzio, G.M.: Working notes CLEF 2008, Appendix D, Results of the Domain Specific Track. In: Working notes CLEF 2008 (2008)Google Scholar
  7. 7.
    Easley, D., Kleinberg, J.: Networks, crowds, and markets. Cambridge University Press (2010)Google Scholar
  8. 8.
    Huurnink, B., Hofmann, K., de Rijke, M.: Simulating searches from transaction logs. In: SIGIR 2010 Workshop on the Simulation of Interaction (2010)Google Scholar
  9. 9.
    Huurnink, B., Hofmann, K., de Rijke, M., Bron, M.: Validating Query Simulators: An Experiment Using Commercial Searches and Purchases. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds.) CLEF 2010. LNCS, vol. 6360, pp. 40–51. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Kim, J., Croft, W.B.: Retrieval experiments using pseudo-desktop collections. In: CIKM 2009, pp. 1297–1306. ACM (2009)Google Scholar
  11. 11.
    Kluck, M., Gey, F.C.: The Domain-Specific Task of CLEF - Specific Evaluation Strategies in Cross-Language Information Retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 48–56. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  12. 12.
    Kluck, M., Stempfhuber, M.: Domain-Specific Track CLEF 2005: Overview of Results and Approaches, Remarks on the Assessment Analysis. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 212–221. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Lavrenko, V., Croft, W.B.: Relevance based language models. In: SIGIR 2001, pp. 120–127. ACM (2001)Google Scholar
  14. 14.
    Liu, T.-Y.: Learning to Rank for Information Retrieval. Springer (2011) ISBN 978-3-642-14266-6Google Scholar
  15. 15.
    Manning, C.D., Schütze, H.: Foundations of statistical natural language processing. MIT Press (1999)Google Scholar
  16. 16.
    Meij, E., de Rijke, M.: The University of Amsterdam at the CLEF 2008 Domain Specific Track - parsimonious relevance and concept models. In: CLEF 2008 Working Notes (2008)Google Scholar
  17. 17.
    Petras, V.: How one word can make all the difference - using subject metadata for automatic query expansion and reformulation. In: Working notes CLEF 2005 (2005)Google Scholar
  18. 18.
    Petras, V.: The domain-specific track at CLEF 2008. In: Working notes CLEF 2008 (2008)Google Scholar
  19. 19.
    Sculley, D.: Combined regression and ranking. In: KDD 2010, pp. 979–988. ACM (2010)Google Scholar
  20. 20.
    Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient solver for SVM. In: 24th International Conference on Machine Learning, pp. 807–814. ACM (2007)Google Scholar
  21. 21.
    Smucker, M., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: CIKM 2007, pp. 623–632. ACM (2007)Google Scholar
  22. 22.
    Tague, J., Nelson, M.: Simulation of user judgments in bibliographic retrieval systems. In: SIGIR 1981, pp. 66–71 (1981)Google Scholar
  23. 23.
    Tague, J., Nelson, M., Wu, H.: Problems in the simulation of bibliographic retrieval systems. In: SIGIR 1980, pp. 236–255 (1980)Google Scholar
  24. 24.
    Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management 36(5), 697–716 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.ISLAUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations