A Component-Level Analysis of an Academic Search Test Collection.

Part I: System and Collection Configurations
  • Florian Dietz
  • Vivien PetrasEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10456)


This study analyzes search performance in an academic search test collection. In a component-level evaluation setting, 3,276 configurations over 100 topics were tested involving variations in queries, documents and system components resulting in 327,600 data points. Additional analyses of the recall base and the semantic heterogeneity of queries and documents are presented in a parallel paper. The study finds that the structure of the documents and topics as well as IR components significantly impact the general performance, while more content in either documents or topics does not necessarily improve a search. While achieving overall performance improvements, the component-level analysis did not find a component that would identify or improve badly performing queries.


Academic search Component-level evaluation GIRT04 


  1. 1.
    Behnert, C., Lewandowski, D.: Ranking search results in library information systems - considering ranking approaches adapted from web search engines. J. Acad. Librariansh. 41(6), 725–735 (2015)CrossRefGoogle Scholar
  2. 2.
    Carmel, D., Yom-Tov, E.: Estimating the query difficulty for information retrieval. Synth. Lect. Inf. Concepts Retr. Serv. 2(1), 1–89 (2010)zbMATHGoogle Scholar
  3. 3.
    Chowdhury, G.: Introduction to Modern Information Retrieval. Facet, London (2010)Google Scholar
  4. 4.
    Cleverdon, C.: The Cranfield tests on index language devices. In: Aslib Proceedings, vol. 19, pp. 173–194. MCB UP Ltd. (1967)Google Scholar
  5. 5.
    De Loupy, C., Bellot, P.: Evaluation of document retrieval systems and query difficulty. In: LREC 2000, Athens, pp. 32–39 (2000)Google Scholar
  6. 6.
    Dietz, F., Petras, V.: A component-level analysis of an academic search test collection. Part II: query analysis. In: CLEF 2017 (2017). doi: 10.1007/978-3-319-65813-1_3
  7. 7.
    Ferro, N., Harman, D.: CLEF 2009: Grid@CLEF pilot track overview. In: Peters, C., Nunzio, G.M., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 552–565. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15754-7_68 CrossRefGoogle Scholar
  8. 8.
    Ferro, N., Silvello, G.: A general linear mixed models approach to study system component effects. In: SIGIR 2016, pp. 25–34. ACM (2016)Google Scholar
  9. 9.
    Grivolla, J., Jourlin, P., de Mori, R.: Automatic classification of queries by expected retrieval performance. In: Predicting Query Difficulty Workshop. SIGIR 2005 (2005)Google Scholar
  10. 10.
    Han, H., Jeong, W., Wolfram, D.: Log analysis of an academic digital library: user query patterns. In: iConference 2014. iSchools (2014)Google Scholar
  11. 11.
    Hanbury, A., Müller, H.: Automated component–level evaluation: present and future. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds.) CLEF 2010. LNCS, vol. 6360, pp. 124–135. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15998-5_14
  12. 12.
    Harman, D., Buckley, C.: Overview of the reliable information access workshop. Inf. Retr. 12(6), 615–641 (2009)CrossRefGoogle Scholar
  13. 13.
    Khabsa, M., Wu, Z., Giles, C.L.: Towards better understanding of academic search. In: JCDL 2016, pp. 111–114. ACM (2016)Google Scholar
  14. 14.
    Kluck, M., Gey, F.C.: The domain-specific task of CLEF - specific evaluation strategies in cross-language information retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 48–56. Springer, Heidelberg (2001). doi: 10.1007/3-540-44645-1_5 CrossRefGoogle Scholar
  15. 15.
    Kluck, M., Stempfhuber, M.: Domain-specific track CLEF 2005: overview of results and approaches, remarks on the assessment analysis. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., Rijke, M. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 212–221. Springer, Heidelberg (2006). doi: 10.1007/11878773_25 CrossRefGoogle Scholar
  16. 16.
    Kürsten, J.: A generic approach to component-level evaluation in information retrieval. Ph.D. thesis, Technical University Chemnitz, Germany (2012)Google Scholar
  17. 17.
    Li, X., Schijvenaars, B.J., de Rijke, M.: Investigating queries and search failures in academic search. Inf. Process. Manag. 53(3), 666–683 (2017)CrossRefGoogle Scholar
  18. 18.
    Mayr, P., Scharnhorst, A., Larsen, B., Schaer, P., Mutschke, P.: Bibliometric-enhanced information retrieval. In: de Rijke, M., Kenter, T., Vries, A.P., Zhai, C.X., Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 798–801. Springer, Cham (2014). doi: 10.1007/978-3-319-06028-6_99
  19. 19.
    McCarn, D.B., Leiter, J.: On-line services in medicine and beyond. Science 181(4097), 318–324 (1973)CrossRefGoogle Scholar
  20. 20.
    Scholer, F., Garcia, S.: A case for improved evaluation of query difficulty prediction. In: SIGIR 2009, pp. 640–641. ACM (2009)Google Scholar
  21. 21.
    Vanopstal, K., Buysschaert, J., Laureys, G., Stichele, R.V.: Lost in PubMed. Factors influencing the success of medical information retrieval. Expert Systems with Applications 40(10), 4106–4114 (2013)CrossRefGoogle Scholar
  22. 22.
    Verberne, S., Sappelli, M., Kraaij, W.: Query term suggestion in academic search. In: de Rijke, M., Kenter, T., Vries, A.P., Zhai, C.X., Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 560–566. Springer, Cham (2014). doi: 10.1007/978-3-319-06028-6_57
  23. 23.
    Voorhees, E.M.: The TREC robust retrieval track. ACM SIGIR Forum 39, 11–20 (2005)CrossRefGoogle Scholar
  24. 24.
    Ware, M., Mabe, M.: The STM report: an overview of scientific and scholarly journal publishing (2015).
  25. 25.
    Web of Science: Journal Citation Report. Thomson Reuters (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Berlin School of Library and Information ScienceHumboldt-Universität zu BerlinBerlinGermany

Personalised recommendations