Skip to main content

Evaluating Research Dataset Recommendations in a Living Lab

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13390)


The search for research datasets is as important as laborious. Due to the importance of the choice of research data in further research, this decision must be made carefully. Additionally, because of the growing amounts of data in almost all areas, research data is already a central artifact in empirical sciences. Consequentially, research dataset recommendations can beneficially supplement scientific publication searches. We formulated the recommendation task as a retrieval problem by focussing on broad similarities between research datasets and scientific publications. In a multistage approach, initial recommendations were retrieved by the BM25 ranking function and dynamic queries. Subsequently, the initial ranking was re-ranked utilizing click feedback and document embeddings. The proposed system was evaluated live on real user interaction data using the STELLA infrastructure in the LiLAS Lab at CLEF 2021. Our experimental system could efficiently be fine-tuned before the live evaluation by pre-testing the system with a pseudo test collection based on prior user interaction data from the live system. The results indicate that the experimental system outperforms the other participating systems.


  • Living Labs
  • (Online) Evaluation in IR
  • Recommender System
  • Research Dataset Retrieval

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.

  2. 2.

  3. 3.

  4. 4.

  5. 5.

  6. 6.

  7. 7.


  1. Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17, 734–749 (2005).

    CrossRef  Google Scholar 

  2. Asadi, N., Metzler, D., Elsayed, T., Lin, J.: Pseudo test collections for learning web search ranking functions. In: Ma, W., Nie, J., Baeza-Yates, R., Chua, T., Croft, W.B. (eds.) Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, 25–29 July 2011, pp. 1073–1082. ACM (2011).

  3. Azzopardi, L., Balog, K.: Towards a living lab for information retrieval research and development. In: Forner, P., Gonzalo, J., Kekäläinen, J., Lalmas, M., de Rijke, M. (eds.) CLEF 2011. LNCS, vol. 6941, pp. 26–37. Springer, Heidelberg (2011).

    CrossRef  Google Scholar 

  4. Balog, K., Schuth, A., Dekker, P., Schaer, P., Chuang, P.Y., Tavakolpoursaleh, N.: Overview of the trec 2016 open search track. In: Voorhees, E.M., Ellis, A. (eds.) TREC, vol. Special Publication 500–321. National Institute of Standards and Technology (NIST) (2016)

    Google Scholar 

  5. Beel, J., Gipp, B., Langer, S., Breitinger, C.: Research-paper recommender systems: a literature survey. Int. J. Digit. Libr. 17(4), 305–338 (2015).

    CrossRef  Google Scholar 

  6. Berendsen, R., Tsagkias, M., Weerkamp, W., de Rijke, M.: Pseudo test collections for training and tuning microblog rankers. In: Jones, G.J.F., Sheridan, P., Kelly, D., de Rijke, M., Sakai, T. (eds.) The 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013, Dublin, Ireland - July 28 - August 01 2013, pp. 53–62. ACM (2013).

  7. Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Recommender systems survey. Knowl. Based Syst. 46, 109–132 (2013).

    CrossRef  Google Scholar 

  8. Breuer, T., Schaer, P., Tavakolpoursaleh, N., Schaible, J., Wolff, B., Müller, B.: STELLA: towards a framework for the reproducibility of online search experiments. In: Clancy, R., Ferro, N., Hauff, C., Lin, J., Sakai, T., Wu, Z.Z. (eds.) Proceedings of the Open-Source IR Replicability Challenge co-located with 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, OSIRRC@SIGIR 2019, Paris, France, 25 July 2019. CEUR Workshop Proceedings, vol. 2409, pp. 8–11. (2019),

  9. Chapman, A., et al.: Dataset search: a survey. VLDB J. 29(1), 251–272 (2019).

    CrossRef  Google Scholar 

  10. Cohan, A., Feldman, S., Beltagy, I., Downey, D., Weld, D.S.: SPECTER: document-level representation learning using citation-informed transformers. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, 5–10 July 2020, pp. 2270–2282. Association for Computational Linguistics (2020).

  11. Craswell, N., Zoeter, O., Taylor, M.J., Ramsey, B.: An experimental comparison of click position-bias models. In: Najork, M., Broder, A.Z., Chakrabarti, S. (eds.) Proceedings of the International Conference on Web Search and Web Data Mining, WSDM 2008, Palo Alto, California, USA, 11–12 February 2008, pp. 87–94. ACM (2008).

  12. Fix, E., Hodges, J.L.: Discriminatory analysis. Nonparametric discrimination: consistency properties. Int. Stat. Rev. Rev. Int. de Stat. 57(3), 238–247 (1989).

  13. Hienert, D., Kern, D., Boland, K., Zapilko, B., Mutschke, P.: A digital library for research data and related information in the social sciences. In: Bonn, M., Wu, D., Downie, J.S., Martaus, A. (eds.) 19th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019, Champaign, IL, USA, 2–6 June 2019, pp. 148–157. IEEE (2019).

  14. Kacprzak, E., Koesten, L., Ibáñez, L., Blount, T., Tennison, J., Simperl, E.: Characterising dataset search - an analysis of search logs and data requests. J. Web Semant. 55, 37–55 (2019).

    CrossRef  Google Scholar 

  15. Kern, D., Mathiak, B.: Are there any differences in data set retrieval compared to well-known literature retrieval? In: Kapidakis, S., Mazurek, C., Werla, M. (eds.) TPDL 2015. LNCS, vol. 9316, pp. 197–208. Springer, Cham (2015).

    CrossRef  Google Scholar 

  16. Lommatzsch, A., Kille, B., Hopfgartner, F., Ramming, L.: Newsreel multimedia at mediaeval 2018: news recommendation with image and text content. In: Larson, M.A. (eds.) Working Notes Proceedings of the MediaEval 2018 Workshop, Sophia Antipolis, France, 29–31 October 2018. CEUR Workshop Proceedings, vol. 2283. (2018).

  17. Roberts, K., et al.: Searching for scientific evidence in a pandemic: an overview of TREC-COVID. CoRR abs/2104.09632 (2021).

  18. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Harman, D.K. (ed.) Proceedings of The Third Text Retrieval Conference, TREC 1994, Gaithersburg, Maryland, USA, 2–4 November 1994. NIST Special Publication, vol. 500–225, pp. 109–126. National Institute of Standards and Technology (NIST) (1994).

  19. Schaer, P., Breuer, T., Castro, L.J., Wolff, B., Schaible, J., Tavakolpoursaleh, N.: Overview of lilas 2021 – living labs for academic search. In: Candan, K.S. (ed.) CLEF 2021. LNCS, vol. 12880, pp. 394–418. Springer, Cham (2021).

    CrossRef  Google Scholar 

  20. Schaer, P., Breuer, T., Castro, L.J., Wolff, B., Schaible, J., Tavakolpoursaleh, N.: Overview of lilas 2021 - living labs for academic search (extended overview). In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to - 24th, 2021. CEUR Workshop Proceedings, vol. 2936, pp. 1668–1699. (2021).

  21. Schaer, P., Schaible, J., Müller, B.: Living labs for academic search at CLEF 2020. In: ECIR 2020. LNCS, vol. 12036, pp. 580–586. Springer, Cham (2020).

    CrossRef  Google Scholar 

  22. Schaible, J., Breuer, T., Tavakolpoursaleh, N., Müller, B., Wolff, B., Schaer, P.: Evaluation Infrastructures for Academic Shared Tasks. Datenbank-Spektrum 20(1), 29–36 (2020).

    CrossRef  Google Scholar 

  23. Schuth, A., Balog, K., Kelly, L.: Overview of the living labs for information retrieval evaluation (LL4IR) CLEF lab 2015. In: Mothe, J. (ed.) CLEF 2015. LNCS, vol. 9283, pp. 484–496. Springer, Cham (2015).

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jüri Keller .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Keller, J., Munz, L.P.M. (2022). Evaluating Research Dataset Recommendations in a Living Lab. In: Barrón-Cedeño, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2022. Lecture Notes in Computer Science, vol 13390. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13642-9

  • Online ISBN: 978-3-031-13643-6

  • eBook Packages: Computer ScienceComputer Science (R0)