Benchmarking News Recommendations in a Living Lab

  • Frank Hopfgartner
  • Benjamin Kille
  • Andreas Lommatzsch
  • Till Plumbaum
  • Torben Brodt
  • Tobias Heintz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8685)


Most user-centric studies of information access systems in literature suffer from unrealistic settings or limited numbers of users who participate in the study. In order to address this issue, the idea of a living lab has been promoted. Living labs allow us to evaluate research hypotheses using a large number of users who satisfy their information need in a real context. In this paper, we introduce a living lab on news recommendation in real time. The living lab has first been organized as News Recommendation Challenge at ACM RecSys’13 and then as campaign-style evaluation lab NEWSREEL at CLEF’14. Within this lab, researchers were asked to provide news article recommendations to millions of users in real time. Different from user studies which have been performed in a laboratory, these users are following their own agenda. Consequently, laboratory bias on their behavior can be neglected. We outline the living lab scenario and the experimental setup of the two benchmarking events. We argue that the living lab can serve as reference point for the implementation of living labs for the evaluation of information access systems.


Information Retrieval Recommender System News Article Recommendation Algorithm Information Retrieval Evaluation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adomavicius, G., Kwon, Y.O.: Improving aggregate recommendation diversity using ranking-based techniques. Knowledge and Data Engineering 24(5), 896–911 (2012)CrossRefGoogle Scholar
  2. 2.
    Allan, J.: Hard track overview in trec 2003: High accuracy retrieval from documents. In: TREC, pp. 24–37 (2003)Google Scholar
  3. 3.
    Amatriain, X.: Mining large streams of user data for personalized recommendations. ACM SIGKDD Explorations Newsletter 14(2), 37 (2013)CrossRefGoogle Scholar
  4. 4.
    Azzopardi, L., Balog, K.: Towards a living lab for information retrieval research and development. In: Forner, P., Gonzalo, J., Kekäläinen, J., Lalmas, M., de Rijke, M. (eds.) CLEF 2011. LNCS, vol. 6941, pp. 26–37. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  5. 5.
    Balog, K., Elsweiler, D., Kanoulas, E., Kelly, L., Smucker, M.: Report on the cikm workshop on living labs for information retrieval evaluation. SIGIR Forum 48(1) (2014)Google Scholar
  6. 6.
    Belkin, N.J.: Some(what) grand challenges for information retrieval. In: ECIR, p. 1 (2008)Google Scholar
  7. 7.
    Bennett, J., Lanning, S.: The netflix prize. In: KDDCup (2007)Google Scholar
  8. 8.
    Brodt, T., Hopfgartner, F.: Shedding Light on a Living Lab: The CLEF NEWSREEL Open Recommendation Platform. In: Proceedings of the Information Interaction in Context Conference, IIiX 2014. Springer (to appear, 2014)Google Scholar
  9. 9.
    Cleverdon, C., Mills, J., Keen, M.: Factors determining the performance of indexing systems. Technical report, ASLIB Cranfield project, Cranfield (1966)Google Scholar
  10. 10.
    Clough, P., Sanderson, M.: Evaluating the performance of information retrieval systems using test collections. Information Research 18(2) (2013)Google Scholar
  11. 11.
    Dror, G., Koenigstein, N., Koren, Y., Weimer, M.: The Yahoo! Music Dataset and KDD-Cup. In: JMLR: Workshop and Conference Proceedings, pp. 3–18 (2012)Google Scholar
  12. 12.
    Dumais, S., Belkin, N.: The trec interactive tracks: Putting the user into search. In: TREC (2005)Google Scholar
  13. 13.
    Esiyok, C., Kille, B., Jain, B.J., Hopfgartner, F., Albayrak, S.: Users’ reading habits in online news portals. In: IIiX 2014: Proceedings of Information Interaction in Context Conference. ACM (to appear, August 2014)Google Scholar
  14. 14.
    Goldberg, K., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval 4(2), 133–151 (2001)CrossRefzbMATHGoogle Scholar
  15. 15.
    Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic framework for performing collaborative filtering. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 230–237. ACM (1999)Google Scholar
  16. 16.
    Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22(1), 5–53 (2004)CrossRefGoogle Scholar
  17. 17.
    Hopfgartner, F., Jose, J.M.: Semantic user profiling techniques for personalised multimedia recommendation. Multimedia Syst. 16(4-5), 255–274 (2010)CrossRefGoogle Scholar
  18. 18.
    Ivory, M.Y., Hearst, M.A.: The state of the art in automating usability evaluation of user interfaces. ACM Comput. Surv. 33(4), 470–516 (2001)CrossRefGoogle Scholar
  19. 19.
    Kamps, J., Geva, S., Peters, C., Sakai, T., Trotman, A., Voorhees, E.M.: Report on the sigir 2009 workshop on the future of ir evaluation. SIGIR Forum 43(2), 13–23 (2009)CrossRefGoogle Scholar
  20. 20.
    Kelly, D., Dumais, S.T., Pedersen, J.O.: Evaluation challenges and directions for information-seeking support systems. IEEE Computer 42(3), 60–66 (2009)CrossRefGoogle Scholar
  21. 21.
    Kille, B., Hopfgartner, F., Brodt, T., Heintz, T.: The plista dataset. In: NRS 2013: Proceedings of the International Workshop and Challenge on News Recommender Systems, pp. 14–21. ACM (2013)Google Scholar
  22. 22.
    Konstan, J., Riedl, J.: Recommender systems: from algorithms to user experience. User Modeling and User-Adapted Interaction 22(1-2), 101–123 (2012)CrossRefGoogle Scholar
  23. 23.
    Lommatzsch, A.: Real-time news recommendation using context-aware ensembles. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 51–62. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  24. 24.
    Lommatzsch, A., Plumbaum, T., Albayrak, S.: A linked dataverse knows better: Boosting recommendation quality using semantic knowledge. In: Proc. of the 5th Intl. Conf. on Advances in Semantic Processing, Wilmington, DE, USA, pp. 97–103. IARIA (2011)Google Scholar
  25. 25.
    Pazzani, M.J., Billsus, D.: Content-based recommendation systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) Adaptive Web 2007. LNCS, vol. 4321, pp. 325–341. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  26. 26.
    Phelan, O., McCarthy, K., Smyth, B.: Using twitter to recommend real-time topical news. In: Proceedings of the Third ACM Conference on Recommender Systems, RecSys 2009, pp. 385–388. ACM, New York (2009)Google Scholar
  27. 27.
    Pirolli, P.: Powers of 10: Modeling complex information-seeking systems at multiple scales. IEEE Computer 42(3), 33–40 (2009)CrossRefGoogle Scholar
  28. 28.
    Said, A., Lin, J., Bellogín, A., de Vries, A.: A month in the life of a production news recommender system. In: Proceedings of the 2013 Workshop on Living Labs for Information Retrieval Evaluation, LivingLab 2013, pp. 7–10. ACM (2013)Google Scholar
  29. 29.
    Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: WWW, pp. 285–295 (2001)Google Scholar
  30. 30.
    Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Recommender Systems Handbook, pp. 257–297. Springer (2011)Google Scholar
  31. 31.
    Tavakolifard, M., Gulla, J.A., Almeroth, K.C., Hopfgartner, F., Kille, B., Plumbaum, T., Lommatzsch, A., Brodt, T., Bucko, A., Heintz, T.: Workshop and challenge on news recommender systems. In: RecSys 2013: Proceedings of the International ACM Conference on Recommender Systems. ACM (October 2013)Google Scholar
  32. 32.
    TNS Opinion & Social. Special Eurobarometer 386 – Europeans and their Languages. Technical report, European Commission (2012)Google Scholar
  33. 33.
    Vallet, D., Hopfgartner, F., Jose, J.: Use of implicit graph for recommending relevant videos: a simulated evaluation. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 199–210. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  34. 34.
    Voorhees, E.M., Harman, D.K.: TREC: Experiment and Evaluation in Information Retrieval, 1st edn. MIT Press, Cambridge (2005)Google Scholar
  35. 35.
    Ziegler, C.-N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: WWW 2005, pp. 22–32. ACM (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Frank Hopfgartner
    • 1
  • Benjamin Kille
    • 1
  • Andreas Lommatzsch
    • 1
  • Till Plumbaum
    • 1
  • Torben Brodt
    • 2
  • Tobias Heintz
    • 2
  1. 1.Technische Universität BerlinBerlinGermany
  2. 2.plista GmbHBerlinGermany

Personalised recommendations