Skip to main content

Evaluating Recommender Systems

  • Chapter
Recommender Systems Handbook

Abstract

Recommender systems are now popular both commercially and in the research community, where many approaches have been suggested for providing recommendations. In many cases a system designer that wishes to employ a recommendater system must choose between a set of candidate approaches. A first step towards selecting an appropriate algorithm is to decide which properties of the application to focus upon when making this choice. Indeed, recommender systems have a variety of properties that may affect user experience, such as accuracy, robustness, scalability, and so forth. In this paper we discuss how to compare recommenders based on a set of properties that are relevant for the application. We focus on comparative studies, where a few algorithms are compared using some evaluation metric, rather than absolute benchmarking of algorithms. We describe experimental settings appropriate for making choices between algorithms. We review three types of experiments, starting with an offline setting, where recommendation approaches are compared without user interaction, then reviewing user studies, where a small group of subjects experiment with the system and report on the experience, and finally describe large scale online experiments, where real user populations interact with the system. In each of these cases we describe types of questions that can be answered, and suggest protocols for experimentation. We also discuss how to draw trustworthy conclusions from the conducted experiments. We then review a large set of properties, and explain how to evaluate systems given relevant properties. We also survey a large set of evaluation metrics in the context of the property that they evaluate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.Netflix.com.

  2. 2.

    www.amazon.com.

  3. 3.

    A reference to their origins in signal detection theory.

  4. 4.

    Not to be confused with trust in the social network research, used to measure how much a user believes another user. Some literature on recommender systems uses such trust measurements to filter similar users [48].

References

  1. Bailey, R.: Design of comparative experiments, vol. 25. Cambridge University Press Cambridge (2008)

    Google Scholar 

  2. Bamber, D.: The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology 12, 387–415 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  3. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) pp. 289–300 (1995)

    Google Scholar 

  4. Bickel, P.J., Ducksum, K.A.: Mathematical Statistics: Ideas and Concepts. Holden-Day (1977)

    Google Scholar 

  5. Bonhard, P., Harries, C., McCarthy, J., Sasse, M.A.: Accounting for taste: using profile similarity to improve recommender systems. In: CHI ’06: Proceedings of the SIGCHI conference on Human Factors in computing systems, pp. 1057–1066. ACM, New York, NY, USA (2006)

    Google Scholar 

  6. Boutilier, C., Zemel, R.S.: Online queries for collaborative filtering. In: In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (2002)

    Google Scholar 

  7. Box, G.E.P., Hunter, W.G., Hunter, J.S.: Statistics for Experimenters. Wiley, New York (1978)

    MATH  Google Scholar 

  8. Bradley, K., Smyth, B.: Improving recommendation diversity. In: Twelfth Irish Conference on Artificial Intelligence and Cognitive Science, pp. 85–94 (2001)

    Google Scholar 

  9. Braziunas, D., Boutilier, C.: Local utility elicitation in GAI models. In: Proceedings of the Twenty-first Conference on Uncertainty in Artificial Intelligence, pp. 42–49. Edinburgh (2005)

    Google Scholar 

  10. Breese, J.S., Heckerman, D., Kadie, C.M.: Empirical analysis of predictive algorithms for collaborative filtering. In: UAI, pp. 43–52 (1998)

    Google Scholar 

  11. Burke, R.: Evaluating the dynamic properties of recommendation algorithms. In: Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys ’10, pp. 225–228. ACM, New York, NY, USA (2010)

    Google Scholar 

  12. Celma, O., Herrera, P.: A new approach to evaluating novel recommendations. In: RecSys ’08: Proceedings of the 2008 ACM conference on Recommender systems, pp. 179–186. ACM, New York, NY, USA (2008)

    Google Scholar 

  13. Chirita, P.A., Nejdl, W., Zamfir, C.: Preventing shilling attacks in online recommender systems. In: WIDM ’05: Proceedings of the 7th annual ACM international workshop on Web information and data management, pp. 67–74. ACM, New York, NY, USA (2005)

    Google Scholar 

  14. Cramer, H., Evers, V., Ramlal, S., Someren, M., Rutledge, L., Stash, N., Aroyo, L., Wielinga, B.: The effects of transparency on trust in and acceptance of a content-based art recommender. User Modeling and User-Adapted Interaction 18(5), 455–496 (2008)

    Article  Google Scholar 

  15. Das, A.S., Datar, M., Garg, A., Rajaram, S.: Google news personalization: scalable online collaborative filtering. In: WWW ’07: Proceedings of the 16th international conference on World Wide Web, pp. 271–280. ACM, New York, NY, USA (2007)

    Google Scholar 

  16. Dekel, O., Manning, C.D., Singer, Y.: Log-linear models for label ranking. In: NIPS’03, pp.–1–1 (2003)

    Google Scholar 

  17. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MATH  MathSciNet  Google Scholar 

  18. Deshpande, M., Karypis, G.: Item-based top-N recommendation algorithms. ACM Transactions on Information Systems 22(1), 143–177 (2004)

    Article  Google Scholar 

  19. Fischer, G.: User modeling in human-computer interaction. User Model. User-Adapt. Interact. 11(1–2), 65–86 (2001)

    Article  MATH  Google Scholar 

  20. Fleder, D.M., Hosanagar, K.: Recommender systems and their impact on sales diversity. In: EC ’07: Proceedings of the 8th ACM conference on Electronic commerce, pp. 192–199. ACM, New York, NY, USA (2007)

    Google Scholar 

  21. Frankowski, D., Cosley, D., Sen, S., Terveen, L., Riedl, J.: You are what you say: privacy risks of public mentions. In: SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 565–572. ACM, New York, NY, USA (2006)

    Google Scholar 

  22. Fredricks, G.A., Nelsen, R.B.: On the relationship between spearman’s rho and kendall’s tau for pairs of continuous random variables. Journal of Statistical Planning and Inference 137(7), 2143–2150 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  23. George, T.: A scalable collaborative filtering framework based on co-clustering. In: Fifth IEEE International Conference on Data Mining, pp. 625–628 (2005)

    Google Scholar 

  24. Greenwald, A.G.: Within-subjects designs: To use or not to use? Psychological Bulletin 83, 216–229 (1976)

    Article  Google Scholar 

  25. Haddawy, P., Ha, V., Restificar, A., Geisler, B., Miyamoto, J.: Preference elicitation via theory refinement. Journal of Machine Learning Research 4, 2003 (2002)

    Google Scholar 

  26. Herlocker, J.L., Konstan, J.A., Riedl, J.T.: Explaining collaborative filtering recommendations. In: CSCW ’00: Proceedings of the 2000 ACM conference on Computer supported cooperative work, pp. 241–250. ACM, New York, NY, USA (2000)

    Google Scholar 

  27. Herlocker, J.L., Konstan, J.A., Riedl, J.T.: An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Inf. Retr. 5(4), 287–310 (2002). DOI http://dx.doi.org/10.1023/A:1020443909834

  28. Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 22(1), 5–53 (2004). DOI http://doi.acm.org/10.1145/963770.963772

  29. Hijikata, Y., Shimizu, T., Nishida, S.: Discovery-oriented collaborative filtering for improving user satisfaction. In: IUI ’09: Proceedings of the 13th international conference on Intelligent user interfaces, pp. 67–76. ACM, New York, NY, USA (2009)

    Google Scholar 

  30. Hu, R., Pu, P.: A comparative user study on rating vs. personality quiz based preference elicitation methods. In: IUI, pp. 367–372 (2009)

    Google Scholar 

  31. Hu, R., Pu, P.: A comparative user study on rating vs. personality quiz based preference elicitation methods. In: IUI 0́9: Proceedings of the 13th international conference on Intelligent user interfaces, pp. 367–372. ACM, New York, NY, USA (2009)

    Google Scholar 

  32. Hu, R., Pu, P.: A study on user perception of personality-based recommender systems. In: UMAP, pp. 291–302 (2010)

    Google Scholar 

  33. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002). DOI http://doi.acm.org/10.1145/582415.582418

  34. Jones, N., Pu, P.: User technology adoption issues in recommender systems. In: Networking and Electronic Conference (2007)

    Google Scholar 

  35. Jung, S., Herlocker, J.L., Webster, J.: Click data as implicit relevance feedback in web search. Inf. Process. Manage. 43(3), 791–807 (2007)

    Article  Google Scholar 

  36. Karypis, G.: Evaluation of item-based top-n recommendation algorithms. In: CIKM ’01: Proceedings of the tenth international conference on Information and knowledge management, pp. 247–254. ACM, New York, NY, USA (2001)

    Google Scholar 

  37. Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1–2), 81–93 (1938)

    Article  MATH  MathSciNet  Google Scholar 

  38. Kendall, M.G.: The treatment of ties in ranking problems. Biometrika 33(3), 239–251 (1945)

    Article  MATH  MathSciNet  Google Scholar 

  39. Kohavi, R., Deng, A., Frasca, B., Walker, T., Xu, Y., Pohlmann, N.: Online controlled experiments at large scale. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13, pp. 1168–1176. ACM, New York, NY, USA (2013)

    Google Scholar 

  40. Kohavi, R., Longbotham, R., Sommerfield, D., Henne, R.M.: Controlled experiments on the web: survey and practical guide. Data Min. Knowl. Discov. 18(1), 140–181 (2009)

    Article  MathSciNet  Google Scholar 

  41. Konstan, J.A., McNee, S.M., Ziegler, C.N., Torres, R., Kapoor, N., Riedl, J.: Lessons on applying automated recommender systems to information-seeking tasks. In: AAAI (2006)

    Google Scholar 

  42. Koychev, I., Schwab, I.: Adaptation to drifting user’s interests. In: In Proceedings of ECML2000 Workshop: Machine Learning in New Information Age, pp. 39–46 (2000)

    Google Scholar 

  43. Lam, S.K., Frankowski, D., Riedl, J.: Do you trust your recommendations? an exploration of security and privacy issues in recommender systems. In: In Proceedings of the 2006 International Conference on Emerging Trends in Information and Communication Security (ETRICS) (2006)

    Google Scholar 

  44. Lam, S.K., Riedl, J.: Shilling recommender systems for fun and profit. In: WWW ’04: Proceedings of the 13th international conference on World Wide Web, pp. 393–402. ACM, New York, NY, USA (2004)

    Google Scholar 

  45. Lehmann, E.L., Romano, J.P.: Testing statistical hypotheses, third edn. Springer Texts in Statistics. Springer, New York (2005)

    MATH  Google Scholar 

  46. Mahmood, T., Ricci, F.: Learning and adaptivity in interactive recommender systems. In: ICEC ’07: Proceedings of the ninth international conference on Electronic commerce, pp. 75–84. ACM, New York, NY, USA (2007)

    Google Scholar 

  47. Marlin, B.M., Zemel, R.S.: Collaborative prediction and ranking with non-random missing data. In: Proceedings of the 2009 ACM Conference on Recommender Systems, RecSys 2009, New York, NY, USA, October 23–25, 2009, pp. 5–12 (2009)

    Google Scholar 

  48. Massa, P., Bhattacharjee, B.: Using trust in recommender systems: An experimental analysis. In: In Proceedings of iTrust2004 International Conference, pp. 221–235 (2004)

    Google Scholar 

  49. McLaughlin, M.R., Herlocker, J.L.: A collaborative filtering algorithm and evaluation metric that accurately model the user experience. In: SIGIR ’04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 329–336. ACM, New York, NY, USA (2004)

    Google Scholar 

  50. McNee, S.M., Riedl, J., Konstan, J.A.: Making recommendations better: an analytic model for human-recommender interaction. In: CHI ’06: CHI ’06 extended abstracts on Human factors in computing systems, pp. 1103–1108. ACM, New York, NY, USA (2006)

    Google Scholar 

  51. McSherry, F., Mironov, I.: Differentially private recommender systems: building privacy into the netflix prize contenders. In: KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 627–636. ACM, New York, NY, USA (2009)

    Google Scholar 

  52. Mobasher, B., Burke, R., Bhaumik, R., Williams, C.: Toward trustworthy recommender systems: An analysis of attack models and algorithm robustness. ACM Trans. Internet Technol. 7(4), 23 (2007)

    Article  Google Scholar 

  53. Murakami, T., Mori, K., Orihara, R.: Metrics for evaluating the serendipity of recommendation lists. New Frontiers in Artificial Intelligence 4914, 40–46 (2008)

    Article  Google Scholar 

  54. Nguyen, T.T., Kluver, D., Wang, T.Y., Hui, P.M., Ekstrand, M.D., Willemsen, M.C., Riedl, J.: Rating support interfaces to improve user experience and recommender accuracy. In: Proceedings of the 7th ACM Conference on Recommender Systems, RecSys ’13, pp. 149–156. ACM, New York, NY, USA (2013)

    Google Scholar 

  55. O’Mahony, M., Hurley, N., Kushmerick, N., Silvestre, G.: Collaborative recommendation: A robustness analysis. ACM Trans. Internet Technol. 4(4), 344–377 (2004)

    Article  Google Scholar 

  56. Pfleeger, S.L., Kitchenham, B.A.: Principles of survey research. SIGSOFT Softw. Eng. Notes 26(6), 16–18 (2001)

    Article  Google Scholar 

  57. Pu, P., Chen, L.: Trust building with explanation interfaces. In: IUI ’06: Proceedings of the 11th international conference on Intelligent user interfaces, pp. 93–100. ACM, New York, NY, USA (2006)

    Google Scholar 

  58. Pu, P., Chen, L., Hu, R.: A user-centric evaluation framework for recommender systems. In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ’11, pp. 157–164. ACM, New York, NY, USA (2011)

    Google Scholar 

  59. Queiroz, S.: Adaptive preference elicitation for top-k recommendation tasks using gai-networks. In: AIAP’07: Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference, pp. 579–584. ACTA Press, Anaheim, CA, USA (2007)

    Google Scholar 

  60. Russell, M.L., Moralejo, D.G., Burgess, E.D.: Paying research subjects: participants’ perspectives. Journal of Medical Ethics 26(2), 126–130 (2000)

    Article  Google Scholar 

  61. Salzberg, S.L.: On comparing classifiers: Pitfalls toavoid and a recommended approach. Data Min. Knowl. Discov. 1(3), 317–328 (1997)

    Article  Google Scholar 

  62. Sarwar, B., Karypis, G., Konstan, J., Reidl, J.: Item-based collaborative filtering recommendation algorithms. In: WWW ’01: Proceedings of the 10th international conference on World Wide Web, pp. 285–295. ACM, New York, NY, USA (2001)

    Google Scholar 

  63. Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Analysis of recommendation algorithms for e-commerce. In: EC ’00: Proceedings of the 2nd ACM conference on Electronic commerce, pp. 158–167. ACM, New York, NY, USA (2000)

    Google Scholar 

  64. Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 253–260. ACM, New York, NY, USA (2002)

    Google Scholar 

  65. Shani, G., Chickering, D.M., Meek, C.: Mining recommendations from the web. In: RecSys ’08: Proceedings of the 2008 ACM Conference on Recommender Systems, pp. 35–42 (2008)

    Google Scholar 

  66. Shani, G., Heckerman, D., Brafman, R.I.: An mdp-based recommender system. Journal of Machine Learning Research 6, 1265–1295 (2005)

    MATH  MathSciNet  Google Scholar 

  67. Shani, G., Rokach, L., Shapira, B., Hadash, S., Tangi, M.: Investigating confidence displays for top-n recommendations. JASIST 64(12), 2548–2563 (2013)

    Article  Google Scholar 

  68. Smyth, B., McClave, P.: Similarity vs. diversity. In: ICCBR, pp. 347–361 (2001)

    Google Scholar 

  69. Spillman, W., Lang, E.: The Law of Diminishing Returns. World Book Company (1924)

    Google Scholar 

  70. Steck, H.: Item popularity and recommendation accuracy. In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ’11, pp. 125–132. ACM, New York, NY, USA (2011)

    Google Scholar 

  71. Steck, H.: Evaluation of recommendations: rating-prediction and ranking. In: Seventh ACM Conference on Recommender Systems, RecSys ’13, Hong Kong, China, October 12–16, 2013, pp. 213–220 (2013)

    Google Scholar 

  72. Swearingen, K., Sinha, R.: Beyond algorithms: An hci perspective on recommender systems. In: ACM SIGIR 2001 Workshop on Recommender Systems (2001)

    Google Scholar 

  73. Van Rijsbergen, C.J.: Information Retrieval. Butterworth-Heinemann, Newton, MA, USA (1979)

    Google Scholar 

  74. Voorhees, E.M.: Overview of trec 2002. In: In Proceedings of the 11th Text Retrieval Conference (TREC 2002), NIST Special Publication 500-251, pp. 1–15 (2002)

    Google Scholar 

  75. Voorhees, E.M.: The philosophy of information retrieval evaluation. In: CLEF ’01: Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems, pp. 355–370. Springer-Verlag, London, UK (2002)

    Google Scholar 

  76. Yao, Y.Y.: Measuring retrieval effectiveness based on user preference of documents. J. Amer. Soc. Inf. Sys 46(2), 133–145 (1995)

    Article  Google Scholar 

  77. Yilmaz, E., Aslam, J.A., Robertson, S.: A new rank correlation coefficient for information retrieval. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, pp. 587–594. ACM, New York, NY, USA (2008)

    Google Scholar 

  78. Zhang, M., Hurley, N.: Avoiding monotony: improving the diversity of recommendation lists. In: RecSys ’08: Proceedings of the 2008 ACM conference on Recommender systems, pp. 123–130. ACM, New York, NY, USA (2008)

    Google Scholar 

  79. Zhang, Y., Callan, J., Minka, T.: Novelty and redundancy detection in adaptive filtering. In: SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 81–88. ACM, New York, NY, USA (2002)

    Google Scholar 

  80. Ziegler, C.N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: WWW 0́5: Proceedings of the 14th international conference on World Wide Web, pp. 22–32. ACM, New York, NY, USA (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guy Shani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this chapter

Cite this chapter

Gunawardana, A., Shani, G. (2015). Evaluating Recommender Systems. In: Ricci, F., Rokach, L., Shapira, B. (eds) Recommender Systems Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7637-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4899-7637-6_8

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4899-7636-9

  • Online ISBN: 978-1-4899-7637-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics