Evaluating Recommendation Systems

Shani, Guy; Gunawardana, Asela

doi:10.1007/978-0-387-85820-3_8

Guy Shani⁵ &
Asela Gunawardana⁵

27k Accesses
534 Citations

Abstract

Recommender systems are now popular both commercially and in the research community, where many approaches have been suggested for providing recommendations. In many cases a system designer that wishes to employ a recommendation system must choose between a set of candidate approaches. A first step towards selecting an appropriate algorithm is to decide which properties of the application to focus upon when making this choice. Indeed, recommendation systems have a variety of properties that may affect user experience, such as accuracy, robustness, scalability, and so forth. In this paper we discuss how to compare recommenders based on a set of properties that are relevant for the application. We focus on comparative studies, where a few algorithms are compared using some evaluation metric, rather than absolute benchmarking of algorithms. We describe experimental settings appropriate for making choices between algorithms. We review three types of experiments, starting with an offline setting, where recommendation approaches are compared without user interaction, then reviewing user studies, where a small group of subjects experiment with the system and report on the experience, and finally describe large scale online experiments, where real user populations interact with the system. In each of these cases we describe types of questions that can be answered, and suggest protocols for experimentation. We also discuss how to draw trustworthy conclusions from the conducted experiments. We then review a large set of properties, and explain how to evaluate systems given relevant properties. We also survey a large set of evaluation metrics in the context of the properties that they evaluate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 179.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bamber, D.: The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology 12, 387–415 (1975)
Article MathSciNet MATH Google Scholar
benjamini: Controlling the false discovery rate:a practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57(1), 289–300 (1995)
MathSciNet MATH Google Scholar
Bonhard, P., Harries, C., McCarthy, J., Sasse, M.A.: Accounting for taste: using profile similarity to improve recommender systems. In: CHI ’06: Proceedings of the SIGCHI conference on Human Factors in computing systems, pp. 1057–1066. ACM, New York, NY, USA (2006). DOI http://doi.acm.org/10.1145/1124772.1124930
Boutilier, C., Zemel, R.S.: Online queries for collaborative filtering. In: In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (2002)
Google Scholar
Box, G.E.P., Hunter,W.G., Hunter, J.S.: Statistics for Experimenters. Wiley, New York (1978)
MATH Google Scholar
Bradley, K., Smyth, B.: Improving recommendation diversity. In: Twelfth Irish Conference on Artificial Intelligence and Cognitive Science, pp. 85–94 (2001)
Google Scholar
Braziunas, D., Boutilier, C.: Local utility elicitation in GAI models. In: Proceedings of the Twenty-first Conference on Uncertainty in Artificial Intelligence, pp. 42–49. Edinburgh (2005)
Google Scholar
Breese, J.S., Heckerman, D., Kadie, C.M.: Empirical analysis of predictive algorithms for collaborative filtering. In: UAI, pp. 43–52 (1998)
Google Scholar
Celma, O., Herrera, P.: A new approach to evaluating novel recommendations. In: RecSys ’08: Proceedings of the 2008 ACM conference on Recommender systems, pp. 179–186. ACM, New York, NY, USA (2008). DOI http://doi.acm.org/10.1145/1454008.1454038
Chirita, P.A., Nejdl, W., Zamfir, C.: Preventing shilling attacks in online recommender systems. In: WIDM ’05: Proceedings of the 7th annual ACM international workshop on Web information and data management, pp. 67–74. ACM, New York, NY, USA (2005). DOI http://doi.acm.org/10.1145/1097047.1097061
Cramer, H., Evers, V., Ramlal, S., Someren, M., Rutledge, L., Stash, N., Aroyo, L.,Wielinga, B.: The effects of transparency on trust in and acceptance of a content-based art recommender. User Modeling and User-Adapted Interaction 18(5), 455–496 (2008). DOI http://dx.doi.org/10.1007/s11257-008-9051-3
Das, A.S., Datar, M., Garg, A., Rajaram, S.: Google news personalization: scalable online collaborative filtering. In: WWW ’07: Proceedings of the 16th international conference on World Wide Web, pp. 271–280. ACM, New York, NY, USA (2007). DOI http://doi.acm.org/10.1145/1242572.1242610
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Google Scholar
Deshpande, M., Karypis, G.: Item-based top-N recommendation algorithms. ACM Transactions on Information Systems 22(1), 143–177 (2004)
Article Google Scholar
Fischer, G.: User modeling in human-computer interaction. User Model. User-Adapt. Interact. 11(1-2), 65–86 (2001)
Article MATH Google Scholar
Fleder, D.M., Hosanagar, K.: Recommender systems and their impact on sales diversity. In: EC ’07: Proceedings of the 8th ACM conference on Electronic commerce, pp. 192–199. ACM, New York, NY, USA (2007). DOI http://doi.acm.org/10.1145/1250910.1250939
Frankowski, D., Cosley, D., Sen, S., Terveen, L., Riedl, J.: You are what you say: privacy risks of public mentions. In: SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 565–572. ACM, New York, NY, USA (2006). DOI http://doi.acm.org/10.1145/1148170.1148267
Fredricks, G.A., Nelsen, R.B.: On the relationship between spearman’s rho and kendall’s tau for pairs of continuous random variables. Journal of Statistical Planning and Inference 137(7), 2143–2150 (2007)
Article MathSciNet MATH Google Scholar
George, T.: A scalable collaborative filtering framework based on co-clustering. In: Fifth IEEE International Conference on Data Mining, pp. 625–628 (2005)
Google Scholar
Greenwald, A.G.: Within-subjects designs: To use or not to use? Psychological Bulletin 83, 216–229 (1976)
Google Scholar
Haddawy, P., Ha, V., Restificar, A., Geisler, B., Miyamoto, J.: Preference elicitation via theory refinement. Journal of Machine Learning Research 4, 2003 (2002)
Google Scholar
Herlocker, J.L., Konstan, J.A., Riedl, J.T.: Explaining collaborative filtering recommendations. In: CSCW ’00: Proceedings of the 2000 ACM conference on Computer supported cooperative work, pp. 241–250. ACM, New York, NY, USA (2000). DOI http://doi.acm.org/10.1145/358916.358995
Herlocker, J.L., Konstan, J.A., Riedl, J.T.: An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Inf. Retr. 5(4), 287–310 (2002). DOI http://dx.doi.org/10.1023/A:1020443909834
Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 22(1), 5–53 (2004). DOI http://doi.acm.org/10.1145/963770.963772
Google Scholar
Hijikata, Y., Shimizu, T., Nishida, S.: Discovery-oriented collaborative filtering for improving user satisfaction. In: IUI ’09: Proceedings of the 13th international conference on Intelligent user interfaces, pp. 67–76. ACM, New York, NY, USA (2009). DOI http://doi.acm.org/10.1145/1502650.1502663
Hu, R., Pu, P.: A comparative user study on rating vs. personality quiz based preference elicitation methods. In: IUI ^´09: Proceedings of the 13th international conference on Intelligent user interfaces, pp. 367–372. ACM, New York, NY, USA (2009). DOI http://doi.acm.org/10.1145/1502650.1502702
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002). DOI http://doi.acm.org/10.1145/582415.582418
Google Scholar
Jones, N., Pu, P.: User technology adoption issues in recommender systems. In: Networking and Electronic Conference (2007)
Google Scholar
Karypis, G.: Evaluation of item-based top-n recommendation algorithms. In: CIKM ’01: Proceedings of the tenth international conference on Information and knowledge management, pp. 247–254. ACM, New York, NY, USA (2001). DOI http://doi.acm.org/10.1145/502585.502627
Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1–2), 81–93 (1938)
MathSciNet MATH Google Scholar
Kendall, M.G.: The treatment of ties in ranking problems. Biometrika 33(3), 239–251 (1945)
Article MathSciNet Google Scholar
Kohavi, R., Longbotham, R., Sommerfield, D., Henne, R.M.: Controlled experiments on the web: survey and practical guide. Data Min. Knowl. Discov. 18(1), 140–181 (2009). DOI http://dx.doi.org/10.1007/s10618-008-0114-1
Konstan, J.A., McNee, S.M., Ziegler, C.N., Torres, R., Kapoor, N., Riedl, J.: Lessons on applying automated recommender systems to information-seeking tasks. In: AAAI (2006)
Google Scholar
Koychev, I., Schwab, I.: Adaptation to drifting user’s interests. In: In Proceedings of ECML2000 Workshop: Machine Learning in New Information Age, pp. 39–46 (2000)
Google Scholar
Lam, S.K., Frankowski, D., Riedl, J.: Do you trust your recommendations? an exploration of security and privacy issues in recommender systems. In: In Proceedings of the 2006 Interna296 Guy Shani and Asela Gunawardana tional Conference on Emerging Trends in Information and Communication Security (ETRICS (2006)
Google Scholar
Lam, S.K., Riedl, J.: Shilling recommender systems for fun and profit. In: WWW ’04: Proceedings of the 13th international conference on World Wide Web, pp. 393–402. ACM, New York, NY, USA (2004). DOI http://doi.acm.org/10.1145/988672.988726
Mahmood, T., Ricci, F.: Learning and adaptivity in interactive recommender systems. In: ICEC ’07: Proceedings of the ninth international conference on Electronic commerce, pp. 75–84. ACM, New York, NY, USA (2007). DOI http://doi.acm.org/10.1145/1282100.1282114
Marlin, B.M., Zemel, R.S., Roweis, S., Slaney, M.: Collaborative filtering and the missing at random assumption. In: Proceedings of the 23rd COnference on Uncertainity in Artificial Intelligence (2007)
Google Scholar
Massa, P., Bhattacharjee, B.: Using trust in recommender systems: An experimental analysis. In: In Proceedings of iTrust2004 International Conference, pp. 221–235 (2004)
Google Scholar
McLaughlin, M.R., Herlocker, J.L.: A collaborative filtering algorithm and evaluation metric that accurately model the user experience. In: SIGIR ’04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 329–336. ACM, New York, NY, USA (2004). DOI http://doi.acm.org/10.1145/1008992.1009050
McNee, S.M., Riedl, J., Konstan, J.A.: Making recommendations better: an analytic model for human-recommender interaction. In: CHI ’06: CHI ’06 extended abstracts on Human factors in computing systems, pp. 1103–1108. ACM, New York, NY, USA (2006). DOI http://doi.acm.org/10.1145/1125451.1125660
McSherry, F., Mironov, I.: Differentially private recommender systems: building privacy into the netflix prize contenders. In: KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 627–636. ACM, New York, NY, USA (2009). DOI http://doi.acm.org/10.1145/1557019.1557090
Mobasher, B., Burke, R., Bhaumik, R., Williams, C.: Toward trustworthy recommender systems: An analysis of attack models and algorithm robustness. ACM Trans. Internet Technol. 7(4), 23 (2007). DOI http://doi.acm.org/10.1145/1278366.1278372
Murakami, T., Mori, K., Orihara, R.: Metrics for evaluating the serendipity of recommendation lists. New Frontiers in Artificial Intelligence 4914, 40–46 (2008)
Article Google Scholar
O’Mahony, M., Hurley, N., Kushmerick, N., Silvestre, G.: Collaborative recommendation: A robustness analysis. ACM Trans. Internet Technol. 4(4), 344–377 (2004). DOI http://doi.acm.org/10.1145/1031114.1031116
Google Scholar
Pfleeger, S.L., Kitchenham, B.A.: Principles of survey research. SIGSOFT Softw. Eng. Notes 26(6), 16–18 (2001). DOI http://doi.acm.org/10.1145/505532.505535
Pu, P., Chen, L.: Trust building with explanation interfaces. In: IUI ’06: Proceedings of the 11th international conference on Intelligent user interfaces, pp. 93–100. ACM, New York, NY, USA (2006). DOI http://doi.acm.org/10.1145/1111449.1111475
Queiroz, S.: Adaptive preference elicitation for top-k recommendation tasks using gainetworks. In: AIAP’07: Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference, pp. 579–584. ACTA Press, Anaheim, CA, USA (2007)
Google Scholar
Salzberg, S.L.: On comparing classifiers: Pitfalls toavoid and a recommended approach. Data Min. Knowl. Discov. 1(3), 317–328 (1997). DOI http://dx.doi.org/10.1023/A:1009752403260
Sarwar, B., Karypis, G., Konstan, J., Reidl, J.: Item-based collaborative filtering recommendation algorithms. In:WWW’01: Proceedings of the 10th international conference onWorld Wide Web, pp. 285–295. ACM, New York, NY, USA (2001). DOI http://doi.acm.org/10.1145/371920.372071
Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Analysis of recommendation algorithms for e-commerce. In: EC ’00: Proceedings of the 2nd ACM conference on Electronic commerce, pp. 158–167. ACM, New York, NY, USA (2000). DOI http://doi.acm.org/10.1145/352871.352887
Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 253–260. ACM, New York, NY, USA (2002). DOI http://doi.acm.org/10.1145/564376.564421
Shani, G., Chickering, D.M., Meek, C.: Mining recommendations from the web. In: RecSys ’08: Proceedings of the 2008 ACM Conference on Recommender Systems, pp. 35–42 (2008)
Google Scholar
Shani, G., Heckerman, D., Brafman, R.I.: An mdp-based recommender system. Journal of Machine Learning Research 6, 1265–1295 (2005)
MathSciNet Google Scholar
Smyth, B., McClave, P.: Similarity vs. diversity. In: ICCBR, pp. 347–361 (2001)
Google Scholar
Spillman, W., Lang, E.: The Law of Diminishing Returns. World Book Company (1924)
Google Scholar
Swearingen, K., Sinha, R.: Beyond algorithms: An hci perspective on recommender systems. In: ACM SIGIR 2001 Workshop on Recommender Systems (2001)
Google Scholar
Van Rijsbergen, C.J.: Information Retrieval. Butterworth-Heinemann, Newton, MA, USA (1979). URL http://portal.acm.org/citation.cfm?id=539927
Voorhees, E.M.: Overview of trec 2002. In: In Proceedings of the 11th Text Retrieval Conference (TREC 2002), NIST Special Publication 500-251, pp. 1–15 (2002)
Google Scholar
Voorhees, E.M.: The philosophy of information retrieval evaluation. In: CLEF ’01: Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems, pp. 355–370. Springer-Verlag, London, UK (2002)
Google Scholar
Yao, Y.Y.: Measuring retrieval effectiveness based on user preference of documents. J. Amer. Soc. Inf. Sys 46(2), 133–145 (1995)
Article Google Scholar
Zhang, M., Hurley, N.: Avoiding monotony: improving the diversity of recommendation lists. In: RecSys ’08: Proceedings of the 2008 ACM conference on Recommender systems, pp. 123–130. ACM, New York, NY, USA (2008). DOI http://doi.acm.org/10.1145/1454008.1454030
Zhang, Y., Callan, J., Minka, T.: Novelty and redundancy detection in adaptive filtering. In: SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 81–88. ACM, New York, NY, USA (2002).DOI http://doi.acm.org/10.1145/564376.564393
Ziegler, C.N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: WWW ´05: Proceedings of the 14th international conference on World Wide Web, pp. 22–32. ACM, New York, NY, USA (2005). DOI http://doi.acm.org/10.1145/1060745.1060754

Download references

Author information

Authors and Affiliations

Microsoft Research, One Microsoft Way, Redmond, WA, USA
Guy Shani & Asela Gunawardana

Authors

Guy Shani
View author publications
You can also search for this author in PubMed Google Scholar
Asela Gunawardana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guy Shani .

Editor information

Editors and Affiliations

, Faculty of Computer Science, Free University of Bozen-Bolzano, Piazza Domenicani 3, Bolzano, 39100, Italy
Francesco Ricci
, Dept. Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
Lior Rokach
Dept. Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
Bracha Shapira
School of Communication,, Information & Library Studies, Rutgers University, Huntington Street 4, New Brunswick, 08901-1071, New Jersey, USA
Paul B. Kantor

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shani, G., Gunawardana, A. (2011). Evaluating Recommendation Systems. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P. (eds) Recommender Systems Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-85820-3_8

Download citation

DOI: https://doi.org/10.1007/978-0-387-85820-3_8
Published: 05 October 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-85819-7
Online ISBN: 978-0-387-85820-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics