Skip to main content

Evaluating Recommender Systems

Abstract

The evaluation of collaborative filtering shares a number of similarities with that of classification. This similarity is due to the fact that collaborative filtering can be viewed as a generalization of the classification and regression modeling problem (cf. section 1.3.1.3 of Chapter 1).

Keywords

  • Root Mean Square Error
  • Receiver Operating Characteristic Curve
  • Recommender System
  • Rating Matrix
  • Recommendation Algorithm

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-29659-3_7
  • Chapter length: 30 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-29659-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)
Hardcover Book
USD   69.99
Price excludes VAT (USA)
Figure 7.1
Figure 7.2
Figure 7.3
Figure 7.4

Notes

  1. 1.

    The actual design in methods such as cross-validation is slightly more complex because the data are segmented in multiple ways, even though they are always divided into two parts during a particular execution phase of training.

  2. 2.

    A related effect is that observed ratings are likely to be specified by users who are frequent raters. Frequent raters may show different patterns of rating values compared to infrequent raters.

Bibliography

  1. P. Adamopoulos, A. Bellogin, P. Castells, P. Cremonesi, and H. Steck. REDD 2014 – International Workshop on Recommender Systems Evaluation: Dimensions and Design. Held in conjunction with ACM Conference on Recommender systems, 2014.

    Google Scholar 

  2. C. Aggarwal. Data classification: algorithms and applications. CRC Press, 2014.

    Google Scholar 

  3. C. Aggarwal. Data mining: the textbook. Springer, New York, 2015.

    Google Scholar 

  4. C. Anderson. The long tail: why the future of business is selling less of more. Hyperion, 2006.

    Google Scholar 

  5. S. Balakrishnan and S. Chopra. Collaborative ranking. Web Search and Data Mining Conference, pp. 143–152, 2012.

    Google Scholar 

  6. G. Box, W. Hunter, and J. Hunter. Statistics for experimenters, Wiley, New York, 1978.

    MATH  Google Scholar 

  7. J. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. Conference on Uncertainty in Artificial Inetlligence, 1998.

    Google Scholar 

  8. P. Campos, F. Diez, and I. Cantador. Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols. User Modeling and User-Adapted Interaction, 24(1–2), pp. 67–119, 2014.

    CrossRef  Google Scholar 

  9. O. Celma and P. Herrera. A new approach to evaluating novel recommendations. ACM Conference on Recommender Systems, pp. 179–186, 2008.

    Google Scholar 

  10. T. Chai and R. Draxler. Root mean square error (RMSE) or mean absolute error (MAE)?– Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), pp. 1247–1250, 2004.,

    CrossRef  Google Scholar 

  11. P. Chirita, W. Nejdl, and C. Zamfir. Preventing shilling attacks in online recommender systems. ACM International Workshop on Web Information and Data Management, pp. 67–74, 2005.

    Google Scholar 

  12. H. Cramer, V. Evers, S. Ramlal, M. Someren, L. Rutledge, N. Stash, L. Aroyo, and B. Wielinga. The effects of transparency on trust in and acceptance of a content-based art recommender. User Modeling and User-Adapted Interaction, 18(5), pp. 455–496, 2008.

    Google Scholar 

  13. P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-n recommendation tasks. RecSys, pp. 39–46, 2010.

    Google Scholar 

  14. A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. World Wide Web Conference, pp. 271–280, 2007.

    Google Scholar 

  15. M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Transactions on Information Systems (TOIS), 22(1), pp. 143–177, 2004.

    CrossRef  Google Scholar 

  16. R. Devooght, N. Kourtellis, and A. Mantrach. Dynamic matrix factorization with priors on unknown values. ACM KDD Conference, 2015.

    Google Scholar 

  17. T. Fawcett. ROC Graphs: Notes and Practical Considerations for Researchers. Technical Report HPL-2003-4, Palo Alto, CA, HP Laboratories, 2003.

    Google Scholar 

  18. D. M. Fleder and K. Hosanagar. Recommender systems and their impact on sales diversity. ACM Conference on Electronic Commerce, pp. 192–199, 2007.

    Google Scholar 

  19. M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. ACM Conference on Recommender Systems, pp. 257–260, 2010.

    Google Scholar 

  20. J. Herlocker, J. Konstan, L. Terveen, and J. Riedl. Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems (TOIS), 22(1), pp. 5–53, 2004.

    CrossRef  Google Scholar 

  21. J. Herlocker, J. Konstan, and J. Riedl. Explaining collaborative filtering recommendations. ACM Conference on Computer Supported Cooperative work, pp. 241–250, 2000.

    Google Scholar 

  22. D. Jannach, M. Zanker, A. Felfernig, and G. Friedrich. An introduction to recommender systems, Cambridge University Press, 2011.

    Google Scholar 

  23. N. Jones and P. Pu. User technology adoption issues in recommender systems. Networking and Electronic Conference, pp. 379–394, 2007.

    Google Scholar 

  24. M. Kendall. A new measure of rank correlation. Biometrika, pp. 81–93, 1938.

    Google Scholar 

  25. M. Kendall and J. Gibbons. Rank correlation methods. Charles Griffin, 5th edition, 1990.

    Google Scholar 

  26. R. Kohavi, R. Longbotham, D. Sommerfield, R. Henne. Controlled experiments on the Web: survey and practical guide. Data Mining and Knowledge Discovery, 18(1), pp. 140–181, 2009.

    MathSciNet  CrossRef  Google Scholar 

  27. J. Konstan, S. McNee, C. Ziegler, R. Torres, N. Kapoor, and J. Riedl. Lessons on applying automated recommender systems to information-seeking tasks. AAAI Conference, pp. 1630–1633, 2006.

    Google Scholar 

  28. Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. ACM KDD Conference, pp. 426–434, 2008. Extended version of this paper appears as: “Y. Koren. Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Transactions on Knowledge Discovery from Data (TKDD), 4(1), 1, 2010.”

    Google Scholar 

  29. Y. Koren. The Bellkor solution to the Netflix grand prize. Netflix prize documentation, 81, 2009. http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf

  30. V. Krishnan, P. Narayanashetty, M. Nathan, R. Davies, and J. Konstan. Who predicts better? Results from an online study comparing humans and an online recommender system. ACM Conference on Recommender Systems, pp. 211–218, 2008.

    Google Scholar 

  31. S. Lam and J. Riedl. Shilling recommender systems for fun and profit. World Wide Web Conference, pp. 393–402, 2004.

    Google Scholar 

  32. N. Lathia, S. Hailes, L. Capra, and X. Amatriain. Temporal diversity in recommender systems. ACM SIGIR Conference, pp. 210–217, 2010.

    Google Scholar 

  33. B.-H. Lee, H. Kim, J. Jung, and G.-S. Jo. Location-based service with context data for a restaurant recommendation. Database and Expert Systems Applications, pp. 430–438, 2006.

    Google Scholar 

  34. L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. International Conference on Web Search and Data Mining, pp. 297–306, 2011.

    Google Scholar 

  35. C. Ling and C. Li. Data Mining for direct marketing: problems and solutions. ACM KDD Conference, pp. 73–79, 1998.

    Google Scholar 

  36. Z. Ma, G. Pant, and O. Sheng. Interest-based personalized search. ACM Transactions on Information Systems, 25(1), 2007.

    Google Scholar 

  37. T. Mahmood and F. Ricci. Learning and adaptivity in interactive recommender systems. International Conference on Electronic Commerce, pp. 75–84, 2007.

    Google Scholar 

  38. T. Mahmood and F. Ricci. Improving recommender systems with adaptive conversational strategies. ACM Conference on Hypertext and Hypermedia, pp. 73–82, 2009.

    Google Scholar 

  39. M. O’Mahony, N. Hurley, N. Kushmerick, and G. Silvestre. Collaborative recommendation: A robustness analysis. ACM Transactions on Internet Technology, 4(4), pp. 344–377, 2004.

    CrossRef  Google Scholar 

  40. B. Marlin and R. Zemel. Collaborative prediction and ranking with non-random missing data. ACM Conference on Recommender Systems, pp. 5–12, 2009.

    Google Scholar 

  41. S. McNee, J. Riedl, and J. Konstan. Being accurate is not enough: how accuracy metrics have hurt recommender systems. SIGCHI Conference, pp. 1097–1101, 2006.

    Google Scholar 

  42. S. Middleton, N. Shadbolt, and D. de Roure. Ontological user profiling in recommender systems. ACM Transactions on Information Systems, 22(1), pp. 54–88, 2004.

    CrossRef  Google Scholar 

  43. B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward trustworthy recommender systems: an analysis of attack models and algorithm robustness. ACM Transactions on Internet Technology (TOIT), 7(4), 23, 2007.

    Google Scholar 

  44. T. Murakami, K. Mori, and R. Orihara. Metrics for evaluating the serendipity of recommendation lists. New Frontiers in Artificial Intelligence, pp. 40–46, 2008.

    Google Scholar 

  45. F. Del Olmo and E. Gaudioso. Evaluation of recommender systems: A new approach. Expert Systems with Applications, 35(3), pp. 790–804, 2008.

    CrossRef  Google Scholar 

  46. P. Pu and L. Chen. Trust building with explanation interfaces. International conference on Intelligent User Interfaces, pp. 93–100, 2006.

    Google Scholar 

  47. F. Ricci, L. Rokach, B. Shapira, and P. Kantor. Recommender systems handbook. Springer, New York, 2011.

    Google Scholar 

  48. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Incremental singular value decomposition algorithms for highly scalable recommender systems. International Conference on Computer and Information Science, pp. 27–28, 2002.

    Google Scholar 

  49. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Recommender systems for large-scale e-commerce: Scalable neighborhood formation using clustering. International Conference on Computer and Information Technology, 2002.

    Google Scholar 

  50. A. Schein, A. Popescul, L. Ungar, and D. Pennock. Methods and metrics for cold-start recommendations. ACM SIGIR Conference, 2002.

    Google Scholar 

  51. G. Shani and A. Gunawardana. Evaluating recommendation systems. Recommender Systems Handbook, pp. 257–297, 2011.

    Google Scholar 

  52. G. Shani, M. Chickering, and C. Meek. Mining recommendations from the Web. ACM Conference on Recommender Systems, pp. 35–42, 2008.

    Google Scholar 

  53. J. Sill, G. Takacs, L. Mackey, and D. Lin. Feature-weighted linear stacking. arXiv preprint, arXiv:0911.0460, 2009. http://arxiv.org/pdf/0911.0460.pdf

  54. B. Smyth and P. McClave. Similarity vs. diversity. Case-Based Reasoning Research and Development, pp. 347–361, 2001.

    Google Scholar 

  55. H. Steck. Item popularity and recommendation accuracy. ACM Conference on Recommender Systems, pp. 125–132, 2011.

    Google Scholar 

  56. H. Steck. Training and testing of recommender systems on data missing not at random. ACM KDD Conference, pp. 713–722, 2010.

    Google Scholar 

  57. H. Steck. Evaluation of recommendations: rating-prediction and ranking. ACM Conference on Recommender Systems, pp. 213–220, 2013.

    Google Scholar 

  58. R. Sutton and A. Barto. Reinforcement learning: An introduction, MIT Press, Cambridge, 1998.

    Google Scholar 

  59. N. Taghipour, A. Kardan, and S. Ghidary. Usage-based web recommendations: a reinforcement learning approach. ACM Conference on Recommender Systems, pp. 113–120, 2007.

    Google Scholar 

  60. G. Takacs, I. Pilaszy, B. Nemeth, and D. Tikk. Scalable collaborative filtering approaches for large recommender systems. Journal of Machine Learning Research, 10, pp. 623–656, 2009.

    Google Scholar 

  61. C. Willmott and K. Matsuura. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79, 2005.

    Google Scholar 

  62. Y. Zhang, J. Callan, and T. Minka. Novelty and redundancy detection in adaptive filtering. ACM SIGIR Conference, pp. 81–88, 2002.

    Google Scholar 

  63. C. Ziegler, S. McNee, J. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. World Wide Web Conference, pp. 22–32, 2005.

    Google Scholar 

  64. http://www.netflixprize.com/community/viewtopic.php?id=828

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Aggarwal, C.C. (2016). Evaluating Recommender Systems. In: Recommender Systems. Springer, Cham. https://doi.org/10.1007/978-3-319-29659-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29659-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29657-9

  • Online ISBN: 978-3-319-29659-3

  • eBook Packages: Computer ScienceComputer Science (R0)