Abstract
Pearson product-moment correlation coefficients are a well-practiced quantification of linear dependence seen across many fields. When calculating a sample-based correlation coefficient, the accuracy of the estimation is dependent on the quality and quantity of the sample. Like all statistical models, these correlation coefficients can suffer from overfitting, which results in the representation of random error instead of an underlying trend.
In this paper, we discuss how Pearson product-moment correlation coefficients can utilize information outside of the two items for which the correlation is being computed. By introducing a transitive relationship with one or more additional items that meet specified criterion, our Transitive Pearson product-moment correlation coefficient can significantly reduce the error, up to over 50%, of sparse, sample-based estimations. Finally, we demonstrate that if the data is too dense or too sparse, transitivity is detrimental in reducing the correlation estimation errors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adomavicius, G., Tuzhilin, A.: Towards the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17, 634–749 (2005)
Ali, K., van Stam, W.: Tivo: Making show recommendations using a distributed collaborative filtering architecture. In: Proceedings of the 10th ACM International Conference on Knowledge Discovery and Data Mining, pp. 394–401 (2004)
Bell, R., Koren, Y.: Improved neighborhood-based collaborative filtering. In: International Conference on Knowledge Discovery and Data Mining (2007)
Bell, R., Koren, Y.: Lessons from the netflix prize challenge. SIGKDD Explorations 9, 75–79 (2007)
Bell, R., Koren, Y.: Scalable collaborative filtering with jointly derived neighborhood interpolation weights. In: IEEE International Conference on Data Mining (ICDM 2007), pp. 43–52 (2007)
Bell, R., Koren, Y., Volinsky, C.: The bellkor solution to the netflix prize. Tech. rep., AT&T Labs (2007)
Buskirk, E.V.: Winning teams join to qualify for $1 million netflix prize. Wired Magazine (2009)
Goldberg, D., Nichols, D., Oki, B.M., Terry, D.: Using collaborative filtering to weave an information tapestry. Communications of the ACM 35(12), 61–71 (1992)
Hong, T., Tsamis, D.: Use of knn for the netflix prize. Tech. rep., Stanford University (2006)
Koren, Y.: Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: International Conference on Knowledge Discovery and Data Mining (2008)
Linden, G., Smith, B., York, J.: Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing 7(1), 76–80 (2003)
Lohr, S.: Netflix Competitors Learn the Power of Teamwork. NY Times (2009)
Netflix: The Netflix Prize, http://www.netflixprize.com
Newitz, A.: Movie Tips From Your Robot Overlords. Washington Post (2009)
Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proc. 10th International Conference on the World Wide Web, pp. 285–295 (2001)
Thompson, C.: Netflix challenge to hackers: Improve our service and win big. NY Times (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Phillips, T., GauthierDickey, C., Thurimella, R. (2010). Using Transitivity to Increase the Accuracy of Sample-Based Pearson Correlation Coefficients. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2010. Lecture Notes in Computer Science, vol 6263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15105-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-15105-7_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15104-0
Online ISBN: 978-3-642-15105-7
eBook Packages: Computer ScienceComputer Science (R0)