Advertisement

World Wide Web

, Volume 18, Issue 3, pp 709–729 | Cite as

Correlation range query for effective recommendations

  • Wenjun Zhou
  • Hao Zhang
Article

Abstract

Efficient correlation computation has been an active research area of data mining. Given a large dataset and a specified query item, we are interested in finding items in the dataset that are within certain range of correlation with the query item. Such a problem, known as the correlation range query, has been a common task in many application domains. In this paper, we identify piecewise monotone properties of the upper and lower bounds of the ϕ coefficient, and propose an efficient correlation range query algorithm, called CORAQ. The CORAQ algorithm effectively prunes many items without computing their actual correlation coefficients with the query item. CORAQ also attains completeness and correctness of the query results. Experiments with large benchmark datasets show that this algorithm is much faster than its brute-force alternative and scales well with large datasets. As case studies, real-world datasets from recommendation applications are analyzed to demonstrate that CORAQ can help effectively identify interesting items to recommend to users.

Keywords

Association mining Correlation computing ϕ coefficient Range query 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol. 22, pp. 207–216. ACM (1993)Google Scholar
  2. 2.
    Bell, R.M., Koren, Y.: Improved neighborhood-based collaborative filtering. In: Proceedings of KDD-Cup Workshop at the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)Google Scholar
  3. 3.
    Bennett, J., Lanning, S.: The netflix prize. In: Proceedings of KDD-Cup Workshop at the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)Google Scholar
  4. 4.
    Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: ACM SIGMOD Record, vol. 26, pp. 265–276. ACM (1997)Google Scholar
  5. 5.
    Feng, J., Bhargava, H.K., Pennock, D.M.: Implementing sponsored search in web search engines: computational evaluation of alternative mechanisms. INFORMS J. Comput. 19(1), 137–148 (2007)CrossRefGoogle Scholar
  6. 6.
    Hillard, D., Manavoglu, E., Raghavan, H., Leggetter, C., Cantú-Paz, E., Iyer, R.: The sum of its parts: reducing sparsity in click estimation with query segments. Inf. Retr. 14(3), 315–336 (2011)CrossRefGoogle Scholar
  7. 7.
    Ilyas, I.F., Markl, V., Haas, P.J., Brown, P., Aboulnaga, A.: CORDS: automatic discovery of correlations and soft functional dependencies. In: ACM SIGMOD International Conference on Management of Data, pp. 647–658 (2004)Google Scholar
  8. 8.
    Jermaine, C.: The computational complexity of high-dimensional correlation search. In: Proceedings IEEE International Conference on Data Mining, 2001. ICDM 2001, pp. 249–256. IEEE (2001)Google Scholar
  9. 9.
    Jermaine, C.: Playing hide-and-seek with correlations. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 559–564. ACM (2003)Google Scholar
  10. 10.
    Kamakura, W.A., Wedel, M., de Rosa, F., Mazzon, J.A.: Cross-selling through database marketing: a mixed data factor analyzer for data augmentation and prediction. Int. J. Res. Mark. 20, 45–65 (2003)CrossRefGoogle Scholar
  11. 11.
    Li, W.: Random texts exhibit zipf’s-law-like word frequency distribution. IEEE Trans. Inf. Theory 38(6), 1842–1845 (1992)CrossRefGoogle Scholar
  12. 12.
    Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., Yeates, T.O.: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci 96(8), 4285–4288 (1999)CrossRefGoogle Scholar
  13. 13.
    Seller, M., Gray, P.: A survey of database marketing. Tech. rep., I.T. in Business, Center for Research on Information Technology and Organizations, UC Irvine (1999)Google Scholar
  14. 14.
    Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley (2006)Google Scholar
  15. 15.
    Xiong, H., Shekhar, S., Ning Tan, P., Kumar, V.: Exploiting a support-based upper bound of pearson’s correlation coefficient for efficiently identifying strongly correlated pairs. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 334–343 (2004)Google Scholar
  16. 16.
    Xiong, H., He, X., Ding, C., Zhang, Y., Kumar, V., Holbrook, S.R.: Identification of functional modules in protein complexes via hyperclique pattern discovery. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 221–232 (2005)Google Scholar
  17. 17.
    Xiong, H., Brodie, M., Ma, S.: TOP-COP: mining top-k strongly correlated pairs in large databases. In: Proceedings of the 2006 IEEE International Conference on Data Mining, pp. 1162–1166 (2006)Google Scholar
  18. 18.
    Xiong, H., Shekhar, S., Tan, P.N., Kumar, V.: TAPER: a two-step approach for all-strong-pairs correlation query in large databases. IEEE Trans. Knowl. Data Eng. 18(4), 493–508 (2006)CrossRefGoogle Scholar
  19. 19.
    Xiong, H., Zhou, W., Brodie, M., Ma, S.: Top-k ϕ correlation computation. INFORMS J. Comput. 20(4), 539–552 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Zhang, L., Tang, L., Luo, P., Chen, E., Jiao, L., Wang, M., Liu, G.: Harnessing the wisdom of the crowds for accurate web page clipping. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2012), pp. 570–578 (2012)Google Scholar
  21. 21.
    Zhou, W., Xiong, H.: Volatile correlation computation: a checkpoint view. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 848–856 (2008)Google Scholar
  22. 22.
    Zhou, W., Xiong, H.: Checkpoint evolution for volatile correlation computing. Mach. Learn. 83(1), 103–131 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  23. 23.
    Zhou, W., Zhang, H.: Correlation range query. In: Proceedings of the 14th International Conference on Web-Age Information Management (WAIM 2013), pp. 490–501 (2013)Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Statistics, Operations, and Management Science DepartmentUniversity of TennesseeKnoxvilleUSA
  2. 2.Department of Electrical Engineering and Computer ScienceUniversity of TennesseeKnoxvilleUSA

Personalised recommendations