Correlation Range Query
Efficient correlation computation has been an active research area of data mining. Given a large dataset and a specified query item, we are interested in finding items in the dataset that are within certain range of correlation with the query item. Such a problem, known as the correlation range query (CRQ), has been a common task in many application domains. In this paper, we identify piecewise monotone properties of the upper and lower bounds of the φ coefficient, and propose an efficient correlation range query algorithm, called CORAQ. The CORAQ algorithm effectively prunes many items without computing their actual correlation coefficients with the query item. CORAQ also attains completeness and correctness of the query results. Experiments with large benchmark datasets show that this algorithm is much faster than its brute-force alternative and scales well with large datasets.
KeywordsAssociation Mining Correlation Computing φ Coefficient
Unable to display preview. Download preview PDF.
- 1.Seller, M., Gray, P.: A survey of database marketing. Technical report, I.T. in Business, Center for Research on Information Technology and Organizations, UC Irvine (1999)Google Scholar
- 4.Xiong, H., He, X., Ding, C., Zhang, Y., Kumar, V., Holbrook, S.R.: Identification of functional modules in protein complexes via hyperclique pattern discovery. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 221–232 (2005)Google Scholar
- 5.Xiong, H., Shekhar, S., Ning Tan, P., Kumar, V.: Exploiting a support-based upper bound of pearson’s correlation coefficient for efficiently identifying strongly correlated pairs. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 334–343 (2004)Google Scholar
- 6.Ilyas, I.F., Markl, V., Haas, P.J., Brown, P., Aboulnaga, A.: CORDS: Automatic discovery of correlations and soft functional dependencies. In: ACM SIGMOD International Conference on Management of Data, pp. 647–658 (2004)Google Scholar
- 9.Zhou, W., Xiong, H.: Volatile correlation computation: A checkpoint view. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 848–856 (2008)Google Scholar
- 11.Xiong, H., Brodie, M., Ma, S.: TOP-COP: Mining top-k strongly correlated pairs in large databases. In: Proceedings of the 2006 IEEE International Conference on Data Mining, pp. 1162–1166 (2006)Google Scholar
- 12.Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley (2006)Google Scholar