Abstract
This article utilizes stochastic ideas for reasoning about association rule mining, and provides a formal statistical view of this discipline. A simple stochastic model is proposed, based on which support and confidence are reasonable estimates for certain probabilities of the model. Statistical properties of the corresponding estimators, like moments and confidence intervals, are derived, and items and itemsets are observed for correlations.
After a brief review of measures of interest of association rules, with the main focus on interestingness measures motivated by statistical principles, two new measures are described. These measures, called α- and σ-precision, respectively, rely on statistical properties of the estimators discussed before. Experimental results demonstrate the effectivity of both measures.
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings 20th International Conference on Very Large Databases, Chile, Santiago, 1994, pp. 487–499
Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th International Conference on Data Engineering (ICDE ’95), Taipeh, Taiwan, 1995, pp. 3–14
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large data bases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Washington, 1993, pp. 207–216
Agresti, A.: Categorical Data Analysis. Wiley, New York (1990)
Bolton, R.J., Hand, D.J., Adams, N.M.: Determining hit rate in pattern search. In: Hand, et al. (eds.) Pattern Detection and Discovery. Lecture Notes in Artificial Intelligence, vol. 2447, pp. 36–48. Springer, Berlin (2002)
Bolton, R.J., Adams, N.M.: An iterative hypothesis-testing strategy for pattern discovery. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003, pp. 49–58
Borgelt, C.: Efficient implementations of Apriori and Eclat. In: Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), Melbourne, FL, USA, 2003
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1997, pp. 255–264
Chen, M.S., Han, J., Yu, P.S.: Data mining: An overview from a data base perspective. IEEE Trans. Knowl. Data Eng. 8, 866–883 (1996)
Crowder, M., Sweeting, T.: Bayesian inference for a bivariate binomial distribution. Biometrika 76(3), 599–603 (1989)
DuMouchel, W., Prebigon, D.: Empirical Bayes screening for multi-item associations. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 67–76
Fienberg, S.E., Slavkovic, A.B.: Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Min. Knowl. Discov. 11, 155–180 (2005)
Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38(3), 1–32 (2006)
Goodman, L.A.: On simultaneous confidence intervals for multinomial proportions. Technometrics 7(2), 247–254 (1965)
Hahsler, M., Grün, B., Hornik, K.: Introduction to arules—Mining Association rules and frequent item sets. Available at: http://cran2.arsmachinandi.it/doc/vignettes/arules/arules.pdf (2006)
Hilderman, R.J., Hamilton, H.J.: Knowledge discovery and interestingness measures: a survey. Technical report TR 99-04, ISBN 0-7731-0391-0, University of Regina, Canada (1999)
Hilderman, R.J., Hamilton, H.J.: Evaluation of interestingness measures for ranking discovered knowledge. In: Cheung, et al. (eds.) PAKDD 2001. Lecture Notes in Artificial Intelligence, vol. 2035, pp. 247–259. Springer, Berlin (2001)
Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Data mining of association rules and the process of knowledge discovery in data bases. In: Data Mining in E-Commerce, Medicine, and Knowledge Management, 2002, pp. 15–36
Kamber, M., Shinghal, R.: Evaluating the interestingness of characteristic rules. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996, pp. 263–266
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Adam, et al. (eds.) 3rd International Conference on Information and Knowledge Management, pp. 401–407. ACM, New York (1997)
Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: Fayyad, Uthurusamy (eds.) AAAI Workshop on Knowledge Discovery in Databases, Seattle, Washington, 1994, pp. 181–192
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1, 259–289 (1997)
Megiddo, N., Srikant, R.: Discovering predictive association rules. In: Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discov. and Data Mining, 1998, pp. 274–278
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Web: http://www.R-project.org (2006)
Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Silverstein, C., Brin, S., Motwani, R.: Beyond market baskets: generalizing association rules to dependence rules. Data Min. Knowl. Discov. 2, 39–68 (1998)
Smyth, P., Goodman, R.M.: An information theoretic approach to rule induction from databases. IEEE Trans. Knowl. Discov. Data Eng. 4(4), 301–316 (1992)
Tan, P.-N., Kumar, V.: Interestingness measures for association patterns: a perspective. Technical report TR00-036, Department of Computer Science, University of Minnesota (2000)
Webb, G.I.: Discovering significant rules. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 434–443
Weiß, C.H.: Sampling in data mining. In: Ruggeri, F., et al. (eds.) Encyclopedia of Statistics in Quality and Reliability. Wiley, New York (2007, to appear)
Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD’97), 1997, pp. 283–296
Zhang, H., Padmanabhan, B., Tuzhilin, A.: On the discovery of significant statistical quantitative rules. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 374–383
Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 401–406. Long version available under http://robotics.stanford.edu/~ronnyk/realWorldAssocLongPaper.pdf
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Weiß, C.H. Statistical mining of interesting association rules. Stat Comput 18, 185–194 (2008). https://doi.org/10.1007/s11222-007-9047-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-007-9047-6