Statistical mining of interesting association rules

Weiß, Christian H.

doi:10.1007/s11222-007-9047-6

Statistical mining of interesting association rules

Published: 21 December 2007

Volume 18, pages 185–194, (2008)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Christian H. Weiß¹

252 Accesses
12 Citations
Explore all metrics

Abstract

This article utilizes stochastic ideas for reasoning about association rule mining, and provides a formal statistical view of this discipline. A simple stochastic model is proposed, based on which support and confidence are reasonable estimates for certain probabilities of the model. Statistical properties of the corresponding estimators, like moments and confidence intervals, are derived, and items and itemsets are observed for correlations.

After a brief review of measures of interest of association rules, with the main focus on interestingness measures motivated by statistical principles, two new measures are described. These measures, called α- and σ-precision, respectively, rely on statistical properties of the estimators discussed before. Experimental results demonstrate the effectivity of both measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings 20th International Conference on Very Large Databases, Chile, Santiago, 1994, pp. 487–499
Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th International Conference on Data Engineering (ICDE ’95), Taipeh, Taiwan, 1995, pp. 3–14
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large data bases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Washington, 1993, pp. 207–216
Agresti, A.: Categorical Data Analysis. Wiley, New York (1990)
MATH Google Scholar
Bolton, R.J., Hand, D.J., Adams, N.M.: Determining hit rate in pattern search. In: Hand, et al. (eds.) Pattern Detection and Discovery. Lecture Notes in Artificial Intelligence, vol. 2447, pp. 36–48. Springer, Berlin (2002)
Chapter Google Scholar
Bolton, R.J., Adams, N.M.: An iterative hypothesis-testing strategy for pattern discovery. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003, pp. 49–58
Borgelt, C.: Efficient implementations of Apriori and Eclat. In: Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), Melbourne, FL, USA, 2003
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1997, pp. 255–264
Chen, M.S., Han, J., Yu, P.S.: Data mining: An overview from a data base perspective. IEEE Trans. Knowl. Data Eng. 8, 866–883 (1996)
Article Google Scholar
Crowder, M., Sweeting, T.: Bayesian inference for a bivariate binomial distribution. Biometrika 76(3), 599–603 (1989)
Article MATH MathSciNet Google Scholar
DuMouchel, W., Prebigon, D.: Empirical Bayes screening for multi-item associations. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 67–76
Fienberg, S.E., Slavkovic, A.B.: Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Min. Knowl. Discov. 11, 155–180 (2005)
Article MathSciNet Google Scholar
Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38(3), 1–32 (2006)
Article Google Scholar
Goodman, L.A.: On simultaneous confidence intervals for multinomial proportions. Technometrics 7(2), 247–254 (1965)
Article MATH Google Scholar
Hahsler, M., Grün, B., Hornik, K.: Introduction to arules—Mining Association rules and frequent item sets. Available at: http://cran2.arsmachinandi.it/doc/vignettes/arules/arules.pdf (2006)
Hilderman, R.J., Hamilton, H.J.: Knowledge discovery and interestingness measures: a survey. Technical report TR 99-04, ISBN 0-7731-0391-0, University of Regina, Canada (1999)
Hilderman, R.J., Hamilton, H.J.: Evaluation of interestingness measures for ranking discovered knowledge. In: Cheung, et al. (eds.) PAKDD 2001. Lecture Notes in Artificial Intelligence, vol. 2035, pp. 247–259. Springer, Berlin (2001)
Google Scholar
Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Data mining of association rules and the process of knowledge discovery in data bases. In: Data Mining in E-Commerce, Medicine, and Knowledge Management, 2002, pp. 15–36
Kamber, M., Shinghal, R.: Evaluating the interestingness of characteristic rules. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996, pp. 263–266
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Adam, et al. (eds.) 3rd International Conference on Information and Knowledge Management, pp. 401–407. ACM, New York (1997)
Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: Fayyad, Uthurusamy (eds.) AAAI Workshop on Knowledge Discovery in Databases, Seattle, Washington, 1994, pp. 181–192
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1, 259–289 (1997)
Article Google Scholar
Megiddo, N., Srikant, R.: Discovering predictive association rules. In: Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discov. and Data Mining, 1998, pp. 274–278
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Web: http://www.R-project.org (2006)
Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
MathSciNet Google Scholar
Silverstein, C., Brin, S., Motwani, R.: Beyond market baskets: generalizing association rules to dependence rules. Data Min. Knowl. Discov. 2, 39–68 (1998)
Article Google Scholar
Smyth, P., Goodman, R.M.: An information theoretic approach to rule induction from databases. IEEE Trans. Knowl. Discov. Data Eng. 4(4), 301–316 (1992)
Article Google Scholar
Tan, P.-N., Kumar, V.: Interestingness measures for association patterns: a perspective. Technical report TR00-036, Department of Computer Science, University of Minnesota (2000)
Webb, G.I.: Discovering significant rules. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 434–443
Weiß, C.H.: Sampling in data mining. In: Ruggeri, F., et al. (eds.) Encyclopedia of Statistics in Quality and Reliability. Wiley, New York (2007, to appear)
Google Scholar
Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD’97), 1997, pp. 283–296
Zhang, H., Padmanabhan, B., Tuzhilin, A.: On the discovery of significant statistical quantitative rules. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 374–383
Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 401–406. Long version available under http://robotics.stanford.edu/~ronnyk/realWorldAssocLongPaper.pdf

Download references

Author information

Authors and Affiliations

Institute of Mathematics, Department of Statistics, University of Würzburg, Am Hubland, Würzburg, 97074, Germany
Christian H. Weiß

Authors

Christian H. Weiß
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian H. Weiß.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weiß, C.H. Statistical mining of interesting association rules. Stat Comput 18, 185–194 (2008). https://doi.org/10.1007/s11222-007-9047-6

Download citation

Received: 14 February 2007
Accepted: 30 November 2007
Published: 21 December 2007
Issue Date: June 2008
DOI: https://doi.org/10.1007/s11222-007-9047-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical mining of interesting association rules

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Uncertainty in big data analytics: survey, opportunities, and challenges

A survey of Bayesian Network structure learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Statistical mining of interesting association rules

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Uncertainty in big data analytics: survey, opportunities, and challenges

A survey of Bayesian Network structure learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation