Skip to main content
Log in

Statistical mining of interesting association rules

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

This article utilizes stochastic ideas for reasoning about association rule mining, and provides a formal statistical view of this discipline. A simple stochastic model is proposed, based on which support and confidence are reasonable estimates for certain probabilities of the model. Statistical properties of the corresponding estimators, like moments and confidence intervals, are derived, and items and itemsets are observed for correlations.

After a brief review of measures of interest of association rules, with the main focus on interestingness measures motivated by statistical principles, two new measures are described. These measures, called α- and σ-precision, respectively, rely on statistical properties of the estimators discussed before. Experimental results demonstrate the effectivity of both measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings 20th International Conference on Very Large Databases, Chile, Santiago, 1994, pp. 487–499

  • Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th International Conference on Data Engineering (ICDE ’95), Taipeh, Taiwan, 1995, pp. 3–14

  • Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large data bases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Washington, 1993, pp. 207–216

  • Agresti, A.: Categorical Data Analysis. Wiley, New York (1990)

    MATH  Google Scholar 

  • Bolton, R.J., Hand, D.J., Adams, N.M.: Determining hit rate in pattern search. In: Hand, et al. (eds.) Pattern Detection and Discovery. Lecture Notes in Artificial Intelligence, vol. 2447, pp. 36–48. Springer, Berlin (2002)

    Chapter  Google Scholar 

  • Bolton, R.J., Adams, N.M.: An iterative hypothesis-testing strategy for pattern discovery. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003, pp. 49–58

  • Borgelt, C.: Efficient implementations of Apriori and Eclat. In: Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), Melbourne, FL, USA, 2003

  • Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1997, pp. 255–264

  • Chen, M.S., Han, J., Yu, P.S.: Data mining: An overview from a data base perspective. IEEE Trans. Knowl. Data Eng. 8, 866–883 (1996)

    Article  Google Scholar 

  • Crowder, M., Sweeting, T.: Bayesian inference for a bivariate binomial distribution. Biometrika 76(3), 599–603 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  • DuMouchel, W., Prebigon, D.: Empirical Bayes screening for multi-item associations. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 67–76

  • Fienberg, S.E., Slavkovic, A.B.: Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Min. Knowl. Discov. 11, 155–180 (2005)

    Article  MathSciNet  Google Scholar 

  • Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38(3), 1–32 (2006)

    Article  Google Scholar 

  • Goodman, L.A.: On simultaneous confidence intervals for multinomial proportions. Technometrics 7(2), 247–254 (1965)

    Article  MATH  Google Scholar 

  • Hahsler, M., Grün, B., Hornik, K.: Introduction to arules—Mining Association rules and frequent item sets. Available at: http://cran2.arsmachinandi.it/doc/vignettes/arules/arules.pdf (2006)

  • Hilderman, R.J., Hamilton, H.J.: Knowledge discovery and interestingness measures: a survey. Technical report TR 99-04, ISBN 0-7731-0391-0, University of Regina, Canada (1999)

  • Hilderman, R.J., Hamilton, H.J.: Evaluation of interestingness measures for ranking discovered knowledge. In: Cheung, et al. (eds.) PAKDD 2001. Lecture Notes in Artificial Intelligence, vol. 2035, pp. 247–259. Springer, Berlin (2001)

    Google Scholar 

  • Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Data mining of association rules and the process of knowledge discovery in data bases. In: Data Mining in E-Commerce, Medicine, and Knowledge Management, 2002, pp. 15–36

  • Kamber, M., Shinghal, R.: Evaluating the interestingness of characteristic rules. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996, pp. 263–266

  • Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Adam, et al. (eds.) 3rd International Conference on Information and Knowledge Management, pp. 401–407. ACM, New York (1997)

    Google Scholar 

  • Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: Fayyad, Uthurusamy (eds.) AAAI Workshop on Knowledge Discovery in Databases, Seattle, Washington, 1994, pp. 181–192

  • Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1, 259–289 (1997)

    Article  Google Scholar 

  • Megiddo, N., Srikant, R.: Discovering predictive association rules. In: Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discov. and Data Mining, 1998, pp. 274–278

  • R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Web: http://www.R-project.org (2006)

  • Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)

    MathSciNet  Google Scholar 

  • Silverstein, C., Brin, S., Motwani, R.: Beyond market baskets: generalizing association rules to dependence rules. Data Min. Knowl. Discov. 2, 39–68 (1998)

    Article  Google Scholar 

  • Smyth, P., Goodman, R.M.: An information theoretic approach to rule induction from databases. IEEE Trans. Knowl. Discov. Data Eng. 4(4), 301–316 (1992)

    Article  Google Scholar 

  • Tan, P.-N., Kumar, V.: Interestingness measures for association patterns: a perspective. Technical report TR00-036, Department of Computer Science, University of Minnesota (2000)

  • Webb, G.I.: Discovering significant rules. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 434–443

  • Weiß, C.H.: Sampling in data mining. In: Ruggeri, F., et al. (eds.) Encyclopedia of Statistics in Quality and Reliability. Wiley, New York (2007, to appear)

    Google Scholar 

  • Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD’97), 1997, pp. 283–296

  • Zhang, H., Padmanabhan, B., Tuzhilin, A.: On the discovery of significant statistical quantitative rules. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 374–383

  • Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 401–406. Long version available under http://robotics.stanford.edu/~ronnyk/realWorldAssocLongPaper.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian H. Weiß.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weiß, C.H. Statistical mining of interesting association rules. Stat Comput 18, 185–194 (2008). https://doi.org/10.1007/s11222-007-9047-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-007-9047-6

Keywords

Navigation