Skip to main content
Log in

Scoring the Data Using Association Rules

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In many data mining applications, the objective is to select data cases of a target class. For example, in direct marketing, marketers want to select likely buyers of a particular product for promotion. In such applications, it is often too difficult to predict who will definitely be in the target class (e.g., the buyer class) because the data used for modeling is often very noisy and has a highly imbalanced class distribution. Traditionally, classification systems are used to solve this problem. Instead of classifying each data case to a definite class (e.g., buyer or non-buyer), a classification system is modified to produce a class probability estimate (or a score) for the data case to indicate the likelihood that the data case belongs to the target class (e.g., the buyer class). However, existing classification systems only aim to find a subset of the regularities or rules that exist in data. This subset of rules only gives a partial picture of the domain. In this paper, we show that the target selection problem can be mapped to association rule mining to provide a more powerful solution to the problem. Since association rule mining aims to find all rules in data, it is thus able to give a complete picture of the underlying relationships in the domain. The complete set of rules enables us to assign a more accurate class probability estimate to each data case. This paper proposes an effective and efficient technique to compute class probability estimates using association rules. Experiment results using public domain data and real-life application data show that in general the new technique performs markedly better than the state-of-the-art classification system C4.5, boosted C4.5, and the Naïve Bayesian system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Quinlan, C4.5: Program for Machine Learning, Morgan Kaufmann, 1992.

  2. L. Breiman, J. Friedman, R. Ohlsen, and C. Stone, Classification and Regress Trees, Wadsworth & Brooks: Pacifc Grove, CA, 1984.

    Google Scholar 

  3. R. Kohavi, G. John, R. Long, D. Manley, and K. Pfleger, “MLC++: A machine learning library in C++,” Tools with Artificial Intelligence, 1994, pp. 740–743.

  4. P. Domingos and M. Pazzani, “On the optimality of the simple bayesian classifier under zero-one loss,” Machine Learning, vol. 29, nos.2/3, pp. 103–130, 1997.

    Google Scholar 

  5. R. Quinlan, “Bagging, Boosting, and C4.5,” in Proceeding of National Conference on Artificial Intelligence (AAAI-96), 1996, pp. 725–730.

  6. P. Langley, W. Iba, and K. Thomson, “An analysis of Bayesian classifiers,” in Proceeding of National Conference on Artificial Intelligence (AAAI-92), 1992, pp. 223–228.

  7. B. Liu, W. Hsu, and Y. Ma, “Integrating classification and association rule mining,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-98), 1998, pp. 80–86.

  8. A.M. Hughes, The Complete Database Marketer, Chicago, Ill.: Irwin Professional, 1996.

    Google Scholar 

  9. C. Ling and C. Li, “Data mining for direct marketing: Problems and solutions,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-98), 1998, pp. 73–79.

  10. G. Piatetsky-Shapiro and B. Massand, “Estimating campaign benefits and modelling lift,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999, pp. 185–193.

  11. R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proceedings of International Conference on Very Large Databases (VLDB-94), 1994, pp. 487–499.

  12. Y. Freund and R.E. Schapire, “Experiments with a new boosting algorithm,” in Proceedings of the International Conference on Machine Learning (ICML-96), 1996, pp. 148–156.

  13. Z. Zheng, G. Webb, and K.M. Ting, “Lazy Bayesian rules:Alazy semi-naive Bayesian learning technique competitive to boosting decision trees,” in Proceedings of the International Conference on Machine Learning (ICML-99), 1999, pp. 493–502.

  14. M. Kubat and S. Matwin, “Addressing the curse of imbalanced training sets,” in Proceedings of the International Conference on Machine Learning (ICML-97), 1997, pp. 179–186.

  15. P.K. Chan and S.J. Stolfo, “Towards scaleable learning with non-uniform class and cost distributions: A case study in credit card fraud detection,” in Proceedings of ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-98), 1998, pp. 164–168.

  16. T. Dietterich and G. Baskiri, “Solving multiclass learning problems via error-correcting output code,” Journal of AI research, vol. 2, pp. 263–286, 1995.

    Google Scholar 

  17. M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, and C. Brunk, “Reducing misclassification costs,” in Proceedings of the International Conference on Machine Learning (ICML-97), 1997.

  18. M. Kubat, R. Holte, and S. Matwin, “Learning when negative examples abound,” in Proceedings of the European International Conference on Machine Learning (ECML-97), 1997, pp. 146–153.

  19. F. Provost and T. Fawcettt, “Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997, pp. 43–48.

  20. G. Dong, X. Zhang, L. Wong, and J. Li, “CAEP: Classification by aggregating emerging patterns,” in Proceedings of Discovery-Science-99, 1999, pp. 30–42.

  21. D. Meretkis and B. Wuthrich, “Extending naïve bayes classifiers using long itemsets,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999, pp. 165–174.

  22. H. Mannila, D. Pavlov, and P. Smyth, “Prediction with local patterns using cross-entropy,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999, pp. 357–361.

  23. C. Aggarwal and P. Yu, “Online generation of association rules,” in Proceedings of IEEE International Conference on Data Engineering (ICDE-98), 1998, pp. 402–411.

  24. R. Bayardo, R. Agrawal, and D. Gunopulos, “Constraint-based rule mining in large, dense databases,” in Proceedings of IEEE International Conference on Data Engineering (ICDE-99), 1999, pp. 188–197.

  25. S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic itemset counting and implication rules for market basket data,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD-97), 1997, pp. 255–264.

  26. T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, “Data mining using two-dimensional optimized association rules: Scheme, algorithms and visualization,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD-96), 1996, pp. 13–23.

  27. J. Han and Y. Fu, “Discovery of multiple-level association rules from large databases,” in Proceedings of International Conference on Very Large Databases (VLDB-95), 1995, pp. 420–431.

  28. R.T. Ng, L. Lakshmanan, and J. Han, “Exploratory mining and pruning optimisation of constrained association rules,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD-98), 1998, pp. 13–24.

  29. R. Rastogi and K. Shim, “Mining optimized association rules with categorical and numeric attributes,” in Proceedings of IEEE International Conference on Data Engineering (ICDE–98), 1998, pp. 503–512.

  30. H. Toivonen, “Sampling large databases for association rules,” in Proceedings of International Conference on Very Large Databases (VLDB-96), 1996, pp. 134–145.

  31. A. Tong, H. Lu, J. Han, and L. Feng, “Break the barrier of transactions: mining inter-transaction association rules,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999, pp. 297–301.

  32. U.M. Fayyad and K.B. Irani, “Multi-interval discretization of continuous-valued attributes for classification learning,” in Proceedings of International Joint Conference on Artificial Intelligence (IJCAI-93), 1993, pp. 1022–1029.

  33. Z. Zheng and G. Webb, “Stochastic attribute selection committees with multiple boosting: Learning more accurate and more stable classifier committees,” in Proceedings of Pacific Asia International Conference on Knowledge Discovery and Data Mining (PAKDD-99), 1999, pp. 123–132.

  34. C.J. Merz and P. Murphy, UCI repository of machine learning databases [http://www.cs.uci.edu/~mlearn/MLRepository.html], 1996.

  35. B. Liu, W. Hsu, and Y. Ma, “Mining association rules with multiple minimum supports,” in Proceedings of ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999, pp. 337–341.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, B., Ma, Y., Wong, C.K. et al. Scoring the Data Using Association Rules. Applied Intelligence 18, 119–135 (2003). https://doi.org/10.1023/A:1021931008240

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1021931008240

Navigation