Scoring the Data Using Association Rules

Liu, Bing; Ma, Yiming; Wong, Ching Kian; Yu, Philip S.

doi:10.1023/A:1021931008240

Scoring the Data Using Association Rules

Published: March 2003

Volume 18, pages 119–135, (2003)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Bing Liu¹,
Yiming Ma¹,
Ching Kian Wong¹ &
…
Philip S. Yu²

201 Accesses
34 Citations
3 Altmetric
Explore all metrics

Abstract

In many data mining applications, the objective is to select data cases of a target class. For example, in direct marketing, marketers want to select likely buyers of a particular product for promotion. In such applications, it is often too difficult to predict who will definitely be in the target class (e.g., the buyer class) because the data used for modeling is often very noisy and has a highly imbalanced class distribution. Traditionally, classification systems are used to solve this problem. Instead of classifying each data case to a definite class (e.g., buyer or non-buyer), a classification system is modified to produce a class probability estimate (or a score) for the data case to indicate the likelihood that the data case belongs to the target class (e.g., the buyer class). However, existing classification systems only aim to find a subset of the regularities or rules that exist in data. This subset of rules only gives a partial picture of the domain. In this paper, we show that the target selection problem can be mapped to association rule mining to provide a more powerful solution to the problem. Since association rule mining aims to find all rules in data, it is thus able to give a complete picture of the underlying relationships in the domain. The complete set of rules enables us to assign a more accurate class probability estimate to each data case. This paper proposes an effective and efficient technique to compute class probability estimates using association rules. Experiment results using public domain data and real-life application data show that in general the new technique performs markedly better than the state-of-the-art classification system C4.5, boosted C4.5, and the Naïve Bayesian system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

R. Quinlan, C4.5: Program for Machine Learning, Morgan Kaufmann, 1992.
L. Breiman, J. Friedman, R. Ohlsen, and C. Stone, Classification and Regress Trees, Wadsworth & Brooks: Pacifc Grove, CA, 1984.
Google Scholar
R. Kohavi, G. John, R. Long, D. Manley, and K. Pfleger, “MLC++: A machine learning library in C++,” Tools with Artificial Intelligence, 1994, pp. 740–743.
P. Domingos and M. Pazzani, “On the optimality of the simple bayesian classifier under zero-one loss,” Machine Learning, vol. 29, nos.2/3, pp. 103–130, 1997.
Google Scholar
R. Quinlan, “Bagging, Boosting, and C4.5,” in Proceeding of National Conference on Artificial Intelligence (AAAI-96), 1996, pp. 725–730.
P. Langley, W. Iba, and K. Thomson, “An analysis of Bayesian classifiers,” in Proceeding of National Conference on Artificial Intelligence (AAAI-92), 1992, pp. 223–228.
B. Liu, W. Hsu, and Y. Ma, “Integrating classification and association rule mining,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-98), 1998, pp. 80–86.
A.M. Hughes, The Complete Database Marketer, Chicago, Ill.: Irwin Professional, 1996.
Google Scholar
C. Ling and C. Li, “Data mining for direct marketing: Problems and solutions,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-98), 1998, pp. 73–79.
G. Piatetsky-Shapiro and B. Massand, “Estimating campaign benefits and modelling lift,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999, pp. 185–193.
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proceedings of International Conference on Very Large Databases (VLDB-94), 1994, pp. 487–499.
Y. Freund and R.E. Schapire, “Experiments with a new boosting algorithm,” in Proceedings of the International Conference on Machine Learning (ICML-96), 1996, pp. 148–156.
Z. Zheng, G. Webb, and K.M. Ting, “Lazy Bayesian rules:Alazy semi-naive Bayesian learning technique competitive to boosting decision trees,” in Proceedings of the International Conference on Machine Learning (ICML-99), 1999, pp. 493–502.
M. Kubat and S. Matwin, “Addressing the curse of imbalanced training sets,” in Proceedings of the International Conference on Machine Learning (ICML-97), 1997, pp. 179–186.
P.K. Chan and S.J. Stolfo, “Towards scaleable learning with non-uniform class and cost distributions: A case study in credit card fraud detection,” in Proceedings of ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-98), 1998, pp. 164–168.
T. Dietterich and G. Baskiri, “Solving multiclass learning problems via error-correcting output code,” Journal of AI research, vol. 2, pp. 263–286, 1995.
Google Scholar
M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, and C. Brunk, “Reducing misclassification costs,” in Proceedings of the International Conference on Machine Learning (ICML-97), 1997.
M. Kubat, R. Holte, and S. Matwin, “Learning when negative examples abound,” in Proceedings of the European International Conference on Machine Learning (ECML-97), 1997, pp. 146–153.
F. Provost and T. Fawcettt, “Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997, pp. 43–48.
G. Dong, X. Zhang, L. Wong, and J. Li, “CAEP: Classification by aggregating emerging patterns,” in Proceedings of Discovery-Science-99, 1999, pp. 30–42.
D. Meretkis and B. Wuthrich, “Extending naïve bayes classifiers using long itemsets,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999, pp. 165–174.
H. Mannila, D. Pavlov, and P. Smyth, “Prediction with local patterns using cross-entropy,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999, pp. 357–361.
C. Aggarwal and P. Yu, “Online generation of association rules,” in Proceedings of IEEE International Conference on Data Engineering (ICDE-98), 1998, pp. 402–411.
R. Bayardo, R. Agrawal, and D. Gunopulos, “Constraint-based rule mining in large, dense databases,” in Proceedings of IEEE International Conference on Data Engineering (ICDE-99), 1999, pp. 188–197.
S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic itemset counting and implication rules for market basket data,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD-97), 1997, pp. 255–264.
T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, “Data mining using two-dimensional optimized association rules: Scheme, algorithms and visualization,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD-96), 1996, pp. 13–23.
J. Han and Y. Fu, “Discovery of multiple-level association rules from large databases,” in Proceedings of International Conference on Very Large Databases (VLDB-95), 1995, pp. 420–431.
R.T. Ng, L. Lakshmanan, and J. Han, “Exploratory mining and pruning optimisation of constrained association rules,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD-98), 1998, pp. 13–24.
R. Rastogi and K. Shim, “Mining optimized association rules with categorical and numeric attributes,” in Proceedings of IEEE International Conference on Data Engineering (ICDE–98), 1998, pp. 503–512.
H. Toivonen, “Sampling large databases for association rules,” in Proceedings of International Conference on Very Large Databases (VLDB-96), 1996, pp. 134–145.
A. Tong, H. Lu, J. Han, and L. Feng, “Break the barrier of transactions: mining inter-transaction association rules,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999, pp. 297–301.
U.M. Fayyad and K.B. Irani, “Multi-interval discretization of continuous-valued attributes for classification learning,” in Proceedings of International Joint Conference on Artificial Intelligence (IJCAI-93), 1993, pp. 1022–1029.
Z. Zheng and G. Webb, “Stochastic attribute selection committees with multiple boosting: Learning more accurate and more stable classifier committees,” in Proceedings of Pacific Asia International Conference on Knowledge Discovery and Data Mining (PAKDD-99), 1999, pp. 123–132.
C.J. Merz and P. Murphy, UCI repository of machine learning databases [http://www.cs.uci.edu/~mlearn/MLRepository.html], 1996.
B. Liu, W. Hsu, and Y. Ma, “Mining association rules with multiple minimum supports,” in Proceedings of ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999, pp. 337–341.

Download references

Author information

Authors and Affiliations

School of Computing, National University of Singapore, 3 Science Drive 2, Singapore, 117543
Bing Liu, Yiming Ma & Ching Kian Wong
IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Philip S. Yu

Authors

Bing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Ma
View author publications
You can also search for this author in PubMed Google Scholar
Ching Kian Wong
View author publications
You can also search for this author in PubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, B., Ma, Y., Wong, C.K. et al. Scoring the Data Using Association Rules. Applied Intelligence 18, 119–135 (2003). https://doi.org/10.1023/A:1021931008240

Download citation

Issue Date: March 2003
DOI: https://doi.org/10.1023/A:1021931008240

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scoring the Data Using Association Rules

Abstract

Access this article

Similar content being viewed by others

Simple and Accurate Classification Method Based on Class Association Rules Performs Well on Well-Known Datasets

A Weighted Approach for Class Association Rules

ARCID: A New Approach to Deal with Imbalanced Datasets Classification

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Scoring the Data Using Association Rules

Abstract

Access this article

Similar content being viewed by others

Simple and Accurate Classification Method Based on Class Association Rules Performs Well on Well-Known Datasets

A Weighted Approach for Class Association Rules

ARCID: A New Approach to Deal with Imbalanced Datasets Classification

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation