Abstract
Association rules are a data mining technique used to discover frequent patterns in a data set. In this work, association rules are used in the medical domain, where data sets are generally high dimensional and small. The chief disadvantage about mining association rules in a high dimensional data set is the huge number of patterns that are discovered, most of which are irrelevant or redundant. Several constraints are proposed for filtering purposes, since our aim is to discover only significant association rules and accelerate the search process. A greedy algorithm is introduced to compute rule covers in order to summarize rules having the same consequent. The significance of association rules is evaluated using three metrics: support, confidence and lift. Experiments focus on discovering association rules on a real data set to predict absence or existence of heart disease. Constraints are shown to significantly reduce the number of discovered rules and improve running time. Rule covers summarize a large number of rules by producing a succinct set of rules with high-quality metrics.
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD conference, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. VLDB conference, pp 487–499
Bastide Y, Pasquier N, Taouil R, Lakhal GL (2000) Mining minimal non-redundant association rules using frequent closed itemsets. Computational logic, pp 972–986
Bayardo R, Agrawal R (1999) Mining the most interesting rules. ACM KDD conference, pp 145–154
Becquet C, Blachon S, Jeudy B, Boulicaut J, Gandrillon O (2002) Strong association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data. Genom Biol 3(12)
Braal L, Ezquerra N, Schwartz E, Garcia EV (1996) Analyzing and predicting images through a neural network approach. In: Proceedings of visualization in biomedical computing, pp 253–258
Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. ACM SIGMOD conference, pp 255–264
Brossette S, Sprague A, Hardin J, Waites K, Jones W, Moser S (1998) Association rules and data mining in hospital infection control and public health surveillance. J Am Med Inform Assoc (JAMIA) 5(4):373–381
Brossette S, Sprague A, Jones W, Moser S (2000) A data mining system for infection control surveillance. Methods Inf Med 39(4):303–310
Bykowski A, Rigotti C (2003) Dbc: a condensed representation of frequent patterns for efficient mining. Inform Syst 28(8):949–977
Chen T, Chou L, Hwang S (2003) Application of a data mining technique to analyze coprescription patterns for antacids in Taiwan. Clin Ther 25(9):2453–2463
Cooke D, Ordonez C, Garcia E.V, Omiecinski E, Krawczynska E, Folks R, Santana C, de Braal L, Ezquerra N (1999) Data mining of large myocardial perfusion SPECT (MPS) databases to improve diagnostic decision making. J Nuclear Med 40(5)
Cooke D, Santana C, Morris T, de Braal L, Ordonez C, Omiecinski E, Ezquerra N, Garcia E (2000a) Data mining of large myocardial perfusion SPECT (MPS) databases: validation of expert system rule confidences. J Nuclear Med 41(5):187
Cooke D, Santana C, Morris T, de Braal L, Ordonez C, Omiecinski E, Ezquerra N, Garcia EV (2000b) Validating expert system rule confidences using data mining of myocardial perfusion SPECT databases. Computers in cardiology conference, pp 116–119
Creighton C, Hanash S (2003) Mining gene expression databases for association rules. Bioinformatics 19(1):79–86
Cristofor L, Simovici D (2002) Generating an informative cover for association rules. ICDM, pp 597–600
Delgado M, Sanchez D, Martin-Bautista M, Vila M (2001) Mining association rules with improved semantics in medical databases. Artif Intell Med 21(1–3):241–245
Down S, Wallace M (2000) Mining association rules from a pediatric primary care decision support system. In: Proceedings of AMIA symposium, pp 200–204
Ezquerra N, Mullick R (1993) Perfex: an expert system for interpreting myocardial perfusion. Expert Syst Appl 6:455–468
Fraser H, Long W, Naimi S (2003) Evaluation of a cardiac diagnostic program in a typical clinical setting. J Am Med Inform Assoc (JAMIA) 10(4):373–381
Freitas A (2000) Understanding the crucial differences between classification and association rules–-a position paper. SIGKDD Explor 2(1):65–69
Gade K, Wan J, Karypis G (2004) Efficient closed pattern mining in the presence of tough block constraints. ACM KDD conference, pp 138–147
Gouda K, Zaki M (2001) Efficiently mining maximal frequent itemsets. ICDM conference, pp 163–170
Han J (1996a) Background for association rules and cost estimate of selected mining algorithms. ACM CIKM, pp 73–80
Han J (1996b) Pushing constraints in templates for mining association rules. Florida AI research symposium, pp 375–379
Han J, Kamber M (2001) Data mining: Concepts and techniques, 1st edn. Morgan Kaufmann, San Francisco
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Conference, pp 1–12
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning, 1st edn. Springer, New York
Klemettinen M, Mannila H, Ronkainen P, Toivonen H, Verkamo A (1994) Finding interesting rules from large sets of discovered association rules. ACM CIKM, pp 401–407
Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. IEEE ICDM conference, pp 305–312
Kryszkiewicz M (2004) Reducing borders of k-disjunction free representations of frequent patterns. ACM SAC conference, pp 559–563
Lakshmanan LV, Ng R, Han J, Pang A (1999) Optimization of constrained frequent set queries with 2-variable constraints. ACM SIGMOD conference, pp 157–168
Lent B, Swami A, Widom J (1997) Clustering association rules. IEEE ICDE conference, pp 220–231
Lin D, Kedem Z (1998) Pincer-search: a new algorithm for discovering the maximum frequent itemset. EDBT conference, pp 105–119
Long W (1989) Medical reasoning using a probabilistic network. Appl Artif Intell 3:367–383
Long W, Fraser H, Naimi S (1997) Reasoning requirements for diagnosis of heart disease. Artif Intell Med 10(1):5–24
Ng R, Lakshmanan L, Han J (1998) Exploratory mining and pruning optimizations of constrained association rules. ACM SIGMOD conference, pp 13–24
Ordonez C, Omiecinski E, de Braal L, Santana C, Ezquerra N (2001) Mining constrained association rules to predict heart disease. IEEE ICDM conference, pp 433–440
Ordonez C, Santana C, Braal L (2000) Discovering interesting association rules in medical data. ACM DMKD workshop, pp 78–85
Oyama T, Kitano K, Satou T, Ito T (2002) Extraction of knowledge on protein–protein interaction by association rule discovery. Bioinformatics 18(5):705–714
Pasquier N, Bastide Y, Taouil RG, Lakhal L (1999) Discovering frequent closed itemsets for association rules. ICDT conference, pp 398–416
Pei J, Han J (2002) Constraints in data mining: constrained frequent pattern mining: a pattern-growth view. SIGKDD Explor 4(1):31–39
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. ACM DMKD workshop, pp 21–30
Phan-Luong V (2001) The representative basis for association rules. IEEE ICDM, pp 639–640
Pudi V, Haritsa J (2003) Reducing rule covers with deterministic error bounds. PAKDD conference, pp 313–324
Rastogi R, Shim K (1998) Mining optimized association rules with categorical and numeric attributes. IEEE ICDE conference, pp 503–512
Roddick J, Fule P, Graco W (2003) Exploratory medical knowledge discovery: experiences and issues. SIGKDD Explor 5(1):94–99
Srikant R, Agrawal R (1995) Mining generalized association rules. VLDB conference, pp 407–419
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. ACM SIGMOD conference, pp 1–12
Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. ACM KDD conference, pp 67–73
Taouil R, Pasquier N, Bastide Y, Lakhal L (2000) Mining bases for association rules using closed sets. IEEE ICDE conference, p 307
Wang K, He Y, Han J (2003) Pushing support constraints into association rules mining. IEEE TKDE 15(3):642–658
Author information
Authors and Affiliations
Corresponding author
Additional information
Carlos Ordonez received a degree in applied mathematics (actuarial sciences) and an MS degree in computer science, both from the UNAM University, Mexico, in 1992 and 1996, respectively. He got a PhD degree in computer science from the Georgia Institute of Technology, USA, in 2000. Dr. Ordonez currently works for Teradata (NCR) conducting research on database and data mining technology. He has published more than 20 research articles and holds three patents.
Norberto Ezquerra obtained his undergraduate degree in mathematics and physics from the University of South Florida, and his doctoral degree from Florida State University, USA. He is an associate professor at the College of Computing at the Georgia Institute of Technology and an adjunct faculty member in the Emory University School of Medicine. His research interests include computer graphics, computer vision in medicine, AI in medicine, modeling of physically based systems, medical informatics and telemedicine. He is associate editor of the IEEE Transactions on Medical Imaging Journal, and a member of the American Medical Informatics Association and the IEEE Engineering in Medicine Biology Society.
Cesar A. Santana received his MD degree in 1984 from the Institute of Medical Science, in Havana, Cuba. In 1988, he finished his residency training in internal medicine, and in 1991, completed a fellowship in nuclear medicine in Havana, Cuba. Dr. Santana received a PhD in nuclear cardiology in 1996 from the Department of Cardiology of the Vall d' Hebron University Hospital in Barcelona, Spain. Dr. Santana is an assistant professor at the Emory University School of Medicine and conducts research in the Radiology Department at the Emory University Hospital.
Rights and permissions
About this article
Cite this article
Ordonez, C., Ezquerra, N. & Santana, C.A. Constraining and summarizing association rules in medical data. Knowl Inf Syst 9, 1–2 (2006). https://doi.org/10.1007/s10115-005-0226-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-005-0226-5