Constraining and summarizing association rules in medical data

Ordonez, Carlos; Ezquerra, Norberto; Santana, Cesar A.

doi:10.1007/s10115-005-0226-5

Constraining and summarizing association rules in medical data

Regular Paper
Published: 09 September 2005

Volume 9, pages 1–2, (2006)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Carlos Ordonez¹,
Norberto Ezquerra² &
Cesar A. Santana³

496 Accesses
83 Citations
Explore all metrics

Abstract

Association rules are a data mining technique used to discover frequent patterns in a data set. In this work, association rules are used in the medical domain, where data sets are generally high dimensional and small. The chief disadvantage about mining association rules in a high dimensional data set is the huge number of patterns that are discovered, most of which are irrelevant or redundant. Several constraints are proposed for filtering purposes, since our aim is to discover only significant association rules and accelerate the search process. A greedy algorithm is introduced to compute rule covers in order to summarize rules having the same consequent. The significance of association rules is evaluated using three metrics: support, confidence and lift. Experiments focus on discovering association rules on a real data set to predict absence or existence of heart disease. Constraints are shown to significantly reduce the number of discovered rules and improve running time. Rule covers summarize a large number of rules by producing a succinct set of rules with high-quality metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sets of Robust Rules, and How to Find Them

Mining Maximal Association Rules on Soft Sets Using Critical Relative Support Based Pruning

Significant Association Rule Mining Without Support and Confidence Thresholds

References

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD conference, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. VLDB conference, pp 487–499
Bastide Y, Pasquier N, Taouil R, Lakhal GL (2000) Mining minimal non-redundant association rules using frequent closed itemsets. Computational logic, pp 972–986
Bayardo R, Agrawal R (1999) Mining the most interesting rules. ACM KDD conference, pp 145–154
Becquet C, Blachon S, Jeudy B, Boulicaut J, Gandrillon O (2002) Strong association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data. Genom Biol 3(12)
Braal L, Ezquerra N, Schwartz E, Garcia EV (1996) Analyzing and predicting images through a neural network approach. In: Proceedings of visualization in biomedical computing, pp 253–258
Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. ACM SIGMOD conference, pp 255–264
Brossette S, Sprague A, Hardin J, Waites K, Jones W, Moser S (1998) Association rules and data mining in hospital infection control and public health surveillance. J Am Med Inform Assoc (JAMIA) 5(4):373–381
Google Scholar
Brossette S, Sprague A, Jones W, Moser S (2000) A data mining system for infection control surveillance. Methods Inf Med 39(4):303–310
Google Scholar
Bykowski A, Rigotti C (2003) Dbc: a condensed representation of frequent patterns for efficient mining. Inform Syst 28(8):949–977
Article Google Scholar
Chen T, Chou L, Hwang S (2003) Application of a data mining technique to analyze coprescription patterns for antacids in Taiwan. Clin Ther 25(9):2453–2463
Article Google Scholar
Cooke D, Ordonez C, Garcia E.V, Omiecinski E, Krawczynska E, Folks R, Santana C, de Braal L, Ezquerra N (1999) Data mining of large myocardial perfusion SPECT (MPS) databases to improve diagnostic decision making. J Nuclear Med 40(5)
Cooke D, Santana C, Morris T, de Braal L, Ordonez C, Omiecinski E, Ezquerra N, Garcia E (2000a) Data mining of large myocardial perfusion SPECT (MPS) databases: validation of expert system rule confidences. J Nuclear Med 41(5):187
Google Scholar
Cooke D, Santana C, Morris T, de Braal L, Ordonez C, Omiecinski E, Ezquerra N, Garcia EV (2000b) Validating expert system rule confidences using data mining of myocardial perfusion SPECT databases. Computers in cardiology conference, pp 116–119
Creighton C, Hanash S (2003) Mining gene expression databases for association rules. Bioinformatics 19(1):79–86
Article Google Scholar
Cristofor L, Simovici D (2002) Generating an informative cover for association rules. ICDM, pp 597–600
Delgado M, Sanchez D, Martin-Bautista M, Vila M (2001) Mining association rules with improved semantics in medical databases. Artif Intell Med 21(1–3):241–245
Article Google Scholar
Down S, Wallace M (2000) Mining association rules from a pediatric primary care decision support system. In: Proceedings of AMIA symposium, pp 200–204
Ezquerra N, Mullick R (1993) Perfex: an expert system for interpreting myocardial perfusion. Expert Syst Appl 6:455–468
Article Google Scholar
Fraser H, Long W, Naimi S (2003) Evaluation of a cardiac diagnostic program in a typical clinical setting. J Am Med Inform Assoc (JAMIA) 10(4):373–381
Article Google Scholar
Freitas A (2000) Understanding the crucial differences between classification and association rules–-a position paper. SIGKDD Explor 2(1):65–69
Article Google Scholar
Gade K, Wan J, Karypis G (2004) Efficient closed pattern mining in the presence of tough block constraints. ACM KDD conference, pp 138–147
Gouda K, Zaki M (2001) Efficiently mining maximal frequent itemsets. ICDM conference, pp 163–170
Han J (1996a) Background for association rules and cost estimate of selected mining algorithms. ACM CIKM, pp 73–80
Han J (1996b) Pushing constraints in templates for mining association rules. Florida AI research symposium, pp 375–379
Han J, Kamber M (2001) Data mining: Concepts and techniques, 1st edn. Morgan Kaufmann, San Francisco
Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Conference, pp 1–12
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning, 1st edn. Springer, New York
MATH Google Scholar
Klemettinen M, Mannila H, Ronkainen P, Toivonen H, Verkamo A (1994) Finding interesting rules from large sets of discovered association rules. ACM CIKM, pp 401–407
Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. IEEE ICDM conference, pp 305–312
Kryszkiewicz M (2004) Reducing borders of k-disjunction free representations of frequent patterns. ACM SAC conference, pp 559–563
Lakshmanan LV, Ng R, Han J, Pang A (1999) Optimization of constrained frequent set queries with 2-variable constraints. ACM SIGMOD conference, pp 157–168
Lent B, Swami A, Widom J (1997) Clustering association rules. IEEE ICDE conference, pp 220–231
Lin D, Kedem Z (1998) Pincer-search: a new algorithm for discovering the maximum frequent itemset. EDBT conference, pp 105–119
Long W (1989) Medical reasoning using a probabilistic network. Appl Artif Intell 3:367–383
Google Scholar
Long W, Fraser H, Naimi S (1997) Reasoning requirements for diagnosis of heart disease. Artif Intell Med 10(1):5–24
Article Google Scholar
Ng R, Lakshmanan L, Han J (1998) Exploratory mining and pruning optimizations of constrained association rules. ACM SIGMOD conference, pp 13–24
Ordonez C, Omiecinski E, de Braal L, Santana C, Ezquerra N (2001) Mining constrained association rules to predict heart disease. IEEE ICDM conference, pp 433–440
Ordonez C, Santana C, Braal L (2000) Discovering interesting association rules in medical data. ACM DMKD workshop, pp 78–85
Oyama T, Kitano K, Satou T, Ito T (2002) Extraction of knowledge on protein–protein interaction by association rule discovery. Bioinformatics 18(5):705–714
Article Google Scholar
Pasquier N, Bastide Y, Taouil RG, Lakhal L (1999) Discovering frequent closed itemsets for association rules. ICDT conference, pp 398–416
Pei J, Han J (2002) Constraints in data mining: constrained frequent pattern mining: a pattern-growth view. SIGKDD Explor 4(1):31–39
Article Google Scholar
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. ACM DMKD workshop, pp 21–30
Phan-Luong V (2001) The representative basis for association rules. IEEE ICDM, pp 639–640
Pudi V, Haritsa J (2003) Reducing rule covers with deterministic error bounds. PAKDD conference, pp 313–324
Rastogi R, Shim K (1998) Mining optimized association rules with categorical and numeric attributes. IEEE ICDE conference, pp 503–512
Roddick J, Fule P, Graco W (2003) Exploratory medical knowledge discovery: experiences and issues. SIGKDD Explor 5(1):94–99
Article Google Scholar
Srikant R, Agrawal R (1995) Mining generalized association rules. VLDB conference, pp 407–419
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. ACM SIGMOD conference, pp 1–12
Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. ACM KDD conference, pp 67–73
Taouil R, Pasquier N, Bastide Y, Lakhal L (2000) Mining bases for association rules using closed sets. IEEE ICDE conference, p 307
Wang K, He Y, Han J (2003) Pushing support constraints into association rules mining. IEEE TKDE 15(3):642–658
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Teradata, NCR, 17095 Via del Campo, San Diego, CA, 92127, USA
Carlos Ordonez
Georgia Institute of Technology, Atlanta, GA, USA
Norberto Ezquerra
Emory University Hospital, GA, USA
Cesar A. Santana

Authors

Carlos Ordonez
View author publications
You can also search for this author in PubMed Google Scholar
Norberto Ezquerra
View author publications
You can also search for this author in PubMed Google Scholar
Cesar A. Santana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos Ordonez.

Additional information

Carlos Ordonez received a degree in applied mathematics (actuarial sciences) and an MS degree in computer science, both from the UNAM University, Mexico, in 1992 and 1996, respectively. He got a PhD degree in computer science from the Georgia Institute of Technology, USA, in 2000. Dr. Ordonez currently works for Teradata (NCR) conducting research on database and data mining technology. He has published more than 20 research articles and holds three patents.

Norberto Ezquerra obtained his undergraduate degree in mathematics and physics from the University of South Florida, and his doctoral degree from Florida State University, USA. He is an associate professor at the College of Computing at the Georgia Institute of Technology and an adjunct faculty member in the Emory University School of Medicine. His research interests include computer graphics, computer vision in medicine, AI in medicine, modeling of physically based systems, medical informatics and telemedicine. He is associate editor of the IEEE Transactions on Medical Imaging Journal, and a member of the American Medical Informatics Association and the IEEE Engineering in Medicine Biology Society.

Cesar A. Santana received his MD degree in 1984 from the Institute of Medical Science, in Havana, Cuba. In 1988, he finished his residency training in internal medicine, and in 1991, completed a fellowship in nuclear medicine in Havana, Cuba. Dr. Santana received a PhD in nuclear cardiology in 1996 from the Department of Cardiology of the Vall d' Hebron University Hospital in Barcelona, Spain. Dr. Santana is an assistant professor at the Emory University School of Medicine and conducts research in the Radiology Department at the Emory University Hospital.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ordonez, C., Ezquerra, N. & Santana, C.A. Constraining and summarizing association rules in medical data. Knowl Inf Syst 9, 1–2 (2006). https://doi.org/10.1007/s10115-005-0226-5

Download citation

Received: 30 January 2004
Revised: 31 January 2005
Accepted: 15 March 2005
Published: 09 September 2005
Issue Date: March 2006
DOI: https://doi.org/10.1007/s10115-005-0226-5

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constraining and summarizing association rules in medical data

Abstract

Access this article

Similar content being viewed by others

Sets of Robust Rules, and How to Find Them

Mining Maximal Association Rules on Soft Sets Using Critical Relative Support Based Pruning

Significant Association Rule Mining Without Support and Confidence Thresholds

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Constraining and summarizing association rules in medical data

Abstract

Access this article

Similar content being viewed by others

Sets of Robust Rules, and How to Find Them

Mining Maximal Association Rules on Soft Sets Using Critical Relative Support Based Pruning

Significant Association Rule Mining Without Support and Confidence Thresholds

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation