Skip to main content
Log in

Constraining and summarizing association rules in medical data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Association rules are a data mining technique used to discover frequent patterns in a data set. In this work, association rules are used in the medical domain, where data sets are generally high dimensional and small. The chief disadvantage about mining association rules in a high dimensional data set is the huge number of patterns that are discovered, most of which are irrelevant or redundant. Several constraints are proposed for filtering purposes, since our aim is to discover only significant association rules and accelerate the search process. A greedy algorithm is introduced to compute rule covers in order to summarize rules having the same consequent. The significance of association rules is evaluated using three metrics: support, confidence and lift. Experiments focus on discovering association rules on a real data set to predict absence or existence of heart disease. Constraints are shown to significantly reduce the number of discovered rules and improve running time. Rule covers summarize a large number of rules by producing a succinct set of rules with high-quality metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD conference, pp 207–216

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. VLDB conference, pp 487–499

  3. Bastide Y, Pasquier N, Taouil R, Lakhal GL (2000) Mining minimal non-redundant association rules using frequent closed itemsets. Computational logic, pp 972–986

  4. Bayardo R, Agrawal R (1999) Mining the most interesting rules. ACM KDD conference, pp 145–154

  5. Becquet C, Blachon S, Jeudy B, Boulicaut J, Gandrillon O (2002) Strong association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data. Genom Biol 3(12)

  6. Braal L, Ezquerra N, Schwartz E, Garcia EV (1996) Analyzing and predicting images through a neural network approach. In: Proceedings of visualization in biomedical computing, pp 253–258

  7. Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. ACM SIGMOD conference, pp 255–264

  8. Brossette S, Sprague A, Hardin J, Waites K, Jones W, Moser S (1998) Association rules and data mining in hospital infection control and public health surveillance. J Am Med Inform Assoc (JAMIA) 5(4):373–381

    Google Scholar 

  9. Brossette S, Sprague A, Jones W, Moser S (2000) A data mining system for infection control surveillance. Methods Inf Med 39(4):303–310

    Google Scholar 

  10. Bykowski A, Rigotti C (2003) Dbc: a condensed representation of frequent patterns for efficient mining. Inform Syst 28(8):949–977

    Article  Google Scholar 

  11. Chen T, Chou L, Hwang S (2003) Application of a data mining technique to analyze coprescription patterns for antacids in Taiwan. Clin Ther 25(9):2453–2463

    Article  Google Scholar 

  12. Cooke D, Ordonez C, Garcia E.V, Omiecinski E, Krawczynska E, Folks R, Santana C, de Braal L, Ezquerra N (1999) Data mining of large myocardial perfusion SPECT (MPS) databases to improve diagnostic decision making. J Nuclear Med 40(5)

  13. Cooke D, Santana C, Morris T, de Braal L, Ordonez C, Omiecinski E, Ezquerra N, Garcia E (2000a) Data mining of large myocardial perfusion SPECT (MPS) databases: validation of expert system rule confidences. J Nuclear Med 41(5):187

    Google Scholar 

  14. Cooke D, Santana C, Morris T, de Braal L, Ordonez C, Omiecinski E, Ezquerra N, Garcia EV (2000b) Validating expert system rule confidences using data mining of myocardial perfusion SPECT databases. Computers in cardiology conference, pp 116–119

  15. Creighton C, Hanash S (2003) Mining gene expression databases for association rules. Bioinformatics 19(1):79–86

    Article  Google Scholar 

  16. Cristofor L, Simovici D (2002) Generating an informative cover for association rules. ICDM, pp 597–600

  17. Delgado M, Sanchez D, Martin-Bautista M, Vila M (2001) Mining association rules with improved semantics in medical databases. Artif Intell Med 21(1–3):241–245

    Article  Google Scholar 

  18. Down S, Wallace M (2000) Mining association rules from a pediatric primary care decision support system. In: Proceedings of AMIA symposium, pp 200–204

  19. Ezquerra N, Mullick R (1993) Perfex: an expert system for interpreting myocardial perfusion. Expert Syst Appl 6:455–468

    Article  Google Scholar 

  20. Fraser H, Long W, Naimi S (2003) Evaluation of a cardiac diagnostic program in a typical clinical setting. J Am Med Inform Assoc (JAMIA) 10(4):373–381

    Article  Google Scholar 

  21. Freitas A (2000) Understanding the crucial differences between classification and association rules–-a position paper. SIGKDD Explor 2(1):65–69

    Article  Google Scholar 

  22. Gade K, Wan J, Karypis G (2004) Efficient closed pattern mining in the presence of tough block constraints. ACM KDD conference, pp 138–147

  23. Gouda K, Zaki M (2001) Efficiently mining maximal frequent itemsets. ICDM conference, pp 163–170

  24. Han J (1996a) Background for association rules and cost estimate of selected mining algorithms. ACM CIKM, pp 73–80

  25. Han J (1996b) Pushing constraints in templates for mining association rules. Florida AI research symposium, pp 375–379

  26. Han J, Kamber M (2001) Data mining: Concepts and techniques, 1st edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  27. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Conference, pp 1–12

  28. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning, 1st edn. Springer, New York

    MATH  Google Scholar 

  29. Klemettinen M, Mannila H, Ronkainen P, Toivonen H, Verkamo A (1994) Finding interesting rules from large sets of discovered association rules. ACM CIKM, pp 401–407

  30. Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. IEEE ICDM conference, pp 305–312

  31. Kryszkiewicz M (2004) Reducing borders of k-disjunction free representations of frequent patterns. ACM SAC conference, pp 559–563

  32. Lakshmanan LV, Ng R, Han J, Pang A (1999) Optimization of constrained frequent set queries with 2-variable constraints. ACM SIGMOD conference, pp 157–168

  33. Lent B, Swami A, Widom J (1997) Clustering association rules. IEEE ICDE conference, pp 220–231

  34. Lin D, Kedem Z (1998) Pincer-search: a new algorithm for discovering the maximum frequent itemset. EDBT conference, pp 105–119

  35. Long W (1989) Medical reasoning using a probabilistic network. Appl Artif Intell 3:367–383

    Google Scholar 

  36. Long W, Fraser H, Naimi S (1997) Reasoning requirements for diagnosis of heart disease. Artif Intell Med 10(1):5–24

    Article  Google Scholar 

  37. Ng R, Lakshmanan L, Han J (1998) Exploratory mining and pruning optimizations of constrained association rules. ACM SIGMOD conference, pp 13–24

  38. Ordonez C, Omiecinski E, de Braal L, Santana C, Ezquerra N (2001) Mining constrained association rules to predict heart disease. IEEE ICDM conference, pp 433–440

  39. Ordonez C, Santana C, Braal L (2000) Discovering interesting association rules in medical data. ACM DMKD workshop, pp 78–85

  40. Oyama T, Kitano K, Satou T, Ito T (2002) Extraction of knowledge on protein–protein interaction by association rule discovery. Bioinformatics 18(5):705–714

    Article  Google Scholar 

  41. Pasquier N, Bastide Y, Taouil RG, Lakhal L (1999) Discovering frequent closed itemsets for association rules. ICDT conference, pp 398–416

  42. Pei J, Han J (2002) Constraints in data mining: constrained frequent pattern mining: a pattern-growth view. SIGKDD Explor 4(1):31–39

    Article  Google Scholar 

  43. Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. ACM DMKD workshop, pp 21–30

  44. Phan-Luong V (2001) The representative basis for association rules. IEEE ICDM, pp 639–640

  45. Pudi V, Haritsa J (2003) Reducing rule covers with deterministic error bounds. PAKDD conference, pp 313–324

  46. Rastogi R, Shim K (1998) Mining optimized association rules with categorical and numeric attributes. IEEE ICDE conference, pp 503–512

  47. Roddick J, Fule P, Graco W (2003) Exploratory medical knowledge discovery: experiences and issues. SIGKDD Explor 5(1):94–99

    Article  Google Scholar 

  48. Srikant R, Agrawal R (1995) Mining generalized association rules. VLDB conference, pp 407–419

  49. Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. ACM SIGMOD conference, pp 1–12

  50. Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. ACM KDD conference, pp 67–73

  51. Taouil R, Pasquier N, Bastide Y, Lakhal L (2000) Mining bases for association rules using closed sets. IEEE ICDE conference, p 307

  52. Wang K, He Y, Han J (2003) Pushing support constraints into association rules mining. IEEE TKDE 15(3):642–658

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Ordonez.

Additional information

Carlos Ordonez received a degree in applied mathematics (actuarial sciences) and an MS degree in computer science, both from the UNAM University, Mexico, in 1992 and 1996, respectively. He got a PhD degree in computer science from the Georgia Institute of Technology, USA, in 2000. Dr. Ordonez currently works for Teradata (NCR) conducting research on database and data mining technology. He has published more than 20 research articles and holds three patents.

Norberto Ezquerra obtained his undergraduate degree in mathematics and physics from the University of South Florida, and his doctoral degree from Florida State University, USA. He is an associate professor at the College of Computing at the Georgia Institute of Technology and an adjunct faculty member in the Emory University School of Medicine. His research interests include computer graphics, computer vision in medicine, AI in medicine, modeling of physically based systems, medical informatics and telemedicine. He is associate editor of the IEEE Transactions on Medical Imaging Journal, and a member of the American Medical Informatics Association and the IEEE Engineering in Medicine Biology Society.

Cesar A. Santana received his MD degree in 1984 from the Institute of Medical Science, in Havana, Cuba. In 1988, he finished his residency training in internal medicine, and in 1991, completed a fellowship in nuclear medicine in Havana, Cuba. Dr. Santana received a PhD in nuclear cardiology in 1996 from the Department of Cardiology of the Vall d' Hebron University Hospital in Barcelona, Spain. Dr. Santana is an assistant professor at the Emory University School of Medicine and conducts research in the Radiology Department at the Emory University Hospital.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ordonez, C., Ezquerra, N. & Santana, C.A. Constraining and summarizing association rules in medical data. Knowl Inf Syst 9, 1–2 (2006). https://doi.org/10.1007/s10115-005-0226-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-005-0226-5

Navigation