Skip to main content
Log in

An improved predictive association rule based classifier using gain ratio and T-test for health care data diagnosis

  • Published:
Sadhana Aims and scope Submit manuscript

Abstract

Health care data diagnosis is a significant task that needs to be executed precisely, which requires much experience and domain-knowledge. Traditional symptoms-based disease diagnosis may perhaps lead to false presumptions. In recent times, Associative Classification (AC), the combination of association rule mining and classification has received attention in health care applications which desires maximum accuracy. Though several AC techniques exist, they lack in generating quality rules for building efficient associative classifier. This paper aims to enhance the accuracy of the existing CPAR (Classification based on Predictive Association Rule) algorithm by generating quality rules using Gain Ratio. Mostly, health care applications deal with high dimensional datasets. Existence of high dimensions causes unfair estimates in disease diagnosis. Dimensionality reduction is commonly applied as a preprocessing step before classification task to improve classifier accuracy. It eliminates redundant and insignificant dimensions by keeping good ones without information loss. In this work, dimensionality reductions by T-test and reduct sets (or simply reducts) are performed as preprocessing step before CPAR and CPAR using Gain Ratio (CPAR-GR) algorithms. An investigation was also performed to determine the impact of T-test and reducts on CPAR and CPAR-GR. This paper synthesizes the existing work carried out in AC, and also discusses the factors that influence the performance of CPAR and CPAR-GR. Experiments were conducted using six health care datasets from UCI machine learning repository. Based on the experiments, CPAR-GR with T-test yields better classification accuracy than CPAR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

References

  • Antonie M L, Zaïane O R and Coman A 2003 Associative classifiers for medical images. Mining Multimedia and Complex Data. Lect. Notes. Comput. Sci. 2797: 68–83

    Article  Google Scholar 

  • Hassanien A E, Abraham A and Peters J F 2009 Schaefer G Rough sets in medical informatics applications. Adv. Soft. Comp. 2009(58): 23–30

    Article  Google Scholar 

  • Jabbar M A, Deekshatulu B L and Chandra P 2012 Heart disease prediction system using associative classification and genetic algorithm. In: Proceedings of the International Conference on Emerging Trends in Electrical, Electronics and Communication Technologies, Anantapur, India: Elsevier; 183–192

  • Jiao N and Miao D 2009 An efficient gene selection algorithm based on tolerance rough set theory. In: Proceedings of the 12 th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Delhi, India: Springer, Berlin Heidelberg; 176–183

  • Johnson D S 1974 Approximation algorithms for combinatorial problems. J. Comput. Syst. Sci. 9 (3): 256–278

    Article  MATH  Google Scholar 

  • Kohavi R and Frasca B 1994 Useful feature subsets and rough set reducts. In: Proceedings of the International Workshop on Rough Sets and Soft Computing, San Jose, California, USA: RSSC, USA; 310–317

  • Komorowski J, Skowron A and Øhrn A 2002 The ROSETTA: rough set software system. In: Kløsgen W and Zytkow J (eds), Handbook of Data Mining and Knowledge Discovery, Oxford: University Press; 554–559

  • Lei Y. and Huan L. 2003 Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the International Conference on Machine Learning, Washington, USA: AAAI, USA; 856–863

  • Li J. and Nick C. 2005 Discovering and ranking important rules. In: Proceedings of the International Conference on Granular Computing, Beijing, China: IEEE, USA; 506–511

  • Li S, Liao C and Kwok J T 2006 Gene feature extraction using t-test statistics and kernel partial least squares. In: Proceedings of the International Conference on Neural Information Processing, HongKong, China: Springer, Berlin Heidelberg; 11–20

  • Li W, Han J and Pei J 2001 CMAR: accurate and efficient classification based on multiple class association rules. In: Proceedings of the International Conference on Data Mining, San Jose, USA: IEEE, USA; 369–376

  • Liu B, Hsu W and Ma Y 1998 Integrating classification and association rule mining. In: Proceedings of the International Conference on Special Interest Group on Discovery and Data Mining, New York, USA: ACM,USA; 80–86

  • Noh K, Lee H G, Shon H S, Lee B J and Ryu K H 2006 Associative classification approach for diagnosing cardiovascular disease. Lect. Notes Contr. Inf. 345: 721–727

    Article  Google Scholar 

  • Øhrn A 1999 Discernibility and rough sets in medicine: tools and applications. Ph.D, Norwegian University of Science and Technology (NTNU), Trondheim, Norway

  • Pawlak Z 1982 Rough sets. Int. J. Comput. Inf. Sci. 11: 341–356

    Article  MATH  MathSciNet  Google Scholar 

  • Pawlak Z 1991 Rough sets: theoretical aspects of reasoning about data. Dordrecht, Boston, London: Kluwer Academic Publishers, 33–78

    Book  MATH  Google Scholar 

  • Pila A D and Monard M C 2001 An empirical comparison of rough sets reducts and other filters approaches for feature subset selection. In: Proceedings of the 6 th Iberoamerican Symposium on Pattern Recognition, Florianopolis,SC, Brazil: SIARP,Brazil; 41–49

  • Poolsawad N, Moore L Kambhampati C and Cleland J G F 2012 Handling missing values in data mining - A case study of heart failure dataset. In: Proceedings of the International Conference on Fuzzy Systems and Knowledge Discovery, Sichuan,China: IEEE, USA; 2934–2938

  • Qiang S. and Alexios C. 2001 Rough set based dimensionality reduction for supervised and unsupervised learning. Int. J. Appl. Math. Comp. 11: 583–601

    MATH  Google Scholar 

  • Quinlan J and Cameron J. 1993 FOIL: a midterm report. In: Proceedings of the European Conference on Machine Learning, Vienna, Austria, Springer, Berlin Heidelberg; 1–20

  • Susmaga R 2004 Tree-like parallelization of reduct and construct computation. In: Proceedings of the International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden: Springer, Berlin Heidelberg; 455–464

  • Vinterbo S and Ohno-Machado L 1999 A genetic algorithm to select variables in logistic regression: example in the domain of myocardial infarction. J. Am. Med. Inform. Assn. 6: 984–988

    Google Scholar 

  • Wang D, Zhang H, Liu R and Lv W 2012 Feature selection based on term frequency and t-test for text categorization. In: Proceedings of the International Conference on Information and Knowledge Management, Maui, Hawaii, USA: ACM,USA; 1482–1486

  • Wang L, Chu F and Xie W 2007 Accurate cancer classification using expressions of very few genes. IEEE ACM T Comput. Bi 4: 40–53

    Google Scholar 

  • Yin X and Han J 2003 CPAR: classification based on predictive association rules. In: Proceedings of the International Conference on Data Mining, San Francisco, USA: SIAM, USA; 331–335

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M NANDHINI.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

NANDHINI, M., SIVANANDAM, S.N. An improved predictive association rule based classifier using gain ratio and T-test for health care data diagnosis. Sadhana 40, 1683–1699 (2015). https://doi.org/10.1007/s12046-015-0410-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12046-015-0410-6

Keywords

Navigation