Skip to main content

Advertisement

Log in

Fuzzy clustering-based discretization for gene expression classification

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

This paper presents a novel classification approach that integrates fuzzy class association rules and support vector machines. A fuzzy discretization technique based on fuzzy c-means clustering algorithm is employed to transform the training set, particularly quantitative attributes, to a format appropriate for association rule mining. A hill-climbing procedure is adapted for automatic thresholds adjustment and fuzzy class association rules are mined accordingly. The compatibility between the generated rules and fuzzy patterns is considered to construct a set of feature vectors, which are used to generate a classifier. The reported test results show that compatibility rule-based feature vectors present a highly- qualified source of discrimination knowledge that can substantially impact the prediction power of the final classifier. In order to evaluate the applicability of the proposed method to a variety of domains, it is also utilized for the popular task of gene expression classification. Further, we show how this method provide biologists with an accurate and more understandable classifier model compared to other machine learning techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the international conference on very large data bases, pp 487–499

  2. Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. In: Proceedings of COMP. GEOSCI, vol 10(2–3), pp 191–203

  3. Caruana R, Freitag D (1994) Greedy attribute selection. In: Proceedings of the international conference on machine learning, pp 28–36

  4. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. www.csie.ntu.edu.tw/~cjlin/libsvm

  5. Coenen FP (2003) LUCS-KDD DN Software (Version 2). Source code is available at: http://www.csc.liv.ac.uk/~frans/KDD/Software/LUCS_KDD_DN/

  6. Coenen FP (2004) The LUCS-KDD TFPC classification association rule mining algorithm. The source code is available at: www.cSc.liv.ac.uk/~frans/KDD/Software/Apriori_TFPC/aprioriTFPC.html

  7. Coenen FP, Leng P (2007) The effect of threshold values on association rule based classification accuracy. Data Knowl Eng 60(2): 345–360

    Article  Google Scholar 

  8. Cong G, Tan KL, Tung AKH, Xu X (2005) Mining top-k covering rule groups for gene expression data. In: Proceedings of ACM SIGMOD international conference on management of data, pp 670–681

  9. Dy JG, Brodley CE (2000) Feature subset selection and order identification for unsupervised learning. In: Proceedings of the international conference on machine learning, pp 247–254

  10. Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of the international conference on machine learning, pp 74–81

  11. Dash M, Choi K, Scheuermann P, Liu H (2002) Feature selection for clustering-a filter solution. In: Proceedings of IEEE international conference on data mining, pp 115–122

  12. Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Proceedings of the international conference on machine learning

  13. Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of IEEE international conference on tools for artificial intelligence, pp 1022–1027

  14. Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the international conference on machine learning, pp 359–366

  15. Ishibuchi H, Nakashima T (1999) Improving the performance of fuzzy classifier systems for pattern classification problems with continuous attributes. IEEE Trans Ind Electron 46(6): 157–168

    Article  Google Scholar 

  16. Ishibuchi H, Nozaki K, Tanaka H (1992) Distributed representation of fuzzy rules and its application to pattern classification. Fuzzy Sets Syst 52(1): 21–32

    Article  Google Scholar 

  17. Jiang D, Pei J, Ramanathan M, Lin C, Tang C, Zhang A (2007) Mining genesampletime microarray data: a coherent gene cluster discovery approach. Knowl Inform Syst 13(3): 305–335

    Article  Google Scholar 

  18. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inform Syst 3(3): 263–286

    Article  MATH  Google Scholar 

  19. Kianmehr K, Alshalalfa M, Alhajj R (2008) Effectiveness of fuzzy discretization for class association rule-based classification. In: Proceedings of the international symposium on methodologies for intelligent systems, pp 298–308

  20. Kianmehr K, Alhajj R (2006) Effective classification by integrating support vector machine and association rule mining. In: Proceedings of the international conference on intelligent data engineering and automated learning, pp 920–927

  21. Kianmehr K, Alhajj R (2006) Support vector machine approach for fast classification. In: Proceedings of the international conference on data warehouse and knowledge discovery, pp 534–543

  22. Kianmehr K, Alhajj R (2008) CARSVM: a class association rule-based classification framework and its application to gene expression data. Artif Intell Med 44(1): 7–25

    Article  Google Scholar 

  23. Khabbaz M, Kianmehr K, Alshalalfa M, Alhajj R (2008) Effectiveness of fuzzy classifier rules in capturing correlations between genes. Int J Data Warehousing Mining 4(4): 62–83

    Google Scholar 

  24. Khabbaz M, Kianmehr K, Alshalalfa M, Alhajj R (2007) Fuzzy classifier based feature reduction for better gene selection. In: Proceedings of the international conference on data warehouse and knowledge discovery, pp 334–344

  25. Li W, Han J, Pei J (2001) CMAR: Accurate and efficient classification based on multiple class-association rules. In: Proceedings of IEEE international conference on data mining, pp 369–376

  26. Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of ACM KDD, AAAI, pp 80–86

  27. Lu X et al (2008) Predicting features of breast cancer with gene expression patterns. Breast Cancer Res Treat 108: 191–201

    Article  Google Scholar 

  28. Merz CJ, Murphy P (1996) UCI repository of machine learning database. Data set is available at: http://www.cs.uci.edu/~mlearn/MLRepository.html (1996)

  29. Ng AY (1998) On feature selection: learning with exponentially many irrelevant features as training examples. In: Proceedings of the international conference on machine learning, pp 404–412

  30. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Francisco

    Google Scholar 

  31. Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Proceedings of ACM SIGMOD international conference on management of data

  32. Taylor KM et al (2007) The emerging role of the LIV-1 subfamily of zinc transporters in breast cancer. Mol Med 13: 396–406

    Article  Google Scholar 

  33. Tzanis G, Berberidis C, Vlahavas I (2005) Biological data mining. In: Rivero, L.C., Doorn, J.H., Ferraggine, V.E. (eds) Encyclopedia of database technologies and applications. IDEA Group Publishing, Hershey

  34. Zhao H (2008) Instance weighting versus threshold adjusting for cost-sensitive classification. Knowl Inform Syst 15(3): 321–334

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reda Alhajj.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kianmehr, K., Alshalalfa, M. & Alhajj, R. Fuzzy clustering-based discretization for gene expression classification. Knowl Inf Syst 24, 441–465 (2010). https://doi.org/10.1007/s10115-009-0214-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0214-2

Keywords

Navigation