Abstract
This paper presents a novel classification approach that integrates fuzzy class association rules and support vector machines. A fuzzy discretization technique based on fuzzy c-means clustering algorithm is employed to transform the training set, particularly quantitative attributes, to a format appropriate for association rule mining. A hill-climbing procedure is adapted for automatic thresholds adjustment and fuzzy class association rules are mined accordingly. The compatibility between the generated rules and fuzzy patterns is considered to construct a set of feature vectors, which are used to generate a classifier. The reported test results show that compatibility rule-based feature vectors present a highly- qualified source of discrimination knowledge that can substantially impact the prediction power of the final classifier. In order to evaluate the applicability of the proposed method to a variety of domains, it is also utilized for the popular task of gene expression classification. Further, we show how this method provide biologists with an accurate and more understandable classifier model compared to other machine learning techniques.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the international conference on very large data bases, pp 487–499
Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. In: Proceedings of COMP. GEOSCI, vol 10(2–3), pp 191–203
Caruana R, Freitag D (1994) Greedy attribute selection. In: Proceedings of the international conference on machine learning, pp 28–36
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. www.csie.ntu.edu.tw/~cjlin/libsvm
Coenen FP (2003) LUCS-KDD DN Software (Version 2). Source code is available at: http://www.csc.liv.ac.uk/~frans/KDD/Software/LUCS_KDD_DN/
Coenen FP (2004) The LUCS-KDD TFPC classification association rule mining algorithm. The source code is available at: www.cSc.liv.ac.uk/~frans/KDD/Software/Apriori_TFPC/aprioriTFPC.html
Coenen FP, Leng P (2007) The effect of threshold values on association rule based classification accuracy. Data Knowl Eng 60(2): 345–360
Cong G, Tan KL, Tung AKH, Xu X (2005) Mining top-k covering rule groups for gene expression data. In: Proceedings of ACM SIGMOD international conference on management of data, pp 670–681
Dy JG, Brodley CE (2000) Feature subset selection and order identification for unsupervised learning. In: Proceedings of the international conference on machine learning, pp 247–254
Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of the international conference on machine learning, pp 74–81
Dash M, Choi K, Scheuermann P, Liu H (2002) Feature selection for clustering-a filter solution. In: Proceedings of IEEE international conference on data mining, pp 115–122
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Proceedings of the international conference on machine learning
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of IEEE international conference on tools for artificial intelligence, pp 1022–1027
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the international conference on machine learning, pp 359–366
Ishibuchi H, Nakashima T (1999) Improving the performance of fuzzy classifier systems for pattern classification problems with continuous attributes. IEEE Trans Ind Electron 46(6): 157–168
Ishibuchi H, Nozaki K, Tanaka H (1992) Distributed representation of fuzzy rules and its application to pattern classification. Fuzzy Sets Syst 52(1): 21–32
Jiang D, Pei J, Ramanathan M, Lin C, Tang C, Zhang A (2007) Mining genesampletime microarray data: a coherent gene cluster discovery approach. Knowl Inform Syst 13(3): 305–335
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inform Syst 3(3): 263–286
Kianmehr K, Alshalalfa M, Alhajj R (2008) Effectiveness of fuzzy discretization for class association rule-based classification. In: Proceedings of the international symposium on methodologies for intelligent systems, pp 298–308
Kianmehr K, Alhajj R (2006) Effective classification by integrating support vector machine and association rule mining. In: Proceedings of the international conference on intelligent data engineering and automated learning, pp 920–927
Kianmehr K, Alhajj R (2006) Support vector machine approach for fast classification. In: Proceedings of the international conference on data warehouse and knowledge discovery, pp 534–543
Kianmehr K, Alhajj R (2008) CARSVM: a class association rule-based classification framework and its application to gene expression data. Artif Intell Med 44(1): 7–25
Khabbaz M, Kianmehr K, Alshalalfa M, Alhajj R (2008) Effectiveness of fuzzy classifier rules in capturing correlations between genes. Int J Data Warehousing Mining 4(4): 62–83
Khabbaz M, Kianmehr K, Alshalalfa M, Alhajj R (2007) Fuzzy classifier based feature reduction for better gene selection. In: Proceedings of the international conference on data warehouse and knowledge discovery, pp 334–344
Li W, Han J, Pei J (2001) CMAR: Accurate and efficient classification based on multiple class-association rules. In: Proceedings of IEEE international conference on data mining, pp 369–376
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of ACM KDD, AAAI, pp 80–86
Lu X et al (2008) Predicting features of breast cancer with gene expression patterns. Breast Cancer Res Treat 108: 191–201
Merz CJ, Murphy P (1996) UCI repository of machine learning database. Data set is available at: http://www.cs.uci.edu/~mlearn/MLRepository.html (1996)
Ng AY (1998) On feature selection: learning with exponentially many irrelevant features as training examples. In: Proceedings of the international conference on machine learning, pp 404–412
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Francisco
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Proceedings of ACM SIGMOD international conference on management of data
Taylor KM et al (2007) The emerging role of the LIV-1 subfamily of zinc transporters in breast cancer. Mol Med 13: 396–406
Tzanis G, Berberidis C, Vlahavas I (2005) Biological data mining. In: Rivero, L.C., Doorn, J.H., Ferraggine, V.E. (eds) Encyclopedia of database technologies and applications. IDEA Group Publishing, Hershey
Zhao H (2008) Instance weighting versus threshold adjusting for cost-sensitive classification. Knowl Inform Syst 15(3): 321–334
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kianmehr, K., Alshalalfa, M. & Alhajj, R. Fuzzy clustering-based discretization for gene expression classification. Knowl Inf Syst 24, 441–465 (2010). https://doi.org/10.1007/s10115-009-0214-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0214-2