Abstract
Decision tree algorithms are very popular in the area of data mining since the algorithms have a simple inference mechanism and provide a comprehensible way to represent the model. Over the past years, fuzzy decision tree algorithms have been proposed in order to handle the uncertainty in the data. Fuzzy decision tree algorithms have shown to outperform classical decision tree algorithms. This chapter investigates a fuzzy decision tree algorithm applied to the classification of gene expression data. The fuzzy decision tree algorithm is compared to a classical decision tree algorithm as well as other well-known data mining algorithms commonly applied to classification tasks. Based on the five data sets analyzed, the fuzzy decision tree algorithm outperforms the classical decision tree algorithm. However, compared to other commonly used classification algorithms, both decision tree algorithms are competitive, but they do not reach the accuracy values of the best performing classifier. One of the advantages of decision tree models including the fuzzy decision tree algorithm is however the simplicity and comprehensibility of the model as demonstrated in the chapter.
References
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A 96(12):6745–6750
An S, Hu Q (2012) Fuzzy rough decision trees. In: Yao J et al (eds) Rough sets and current trends in computing, vol 7413. Springer, Berlin/Heidelberg, pp 397–404
Biswal M, Dash P (2013) Measurement and classification of simultaneous power signal patterns with an S-transform variant and fuzzy decision tree. IEEE Trans Ind Inf 9(4):1819–1827
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
Chang RL, Pavlidis T (1977) Fuzzy decision tree algorithms. IEEE Trans Syst Man Cybern 7(1):28–35
Chang P-C, Fan C-Y, Dzan W-Y (2010) A CBR-based fuzzy decision tree approach for database classification. Expert Syst Appl 37(1):214–225
Chen M, Ludwig SA (2013) Fuzzy decision tree using soft discretization and a genetic algorithm based feature selection method. In: World congress on nature and biologically inspired computing
Cuperlovic-Culf M, Belacel N, Ouellette, RJ (2005) Determination of tumour marker genes from gene expression data. Drug Discov Today10(6):429–437
Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34(2):113–127
Frank A, Asuncion A (2010) UCI machine learning repository, University of california, school of information and computer science, Irvine, CA. http://archive.ics.uci.edu/ml
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Gordon G, Jensen R, Hsiao L-L, Gullans S, Blumenstock J, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Hamdan H, Garibaldi J (2010) Adaptive neuro-fuzzy inference system (ANFIS) in modelling breast cancer survival. In: Fuzzy systems (fuzz), 2010 ieee international conference on, p 1–8
Janikow C (1998) Fuzzy decision trees: issues and methods. IEEE Trans Syst Man Cybern B Cybern 28(1):1–14
Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679
Khan MU, Choi JP, Shin H, Kim M (2008) Predicting breast cancer survivability using fuzzy decision trees for personalized healthcare. In: Engineering in medicine and biology society, 2008. embs 2008. 30th annual international conference of the IEEE, pp 5148–5151
Lai RK, Fan C-Y, Huang W-H, Chang, P-C (2009) Evolving and clustering fuzzy decision tree for financial time series data forecasting. Expert Syst Appl 36(2, Part 2):3761–3773
Li MF, Casey SF (2004) Multi-class cancer subtype classification based on gene expression signatures with reliability analysis. FEBS Lett 561(13):186–190
Lu Y, Han J (2003) Cancer classification using gene expression data. Inf Syst 28:243–268
Ludwig SA, Jakobovic D, Picek S (2015) Analyzing gene expression data: fuzzy decision tree algorithm applied to the classification of cancer data. In: 2015 IEEE international conference on fuzzy systems
Mitra S, Konwar K, Pal S (2002) Fuzzy decision tree, linguistic rules and fuzzy knowledge-based network: generation and evaluation. IEEE Trans Syst Man Cybern Part C Appl Rev 32(4): 328–339
Olaru C, Wehenkel L (2003) A complete fuzzy decision tree technique. Fuzzy Sets Syst 138(2):221–254
Padmapriya B, Velmurugan T (2014) A survey on breast cancer analysis using data mining techniques. In: Computational intelligence and computing research (ICCIC), 2014 IEEE international conference on, pp 1–4
Palivela H, Yogish H, Vijaykumar S, Patil K (2013) Survey on mining techniques for breast cancer related data. In: Information communication and embedded systems (ICICES), 2013 international conference on, pp 540–546
Petalidis L, Oulas A, Backlund M, Wayland M, Liu L, Plant K, Happerfield L, Freeman TC, Poirazi P, Collins V (2008) Improved grading and survival prediction of human astrocytic brain tumors by artificial neural network analysis of gene expression microarray data. Mol Cancer Ther 7(5):1013–1024
Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC, Liotta LA (2002) Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359(9306):572–577
Polaka I, Tom I, Borisov A (2010) Decision Tree Classifiers in Bioinformatics. J. Riga Tech. Univ. 42:118–123
Quinlan JR (1979) Discovering rules by induction from large collections of examples. In: Michie D (ed) Expert systems in the microlectronic age. Edinburgh University Press, Edinburgh, pp 168–201
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, Loda M, Lander ES, Golub TR (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A 98(26):15149–15154
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Swets JA (1996) Signal detection theory and ROC analysis in psychology and diagnostics: collected papers. Lawrence Erlbaum Associates, Mahwah, NJ
Ulfenborg B, Klinga-Levan K, Olsson B (2013) Classification of tumor samples from expression data using decision trunks. Cancer Informat 12:53–66
van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Augustinus AMH, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536
Wang K-J, Makond B, Wang K-M (2013) An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data. BMC Med Inform Decis Mak 13(1):124
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd ed. Morgan Kaufmann Publishers Inc., San Francisco, CA
Yu L, Han Y, Berens ME (2012) Stable gene selection from microarray data via sample weighting. IEEE/ACM Trans Comput Biol Bioinform 9(1):262–272
Acknowledgements
The authors would like to thank Gongyi Xia for the drawing of the figures of the decision trees.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Ludwig, S.A., Picek, S., Jakobovic, D. (2018). Classification of Cancer Data: Analyzing Gene Expression Data Using a Fuzzy Decision Tree Algorithm. In: Kahraman, C., Topcu, Y. (eds) Operations Research Applications in Health Care Management. International Series in Operations Research & Management Science, vol 262. Springer, Cham. https://doi.org/10.1007/978-3-319-65455-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-65455-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65453-9
Online ISBN: 978-3-319-65455-3
eBook Packages: Business and ManagementBusiness and Management (R0)