Skip to main content

Classification of Cancer Data: Analyzing Gene Expression Data Using a Fuzzy Decision Tree Algorithm

  • Chapter
  • First Online:
Operations Research Applications in Health Care Management

Abstract

Decision tree algorithms are very popular in the area of data mining since the algorithms have a simple inference mechanism and provide a comprehensible way to represent the model. Over the past years, fuzzy decision tree algorithms have been proposed in order to handle the uncertainty in the data. Fuzzy decision tree algorithms have shown to outperform classical decision tree algorithms. This chapter investigates a fuzzy decision tree algorithm applied to the classification of gene expression data. The fuzzy decision tree algorithm is compared to a classical decision tree algorithm as well as other well-known data mining algorithms commonly applied to classification tasks. Based on the five data sets analyzed, the fuzzy decision tree algorithm outperforms the classical decision tree algorithm. However, compared to other commonly used classification algorithms, both decision tree algorithms are competitive, but they do not reach the accuracy values of the best performing classifier. One of the advantages of decision tree models including the fuzzy decision tree algorithm is however the simplicity and comprehensibility of the model as demonstrated in the chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    http://datam.i2r.a-star.edu.sg/datasets/krbd/.

References

  • Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A 96(12):6745–6750

    Article  Google Scholar 

  • An S, Hu Q (2012) Fuzzy rough decision trees. In: Yao J et al (eds) Rough sets and current trends in computing, vol 7413. Springer, Berlin/Heidelberg, pp 397–404

    Chapter  Google Scholar 

  • Biswal M, Dash P (2013) Measurement and classification of simultaneous power signal patterns with an S-transform variant and fuzzy decision tree. IEEE Trans Ind Inf 9(4):1819–1827

    Article  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont

    Google Scholar 

  • Chang RL, Pavlidis T (1977) Fuzzy decision tree algorithms. IEEE Trans Syst Man Cybern 7(1):28–35

    Article  Google Scholar 

  • Chang P-C, Fan C-Y, Dzan W-Y (2010) A CBR-based fuzzy decision tree approach for database classification. Expert Syst Appl 37(1):214–225

    Article  Google Scholar 

  • Chen M, Ludwig SA (2013) Fuzzy decision tree using soft discretization and a genetic algorithm based feature selection method. In: World congress on nature and biologically inspired computing

    Google Scholar 

  • Cuperlovic-Culf M, Belacel N, Ouellette, RJ (2005) Determination of tumour marker genes from gene expression data. Drug Discov Today10(6):429–437

    Google Scholar 

  • Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34(2):113–127

    Article  Google Scholar 

  • Frank A, Asuncion A (2010) UCI machine learning repository, University of california, school of information and computer science, Irvine, CA. http://archive.ics.uci.edu/ml

    Google Scholar 

  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Article  Google Scholar 

  • Gordon G, Jensen R, Hsiao L-L, Gullans S, Blumenstock J, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967

    Google Scholar 

  • Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422

    Article  Google Scholar 

  • Hamdan H, Garibaldi J (2010) Adaptive neuro-fuzzy inference system (ANFIS) in modelling breast cancer survival. In: Fuzzy systems (fuzz), 2010 ieee international conference on, p 1–8

    Google Scholar 

  • Janikow C (1998) Fuzzy decision trees: issues and methods. IEEE Trans Syst Man Cybern B Cybern 28(1):1–14

    Article  Google Scholar 

  • Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679

    Article  Google Scholar 

  • Khan MU, Choi JP, Shin H, Kim M (2008) Predicting breast cancer survivability using fuzzy decision trees for personalized healthcare. In: Engineering in medicine and biology society, 2008. embs 2008. 30th annual international conference of the IEEE, pp 5148–5151

    Google Scholar 

  • Lai RK, Fan C-Y, Huang W-H, Chang, P-C (2009) Evolving and clustering fuzzy decision tree for financial time series data forecasting. Expert Syst Appl 36(2, Part 2):3761–3773

    Google Scholar 

  • Li MF, Casey SF (2004) Multi-class cancer subtype classification based on gene expression signatures with reliability analysis. FEBS Lett 561(13):186–190

    Google Scholar 

  • Lu Y, Han J (2003) Cancer classification using gene expression data. Inf Syst 28:243–268

    Article  Google Scholar 

  • Ludwig SA, Jakobovic D, Picek S (2015) Analyzing gene expression data: fuzzy decision tree algorithm applied to the classification of cancer data. In: 2015 IEEE international conference on fuzzy systems

    Google Scholar 

  • Mitra S, Konwar K, Pal S (2002) Fuzzy decision tree, linguistic rules and fuzzy knowledge-based network: generation and evaluation. IEEE Trans Syst Man Cybern Part C Appl Rev 32(4): 328–339

    Article  Google Scholar 

  • Olaru C, Wehenkel L (2003) A complete fuzzy decision tree technique. Fuzzy Sets Syst 138(2):221–254

    Article  Google Scholar 

  • Padmapriya B, Velmurugan T (2014) A survey on breast cancer analysis using data mining techniques. In: Computational intelligence and computing research (ICCIC), 2014 IEEE international conference on, pp 1–4

    Google Scholar 

  • Palivela H, Yogish H, Vijaykumar S, Patil K (2013) Survey on mining techniques for breast cancer related data. In: Information communication and embedded systems (ICICES), 2013 international conference on, pp 540–546

    Google Scholar 

  • Petalidis L, Oulas A, Backlund M, Wayland M, Liu L, Plant K, Happerfield L, Freeman TC, Poirazi P, Collins V (2008) Improved grading and survival prediction of human astrocytic brain tumors by artificial neural network analysis of gene expression microarray data. Mol Cancer Ther 7(5):1013–1024

    Article  Google Scholar 

  • Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC, Liotta LA (2002) Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359(9306):572–577

    Article  Google Scholar 

  • Polaka I, Tom I, Borisov A (2010) Decision Tree Classifiers in Bioinformatics. J. Riga Tech. Univ. 42:118–123

    Google Scholar 

  • Quinlan JR (1979) Discovering rules by induction from large collections of examples. In: Michie D (ed) Expert systems in the microlectronic age. Edinburgh University Press, Edinburgh, pp 168–201

    Google Scholar 

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA

    Google Scholar 

  • Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, Loda M, Lander ES, Golub TR (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A 98(26):15149–15154

    Article  Google Scholar 

  • Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209

    Article  Google Scholar 

  • Swets JA (1996) Signal detection theory and ROC analysis in psychology and diagnostics: collected papers. Lawrence Erlbaum Associates, Mahwah, NJ

    Google Scholar 

  • Ulfenborg B, Klinga-Levan K, Olsson B (2013) Classification of tumor samples from expression data using decision trunks. Cancer Informat 12:53–66

    Article  Google Scholar 

  • van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Augustinus AMH, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536

    Article  Google Scholar 

  • Wang K-J, Makond B, Wang K-M (2013) An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data. BMC Med Inform Decis Mak 13(1):124

    Article  Google Scholar 

  • Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd ed. Morgan Kaufmann Publishers Inc., San Francisco, CA

    Google Scholar 

  • Yu L, Han Y, Berens ME (2012) Stable gene selection from microarray data via sample weighting. IEEE/ACM Trans Comput Biol Bioinform 9(1):262–272

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Gongyi Xia for the drawing of the figures of the decision trees.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simone A. Ludwig .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ludwig, S.A., Picek, S., Jakobovic, D. (2018). Classification of Cancer Data: Analyzing Gene Expression Data Using a Fuzzy Decision Tree Algorithm. In: Kahraman, C., Topcu, Y. (eds) Operations Research Applications in Health Care Management. International Series in Operations Research & Management Science, vol 262. Springer, Cham. https://doi.org/10.1007/978-3-319-65455-3_13

Download citation

Publish with us

Policies and ethics