Skip to main content

Multi-Test Decision Trees for Gene Expression Data Analysis

  • Conference paper
Security and Intelligent Information Systems (SIIS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7053))

Abstract

This paper introduces a new type of decision trees which are more suitable for gene expression data. The main motivation for this work was to improve the performance of decision trees under a possibly small increase in their complexity. Our approach is thus based on univariate tests, and the main contribution of this paper is the application of several univariate tests in each non-terminal node of the tree. In this way, obtained trees are still relatively easy to analyze and understand, but they become more powerful in modelling high dimensional microarray data. Experimental validation was performed on publicly available gene expression datasets. The proposed method displayed competitive accuracy compared to the commonly applied decision tree methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aldamassi, M., Chen, Z., Merriman, B., Gussin, D., Nelson, S.: A Practical Guide to Microarray Analysis of Gene Expression. UCLA Microarray Core & Nelson Lab, UCLA Department of Human Genetics (2001)

    Google Scholar 

  2. Armstrong, S.A.: MLL Translocations Specify a Distinct Gene Expression Profile that Distinguishes a Unique Leukemia. Nature Genetics 30, 41–47 (2002)

    Article  Google Scholar 

  3. Berzal, F., Cubero, J.C., Marín, N., Sánchez, D.: Building multi-way decision trees with numerical attributes. Information Sciences 165, 73–90 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  4. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth Int. Group (1984)

    Google Scholar 

  5. Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  6. Brodley, C.E., Utgoff, P.E.: Multivariate Decision Trees. Machine Learning 19, 45–77 (1995)

    MATH  Google Scholar 

  7. Chen, X., Wang, M., Zhang, H.: The use of classification trees for bioinformatics. Wires Data Mining Knowl. Discov. 1, 55–63 (2011)

    Article  Google Scholar 

  8. Dettling, M., Buhlmann, P.: Boosting for tumor classification with gene expression data. Bioinformatics 19(9), 1061–1069 (2003)

    Article  Google Scholar 

  9. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Dramiski, M., Rada-Iglesias, A., Enroth, S., Wadelius, C., Koronacki, J., Komorowski, J.: Monte Carlo feature selection for supervised classification. Bioinformatics 24(1), 110–117 (2008)

    Article  Google Scholar 

  11. Fayyad, U.M., Irani, K.B.: On the Handling of Continuous-Valued Attributes in Decision Tree Generation. Machine Learning 8, 87–102 (1992)

    MATH  Google Scholar 

  12. Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: Sixteenth International Conference on Machine Learning, Bled, Slovenia, pp. 124–133 (1999)

    Google Scholar 

  13. Golub, T.R., Armstrong, S.A., Korsmeyer, S.J.: MLL translocations specify a distinct gene expression profile, distinguishing a unique leukemia, United States patent: 20060024734 (2006)

    Google Scholar 

  14. Grześ, M., Kretowski, M.: Decision Tree Approach to Microarray Data Analysis. Biocybernetics and Biomedical Engineering 27(3), 29–42 (2007)

    Google Scholar 

  15. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. In: Data Mining, Inference and Prediction, 2nd edn. Springer, Heidelberg (2009)

    Google Scholar 

  16. Hu, H., Li, J., Wang, H., Shi, M.: A Maximally Diversified Multiple Decision Tree Algorithm for Microarray Data Classification. In: I Workshop on Intelligent Systems for Bioinformatics, ACS (2006)

    Google Scholar 

  17. Kent Ridge Bio-medical Dataset Repository, http://datam.i2r.a-star.edu.sg/datasets/index.html

  18. Li, J., Liu, H., Ng, S., Wong, L.: Discovery of significant rules for classifying cancer diagnosis data. Bioinformatics (19 suppl. 2), 93–102 (2003)

    Google Scholar 

  19. Murthy, S.: Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery 2, 345–389 (1998)

    Article  Google Scholar 

  20. Rokach, L., Maimon, O.Z.: Data mining with decision trees: theory and application. Machine Perception Arfitical Intelligence 69 (2008)

    Google Scholar 

  21. Sebastiani, P., Gussoni, E., Kohane, I.S., Ramoni, M.F.: Statistical challenges in functional genomics. Statistical Science 18(1), 33–70 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  22. Shi, H.: Best-first decision tree learning, MSc dissertation, University of Waikato (2007)

    Google Scholar 

  23. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  24. Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2(3), 75–83 (2003)

    Google Scholar 

  25. Tan, P.J., Dowe, D.L., Dix, T.I.: Building classification models from microarray data with tree-based classification algorithms. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 589–598. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  26. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  27. Wold, S., Eriksson, L.: Statistical Validation of QSAR Results. In: van de Waterbeemd, H. (ed.) Chemometrics Methods in Molecular Design, VCH, pp. 309–318 (1995)

    Google Scholar 

  28. Yeoh, E.J., Ross, M.E.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2), 133–143 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Pascal Bouvry Mieczysław A. Kłopotek Franck Leprévost Małgorzata Marciniak Agnieszka Mykowiecka Henryk Rybiński

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Czajkowski, M., Grześ, M., Kretowski, M. (2012). Multi-Test Decision Trees for Gene Expression Data Analysis. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds) Security and Intelligent Information Systems. SIIS 2011. Lecture Notes in Computer Science, vol 7053. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25261-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25261-7_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25260-0

  • Online ISBN: 978-3-642-25261-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics