Skip to main content

Global Top-Scoring Pair Decision Tree for Gene Expression Data Analysis

  • Conference paper
Genetic Programming (EuroGP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7831))

Included in the following conference series:

Abstract

Extracting knowledge from gene expression data is still a major challenge. Relative expression algorithms use the ordering relationships for a small collection of genes and are successfully applied for micro-array classification. However, searching for all possible subsets of genes requires a significant number of calculations, assumptions and limitations. In this paper we propose an evolutionary algorithm for global induction of top-scoring pair decision trees. We have designed several specialized genetic operators that search for the best tree structure and the splits in internal nodes which involve pairwise comparisons of the gene expression values. Preliminary validation performed on real-life micro-array datasets is promising as the proposed solution is highly competitive to other relative expression algorithms and allows exploring much larger solution space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akaike, H.: A New Look at Statistical Model Identification. IEEE Transactions on Automatic Control 19, 716–723 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  2. Breiman, L., Friedman, J.: Classification and Regression Trees. Wadsworth Int. Group (1984)

    Google Scholar 

  3. Cho, H.S., Kim, T.S.: cDNA Microarray Data Based Classification of Cancers Using Neural Networks and Genetic Algorithms. Nanotech 1 (2003)

    Google Scholar 

  4. Czajkowski, M., Kretowski, M.: Novel Extension of kTSP Algorithm for Microarray Classification. In: Nguyen, N.T., Borzemski, L., Grzech, A., Ali, M. (eds.) IEA/AIE 2008. LNCS (LNAI), vol. 5027, pp. 456–465. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Czajkowski, M., Kretowski, M.: Top Scoring Pair Decision Tree for Gene Expression Data Analysis. In: Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol. 696, pp. 27–35 (2011)

    Google Scholar 

  6. Czajkowski, M., Grześ, M., Kretowski, M.: Multi-Test Decision Trees for Gene Expression Data Analysis. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 154–167. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Diaz-Uriarte, R., Alvarez de Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)

    Article  Google Scholar 

  8. Dudoit, S.J., Fridlyand, J.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77–87 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  9. Esposito, F., Malerba, D., Semeraro, G.: A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(5), 476–491 (1997)

    Article  Google Scholar 

  10. Geman, D., d’Avignon, C., Naiman, D.Q., Winslow, R.L.: Classifying gene expression profiles from pairwise mRNA comparisons. Statistical Applications in Genetics and Molecular Biology 3(19) (2004)

    Google Scholar 

  11. Grześ, M., Kretowski, M.: Decision Tree Approach to Microarray Data Analysis. Biocybernetics and Biomedical Engineering 27(3), 29–42 (2007)

    Google Scholar 

  12. Kent Ridge Bio-medical Dataset Repository, http://datam.i2r.a-star.edu.sg/datasets/index.html

  13. Kretowski, M., Grześ, M.: Evolutionary Induction of Mixed Decision Trees. International Journal of Data Warehousing and Mining 3(4), 68–82 (2007)

    Article  Google Scholar 

  14. Kononenko, I.: Estimating Attributes: Analysis and Extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  15. Lin, X., Afsari, B., Marchionni, L., Cope, L., Parmigiani, G., Naiman, D., Geman, D.: The ordering of expression among a few genes can provide simple cancer biomarkers and signal BRCA1 mutations. BMC Bioinformatics 10(256) (2009)

    Google Scholar 

  16. Lockhart, D.J., Winzeler, E.A.: Genomics, gene expression and DNA arrays. Nature 405, 827–836 (2000)

    Article  Google Scholar 

  17. Lu, Y., Han, J.: Cancer classification using gene expression data. Information Systems 28(4), 243–268 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  18. Magis, A.T., Earls, J.C., Ko, Y., Eddy, J.A., Price, N.D.: Graphics processing unit implementations of relative expression analysis algorithms enable dramatic computational speedup. Bioinformatics 27(6), 872–873 (2011)

    Article  Google Scholar 

  19. Magis, A.T., Price, N.D.: The top-scoring ‘N’ algorithm: a generalized relative expression classification method from small numbers of biomolecules. BMC Bioinformatics 13(1), 227 (2012)

    Article  Google Scholar 

  20. Mao, Y., Zhou, X.: Multiclass Cancer Classification by Using Fuzzy Support Vector Machine and Binary Decision Tree With Gene Selection. Journal of Biomedicine and Biotechnology, 160–171 (2005)

    Google Scholar 

  21. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 3rd edn. Springer (1996)

    Google Scholar 

  22. Murthy, S.: Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery 2, 345–389 (1998)

    Article  MathSciNet  Google Scholar 

  23. Nelson, P.S.: Predicting prostate cancer behavior using transcript profiles. Journal of Urology 172, 28–32 (2004)

    Article  Google Scholar 

  24. Rokach, L., Maimon, O.: Top-down induction of decision trees classifiers - A survey. IEEE Transactions on Systems, Man, and Cybernetics - Part C 35(4), 476–487 (2005)

    Article  Google Scholar 

  25. Schwarz, G.: Estimating the Dimension of a Model. The Annals of Statistics 6, 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  26. Shi, P., Ray, S., Zhu, Q., Kon, M.A.: Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction. BMC Bioinformatics 12(375) (2011)

    Google Scholar 

  27. Simon, R., Radmacher, M.D.: Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. Journal of the National Cancer Institute 95, 14–18 (2003)

    Article  Google Scholar 

  28. Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2, 75–83 (2003)

    Google Scholar 

  29. Tan, A.C., Naiman, D.Q.: Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 21, 3896–3904 (2005)

    Article  Google Scholar 

  30. Quinlan, R.: Inductive knowledge acquisition: A case study, vol. 9, pp. 157–173. Addison-Wesley (1987)

    Google Scholar 

  31. Yang, X., Liu, H.: Top Scoring Pair based methods for classification (BigTSP R package) (2012), http://cran.r-project.org

  32. Yoon, S., Kim, S.: k-Top Scoring Pair Algorithm for feature selection in SVM with applications to microarray data classification. Soft Computing - A Fusion of Foundations, Methodologies and Applications, 151–159 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Czajkowski, M., Kretowski, M. (2013). Global Top-Scoring Pair Decision Tree for Gene Expression Data Analysis. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds) Genetic Programming. EuroGP 2013. Lecture Notes in Computer Science, vol 7831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37207-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37207-0_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37206-3

  • Online ISBN: 978-3-642-37207-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics