Skip to main content

Top Scoring Pair Decision Tree for Gene Expression Data Analysis

  • Chapter
  • First Online:
Software Tools and Algorithms for Biological Systems

Part of the book series: Advances in Experimental Medicine and Biology ((AEMB,volume 696))

Abstract

Classification problems of microarray data may be successfully performed with approaches by human experts which are easy to understand and interpret, like decision trees or Top Scoring Pairs algorithms. In this chapter, we propose a hybrid solution that combines the above-mentioned methods. An application of presented decision trees, which splits instances based on pairwise comparisons of the gene expression values, may have considerable potential for genomic research and scientific modeling of underlying processes. We have compared proposed solution with the TSP-family methods and decision trees on 11 public domain microarray datasets and the results are promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alon, U., Barkai, N.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the USA, 96(12):6745–6750 (1999)

    Article  PubMed  CAS  Google Scholar 

  2. Bittner, M., Meltzer, P.: Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 406:536–540 (2000)

    Article  PubMed  CAS  Google Scholar 

  3. Breiman, L., Friedman, J.: Classification and Regression Trees, Wadsworth International Group, Belmont, CA, USA (1984)

    Google Scholar 

  4. Breiman, L.: Bagging predictors. Machine Learning, 24(2):123–140 (1996)

    Google Scholar 

  5. Breiman, L.: Random forests. Machine Learning, 45(1):5–32 (2001)

    Article  Google Scholar 

  6. Brown, P.O., Botstein, D.: Exploring the new world of the genome with DNA microarrays. Nature Genetics, 21:33–37 (1999)

    Article  PubMed  CAS  Google Scholar 

  7. Cho, H.S., Kim, T.S.: cDNA microarray data based classification of cancers using neural networks and genetic algorithms. Nanotechnology, 1:28–31 (2003)

    Google Scholar 

  8. Czajkowski, M., Krȩtowski, M.: Novel extension of k-TSP algorithm for micro-array classification. Lecture Notes in Artificial Intelligence, 5027:456–465 (2008)

    Google Scholar 

  9. Cohen, W.W.: Fast Effective Rule Induction, Twelfth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, USA, 115–123 (1995)

    Google Scholar 

  10. Dhanasekaran, S.M.: Delineation of prognostic biomarkers in prostate cancer. Nature, 412:822–826 (2001)

    Article  PubMed  CAS  Google Scholar 

  11. Dudoit, S.J., Fridlyand, J.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97:77–87 (2002)

    Article  CAS  Google Scholar 

  12. Duggan, D.J., Bittner, M.: Expression profiling using cDNA microarrays. Nature Genetics, 21(suppl 1):10–14 (1999)

    Article  PubMed  CAS  Google Scholar 

  13. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm, Thirteenth International Conference on Machine Learning, San Francisco, CA, USA, 148–156 (1996)

    Google Scholar 

  14. Geman, D., dAvignon, C.: Classifying gene expression profiles from pairwise mRNA comparisons. Statistical Applications in Genetics and Molecular Biology, 3(1):19 (2007)

    Google Scholar 

  15. Grześ, M., Krȩtowski, M.: Decision tree approach to microarray data analysis. Biocybernetics and Biomedical Engineering, 27(3):29–42 (2007)

    Google Scholar 

  16. Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, New York (2001)

    Google Scholar 

  17. Hu, H., Li, J.: A Maximally Diversified Multiple Decision Tree Algorithm for Microarray Data Classification, Workshop on Intelligent Systems for Bioinformatics, Hobart, Australia (2006)

    Google Scholar 

  18. Jinyan. L., Huiqing, L.: Ensembles of cascading trees, Proceedings of the Third IEEE International Conference on Data Mining, 585–588 (2003)

    Google Scholar 

  19. Kent Ridge Bio-medical Dataset Repository: http://datam.i2r.a-star.edu.sg/datasets/index.html

  20. Kononenko, I.: Estimating Attributes: Analysis and Extensions of RELIEF. In: European Conference on Machine Learning, Catania, Italy, 171–182 (1994)

    Google Scholar 

  21. Lockhart, D.J., Winzeler, E.A.: Genomics, gene expression and DNA arrays. Nature, 405:827–836 (2000)

    Article  PubMed  CAS  Google Scholar 

  22. Lu, Y., Han, J.: Cancer classification using gene expression data. Information Systems, 28(4):243–268 (2003)

    Article  Google Scholar 

  23. Mao, Y., Zhou, X.: Multiclass cancer classification by using fuzzy support vector machine and binary decision tree with gene selection. Journal of Biomedicine and Biotechnology, 2:160–171 (2005)

    Article  Google Scholar 

  24. Murthy, S.: Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2:345–389 (1998)

    Article  Google Scholar 

  25. Nelson, P.S.: Predicting prostate cancer behavior using transcript profiles. Journal of Urology, 172:28–32 (2004)

    Article  Google Scholar 

  26. Rokach, L., Maimon, O.: Top-down induction of decision trees classifiers - A survey. IEEE Transactions on Systems, Man, and Cybernetics - Part C, 35(4):476–487 (2005)

    Article  Google Scholar 

  27. Sebastiani, P., Gussoni, E.: Statistical challenges in functional genomics. Statistical Science, 18(1):33–70 (2003)

    Article  Google Scholar 

  28. Simon, R., Radmacher, M.D.: Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. Journal of the National Cancer Institute, 95:14–18 (2003)

    Article  PubMed  CAS  Google Scholar 

  29. Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics, 2:75–83 (2003)

    Google Scholar 

  30. Tan, A.C., Naiman, D.Q.: Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics, 21:3896–3904 (2005)

    Article  PubMed  CAS  Google Scholar 

  31. Quinlan, R.: Inductive knowledge acquisition: A case study. Addison-Wesley, Boston, MA, USA, chapt. 9, 157–173 (1987)

    Google Scholar 

  32. Quinlan, R.: C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, USA (1993)

    Google Scholar 

  33. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA, USA (2005)

    Google Scholar 

  34. Valentini, G., Muselli, M.: Bagged Ensembles of SVMs for Gene Expression Data Analysis, International Joint Conference on Neural Networks 2003, Portland, OR, USA (2003)

    Google Scholar 

  35. Veer, L. J., Dai, H.: Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530–536 (2002)

    Article  Google Scholar 

  36. Xu, L., Tan, A.C.: Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data. Bioinformatics, 21(20):3905–3911 (2005)

    Article  PubMed  CAS  Google Scholar 

  37. Yoon, S., Kim, S.: k-Top scoring pair algorithm for feature selection in SVM with applications to microarray data classification. Soft Computing - A Fusion of Foundations, Methodologies and Applications, 14(2):151–159 (2009)

    Google Scholar 

  38. Zhang. H., Yu, C.Y.: Recursive partitioning for tumor classification with gene expression microarray data. Proceedings of the National Academy of Sciences of the USA, 98(12):6730–6735 (2001)

    Google Scholar 

  39. Zhang, C., Li, P.: Parallelization of multicategory support vector machines (PMC-SVM) from classifying microarray data. BMC Bioinformatics, 7(Suppl 4):S15 (2006)

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the grant W/WI/5/08 from Białystok Technical University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcin Czajkowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Czajkowski, M., Krȩtowski, M. (2011). Top Scoring Pair Decision Tree for Gene Expression Data Analysis. In: Arabnia, H., Tran, QN. (eds) Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol 696. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7046-6_3

Download citation

Publish with us

Policies and ethics