Top Scoring Pair Decision Tree for Gene Expression Data Analysis

Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 696)


Classification problems of microarray data may be successfully performed with approaches by human experts which are easy to understand and interpret, like decision trees or Top Scoring Pairs algorithms. In this chapter, we propose a hybrid solution that combines the above-mentioned methods. An application of presented decision trees, which splits instances based on pairwise comparisons of the gene expression values, may have considerable potential for genomic research and scientific modeling of underlying processes. We have compared proposed solution with the TSP-family methods and decision trees on 11 public domain microarray datasets and the results are promising.


Decision Tree Microarray Dataset Decision Tree Model Boost Decision Tree Splitting Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by the grant W/WI/5/08 from Białystok Technical University.


  1. 1.
    Alon, U., Barkai, N.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the USA, 96(12):6745–6750 (1999)PubMedCrossRefGoogle Scholar
  2. 2.
    Bittner, M., Meltzer, P.: Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 406:536–540 (2000)PubMedCrossRefGoogle Scholar
  3. 3.
    Breiman, L., Friedman, J.: Classification and Regression Trees, Wadsworth International Group, Belmont, CA, USA (1984)Google Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Machine Learning, 24(2):123–140 (1996)Google Scholar
  5. 5.
    Breiman, L.: Random forests. Machine Learning, 45(1):5–32 (2001)CrossRefGoogle Scholar
  6. 6.
    Brown, P.O., Botstein, D.: Exploring the new world of the genome with DNA microarrays. Nature Genetics, 21:33–37 (1999)PubMedCrossRefGoogle Scholar
  7. 7.
    Cho, H.S., Kim, T.S.: cDNA microarray data based classification of cancers using neural networks and genetic algorithms. Nanotechnology, 1:28–31 (2003)Google Scholar
  8. 8.
    Czajkowski, M., Krȩtowski, M.: Novel extension of k-TSP algorithm for micro-array classification. Lecture Notes in Artificial Intelligence, 5027:456–465 (2008)Google Scholar
  9. 9.
    Cohen, W.W.: Fast Effective Rule Induction, Twelfth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, USA, 115–123 (1995)Google Scholar
  10. 10.
    Dhanasekaran, S.M.: Delineation of prognostic biomarkers in prostate cancer. Nature, 412:822–826 (2001)PubMedCrossRefGoogle Scholar
  11. 11.
    Dudoit, S.J., Fridlyand, J.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97:77–87 (2002)CrossRefGoogle Scholar
  12. 12.
    Duggan, D.J., Bittner, M.: Expression profiling using cDNA microarrays. Nature Genetics, 21(suppl 1):10–14 (1999)PubMedCrossRefGoogle Scholar
  13. 13.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm, Thirteenth International Conference on Machine Learning, San Francisco, CA, USA, 148–156 (1996)Google Scholar
  14. 14.
    Geman, D., dAvignon, C.: Classifying gene expression profiles from pairwise mRNA comparisons. Statistical Applications in Genetics and Molecular Biology, 3(1):19 (2007)Google Scholar
  15. 15.
    Grześ, M., Krȩtowski, M.: Decision tree approach to microarray data analysis. Biocybernetics and Biomedical Engineering, 27(3):29–42 (2007)Google Scholar
  16. 16.
    Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, New York (2001)Google Scholar
  17. 17.
    Hu, H., Li, J.: A Maximally Diversified Multiple Decision Tree Algorithm for Microarray Data Classification, Workshop on Intelligent Systems for Bioinformatics, Hobart, Australia (2006)Google Scholar
  18. 18.
    Jinyan. L., Huiqing, L.: Ensembles of cascading trees, Proceedings of the Third IEEE International Conference on Data Mining, 585–588 (2003)Google Scholar
  19. 19.
    Kent Ridge Bio-medical Dataset Repository:
  20. 20.
    Kononenko, I.: Estimating Attributes: Analysis and Extensions of RELIEF. In: European Conference on Machine Learning, Catania, Italy, 171–182 (1994)Google Scholar
  21. 21.
    Lockhart, D.J., Winzeler, E.A.: Genomics, gene expression and DNA arrays. Nature, 405:827–836 (2000)PubMedCrossRefGoogle Scholar
  22. 22.
    Lu, Y., Han, J.: Cancer classification using gene expression data. Information Systems, 28(4):243–268 (2003)CrossRefGoogle Scholar
  23. 23.
    Mao, Y., Zhou, X.: Multiclass cancer classification by using fuzzy support vector machine and binary decision tree with gene selection. Journal of Biomedicine and Biotechnology, 2:160–171 (2005)CrossRefGoogle Scholar
  24. 24.
    Murthy, S.: Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2:345–389 (1998)CrossRefGoogle Scholar
  25. 25.
    Nelson, P.S.: Predicting prostate cancer behavior using transcript profiles. Journal of Urology, 172:28–32 (2004)CrossRefGoogle Scholar
  26. 26.
    Rokach, L., Maimon, O.: Top-down induction of decision trees classifiers - A survey. IEEE Transactions on Systems, Man, and Cybernetics - Part C, 35(4):476–487 (2005)CrossRefGoogle Scholar
  27. 27.
    Sebastiani, P., Gussoni, E.: Statistical challenges in functional genomics. Statistical Science, 18(1):33–70 (2003)CrossRefGoogle Scholar
  28. 28.
    Simon, R., Radmacher, M.D.: Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. Journal of the National Cancer Institute, 95:14–18 (2003)PubMedCrossRefGoogle Scholar
  29. 29.
    Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics, 2:75–83 (2003)Google Scholar
  30. 30.
    Tan, A.C., Naiman, D.Q.: Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics, 21:3896–3904 (2005)PubMedCrossRefGoogle Scholar
  31. 31.
    Quinlan, R.: Inductive knowledge acquisition: A case study. Addison-Wesley, Boston, MA, USA, chapt. 9, 157–173 (1987)Google Scholar
  32. 32.
    Quinlan, R.: C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, USA (1993)Google Scholar
  33. 33.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA, USA (2005)Google Scholar
  34. 34.
    Valentini, G., Muselli, M.: Bagged Ensembles of SVMs for Gene Expression Data Analysis, International Joint Conference on Neural Networks 2003, Portland, OR, USA (2003)Google Scholar
  35. 35.
    Veer, L. J., Dai, H.: Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530–536 (2002)CrossRefGoogle Scholar
  36. 36.
    Xu, L., Tan, A.C.: Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data. Bioinformatics, 21(20):3905–3911 (2005)PubMedCrossRefGoogle Scholar
  37. 37.
    Yoon, S., Kim, S.: k-Top scoring pair algorithm for feature selection in SVM with applications to microarray data classification. Soft Computing - A Fusion of Foundations, Methodologies and Applications, 14(2):151–159 (2009)Google Scholar
  38. 38.
    Zhang. H., Yu, C.Y.: Recursive partitioning for tumor classification with gene expression microarray data. Proceedings of the National Academy of Sciences of the USA, 98(12):6730–6735 (2001)Google Scholar
  39. 39.
    Zhang, C., Li, P.: Parallelization of multicategory support vector machines (PMC-SVM) from classifying microarray data. BMC Bioinformatics, 7(Suppl 4):S15 (2006)PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Faculty of Computer ScienceBialystok University of TechnologyBialystokPoland

Personalised recommendations