Abstract
Classification problems of microarray data may be successfully performed with approaches by human experts which are easy to understand and interpret, like decision trees or Top Scoring Pairs algorithms. In this chapter, we propose a hybrid solution that combines the above-mentioned methods. An application of presented decision trees, which splits instances based on pairwise comparisons of the gene expression values, may have considerable potential for genomic research and scientific modeling of underlying processes. We have compared proposed solution with the TSP-family methods and decision trees on 11 public domain microarray datasets and the results are promising.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alon, U., Barkai, N.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the USA, 96(12):6745–6750 (1999)
Bittner, M., Meltzer, P.: Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 406:536–540 (2000)
Breiman, L., Friedman, J.: Classification and Regression Trees, Wadsworth International Group, Belmont, CA, USA (1984)
Breiman, L.: Bagging predictors. Machine Learning, 24(2):123–140 (1996)
Breiman, L.: Random forests. Machine Learning, 45(1):5–32 (2001)
Brown, P.O., Botstein, D.: Exploring the new world of the genome with DNA microarrays. Nature Genetics, 21:33–37 (1999)
Cho, H.S., Kim, T.S.: cDNA microarray data based classification of cancers using neural networks and genetic algorithms. Nanotechnology, 1:28–31 (2003)
Czajkowski, M., Krȩtowski, M.: Novel extension of k-TSP algorithm for micro-array classification. Lecture Notes in Artificial Intelligence, 5027:456–465 (2008)
Cohen, W.W.: Fast Effective Rule Induction, Twelfth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, USA, 115–123 (1995)
Dhanasekaran, S.M.: Delineation of prognostic biomarkers in prostate cancer. Nature, 412:822–826 (2001)
Dudoit, S.J., Fridlyand, J.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97:77–87 (2002)
Duggan, D.J., Bittner, M.: Expression profiling using cDNA microarrays. Nature Genetics, 21(suppl 1):10–14 (1999)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm, Thirteenth International Conference on Machine Learning, San Francisco, CA, USA, 148–156 (1996)
Geman, D., dAvignon, C.: Classifying gene expression profiles from pairwise mRNA comparisons. Statistical Applications in Genetics and Molecular Biology, 3(1):19 (2007)
Grześ, M., Krȩtowski, M.: Decision tree approach to microarray data analysis. Biocybernetics and Biomedical Engineering, 27(3):29–42 (2007)
Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, New York (2001)
Hu, H., Li, J.: A Maximally Diversified Multiple Decision Tree Algorithm for Microarray Data Classification, Workshop on Intelligent Systems for Bioinformatics, Hobart, Australia (2006)
Jinyan. L., Huiqing, L.: Ensembles of cascading trees, Proceedings of the Third IEEE International Conference on Data Mining, 585–588 (2003)
Kent Ridge Bio-medical Dataset Repository: http://datam.i2r.a-star.edu.sg/datasets/index.html
Kononenko, I.: Estimating Attributes: Analysis and Extensions of RELIEF. In: European Conference on Machine Learning, Catania, Italy, 171–182 (1994)
Lockhart, D.J., Winzeler, E.A.: Genomics, gene expression and DNA arrays. Nature, 405:827–836 (2000)
Lu, Y., Han, J.: Cancer classification using gene expression data. Information Systems, 28(4):243–268 (2003)
Mao, Y., Zhou, X.: Multiclass cancer classification by using fuzzy support vector machine and binary decision tree with gene selection. Journal of Biomedicine and Biotechnology, 2:160–171 (2005)
Murthy, S.: Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2:345–389 (1998)
Nelson, P.S.: Predicting prostate cancer behavior using transcript profiles. Journal of Urology, 172:28–32 (2004)
Rokach, L., Maimon, O.: Top-down induction of decision trees classifiers - A survey. IEEE Transactions on Systems, Man, and Cybernetics - Part C, 35(4):476–487 (2005)
Sebastiani, P., Gussoni, E.: Statistical challenges in functional genomics. Statistical Science, 18(1):33–70 (2003)
Simon, R., Radmacher, M.D.: Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. Journal of the National Cancer Institute, 95:14–18 (2003)
Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics, 2:75–83 (2003)
Tan, A.C., Naiman, D.Q.: Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics, 21:3896–3904 (2005)
Quinlan, R.: Inductive knowledge acquisition: A case study. Addison-Wesley, Boston, MA, USA, chapt. 9, 157–173 (1987)
Quinlan, R.: C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, USA (1993)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA, USA (2005)
Valentini, G., Muselli, M.: Bagged Ensembles of SVMs for Gene Expression Data Analysis, International Joint Conference on Neural Networks 2003, Portland, OR, USA (2003)
Veer, L. J., Dai, H.: Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530–536 (2002)
Xu, L., Tan, A.C.: Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data. Bioinformatics, 21(20):3905–3911 (2005)
Yoon, S., Kim, S.: k-Top scoring pair algorithm for feature selection in SVM with applications to microarray data classification. Soft Computing - A Fusion of Foundations, Methodologies and Applications, 14(2):151–159 (2009)
Zhang. H., Yu, C.Y.: Recursive partitioning for tumor classification with gene expression microarray data. Proceedings of the National Academy of Sciences of the USA, 98(12):6730–6735 (2001)
Zhang, C., Li, P.: Parallelization of multicategory support vector machines (PMC-SVM) from classifying microarray data. BMC Bioinformatics, 7(Suppl 4):S15 (2006)
Acknowledgements
This work was supported by the grant W/WI/5/08 from Białystok Technical University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Czajkowski, M., Krȩtowski, M. (2011). Top Scoring Pair Decision Tree for Gene Expression Data Analysis. In: Arabnia, H., Tran, QN. (eds) Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol 696. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7046-6_3
Download citation
DOI: https://doi.org/10.1007/978-1-4419-7046-6_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7045-9
Online ISBN: 978-1-4419-7046-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)