Rank-based classifiers for extremely high-dimensional gene expression data

Lausser, Ludwig; Schmid, Florian; Schirra, Lyn-Rouven; Wilhelm, Adalbert F. X.; Kestler, Hans A.

doi:10.1007/s11634-016-0277-3

Rank-based classifiers for extremely high-dimensional gene expression data

Regular Article
Published: 19 December 2016

Volume 12, pages 917–936, (2018)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Ludwig Lausser¹,
Florian Schmid¹,
Lyn-Rouven Schirra^1,3,
Adalbert F. X. Wilhelm⁴ &
…
Hans A. Kestler^1,2

718 Accesses
2 Citations
Explore all metrics

Abstract

Predicting phenotypes on the basis of gene expression profiles is a classification task that is becoming increasingly important in the field of precision medicine. Although these expression signals are real-valued, it is questionable if they can be analyzed on an interval scale. As with many biological signals their influence on e.g. protein levels is usually non-linear and thus can be misinterpreted. In this article we study gene expression profiles with up to 54,000 dimensions. We analyze these measurements on an ordinal scale by replacing the real-valued profiles by their ranks. This type of rank transformation can be used for the construction of invariant classifiers that are not affected by noise induced by data transformations which can occur in the measurement setup. Our 10 \(\times \) 10 fold cross-validation experiments on 86 different data sets and 19 different classification models indicate that classifiers largely benefit from this transformation. Especially random forests and support vector machines achieve improved classification results on a significant majority of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble methods of rank-based trees for single sample classification with gene expression profiles

Article Open access 07 February 2024

An Insight on the ‘Large G, Small n’ Problem in Gene-Expression Microarray Classification

IFS: An Incremental Feature Selection Method to Classify High-Dimensional Data

References

Bavaud F (2009) Aggregation invariance in general clustering approaches. Adv Data Anal Classif 3(3):205–225
Article MathSciNet Google Scholar
Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000) Tissue classification with gene expression profiles. J Comput Biol 7(3–4):559–583
Article Google Scholar
Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, New York
MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. The Wadsworth statistics/probability series. Chapman & Hall/CRC, Boca Raton
Google Scholar
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Fix E, Hodges JL (1951) Discriminatory analysis: nonparametric discrimination: consistency properties. Tech. Rep. Project 21-49-004, Report Number 4, USAF School of Aviation Medicine, Randolf Field, Texas
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Haasdonk B, Burkhardt H (2007) Invariant kernel functions for pattern analysis and machine learning. Mach Learn 68(1):35–61
Article Google Scholar
Hariharan B, Malik J, Ramanan D (2012) Discriminative decorrelation for clustering and classification. In: Fitzgibbon AW, Lazebnik S, Perona P, Sato Y, Schmid C (eds) Computer Vision–ECCV 2012, Springer, Lecture notes in computer science 7575:459–472
Irizarry R, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U, Speed T (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4(2):249–264
Article Google Scholar
Jamain A, Hand D (2009) Where are the large and difficult datasets? Adv Data Anal Classif 3(1):25–38
Article MathSciNet Google Scholar
Kestler HA, Lausser L, Lindner W, Palm G (2011) On the fusion of threshold classifiers for categorization and dimensionality reduction. Comput Stat 26(2):321–340
Article MathSciNet Google Scholar
Lausser L, Müssel C, Kestler HA (2012) Representative prototype sets for data characterization and classification. In: Mana N, Schwenker F, Trentin E (eds) Artificial neural networks in pattern recognition (ANNPR12), Lecture notes in artificial intelligence, Springer, Heidelberg 7477:36–47
Chapter Google Scholar
McCall M, Bolstad B, Irizarry R (2010) Frozen robust multiarray analysis (fRMA). Biostatistics 11(2):242n++253
Article Google Scholar
Müssel C, Lausser L, Maucher M, Kestler HA (2012) Multi-objective parameter selection for classifiers. J Stat Softw 46(5):1–27
Article Google Scholar
Niyogi P, Poggio T, Girosi F (1998) Incorporating prior information in machine learning by creating virtual examples. IEEE Proc Intell Signal Process 86(11):2196–2209
Google Scholar
Patil P, Bachant-Winner PO, Haibe-Kains B, Leek J (2015) Test set bias affects reproducibility of gene signatures. Bioinformatics 31(14):2318–2323
Article Google Scholar
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Article Google Scholar
Schmid F, Lausser L, Kestler HA (2014) Linear contrast classifiers in high-dimensional spaces. In: Gayar NE, Schwenker F, Suen C (eds) Artificial neural networks in pattern recognition (ANNPR14), Springer, Heidelberg, Lecture notes in artificial intelligence 8774:141–152
Schölkopf B, Burges C, Vapnik V (1996) Incorporating invariances in support vector learning machines. In: von der Malsburg C, von Seelen W, Vorbrüggen J, Sendhoff S (eds) Artificial neural networks—ICANN’96, Springer, Lecture Notes in Computer Science, 1112:47–52
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Article Google Scholar
Simard PY, LeCun YA, Denker JS, Victorri B (2012) Transformation invariance in pattern recognition—tangent distance and tangent propagation. In: Orr G, Müller KR (eds) Neural networks: tricks of the trade, vol 7700, 2nd edn., Lecture notes in computer scienceSpringer, Heidelberg, pp 239–274
Google Scholar
Thomas J, Olson J, Tapscott S, Zhao L (2001) An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 11(7):1227–1236
Article Google Scholar
Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA 99(10):6567–6572
Article Google Scholar
Tsuda K (1999) Support vector classifier with asymmetric kernel functions. In: Verleysen M (ed) Proceedings of ESANN’99 - European symposium on artificial neural networks, D-Facto public, Brussels, pp 183–188
Wood J (1996) Invariant pattern recognition: a review. Pattern Recogn 29(1):1–17
Article Google Scholar

Download references

Acknowledgements

The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/20072013) under Grant Agreement No. 602783, the German Research Foundation (DFG, SFB 1074 project Z1 to HAK), and the Federal Ministry of Education and Research (BMBF, Gerontosys II, Forschungskern SyStaR, project ID 0315894A and e:Med, SYMBOL-HF, Grant ID 01ZX1407A) all to HAK.

Author information

Authors and Affiliations

Institute of Medical Systems Biology, Ulm University, 89069, Ulm, Germany
Ludwig Lausser, Florian Schmid, Lyn-Rouven Schirra & Hans A. Kestler
Leibniz Institute on Aging–Fritz Lipmann Institute, 07745, Jena, Germany
Hans A. Kestler
Institute of Number Theory and Probability Theory, Ulm University, 89069, Ulm, Germany
Lyn-Rouven Schirra
Department of Psychology and Methods, Jacobs University, 28759, Bremen, Germany
Adalbert F. X. Wilhelm

Authors

Ludwig Lausser
View author publications
You can also search for this author in PubMed Google Scholar
Florian Schmid
View author publications
You can also search for this author in PubMed Google Scholar
Lyn-Rouven Schirra
View author publications
You can also search for this author in PubMed Google Scholar
Adalbert F. X. Wilhelm
View author publications
You can also search for this author in PubMed Google Scholar
Hans A. Kestler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans A. Kestler.

Additional information

L. Lausser, F. Schmid, and L.-R. Schirra contributed equally.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 230 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lausser, L., Schmid, F., Schirra, LR. et al. Rank-based classifiers for extremely high-dimensional gene expression data. Adv Data Anal Classif 12, 917–936 (2018). https://doi.org/10.1007/s11634-016-0277-3

Download citation

Received: 15 December 2014
Revised: 21 November 2016
Accepted: 28 November 2016
Published: 19 December 2016
Issue Date: December 2018
DOI: https://doi.org/10.1007/s11634-016-0277-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rank-based classifiers for extremely high-dimensional gene expression data

Abstract

Access this article

Similar content being viewed by others

Ensemble methods of rank-based trees for single sample classification with gene expression profiles

An Insight on the ‘Large G, Small n’ Problem in Gene-Expression Microarray Classification

IFS: An Incremental Feature Selection Method to Classify High-Dimensional Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 230 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Rank-based classifiers for extremely high-dimensional gene expression data

Abstract

Access this article

Similar content being viewed by others

Ensemble methods of rank-based trees for single sample classification with gene expression profiles

An Insight on the ‘Large G, Small n’ Problem in Gene-Expression Microarray Classification

IFS: An Incremental Feature Selection Method to Classify High-Dimensional Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 230 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation