Abstract
Visualization techniques for high-dimensional data sets play a pivotal role in exploratory analysis in a wide range of disciplines. A particularly challenging problem represents gene expression data based on microarray technology where the number of features (genes) typically exceeds 20,000, whereas the number of samples is frequently below 200. We investigated class-specific discrimination coefficients for each feature and each pair of classes for an effective nonlinear mapping to lower-dimensional space. We applied the technique to three microarray data sets and compared the projections to two-dimensional space with the results from a conventional multidimensional scaling method, a score plot resulting from principal component analysis, and projections from linear discriminant analysis. In the experiments, we observed that the discrimination coefficients allowed for an improved visualization of high-dimensional genomic data.
Similar content being viewed by others
References
Dubitzky W, Granzow M, Downes CS, Berrar D (2002) Introduction to microarray data analysis. In: Berrar D, Granzow M, Dubitzky W (eds) A practical approach to microarray data analysis. Kluwer Academic Publishers, Boston, pp 1–46
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression. Science 286:531–537
van’t Veer LJ, Dai H, van de Vijver MJ et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536
Wang Y, Klijn JG, Zhang Y et al (2005) Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365(9460):671–679
Jansen MP, Foekens JA, van Staveren IL et al (2005) Molecular classification of tamoxifen-resistant breast carcinomas by gene expression profiling. J Clin Oncol 23(4):732–740
Simon R (2003) Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n). ACM SIGKDD Expl Newslett 5(2):31–36
Wall ME, Rechtsteiner A, Rocha LM (2002) Singular value decomposition and principal component analysis. In: Berrar D, Granzow M, Dubitzky W (eds) A practical approach to microarray data analysis. Kluwer Academic Publishers, Boston, pp 91–109
Sammon JW (1969) A non-linear mapping for data structure analysis. IEEE Trans Comp C-18:401–409
Hastie T, Tibshirani R, Friedman J (2002) The elements of statistical learning. Springer, Berlin
Lerner B, Guterman H, Aladjem M, Dinstein I, Romem Y (1998) On pattern classification with Sammon’s nonlinear mapping—an experimental study. Pattern Recog 31(4):371–381
R Development Core Team (2009) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. URL http://www.R-project.org
Venables WN, Ripley BD (2002) Modern applied statistics with S. Fourth edition. Springer
GraphPad Prism, http://www.graphpad.com
Sotiriou C, Wirapati P, Loi S et al (2006) Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 98(4):262–272
Chang HY, Nuyten DS, Sneddon JB et al (2006) Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci USA 102(10):3738–3743. Data available at http://microarray-pubs.stanford.edu/wound_NKI/explore.html
Geyer FC, Marchio C, Reis-Filho JS (2009) The role of molecular analysis in breast cancer. Pathology 41(1):77–88
Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47:583–621
Fisher RA (1922) On the interpretation of χ2 from contingency tables, and the calculation of P. J Royal Stat Soc 85(1):87–94
Mantel N (1966) Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chem Rep 50(3):163–170
Lê Cao KA, Gonçalves O, Besse P, Gadat S (2007) Selection of biologically relevant genes with a wrapper stochastic algorithm. Stat Appl Genet Mol Biol 6: Article 29
Acknowledgments
This work was supported in part by the Japan Society for the Promotion of Science. We thank H. Kitano for his support and the anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Berrar, D., Ohmayer, G. Multidimensional scaling with discrimination coefficients for supervised visualization of high-dimensional data. Neural Comput & Applic 20, 1211–1218 (2011). https://doi.org/10.1007/s00521-010-0478-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-010-0478-1