Multidimensional scaling with discrimination coefficients for supervised visualization of high-dimensional data
Visualization techniques for high-dimensional data sets play a pivotal role in exploratory analysis in a wide range of disciplines. A particularly challenging problem represents gene expression data based on microarray technology where the number of features (genes) typically exceeds 20,000, whereas the number of samples is frequently below 200. We investigated class-specific discrimination coefficients for each feature and each pair of classes for an effective nonlinear mapping to lower-dimensional space. We applied the technique to three microarray data sets and compared the projections to two-dimensional space with the results from a conventional multidimensional scaling method, a score plot resulting from principal component analysis, and projections from linear discriminant analysis. In the experiments, we observed that the discrimination coefficients allowed for an improved visualization of high-dimensional genomic data.
KeywordsVisualization Discrimination coefficients Multidimensional scaling Principal component analysis Linear discriminant analysis Microarrays
This work was supported in part by the Japan Society for the Promotion of Science. We thank H. Kitano for his support and the anonymous reviewers for their valuable comments.
- 1.Dubitzky W, Granzow M, Downes CS, Berrar D (2002) Introduction to microarray data analysis. In: Berrar D, Granzow M, Dubitzky W (eds) A practical approach to microarray data analysis. Kluwer Academic Publishers, Boston, pp 1–46Google Scholar
- 4.Wang Y, Klijn JG, Zhang Y et al (2005) Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365(9460):671–679Google Scholar
- 7.Wall ME, Rechtsteiner A, Rocha LM (2002) Singular value decomposition and principal component analysis. In: Berrar D, Granzow M, Dubitzky W (eds) A practical approach to microarray data analysis. Kluwer Academic Publishers, Boston, pp 91–109Google Scholar
- 9.Hastie T, Tibshirani R, Friedman J (2002) The elements of statistical learning. Springer, BerlinGoogle Scholar
- 11.R Development Core Team (2009) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. URL http://www.R-project.org
- 12.Venables WN, Ripley BD (2002) Modern applied statistics with S. Fourth edition. SpringerGoogle Scholar
- 13.GraphPad Prism, http://www.graphpad.com
- 15.Chang HY, Nuyten DS, Sneddon JB et al (2006) Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci USA 102(10):3738–3743. Data available at http://microarray-pubs.stanford.edu/wound_NKI/explore.html
- 19.Mantel N (1966) Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chem Rep 50(3):163–170Google Scholar
- 20.Lê Cao KA, Gonçalves O, Besse P, Gadat S (2007) Selection of biologically relevant genes with a wrapper stochastic algorithm. Stat Appl Genet Mol Biol 6: Article 29Google Scholar