Neural Computing and Applications

, Volume 20, Issue 8, pp 1211–1218 | Cite as

Multidimensional scaling with discrimination coefficients for supervised visualization of high-dimensional data

ISNN 2010
  • 128 Downloads

Abstract

Visualization techniques for high-dimensional data sets play a pivotal role in exploratory analysis in a wide range of disciplines. A particularly challenging problem represents gene expression data based on microarray technology where the number of features (genes) typically exceeds 20,000, whereas the number of samples is frequently below 200. We investigated class-specific discrimination coefficients for each feature and each pair of classes for an effective nonlinear mapping to lower-dimensional space. We applied the technique to three microarray data sets and compared the projections to two-dimensional space with the results from a conventional multidimensional scaling method, a score plot resulting from principal component analysis, and projections from linear discriminant analysis. In the experiments, we observed that the discrimination coefficients allowed for an improved visualization of high-dimensional genomic data.

Keywords

Visualization Discrimination coefficients Multidimensional scaling Principal component analysis Linear discriminant analysis Microarrays 

Notes

Acknowledgments

This work was supported in part by the Japan Society for the Promotion of Science. We thank H. Kitano for his support and the anonymous reviewers for their valuable comments.

References

  1. 1.
    Dubitzky W, Granzow M, Downes CS, Berrar D (2002) Introduction to microarray data analysis. In: Berrar D, Granzow M, Dubitzky W (eds) A practical approach to microarray data analysis. Kluwer Academic Publishers, Boston, pp 1–46Google Scholar
  2. 2.
    Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression. Science 286:531–537CrossRefGoogle Scholar
  3. 3.
    van’t Veer LJ, Dai H, van de Vijver MJ et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536CrossRefGoogle Scholar
  4. 4.
    Wang Y, Klijn JG, Zhang Y et al (2005) Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365(9460):671–679Google Scholar
  5. 5.
    Jansen MP, Foekens JA, van Staveren IL et al (2005) Molecular classification of tamoxifen-resistant breast carcinomas by gene expression profiling. J Clin Oncol 23(4):732–740CrossRefGoogle Scholar
  6. 6.
    Simon R (2003) Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n). ACM SIGKDD Expl Newslett 5(2):31–36CrossRefGoogle Scholar
  7. 7.
    Wall ME, Rechtsteiner A, Rocha LM (2002) Singular value decomposition and principal component analysis. In: Berrar D, Granzow M, Dubitzky W (eds) A practical approach to microarray data analysis. Kluwer Academic Publishers, Boston, pp 91–109Google Scholar
  8. 8.
    Sammon JW (1969) A non-linear mapping for data structure analysis. IEEE Trans Comp C-18:401–409CrossRefGoogle Scholar
  9. 9.
    Hastie T, Tibshirani R, Friedman J (2002) The elements of statistical learning. Springer, BerlinGoogle Scholar
  10. 10.
    Lerner B, Guterman H, Aladjem M, Dinstein I, Romem Y (1998) On pattern classification with Sammon’s nonlinear mapping—an experimental study. Pattern Recog 31(4):371–381CrossRefGoogle Scholar
  11. 11.
    R Development Core Team (2009) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. URL http://www.R-project.org
  12. 12.
    Venables WN, Ripley BD (2002) Modern applied statistics with S. Fourth edition. SpringerGoogle Scholar
  13. 13.
    GraphPad Prism, http://www.graphpad.com
  14. 14.
    Sotiriou C, Wirapati P, Loi S et al (2006) Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 98(4):262–272CrossRefGoogle Scholar
  15. 15.
    Chang HY, Nuyten DS, Sneddon JB et al (2006) Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci USA 102(10):3738–3743. Data available at http://microarray-pubs.stanford.edu/wound_NKI/explore.html
  16. 16.
    Geyer FC, Marchio C, Reis-Filho JS (2009) The role of molecular analysis in breast cancer. Pathology 41(1):77–88CrossRefGoogle Scholar
  17. 17.
    Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47:583–621CrossRefGoogle Scholar
  18. 18.
    Fisher RA (1922) On the interpretation of χ2 from contingency tables, and the calculation of P. J Royal Stat Soc 85(1):87–94CrossRefGoogle Scholar
  19. 19.
    Mantel N (1966) Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chem Rep 50(3):163–170Google Scholar
  20. 20.
    Lê Cao KA, Gonçalves O, Besse P, Gadat S (2007) Selection of biologically relevant genes with a wrapper stochastic algorithm. Stat Appl Genet Mol Biol 6: Article 29Google Scholar

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  1. 1.Systems Biology Research Group, School of Biomedical SciencesUniversity of UlsterColeraineUK
  2. 2.Department of Cancer Systems Biology, Cancer InstituteJapan Foundation for Cancer ResearchTokyoJapan
  3. 3.University of Applied SciencesWeihenstephanGermany

Personalised recommendations