Skip to main content
Log in

Multidimensional scaling with discrimination coefficients for supervised visualization of high-dimensional data

  • ISNN 2010
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Visualization techniques for high-dimensional data sets play a pivotal role in exploratory analysis in a wide range of disciplines. A particularly challenging problem represents gene expression data based on microarray technology where the number of features (genes) typically exceeds 20,000, whereas the number of samples is frequently below 200. We investigated class-specific discrimination coefficients for each feature and each pair of classes for an effective nonlinear mapping to lower-dimensional space. We applied the technique to three microarray data sets and compared the projections to two-dimensional space with the results from a conventional multidimensional scaling method, a score plot resulting from principal component analysis, and projections from linear discriminant analysis. In the experiments, we observed that the discrimination coefficients allowed for an improved visualization of high-dimensional genomic data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Dubitzky W, Granzow M, Downes CS, Berrar D (2002) Introduction to microarray data analysis. In: Berrar D, Granzow M, Dubitzky W (eds) A practical approach to microarray data analysis. Kluwer Academic Publishers, Boston, pp 1–46

    Google Scholar 

  2. Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression. Science 286:531–537

    Article  Google Scholar 

  3. van’t Veer LJ, Dai H, van de Vijver MJ et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536

    Article  Google Scholar 

  4. Wang Y, Klijn JG, Zhang Y et al (2005) Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365(9460):671–679

    Google Scholar 

  5. Jansen MP, Foekens JA, van Staveren IL et al (2005) Molecular classification of tamoxifen-resistant breast carcinomas by gene expression profiling. J Clin Oncol 23(4):732–740

    Article  Google Scholar 

  6. Simon R (2003) Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n). ACM SIGKDD Expl Newslett 5(2):31–36

    Article  Google Scholar 

  7. Wall ME, Rechtsteiner A, Rocha LM (2002) Singular value decomposition and principal component analysis. In: Berrar D, Granzow M, Dubitzky W (eds) A practical approach to microarray data analysis. Kluwer Academic Publishers, Boston, pp 91–109

    Google Scholar 

  8. Sammon JW (1969) A non-linear mapping for data structure analysis. IEEE Trans Comp C-18:401–409

    Article  Google Scholar 

  9. Hastie T, Tibshirani R, Friedman J (2002) The elements of statistical learning. Springer, Berlin

    Google Scholar 

  10. Lerner B, Guterman H, Aladjem M, Dinstein I, Romem Y (1998) On pattern classification with Sammon’s nonlinear mapping—an experimental study. Pattern Recog 31(4):371–381

    Article  Google Scholar 

  11. R Development Core Team (2009) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. URL http://www.R-project.org

  12. Venables WN, Ripley BD (2002) Modern applied statistics with S. Fourth edition. Springer

  13. GraphPad Prism, http://www.graphpad.com

  14. Sotiriou C, Wirapati P, Loi S et al (2006) Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 98(4):262–272

    Article  Google Scholar 

  15. Chang HY, Nuyten DS, Sneddon JB et al (2006) Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci USA 102(10):3738–3743. Data available at http://microarray-pubs.stanford.edu/wound_NKI/explore.html

  16. Geyer FC, Marchio C, Reis-Filho JS (2009) The role of molecular analysis in breast cancer. Pathology 41(1):77–88

    Article  Google Scholar 

  17. Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47:583–621

    Article  Google Scholar 

  18. Fisher RA (1922) On the interpretation of χ2 from contingency tables, and the calculation of P. J Royal Stat Soc 85(1):87–94

    Article  Google Scholar 

  19. Mantel N (1966) Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chem Rep 50(3):163–170

    Google Scholar 

  20. Lê Cao KA, Gonçalves O, Besse P, Gadat S (2007) Selection of biologically relevant genes with a wrapper stochastic algorithm. Stat Appl Genet Mol Biol 6: Article 29

Download references

Acknowledgments

This work was supported in part by the Japan Society for the Promotion of Science. We thank H. Kitano for his support and the anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Berrar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berrar, D., Ohmayer, G. Multidimensional scaling with discrimination coefficients for supervised visualization of high-dimensional data. Neural Comput & Applic 20, 1211–1218 (2011). https://doi.org/10.1007/s00521-010-0478-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-010-0478-1

Keywords

Navigation