Skip to main content

Dimension Reduction for High-Dimensional Data

  • Protocol
  • First Online:
Statistical Methods in Molecular Biology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 620))

Abstract

With advancing of modern technologies, high-dimensional data have prevailed in computational biology. The number of variables p is very large, and in many applications, p is larger than the number of observational units n. Such high dimensionality and the unconventional small-n-large-p setting have posed new challenges to statistical analysis methods. Dimension reduction, which aims to reduce the predictor dimension prior to any modeling efforts, offers a potentially useful avenue to tackle such high-dimensional regression. In this chapter, we review a number of commonly used dimension reduction approaches, including principal component analysis, partial least squares, and sliced inverse regression. For each method, we review its background and its applications in computational biology, discuss both its advantages and limitations, and offer enough operational details for implementation. A numerical example of analyzing a microarray survival data is given to illustrate applications of the reviewed reduction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rosenwald, A., Wright, G., Chan, W.C., Connors, J.M., Campo, E., Fisher, R.I., Gascoyne, R.D., Muller-Hermelink, H.K., Smeland, E.B., and Staudt, L.M. (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. The New England Journal of Medicine 346, 1937–1947.

    Article  PubMed  Google Scholar 

  2. Cook, R.D., Li, B., and Chiaromonte, F. (2007) Dimension reduction without matrix inversion. Biometrika 94, 569–584.

    Article  Google Scholar 

  3. Zhong, W., Zeng, P., Ma, P., Liu, J.S., and Zhu, Y. (2005) RSIR: regularized sliced inverse regression for motif discovery. Bioinformatics 21, 4169–4175.

    Article  PubMed  CAS  Google Scholar 

  4. Tenenbaum, J.B., Silva, V.D., and Langford, J.C. (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323.

    Article  PubMed  CAS  Google Scholar 

  5. Roweis, S.T., and Saul, L.K. (2000) Nonlinear dimensionality reduction by local linear embedding. Science 290, 2323–2326.

    Article  PubMed  CAS  Google Scholar 

  6. Wold, H. (1966) Estimation of principal components and related models by iterative least squares. In Multivariate Analysis, Ed. P. R. Krishnaiah, 391–420. New York: Academic Press.

    Google Scholar 

  7. Li, K.C. (1991) Sliced inverse regression for dimension reduction (with discussion). Journal of the American Statistical Association 86, 316–327.

    Article  Google Scholar 

  8. Jolliffe, I.T. (2002) Principal Components Analysis. Second Edition. Springer, New York.

    Google Scholar 

  9. Alter, O., Brown, P.O., and Botstein, D. (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of National Academy of Sciences, USA 97, 10101–10106.

    Article  CAS  Google Scholar 

  10. West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J.A. Jr. Marks, J.R., and Nevins J.R. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of National Academy of Sciences, USA 98, 11462–11467.

    Article  CAS  Google Scholar 

  11. Chiaromonte, F., and Martinelli, J. (2002) Dimension reduction strategies for analyzing global gene expression data with a response. Mathematical Biosciences 176, 123–144.

    Article  PubMed  CAS  Google Scholar 

  12. Li, L., and Li, H. (2004) Dimension reduction methods for microarrays with application to censored survival data. Bioinformatics 20, 3406–3412.

    Article  PubMed  CAS  Google Scholar 

  13. Li, L. (2006) Survival prediction of diffuse large-B-cell lymphoma based on both clinical and gene expression information. Bioinformatics 22, 466–471.

    Article  PubMed  CAS  Google Scholar 

  14. Wei, T., Liao, B.L., Ackermann, B.L., Jolly, R.A., Eckstein, J.A., Kulkarni, N.H., Helvering, L.M., Goldsteiin, K.M., Shou, J., Estrem, S.T., Ryan, T.P., Colet, J.-M., Thomas, C.E., Stevens, J.L., and Onyia, J.E. (2005) Data-driven analysis approach for biomarker discovery using molecular-profiling technologies. Biomarkers 10, 153–172.

    Article  PubMed  CAS  Google Scholar 

  15. Leek, J.T., and Storey, J.D. (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genetics 3, 1724–1735.

    Article  PubMed  CAS  Google Scholar 

  16. Patterson, N., Price, A.L., and Reich, D. (2006) Population structure and eigenanalysis. PLoS Genetics 2, 2074–2093.

    Article  CAS  Google Scholar 

  17. Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A., and Reich, D. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics 38, 904–909.

    Article  PubMed  CAS  Google Scholar 

  18. Cox, D.R. (1968) Notes on some aspects of regression analysis. Journal of the Royal Statistical Society, Series A. 131, 265–279.

    Article  Google Scholar 

  19. Artemiou, A., and Li, B. (2009) On principal components and regression: a statistical explanation of a natural phenomenon. Statistica Sinica, 19, 1557–1565.

    Google Scholar 

  20. Cook, R.D. (2007) Fisher Lecture: Dimension reduction in regression (with discussion). Statistical Science 22, 1–26.

    Article  Google Scholar 

  21. Cook, R.D. (1998) Regression Graphics: Ideas for Studying Regressions Through Graphics. New York: Wiley.

    Book  Google Scholar 

  22. Cook, R.D. (1996) Graphics for regressions with a binary response. Journal of the American Statistical Association 91, 983–992.

    Article  Google Scholar 

  23. Cook, R.D., and Li, B. (2002) Dimension reduction for the conditional mean in regression. Annals of Statistics 30, 455–474.

    Article  Google Scholar 

  24. Wold, H. (1975) Soft modelling by latent variables: The nonlinear partial least squares (NIPALS) approach. In Perspectives in Probability and Statistics, Papers in Honour of M.S. Barlett, Ed. J. Gani, 117–142. London: Academic Press.

    Google Scholar 

  25. Helland, I.S. (1992) Maximum likelihood regression on relevant components. Journal of Royal Statistical Society, Series B 54, 637–647.

    Google Scholar 

  26. Helland, I.S., and Almøy, T. (1994) Comparison of prediction methods when only a few components are relevant. Journal of the American Statistical Association 89, 583–591.

    Article  Google Scholar 

  27. Li, K.C., and Duan, N. (1989) Regression analysis under link violation. Annals of Statistics 17, 1009–1052.

    Article  Google Scholar 

  28. Naik, P., and Tsai, C.L. (2000) Partial least squares estimator for single-index models. Journal of the Royal Statistical Society, Series B 62, 763–771.

    Article  Google Scholar 

  29. Li, L., Cook, R.D., and Tsai, C.L. (2007) Partial inverse regression method. Biometrika 94, 615–625.

    Article  Google Scholar 

  30. Nguyen, D.V., and Rocke, D.M. (2002a) Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18, 39–50.

    Article  PubMed  CAS  Google Scholar 

  31. Pérez-Enciso, M., and Tenenhaus, M. (2003) Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis approach. Human Genetics 112, 581–592.

    PubMed  Google Scholar 

  32. Fort, G., and Lambert-Lacroix, S. (2005) Classification using partial least squares with penalized logistic regression. Bioinformatics 21, 1104–1111.

    Article  PubMed  CAS  Google Scholar 

  33. Nguyen, D.V., and Rocke, D.M. (2002b) Partial least squares proportional hazard regression for application to DNA microarray survival data. Bioinformatics 18, 1625–1632.

    Article  PubMed  CAS  Google Scholar 

  34. Park, P.J., Tian, L. and Kohane, I.S. (2002) Linking gene expression data with patient survival times using partial least squares. Bioinformatics 18, 120–127.

    Article  Google Scholar 

  35. Li, H., and Gui, J. (2004) Partial Cox regression analysis for high-dimensional microarray gene expression data. Bioinformatics 20, 208–215.

    Article  CAS  Google Scholar 

  36. Cook, R.D., and Weisberg, S. (1991) Discussion of Li (1991). Journal of American Statistical Association 86, 328–332.

    Google Scholar 

  37. Zhu, Y., and Zeng, P. (2006) Fourier methods for estimating the central subspace and the central mean subspace in regression. Journal of the American Statistical Association 101, 1638–1651.

    Article  CAS  Google Scholar 

  38. Li, B., and Wang, S. (2007) On directional regression for dimension reduction. Journal of the American Statistical Association 102, 997–1008.

    Article  CAS  Google Scholar 

  39. Li, K.C. (1992) On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s Lemma. Annals of Statistics 87, 1025–1039.

    Google Scholar 

  40. Xia, Y., Tong, H., Li, W.K., and Zhu, L.X. (2002) An adaptive estimation of dimension reduction space (with discussion). Journal of the Royal Statistical Society, Series B 64, 363–410.

    Google Scholar 

  41. Cook, R.D., and Ni, L. (2005) Sufficient dimension reduction via inverse regression: a minimum discrepancy approach. Journal of the American Statistical Association 100, 410–428.

    Article  CAS  Google Scholar 

  42. Cook, R.D., and Yin, X. (2001) Dimension reduction and visualization in discriminant analysis. Australian and New Zealand Journal of Statistics 43, 147–177.

    Article  Google Scholar 

  43. Zhu, L.X., Miao, B., and Peng, H. (2006) On sliced inverse regression with large dimensional covariates. Journal of the American Statistical Association 101, 630–643.

    Article  CAS  Google Scholar 

  44. Li, L., and Yin, X. (2008a) Sliced inverse regression with regularizations. Biometrics 64, 124–131.

    Article  PubMed  Google Scholar 

  45. Bura, E., and Pfeiffer, R.M. (2003) Graphical methods for class prediction using dimension reduction techniques on DNA microarray data. Bioinformatics 19, 1252–1258.

    Article  PubMed  CAS  Google Scholar 

  46. Antoniadis, A., Lambert-Lacroix, S., and Leblanc, F. (2003) Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19, 563–570.

    Article  PubMed  CAS  Google Scholar 

  47. Li, L., and Yin, X. (2008b) Rejoinder to “A note on sliced inverse regression with regularizations”. Biometrics 64, 982–986.

    Article  Google Scholar 

  48. Zou, H., Hastie, T., and Tibshirani, R. (2006) Sparse principal component analysis. Journal of Computational and Graphical Statistics 15, 265–286.

    Article  Google Scholar 

  49. Li, L. (2007) Sparse sufficient dimension reduction. Biometrika 94, 603–613.

    Article  Google Scholar 

  50. Ni, L., Cook, R.D., and Tsai, C.L. (2005) A note on shrinkage sliced inverse regression. Biometrika 92, 242–247.

    Article  Google Scholar 

  51. Bondell, H.D., and Li, L. (2009) Shrinkage inverse regression estimation for model free variable selection. Journal of the Royal Statistical Society, Series B 71, 287–299.

    Article  Google Scholar 

  52. Tibshirani, R. (1996) Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B 58, 267–288.

    Google Scholar 

  53. Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004) Least Angle Regression. Annals of Statistics 32, 407–451.

    Article  Google Scholar 

  54. Fan, J., and Lv, J. (2008) Sure independence screening for ultra-high dimensional feature space (with discussion). Journal of the Royal Statistical Society, Series B 70, 849–911.

    Article  Google Scholar 

  55. Li, K.C., Wang, J.L., and Chen, C.H. (1999) Dimension reduction for censored regression data. The Annals of Statistics 27, 1–23.

    Google Scholar 

  56. Hall, P., and Li, K.C. (1993) On almost linearity of low dimensional projections from high dimensional data. Annals of Statistics 21, 867–889.

    Article  Google Scholar 

  57. Cook, R.D., and Nachtsheim, C.J. (1994) Re-weighting to achieve elliptically contoured covariates in regression. Journal of the American Statistical Association 89, 592–600.

    Article  Google Scholar 

  58. Li, L., Cook, R.D., and Nachtsheim, C.J. (2004) Cluster-based estimation for sufficient dimension reduction. Computational Statistics and Data Analysis 47, 175–193.

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by National Science Foundation grant DMS 0706919.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Li, L. (2010). Dimension Reduction for High-Dimensional Data. In: Bang, H., Zhou, X., van Epps, H., Mazumdar, M. (eds) Statistical Methods in Molecular Biology. Methods in Molecular Biology, vol 620. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-580-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-60761-580-4_14

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-60761-578-1

  • Online ISBN: 978-1-60761-580-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics