Missing Values Estimation in Microarray Data with Partial Least Squares Regression

  • Kun Yang
  • Jianzhong Li
  • Chaokun Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3992)


Microarray data usually contain missing values, thus estimating these missing values is an important preprocessing step. This paper proposes an estimation method of missing values based on Partial Least Squares (PLS) regression. The method is feasible for microarray data, because of the characteristics of PLS regression. We compared our method with three methods, including ROWaverage, KNNimpute and LLSimpute, on different data and various missing probabilities. The experimental results show that the proposed method is accurate and robust for estimating missing values.


Microarray Data Partial Little Square Similar Gene Partial Little Square Normalize Root Mean Square Error 


  1. 1.
    Chu, S., DeRisi, J., et al.: The transcriptioal program of sporulation in budding yeast. Science 278, 680–686 (1998)Google Scholar
  2. 2.
    Alon, U., Barkai, N., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotid arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)CrossRefGoogle Scholar
  3. 3.
    Golub, T.R., Slonim, D.K., et al.: Molecular classification of cancer: class discovery and class prediction by expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  4. 4.
    Alizadeh, A.A., Eisen, M.B., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)CrossRefGoogle Scholar
  5. 5.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)MATHGoogle Scholar
  6. 6.
    Raychaudhuri, S., Stuart, J.M., Altman, R.: Principal components analysis to summarize microarray experiments: application to sporulation time series. In: Pac. Symp. Biocomput., pp. 455–466 (2000)Google Scholar
  7. 7.
    Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl Acad. Sci. USA 97, 10101–10106 (2000)CrossRefGoogle Scholar
  8. 8.
    Troyanskaya, O., Cantor, M., et al.: Missing value estimation methods for DNA microarray. Bioinformatics 17, 520–525 (2001)CrossRefGoogle Scholar
  9. 9.
    Oba, S., Sato, M., et al.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19, 2088–2096 (2003)CrossRefGoogle Scholar
  10. 10.
    Bø, T.H., Dysvik, B., Jonassen, I.: LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res. 32(e34) (2004)Google Scholar
  11. 11.
    Kim, H., Golub, G.H., Park, H.: Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21, 187–198 (2005)CrossRefGoogle Scholar
  12. 12.
    Helland, I.S.: On the structure of partial least squares regression. Commun. Stat. -Simul. Comput. 17, 581–607 (1988)Google Scholar
  13. 13.
    Garthwaite, P.H.: An interpretation of partial least squares. J. Am. Stat. Assoc. 89, 122–127 (1994)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Wang, H.: Partial Least-squares Regression — Method and Applications. National Defence Industry Press, China (1999)Google Scholar
  15. 15.
    Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. 36, 111–133 (1974)MATHGoogle Scholar
  16. 16.
    Spellman, P.T., Sherlock, G., et al.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998)Google Scholar
  17. 17.
    Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by ologonucleotide arrays. Proc. Natl Acad. Sci. USA 96, 6745–6750 (1999)CrossRefGoogle Scholar
  18. 18.
    Ouyang, M., Welsh, W.J., Georgopoulos, P.: Gaussian mixture clustering and imputation of microarray data. Bioinformatics 20, 917–923 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Kun Yang
    • 1
  • Jianzhong Li
    • 1
  • Chaokun Wang
    • 1
    • 2
  1. 1.Department of Computer Science and EngineeringHarbin Institute of TechnologyHarbinChina
  2. 2.School of SoftwareTsinghua UniversityBeijingChina

Personalised recommendations