Independent Component Analysis to Remove Batch Effects from Merged Microarray Datasets

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9838)


Merging gene expression datasets is a simple way to increase the number of samples in an analysis. However experimental and data processing conditions, which are proper to each dataset, generally influence the expression values and can hide the biological effect of interest. It is then important to normalize the bigger merged dataset regarding those batch effects, as failing to adjust for them may adversely impact statistical inference. In this context, we propose to use a “spatiotemporal” independent component analysis to model the influence of those unwanted effects and remove them from the data. We show on a real dataset that our method allows to improve this modeling and helps to improve sample classification tasks.


Batch effect removal Expression data Spatio-temporal independent component analysis 


  1. 1.
    Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. U.S.A. 97(18), 10101–10106 (2000)CrossRefGoogle Scholar
  2. 2.
    Chen, C., Grennan, K., Badner, J., Zhang, D., Gershon, E., Jin, L., Liu, C.: Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PloS ONE 6(2), e17238 (2011)CrossRefGoogle Scholar
  3. 3.
    Cardoso, J.-F.: High-order contrasts for independent component analysis. Neural Comput. 11(1), 157–192 (1999)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Desmedt, C., et al.: Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin. Cancer Res. 13(11), 3207–3214 (2007)CrossRefGoogle Scholar
  5. 5.
    Johnson, W., Li, C., Rabinovic, A.: Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1), 118–127 (2007)CrossRefMATHGoogle Scholar
  6. 6.
    Lazar, C., Meganck, S., Taminau, J., Steenhoff, D., Coletta, A., Molter, C., Weiss-Solís, D.Y., Duque, R., Bersini, H., Nowé, A.: Batch effect removal methods for microarray gene expression data integration: a survey. Brief. Bioinform. 14(4), 469–490 (2013)CrossRefGoogle Scholar
  7. 7.
    Leek, J.T., et al.: Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11(10), 733–739 (2010)CrossRefGoogle Scholar
  8. 8.
    Leek, J.T., Storey, J.D.: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PloS Genet. 3(9), e161 (2007)CrossRefGoogle Scholar
  9. 9.
    Loi, S., et al.: Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. J. Clin. Oncol. 25(10), 1239–1246 (2007)CrossRefGoogle Scholar
  10. 10.
    Miller, L.D., et al.: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc. Natl. Acad. Sci. U.S.A. 102(38), 13550–13555 (2005)CrossRefGoogle Scholar
  11. 11.
    Minn, A.J., et al.: Lung metastasis genes couple breast tumor size and metastatic spread. Proc. Natl. Acad. Sci. 104(16), 6740–6745 (2007)CrossRefGoogle Scholar
  12. 12.
    Renard, E., Teschendorff, A.E., Absil, P.-A.: Capturing confounding sources of variation in DNA methylation data by spatiotemporal independent component analysis. In: 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2014)Google Scholar
  13. 13.
    Sabatier, R., Finetti, P., Cervera, N., Lambaudie, E., Esterni, B., Mamessier, E., Tallet, A., Chabannon, C., Extra, J.-M., Jacquemier, J., Viens, P., Birnbaum, D., Bertucci, F.: A gene expression signature identifies two prognostic subgroups of basal breast cancer. Breast Cancer Res. Treat. 126(2), 407–420 (2011)CrossRefGoogle Scholar
  14. 14.
    Sainlez, M., Absil, P.-A., Teschendorff, A.E.: Gene expression data analysis using spatiotemporal blind source separation. In: 17nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2009)Google Scholar
  15. 15.
    Sotiriou, C., et al.: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J. Nat. Cancer Inst. 98(4), 262–272 (2006)CrossRefGoogle Scholar
  16. 16.
    Stone, J.V., Porrill, J., Porter, N.R., Wilkinson, I.D.: Spatiotemporal independent component analysis of event-related fMRI data using skewed probability density functions. NeuroImage 15(2), 407–421 (2002)CrossRefGoogle Scholar
  17. 17.
    Teschendorff, A.E., Zhuang, J., Widschwendter, M.: Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics 27(11), 1496–1505 (2011)CrossRefGoogle Scholar
  18. 18.
    Wang, Y., et al.: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365(9460), 671–679 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.ICTEAM InstituteUniversité catholique de LouvainLouvain-la-NeuveBelgium
  2. 2.Tools4PatientGosseliesBelgium

Personalised recommendations