Abstract
The sources of inter-sample variability in omic studies are not only biological but often also technical. Assessment of the relative magnitude of biological and technical sources of variation is therefore of paramount importance, especially in the context of epigenome-wide association studies (EWAS) where biological signals are quantitative and may be of a relatively small magnitude. This chapter introduces the reader to a general strategy for determining the number and nature of the sources of variation in an omic data set. It further presents guidelines for inter-sample normalisation. Techniques and tools are illustrated throughout with examples from cancer epigenome and EWAS studies.
Keywords
- Singular Value Decomposition
- Random Matrix Theory
- Retinoic Acid Receptor Alpha
- Surrogate Variable Analysis
- Differential Methylation Locus
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Anjum S, Fourkala EO, Zikan M, Wong A, Gentry-Maharaj A, Jones A, Hardy R, Cibula D, Kuh D, Jacobs IJ, Teschendorff AE, Menon U, Widschwendter M. A BRCA1-mutation associated DNA methylation signature in blood cells predicts sporadic breast cancer incidence and survival. Genome Med. 2014;6(6):47.
Bell CG, Teschendorff AE, Rakyan VK, Maxwell AP, Beck S, Savage DA. Genome-wide DNA methylation analysis for diabetic nephropathy in type 1 diabetes mellitus. BMC Med Genomics. 2010;3:33.
Bishop CM. Pattern recognition and machine learning. New York: Springer; 2006.
Buja A, Eyuboglu N. Remarks on parallel analysis. Multivar Behav Res. 1992;27(4):509–40.
Comon P. Independent component analysis, a new concept? Signal Process. 1994;36(3):287–314.
de Jong S, Neeleman M, Luykx JJ, ten Berg MJ, Strengman E, den Breeijen HH, Stijvers LC, Buizer-Voskamp JE, Bakker SC, Kahn RS, Horvath S, van Solinge WW, Ophoff RA. Seasonal changes in gene expression represent cell-type composition in whole blood. Hum Mol Genet. 2014;23(10):2721–8.
Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R, Min R, Alves P, Abyzov A, Addleman N, Bhardwaj N, Boyle AP, Cayting P, Charos A, Chen DZ, Cheng Y, Clarke D, Eastman C, Euskirchen G, Frietze S, Fu Y, Gertz J, Grubert F, Harmanci A, Jain P, Kasowski M, Lacroute P, Leng J, Lian J, Monahan H, O’Geen H, Ouyang Z, Partridge EC, Patacsil D, Pauli F, Raha D, Ramirez L, Reddy TE, Reed B, Shi M, Slifer T, Wang J, Wu L, Yang X, Yip KY, Zilberman-Schapira G, Batzoglou S, Sidow A, Farnham PJ, Myers RM, Weissman SM, Snyder M. Architecture of the human regulatory network derived from encode data. Nature. 2012;489(7414):91–100.
Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 2013;14(10):R115.
Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, Wiencke JK, Kelsey KT. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinform. 2012;13:86.
Houseman EA, Molitor J, Marsit CJ. Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics. 2014;30(10):1431–9.
Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118–27.
Langevin SM, Houseman EA, Accomando WP, Koestler DC, Christensen BC, Nelson HH, Karagas MR, Marsit CJ, Wiencke JK, Kelsey KT. Leukocyte-adjusted epigenome-wide association studies of blood from solid tumor patients. Epigenetics. 2014;9(6):884–95.
Lechner M, Fenton T, West J, Wilson G, Feber A, Henderson S, Thirlwell C, Dibra HK, Jay A, Butcher L, Chakravarthy AR, Gratrix F, Patel N, Vaz F, O’Flynn P, Kalavrezos N, Teschendorff AE, Boshoff C, Beck S. Identification and functional validation of HPV-mediated hypermethylation in head and neck squamous cell carcinoma. Genome Med. 2013;5(2):15.
Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010;11(10):733–9.
Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3(9):1724–35.
Leek JT, Storey JD. A general framework for multiple testing dependence. Proc Natl Acad Sci U S A. 2008;105(48):18718–23.
Manoli SE, Smith LA, Vyhlidal CA, An CH, Porrata Y, Cardoso WV, Baron RM, Haley KJ. Maternal smoking and the retinoid pathway in the developing lung. Respir Res. 2012;13:42.
Morris TJ, Butcher LM, Feber A, Teschendorff AE, Chakravarthy AR, Wojdacz TK, Beck S. ChAMP: 450k chip analysis methylation pipeline. Bioinformatics. 2014;30(3):428–30.
Philibert RA, Beach SR, Brody GH. Demethylation of the aryl hydrocarbon receptor repressor as a biomarker for nascent smokers. Epigenetics. 2012;7(11):1331–8.
Rakyan VK, Down TA, Balding DJ, Beck S. Epigenome-wide association studies for common human diseases. Nat Rev Genet. 2011;12(8):529–41.
Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M, Esteller M. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics. 2011;6(6):692–702.
Shenker NS, Polidoro S, van Veldhoven K, Sacerdote C, Ricceri F, Birrell MA, Belvisi MG, Brown R, Vineis P, Flanagan JM. Epigenome-wide association study in the European prospective investigation into cancer and nutrition (EPIC-turin) identifies novel genetic loci associated with smoking. Hum Mol Genet. 2013;22(5):843–51.
Teschendorff AE, Menon U, Gentry-Maharaj A, Ramus SJ, Gayther SA, Apostolidou S, Jones A, Lechner M, Beck S, Jacobs IJ, Widschwendter M. An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS One. 2009;4(12):e8274. doi: 10.1371/journal.pone.0008274
Teschendorff AE, Menon U, Gentry-Maharaj A, Ramus SJ, Weisenberger, DJ, Shen H, Campan M, Noushmehr H, Bell CG, Maxwell AP, Savage, DA, Mueller-Holzner E, Marth C, Kocjan G, Gayther SA, Jones A, Beck S, Wagner W, Laird PW, Jacobs IJ, Widschwendter M. Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res. 2010;20(4):440–6.
Teschendorff AE, Renard E, Absil PA. Supervised normalisation of large-scale omic datasets using blind source separation. In: Ganesh RN, Wenwu W, editors. Blind source separation: advances in theory, algorithms and applications. Berlin: Springer; 2014.
Teschendorff AE, West J, Beck S. Age-associated epigenetic drift: implications, and a case of epigenetic thrift? Hum Mol Genet. 2013;22(NA):R7–15.
Teschendorff AE, Zhuang J, Widschwendter M. Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics. 2011;27(11):1496–505.
Zeilinger S, Kuhnel B, Klopp N, Baurecht H, Kleinschmidt A, Gieger C, Weidinger S, Lattka E, Adamski J, Peters A, Strauch K, Waldenberger M, Illig T. Tobacco smoking leads to extensive genome-wide changes in dna methylation. PLoS One. 2013;8(5):e63,812.
Acknowledgements
AET is supported by the Chinese Academy of Sciences and the Max Planck Society.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Yang, Z., Teschendorff, A.E. (2015). A General Strategy for Inter-sample Variability Assessment and Normalisation. In: Teschendorff, A. (eds) Computational and Statistical Epigenomics. Translational Bioinformatics, vol 7. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-9927-0_3
Download citation
DOI: https://doi.org/10.1007/978-94-017-9927-0_3
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-9926-3
Online ISBN: 978-94-017-9927-0
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)