Abstract
High throughput gene expression profiling methods suffer from various sources of measurement bias inherent to the experimental procedures used. Most of the commonly used data standardization methods, designed to reduce the sample-to-sample variability of technical origin, do not account for probe- or transcript-specific effects. However, the efficiency of RNA isolation, cDNA synthesis and amplification does depend on the percentage of GC nucleotides in the transcript sequences and therefore constitutes a strong bias for the analysis of gene expression data. This work is focused on analysis of how and to what extent GC-content bias of oligonucleotide microarray probes affects the measurement data. We propose a mechanism explaining this phenomenon, the implications of GC-content bias for differentially expressed genes (DEGs) detection, and propose a new data standardization method, which by using sample-specific background intensity estimation and LOESS regression, allows to counteract the described effects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arezi, B., Xing, W., Sorge, J.A., Hogrefe, H.H.: Amplification efficiency of thermostable dna polymerases. Anal. Biochem. 321(2), 226–235 (2003)
Barnes, M., Freudenberg, J., Thompson, S., Aronow, B., Pavlidis, P.: Experimental comparison and cross-validation of the affymetrix and illumina gene expression analysis platforms. Nucleic Acids Res. 33(18), 5914–5923 (2005)
Beekman, J.M., Boess, F., Hildebrand, H., Kalkuhl, A., Suter, L.: Gene expression analysis of the hepatotoxicant methapyrilene in primary rat hepatocytes: an interlaboratory study. Environ. Health Perspect. 114(1), 92–99 (2006)
Benjamini, Y., Speed, T.P.: Summarizing and correcting the gc content bias in high-throughput sequencing. Nucleic Acids Res. 40(10), e72 (2012)
Choe, S.E., Boutros, M., Michelson, A.M., Church, G.M., Halfon, M.S.: Preferred analysis methods for affymetrix genechips revealed by a wholly defined control dataset. Genome Biol. 6(2), R16 (2005)
Dobbin, K.K., Beer, D.G., Meyerson, M., Yeatman, T.J., Gerald, W.L., et al.: Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin. Cancer Res. 11(2 Pt 1), 565–572 (2005)
Guo, L., Lobenhofer, E.K., Wang, C., Shippy, R., Harris, S.C., et al.: Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat. Biotechnol. 24(9), 1162–1169 (2006)
Hockley, S.L., Mathijs, K., Staal, Y.C.M., Brewer, D., Giddings, I., van Delft, J.H.M., Phillips, D.H.: Interlaboratory and interplatform comparison of microarray gene expression analysis of HepG2 cells exposed to benzo(a)pyrene. OMICS 13(2), 115–125 (2009)
Irizarry, R.A., Warren, D., Spencer, F., Kim, I.F., Biswal, S., et al.: Multiple-laboratory comparison of microarray platforms. Nat. Methods 2(5), 345–350 (2005)
Jaksik, R., Marczyk, M., Polanska, J., Rzeszowska-Wolny, J.: Sources of high variance between probe signals in affymetrix short oligonucleotide microarrays. Sensors 14(1), 532–548 (2013)
Pease, A.C., Solas, D., Sullivan, E.J., Cronin, M.T., Holmes, C.P., Fodor, S.P.: Light-generated oligonucleotide arrays for rapid dna sequence analysis. Proc. Natl. Acad. Sci. 91(11), 5022–5026 (1994)
Risso, D., Schwartz, K., Sherlock, G., Dudoit, S.: GC-content normalization for RNA-Seq data. BMC Bioinform. 12(1), 480 (2011)
Schuster, E.F., Blanc, E., Partridge, L., Thornton, J.M.: Estimation and correction of non-specific binding in a large-scale spike-in experiment. Genome Biol. 8(6), R126 (2007)
Shi, L., Reid, L.H., Jones, W.D., Shippy, R., Warrington, J.A., et al.: The microarray quality control (maqc) project shows inter-and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24(9), 1151–1161 (2006)
Shi, L., Tong, W., Fang, H., Scherf, U., Han, J., et al.: Cross-platform comparability of microarray technology: intra-platform consistency and appropriate data analysis procedures are essential. BMC Bioinform. 6(Suppl 2), S12 (2005)
Acknowledgments
This work was supported by the Polish National Centre for Research and Development grant number POIG.02.03.01-00-040/13. Calculations were carried out using the computer cluster Ziemowit (http://ziemowit.hpc.polsl.pl) funded by the Silesian BIO-FARMA project No. POIG.02.01.00-00-166/08 in the Computational Biology and Bioinformatics Laboratory of the Biotechnology Centre in the Silesian University of Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Jaksik, R., Bensz, W., Smieja, J. (2016). Nucleotide Composition Based Measurement Bias in High Throughput Gene Expression Studies. In: Gruca, A., Brachman, A., Kozielski, S., Czachórski, T. (eds) Man–Machine Interactions 4. Advances in Intelligent Systems and Computing, vol 391. Springer, Cham. https://doi.org/10.1007/978-3-319-23437-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-23437-3_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23436-6
Online ISBN: 978-3-319-23437-3
eBook Packages: EngineeringEngineering (R0)