Abstract
The concordance of RNA-sequencing (RNA-seq) with microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed using a range of chemical treatment conditions. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same liver samples of rats exposed in triplicate to varying degrees of perturbation by 27 chemicals representing multiple modes of action (MOAs). The cross-platform concordance in terms of differentially expressed genes (DEGs) or enriched pathways is linearly correlated with treatment effect size (R2≈0.8). Furthermore, the concordance is also affected by transcript abundance and biological complexity of the MOA. RNA-seq outperforms microarray (93% versus 75%) in DEG verification as assessed by quantitative PCR, with the gain mainly due to its improved accuracy for low-abundance transcripts. Nonetheless, classifiers to predict MOAs perform similarly when developed using data from either platform. Therefore, the endpoint studied and its biological complexity, transcript abundance and the genomic application are important factors in transcriptomic research and for clinical and regulatory decision making.
Similar content being viewed by others
References
Hamburg, M.A. Advancing regulatory science. Science 331, 987 (2011).
Chen, M., Zhang, M., Borlak, J. & Tong, W. A decade of toxicogenomic research and its contribution to toxicological science. Toxicol. Sci. 130, 217–228 (2012).
Shi, L. et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006).
Guo, L. et al. Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat. Biotechnol. 24, 1162–1169 (2006).
Shi, L. et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol. 28, 827–838 (2010).
Fan, X. et al. Consistency of predictive signature genes and classifiers generated using different microarray platforms. Pharmacogenomics J. 10, 247–257 (2010).
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M. & Gilad, Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).
Bottomly, D. et al. Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays. PLoS ONE 6, e17820 (2011).
Bradford, J.R. et al. A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling. BMC Genomics 11, 282 (2010).
Giorgi, F.M., Del Fabbro, C. & Licausi, F. Comparative study of RNA-seq- and microarray-derived coexpression networks in Arabidopsis thaliana. Bioinformatics 29, 717–724 (2013).
Malone, J.H. & Oliver, B. Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol. 9, 34 (2011).
Merrick, B.A. et al. RNA-seq profiling reveals novel hepatic gene expression pattern in Aflatoxin B1 treated rats. PLoS ONE 8, e61768 (2013).
Nookaew, I. et al. A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae. Nucleic Acids Res. 40, 10084–10097 (2012).
Raghavachari, N. et al. A systematic comparison and evaluation of high density exon arrays and RNA-seq technology used to unravel the peripheral blood transcriptome of sickle cell disease. BMC Med. Genomics 5, 28 (2012).
Sirbu, A., Kerr, G., Crane, M. & Ruskin, H.J. RNA-Seq vs dual- and single-channel microarray data: sensitivity analysis for differential expression and clustering. PLoS ONE 7, e50986 (2012).
Su, Z. et al. Comparing next-generation sequencing and microarray technologies in a toxicological study of the effects of aristolochic acid on rat kidneys. Chem. Res. Toxicol. 24, 1486–1493 (2011).
Subramaniam, S. & Hsiao, G. Gene-expression measurement: variance-modeling considerations for robust data analysis. Nat. Immunol. 13, 199–203 (2012).
Xiong, Y. et al. RNA sequencing shows no dosage compensation of the active X-chromosome. Nat. Genet. 42, 1043–1047 (2010).
Xu, W. et al. Human transcriptome array for high-throughput clinical studies. Proc. Natl. Acad. Sci. USA 108, 3707–3712 (2011).
Łabaj, P.P. et al. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics 27, i383–i391 (2011).
McIntyre, L.M. et al. RNA-seq: technical variability and sampling. BMC Genomics 12, 293 (2011).
Mooney, M. et al. Comparative RNA-Seq and microarray analysis of gene expression changes in B-cell lymphomas of Canis familiaris. PLoS ONE 8, e61088 (2013).
Sultan, M. et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321, 956–960 (2008).
SEQC/MAQC-III Consortium . A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 10.1038/nbt.2957 (24 August 2014).
Thierry-Mieg, D. & Thierry-Mieg, J. AceView: a comprehensive cDNA-supported gene and transcripts annotation. Genome Biol. 7 (suppl. 1), S12.1–14 (2006).
Irizarry, R.A. et al. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31, e15 (2003).
Li, C. & Wong, W.H. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl. Acad. Sci. USA 98, 31–36 (2001).
Smith, G.K. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. (eds. Gentleman, R., Carey, V., Huber, W., Irizarry, R. & Dudoit, S.) 397–420 (Springer, 2005).
Robinson, M.D., McCarthy, D.J. & Smyth, G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
Shi, L. et al. Microarray scanner calibration curves: characteristics and implications. BMC Bioinformatics 6 (suppl. 2), S11 (2005).
Kupershmidt, I. et al. Ontology-based meta-analysis of global collections of high-throughput public data. PLoS ONE 5, e13066 (2010).
Lu, J. & Bushel, P.R. Dynamic expression of 3′ UTRs revealed by Poisson hidden Markov modeling of RNA-Seq: implications in gene expression profiling. Gene 527, 616–623 (2013).
Katz, Y., Wang, E.T., Airoldi, E.M. & Burge, C.B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010).
Baker, S.C. et al. The External RNA Controls Consortium. a progress report. Nat. Methods 2, 731–734 (2005).
Lovén, J. et al. Revisiting global gene expression analysis. Cell 151, 476–482 (2012).
Ganter, B. et al. Development of a large-scale chemogenomics database to improve drug candidate selection and to understand mechanisms of chemical toxicity and action. J. Biotechnol. 119, 219–244 (2005).
Liu, W.M. et al. Analysis of high density expression microarrays with signed-rank call algorithms. Bioinformatics 18, 1593–1599 (2002).
Affymetrix Technical Note. Guide to Probe Logarithmic Intensity Error (PLIER) Estimation (http://www.affymetrix.com/support/technical/technotes/plier_technote.pdf) (2005).
Wu, Z., Irizarry, R.A., Gentleman, R., Martinez-Murillo, F. & Spencer, F. A model-based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc. 99, 909–917 (2004).
Fox, J. & Weisberg, S. An R Companion to Applied Regression (Sage, Thousand Oaks, CA, 2011).
Wingender, E. et al. The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29, 281–283 (2001).
Wingender, E. et al. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316–319 (2000).
Breslin, T., Krogh, M., Peterson, C. & Troein, C. Signal transduction pathway profiling of individual tumor samples. BMC Bioinformatics 6, 163 (2005).
Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Acknowledgements
We thank M. Arana and D. Mendrick for their critical review of the manuscript. This research was supported, in part, by the Intramural Research Program of the National Institutes of Health (NIH), National Institute of Environmental Health Sciences (NIEHS) (ES102345-04 and ES023026) and National Library of Medicine. P.P.Ł. and D.P.K. acknowledge support by the Vienna Scientific Cluster (VSC), the Vienna Science and Technology Fund (WWTF), Baxter AG, Austrian Research Centres (ARC) Seibersdorf and the Austrian Centre of Biopharmaceutical Technology (ACBT).
Author information
Authors and Affiliations
Contributions
W.T. coordinated the consortium study and manuscript preparation. W.T., S.S.A. and C.W. designed the study. C.W. conducted sequencing and qPCR experiments. S.S.A. provided rat tissue samples, gene expression data and contributed to the data analysis. P.R.B. was involved heavily in manuscript preparation and data analysis. B.G. and J.X. conducted the majority of data analysis and prepared various figures and supplementary materials. J.T.M. and D.T.M. constructed the mapping table between microarray and RNA-seq along with other data analysis and interpretation. All the co-authors contributed to various components of the study, including data analysis and preparation of text, figures, tables and supplementary materials.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–8, Supplementary Tables 1, 2, 4–9, 12, 13 and Supplementary Notes 1–6 (PDF 3022 kb)
Supplementary Table 3
RNA-seq data and mapping status summary based on data analysis pipeline P1 (XLSX 31 kb)
Supplementary Table 10
List of transcripts with shortened 3' UTRs detected from the samples treated by chemicals PHE and PIR (XLSX 156 kb)
Supplementary Table 11
List of differentially spliced isoforms detected in samples treated by chemicals PHE and PIR (XLSX 205 kb)
Supplementary Table 14
Master table for mapping Affymetrix probesets to RNA-seq gene annotations (XLS 8113 kb)
Rights and permissions
About this article
Cite this article
Wang, C., Gong, B., Bushel, P. et al. The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol 32, 926–932 (2014). https://doi.org/10.1038/nbt.3001
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.3001
- Springer Nature America, Inc.
This article is cited by
-
Chromosome 1p36 candidate gene ZNF436 predicts the prognosis of neuroblastoma: a bioinformatic analysis
Italian Journal of Pediatrics (2023)
-
Signature literature review reveals AHCY, DPYSL3, and NME1 as the most recurrent prognostic genes for neuroblastoma
BioData Mining (2023)
-
Short-term in vivo testing to discriminate genotoxic carcinogens from non-genotoxic carcinogens and non-carcinogens using next-generation RNA sequencing, DNA microarray, and qPCR
Genes and Environment (2023)
-
TP53 and TP53-associated genes are correlated with the prognosis of paediatric neuroblastoma
BMC Genomic Data (2022)
-
Development and validation of an RNA-seq-based transcriptomic risk score for asthma
Scientific Reports (2022)