Skip to main content

Measurement, Summary, and Methodological Variation in RNA-sequencing

  • 7025 Accesses

Part of the Frontiers in Probability and the Statistical Sciences book series (FROPROSTAS)

Abstract

There has been a major shift from microarrays to RNA-sequencing (RNA-seq) for measuring gene expression as the price per measurement between these technologies has become comparable. The advantages of RNA-seq are increased measurement flexibility to detect alternative transcription, allele specific transcription, or transcription outside of known coding regions. The price of this increased flexibility is: (a) an increase in raw data size and (b) more decisions that must be made by the data analyst. Here we provide a selective review and extension of our previous work in attempting to measure variability in results due to different choices about how to summarize and analyze RNA-sequencing data. We discuss a standard model for gene expression measurements that breaks variability down into variation due to technology, biology, and measurement error. Finally, wee show the importance of gene model selection, normalization, and choice for statistical model on the ultimate results of an RNA-sequencing experiment.

Keywords

  • Gene Expression Measurement
  • Differential Expression Signal
  • Library Size
  • Summarization Method
  • Allele Specific Transcription

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-07212-8_6
  • Chapter length: 14 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   119.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-07212-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   159.99
Price excludes VAT (USA)
Hardcover Book
USD   159.99
Price excludes VAT (USA)
Fig. 6.1
Fig. 6.2
Fig. 6.3

References

  1. A C’t Hoen, P., Friedländer, M.R., Almlöf, J., Sammeth, M., Pulyakhina, I., Anvar, S.Y., Laros, J.F., Buermans, H.P., Karlberg, O., Brännvall, M., et al.: Reproducibility of high-throughput mrna and small rna sequencing across laboratories. Nat. Biotechnol. 31, 1015–1022 (2013)

    Google Scholar 

  2. Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010). doi:10.1186/gb-2010-11-10-r106. http://genomebiology.com/2010/11/10/R106/

  3. Auer, P.L., Doerge, R.W.: Statistical design and analysis of RNA sequencing data. Genetics 185(2), 405–416 (2010)

    CrossRef  Google Scholar 

  4. Bullard, J., Purdom, E., Hansen, K.D., Dudoit, S.: Evaluation of statistical methods for normalization and differential expression in mrna-seq experiments. BMC Bioinform. 11, 94 (2010). R package version 1.10.0

    Google Scholar 

  5. Dohm, J.C., Lottaz, C., Borodina, T., Himmelbauer, H.: Substantial biases in ultra-short read data sets from high-throughput dna sequencing. Nucleic Acids Res. 36(16), e105–e105 (2008)

    CrossRef  Google Scholar 

  6. Elowitz, M., Levine, A., Siggia, E., Swain, P.: Stochastic gene expression in a single cell. Science 297(5584), 1183 (2002)

    CrossRef  Google Scholar 

  7. Frazee, A., Sabunciyan, S., Hansen, K., Irizarry, R., Leek, J.: Differential expression analysis 362 of RNA-seq data at single-base resolution. Biostatistics doi: 10.1093/biostatistics/kxt053 (2014)

  8. Friguet, C., Kloareg, M., Causer, D.: A factor model approach to multiple testing under dependence. J. Am. Stat. Assoc., 104:488, 1406–1415 (2009)

    Google Scholar 

  9. Garber, M., Grabherr, M., Guttman, M., Trapnell, C.: Computational methods for transcriptome annotation and quantification using rna-seq. Nat. Meth. 8(6), 469–477 (2011)

    CrossRef  Google Scholar 

  10. Glenn, T.C.: Field guide to next-generation dna sequencers. Mol. Ecol. Resour. 11(5), 759–769 (2011)

    CrossRef  Google Scholar 

  11. Hansen, K.D., Brenner, S.E., Dudoit, S.: Biases in illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38(12), e131 (2010)

    CrossRef  Google Scholar 

  12. Hansen, K.D., Wu, Z., Irizarry, R.A., Leek, J.T.: Sequencing technology does not eliminate biological variability. Nat. Biotechnol. 29(7), 572–573 (2011)

    CrossRef  Google Scholar 

  13. Hansen, K.D., Irizarry, R.A., Wu, Z.: Removing technical variability in rna-seq data using conditional quantile normalization. Biostatistics 13(2), 204–216 (2012)

    CrossRef  Google Scholar 

  14. Ioannidis, J.P.: Why most published research findings are false. PLoS Med. 2(8), e124 (2005)

    CrossRef  Google Scholar 

  15. Jiang, H., Wong, W.: Statistical inferences for isoform expression in rna-seq. Bioinformatics 25(8), 1026–1032 (2009)

    CrossRef  Google Scholar 

  16. Kleinman, C.L., Majewski, J.: Comment on “widespread RNA and DNA sequence differences in the human transcriptome”. Science 335(6074), 1302; author reply 1302 (2012)

    Google Scholar 

  17. Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with bowtie 2. Nat. Meth. 9(4), 357–359 (2012)

    CrossRef  Google Scholar 

  18. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)

    CrossRef  Google Scholar 

  19. Langmead, B., Hansen, K.D., Leek, J.T.: Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol. 11(8), R83 (2010)

    CrossRef  Google Scholar 

  20. Ledford, H.: The death of microarrays? Nature 455(7215), 847 (2008)

    CrossRef  Google Scholar 

  21. Leek, J., Storey, J.: Capturing heterogeneity in gene expression studies by ‘surrogate variable analysis’. PLoS Genet. 3, e161 (2007)

    CrossRef  Google Scholar 

  22. Leek, J., Storey, J.: A general framework for multiple testing dependence. PNAS 105, 18,718–18,723 (2008)

    Google Scholar 

  23. Leek, J.T., Scharpf, R.B., Bravo, H.C., Simcha, D., Langmead, B., Johnson, W.E., Geman, D., Baggerly, K., Irizarry, R.A.: Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010)

    CrossRef  Google Scholar 

  24. Li, B., Dewey, C.: Rsem: accurate transcript quantification from rna-seq data with or without a reference genome. BMC Bioinform. 12(1), 323 (2011)

    CrossRef  Google Scholar 

  25. Li, H., Durbin, R.: Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)

    CrossRef  Google Scholar 

  26. Li, H., Durbin, R.: Fast and accurate long-read alignment with burrows–wheeler transform. Bioinformatics 26(5), 589–595 (2010)

    CrossRef  Google Scholar 

  27. Li, J., Jiang, H., Wong, W.: Modeling non-uniformity in short-read rates in rna-seq data. Genome Biol. 11(5), R25 (2010)

    CrossRef  Google Scholar 

  28. Li, M., Wang, I.X., Li, Y., Bruzel, A., Richards, A.L., Toung, J.M., Cheung, V.G.: Widespread rna and dna sequence differences in the human transcriptome. Science 333(6038), 53–58 (2011)

    CrossRef  Google Scholar 

  29. Lin, W., Piskol, R., Tan, M.H., Li, J.B.: Comment on “widespread RNA and DNA sequence differences in the human transcriptome”. Science 335(6074), 1302; author reply 1302 (2012)

    Google Scholar 

  30. MacArthur, D.: Methods: face up to false positives. Nature 487(7408), 427–428 (2012)

    CrossRef  Google Scholar 

  31. McCall, M.N., Bolstad, B.M., Irizarry, R.A.: Frozen robust multiarray analysis (frma). Biostatistics 11(2), 242–253 (2010)

    CrossRef  Google Scholar 

  32. McCall, M.N., Uppal, K., Jaffee, H.A., Zilliox, M.J., Irizarry, R.A.: The gene expression barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes. Nucleic Acids Res. 39(Suppl 1), D1011–D1015 (2011)

    CrossRef  Google Scholar 

  33. NHGRI: DNA sequencing costs. http://www.genome.gov/sequencingcosts/

  34. Oshlack, A., Robinson, M.D., Young, M.D., et al.: From rna-seq reads to differential expression results. Genome Biol. 11(12), 220 (2010)

    CrossRef  Google Scholar 

  35. Piccolo, S.R., Withers, M.R., Francis, O.E., Bild, A.H., Johnson, W.E.: Multiplatform single-sample estimates of transcriptional activation. Proc. Natl. Acad. Sci. 110(44), 17,778–17,783 (2013)

    CrossRef  Google Scholar 

  36. Pickrell, J., Marioni, J., Pai, A., Degner, J., Engelhardt, B., Nkadori, E., Veyrieras, J., Stephens, M., Gilad, Y., Pritchard, J.: Understanding mechanisms underlying human gene expression variation with rna sequencing. Nature 464(7289), 768–772 (2010)

    CrossRef  Google Scholar 

  37. Pickrell, J.K., Gilad, Y., Pritchard, J.K.: Comment on “widespread RNA and DNA sequence differences in the human transcriptome”. Science 335(6074), 1302; author reply 1302 (2012)

    Google Scholar 

  38. Risso, D., Schwartz, K., Sherlock, G., Dudoit, S.: Gc-content normalization for rna-seq data. BMC Bioinform. 12(1), 480 (2011)

    CrossRef  Google Scholar 

  39. Roberts, A., Trapnell, C., Donaghey, J., Rinn, J., Pachter, L., et al.: Improving rna-seq expression estimates by correcting for fragment bias. Genome Biol. 12(3), R22 (2011)

    CrossRef  Google Scholar 

  40. Robinson, M., McCarthy, D., Smyth, G.: edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1), 139–140 (2010)

    Google Scholar 

  41. Shendure, J., Ji, H.: Next-generation DNA sequencing. Nat. Biotechnol. 26, 1135–1145 (2008)

    Google Scholar 

  42. Stein, L.D.: The case for cloud computing in genome informatics. Genome Biol. 11(5), 207 (2010)

    CrossRef  Google Scholar 

  43. Teschendorff, A.E., Zhuang, J., Widschwendter, M.: Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics 27, 1496–1505 (2011)

    CrossRef  Google Scholar 

  44. Trapnell, C., Pachter, L., Salzberg, S.L.: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9), 1105–1111 (2009)

    CrossRef  Google Scholar 

  45. Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28(5), 511–515 (2010)

    CrossRef  Google Scholar 

  46. Wang, K., Singh, D., Zeng, Z., Coleman, S.J., Huang, Y., Savich, G.L., He, X., Mieczkowski, P., Grimm, S.A., Perou, C.M., et al.: Mapsplice: accurate mapping of rna-seq reads for splice junction discovery. Nucleic Acids Res. 38(18), e178 (2010)

    CrossRef  Google Scholar 

  47. Wu, T.D., Nacu, S.: Fast and snp-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7), 873–881 (2010)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ben Langmead or Jeffrey T. Leek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Frazee, A.C., Torres, L.C., Jaffe, A.E., Langmead, B., Leek, J.T. (2014). Measurement, Summary, and Methodological Variation in RNA-sequencing. In: Datta, S., Nettleton, D. (eds) Statistical Analysis of Next Generation Sequencing Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-07212-8_6

Download citation