Skip to main content

RNA Isoform Discovery Through Goodness of Fit Diagnostics

  • Chapter
  • First Online:
  • 7451 Accesses

Part of the book series: Frontiers in Probability and the Statistical Sciences ((FROPROSTAS))

Abstract

There is great interest from the biological community—basic scientists to clinicians—in determining the expressed RNA isoforms in cells. Determining the extent of RNA expression has potential implications for basic scientific models in biology and for diagnosing and treating diseases such as cancer. Next generation sequencing provides an opportunity to discover expressed RNA isoforms that have previously not been detected. Algorithms for detecting these isoforms from RNA-seq data have attracted great interest and have been quite successful. However, even the most widely used algorithms generally do not assess goodness of fit statistics, even when they are based on statistical models. This leads to high rates of false positives in algorithm output and makes real biological signal more difficult to detect. The goal of this chapter is to present a simple statistical method for isoform discovery based on assessing goodness of fit of a statistical model for mismatches of aligned reads to putative isoforms in RNA-seq data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Black Pyrkosz, A., Cheng, H., Titus Brown, C.: RNA-Seq Mapping Errors When Using Incomplete Reference Transcriptomes of Vertebrates. ArXiv e-prints (2013)

    Google Scholar 

  2. Degner, J.F., Marioni, J.C., Pai, A.A., Pickrell, J.K., Nkadori, E., Gilad, Y., Pritchard, J.K.: Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25(24), 3207–3212 (2009). doi:10.1093/bioinformatics/btp579. http://bioinformatics.oxfordjournals.org/content/25/24/3207.abstract

  3. Hansen, K.D., Brenner, S.E., Dudoit, S.: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38(12), e131 (2010)

    Article  Google Scholar 

  4. Harrow, J., Frankish, A., Gonzalez, J.M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B.L., Barrell, D., Zadissa, A., Searle, S., Barnes, I., Bignell, A., Boychenko, V., Hunt, T., Kay, M., Mukherjee, G., Rajan, J., Despacio-Reyes, G., Saunders, G., Steward, C., Harte, R., Lin, M., Howald, C., Tanzer, A., Derrien, T., Chrast, J., Walters, N., Balasubramanian, S., Pei, B., Tress, M., Rodriguez, J.M., Ezkurdia, I., van Baren, J., Brent, M., Haussler, D., Kellis, M., Valencia, A., Reymond, A., Gerstein, M., Guigio, R., Hubbard, T.J.: Gencode: the reference human genome annotation for the encode project. Genome Res. 22(9), 1760–1774 (2012)

    Article  Google Scholar 

  5. Hoaglin, D.: A poissonness plot. Am. Stat. 34(3), 146–149 (1980)

    Google Scholar 

  6. Jiang, H., Salzman, J.: A penalized likelihood approach for robust estimation of isoform expression. arXiv:1310.0379 (2013, preprint)

    Google Scholar 

  7. Jiang, H., Wong, W.H.: Statistical inferences for isoform expression in RNA-seq. Bioinformatics 25(8), 1026–1032 (2009)

    Article  Google Scholar 

  8. Kemp, A., Kemp, D.: Weldon’s dice data revisted. Am. Stat. 45(3), 216–222 (1991)

    MathSciNet  Google Scholar 

  9. Keren, H., Lev-Maor, G., Ast, G.: Alternative splicing and evolution: diversification, exon definition and function. Nat. Rev. Genet. 11(5), 345–355 (2010). doi:10.1038/nrg2776. http://www.ncbi.nlm.nih.gov/pubmed/20376054

  10. Langmead, B.: Aligning short sequencing reads with Bowtie. In: Baxevanis, A.D., et al. (eds.) Current Protocols in Bioinformatics/Editoral Board, Chapter 11, Unit 11 7 (2010). doi:10.1002/0471250953.bi1107s32. http://www.ncbi.nlm.nih.gov/pubmed/21154709

  11. Li, B., Dewey, C.N.: Rsem: accurate transcript quantification from rna-seq data with or without a reference genome. BMC Bioinform. 12, 323 (2011)

    Article  Google Scholar 

  12. Li, J., Jiang, H., Wong, W.H.: Modeling non-uniformity in short-read rates in rna-seq data. Genome Biol. 11(5), R50 (2010)

    Article  Google Scholar 

  13. Li, J.J., Jiang, C.R., Brown, J.B., Huang, H., Bickel, P.J.: Sparse linear modeling of next-generation mRNA sequencing (RNA-seq) data for isoform discovery and abundance estimation. Proc. Natl. Acad. Sci. 108(50), 19,867–19,872 (2011). doi:10.1073/pnas.1113972108. http://www.pnas.org/content/108/50/19867.abstract

  14. Lopez-Bigas, N., Audit, B., Ouzounis, C., Parra, G., Guigo, R.: Are splicing mutations the most frequent cause of hereditary disease? FEBS Lett. 579(9), 1900–1903 (2005)

    Article  Google Scholar 

  15. Marquez, Y., Brown, J.W., Simpson, C., Barta, A., Kalyna, M.: Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res. 22(6), 1184–1195 (2012). doi:10.1101/gr.134106.111. http://www.ncbi.nlm.nih.gov/pubmed/22391557

  16. Meacham, F., Boffelli, D., Dhahbi, J., Martin, D.I., Singer, M., Pachter, L.: Identification and correction of systematic error in high-throughput sequence data. BMC Bioinform. 12, 451 (2011). doi:10.1186/1471-2105-12-451. http://www.ncbi.nlm.nih.gov/pubmed/22099972

  17. Pachter, L.: Models for transcript quantification from RNA-Seq. ArXiv e-prints (2011)

    Google Scholar 

  18. Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L., Pachter, L.: Improving rna-seq expression estimates by correcting for fragment bias. Genome Biol. 12(3), R22 (2011)

    Article  Google Scholar 

  19. Salzman, J.: Spectral analysis with markov chains. Ph.D. thesis, Stanford (2007)

    Google Scholar 

  20. Salzman, J., Jiang, H., Wong, W.H.: Statistical modeling of RNA-Seq data. Stat. Sci. 26(1), 62–83 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  21. Salzman, J., Gawad, C., Wang, P.L., Lacayo, N., Brown, P.O.: Circular rnas are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS ONE 7(2), e30,733 (2012)

    Article  Google Scholar 

  22. Salzman, J., Chen, R.E., Olsen, M.N., Wang, P.L., Brown, P.O.: Cell-type specific features of circular RNA expression. PLoS Genet. 9(9), e1003,777 (2013)

    Article  Google Scholar 

  23. Sorber, K., Dimon, M.T., DeRisi, J.L.: RNA-Seq analysis of splicing in Plasmodium falciparum uncovers new splice junctions, alternative splicing and splicing of antisense transcripts. Nucleic Acids Res. 39(9), 3820–3835 (2011). doi:10.1093/nar/gkq1223. http://www.ncbi.nlm.nih.gov/pubmed/21245033

  24. Sun, W., You, X., Gogol-Doring, A., He, H., Kise, Y., Sohn, M., Chen, T., Klebes, A., Schmucker, D., Chen, W.: Ultra-deep profiling of alternatively spliced Drosophila Dscam isoforms by circularization-assisted multi-segment sequencing. EMBO J. 32(14), 2029–2038 (2013). doi:10.1038/emboj.2013.144. http://www.ncbi.nlm.nih.gov/pubmed/23792425

  25. Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotech. 28(5), 511–515 (2010)

    Article  Google Scholar 

  26. Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., Burge, C.B.: Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221), 470–476 (2008)

    Article  Google Scholar 

  27. Yang, W., Lu, Z.: Nuclear PKM2 regulates the Warburg effect. Cell Cycle 12(19), 3154–3158 (2013). doi:10.4161/cc.26182. http://www.ncbi.nlm.nih.gov/pubmed/24013426

Download references

Acknowledgements

I thank the editors for helpful comments that improved the exposition of this chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julia Salzman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Salzman, J. (2014). RNA Isoform Discovery Through Goodness of Fit Diagnostics. In: Datta, S., Nettleton, D. (eds) Statistical Analysis of Next Generation Sequencing Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-07212-8_13

Download citation

Publish with us

Policies and ethics