RNA Isoform Discovery Through Goodness of Fit Diagnostics

Salzman, Julia

doi:10.1007/978-3-319-07212-8_13

RNA Isoform Discovery Through Goodness of Fit Diagnostics

Julia Salzman⁸

Chapter
First Online: 01 January 2014

7451 Accesses

Part of the book series: Frontiers in Probability and the Statistical Sciences ((FROPROSTAS))

Abstract

There is great interest from the biological community—basic scientists to clinicians—in determining the expressed RNA isoforms in cells. Determining the extent of RNA expression has potential implications for basic scientific models in biology and for diagnosing and treating diseases such as cancer. Next generation sequencing provides an opportunity to discover expressed RNA isoforms that have previously not been detected. Algorithms for detecting these isoforms from RNA-seq data have attracted great interest and have been quite successful. However, even the most widely used algorithms generally do not assess goodness of fit statistics, even when they are based on statistical models. This leads to high rates of false positives in algorithm output and makes real biological signal more difficult to detect. The goal of this chapter is to present a simple statistical method for isoform discovery based on assessing goodness of fit of a statistical model for mismatches of aligned reads to putative isoforms in RNA-seq data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Black Pyrkosz, A., Cheng, H., Titus Brown, C.: RNA-Seq Mapping Errors When Using Incomplete Reference Transcriptomes of Vertebrates. ArXiv e-prints (2013)
Google Scholar
Degner, J.F., Marioni, J.C., Pai, A.A., Pickrell, J.K., Nkadori, E., Gilad, Y., Pritchard, J.K.: Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25(24), 3207–3212 (2009). doi:10.1093/bioinformatics/btp579. http://bioinformatics.oxfordjournals.org/content/25/24/3207.abstract
Hansen, K.D., Brenner, S.E., Dudoit, S.: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38(12), e131 (2010)
Article Google Scholar
Harrow, J., Frankish, A., Gonzalez, J.M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B.L., Barrell, D., Zadissa, A., Searle, S., Barnes, I., Bignell, A., Boychenko, V., Hunt, T., Kay, M., Mukherjee, G., Rajan, J., Despacio-Reyes, G., Saunders, G., Steward, C., Harte, R., Lin, M., Howald, C., Tanzer, A., Derrien, T., Chrast, J., Walters, N., Balasubramanian, S., Pei, B., Tress, M., Rodriguez, J.M., Ezkurdia, I., van Baren, J., Brent, M., Haussler, D., Kellis, M., Valencia, A., Reymond, A., Gerstein, M., Guigio, R., Hubbard, T.J.: Gencode: the reference human genome annotation for the encode project. Genome Res. 22(9), 1760–1774 (2012)
Article Google Scholar
Hoaglin, D.: A poissonness plot. Am. Stat. 34(3), 146–149 (1980)
Google Scholar
Jiang, H., Salzman, J.: A penalized likelihood approach for robust estimation of isoform expression. arXiv:1310.0379 (2013, preprint)
Google Scholar
Jiang, H., Wong, W.H.: Statistical inferences for isoform expression in RNA-seq. Bioinformatics 25(8), 1026–1032 (2009)
Article Google Scholar
Kemp, A., Kemp, D.: Weldon’s dice data revisted. Am. Stat. 45(3), 216–222 (1991)
MathSciNet Google Scholar
Keren, H., Lev-Maor, G., Ast, G.: Alternative splicing and evolution: diversification, exon definition and function. Nat. Rev. Genet. 11(5), 345–355 (2010). doi:10.1038/nrg2776. http://www.ncbi.nlm.nih.gov/pubmed/20376054
Langmead, B.: Aligning short sequencing reads with Bowtie. In: Baxevanis, A.D., et al. (eds.) Current Protocols in Bioinformatics/Editoral Board, Chapter 11, Unit 11 7 (2010). doi:10.1002/0471250953.bi1107s32. http://www.ncbi.nlm.nih.gov/pubmed/21154709
Li, B., Dewey, C.N.: Rsem: accurate transcript quantification from rna-seq data with or without a reference genome. BMC Bioinform. 12, 323 (2011)
Article Google Scholar
Li, J., Jiang, H., Wong, W.H.: Modeling non-uniformity in short-read rates in rna-seq data. Genome Biol. 11(5), R50 (2010)
Article Google Scholar
Li, J.J., Jiang, C.R., Brown, J.B., Huang, H., Bickel, P.J.: Sparse linear modeling of next-generation mRNA sequencing (RNA-seq) data for isoform discovery and abundance estimation. Proc. Natl. Acad. Sci. 108(50), 19,867–19,872 (2011). doi:10.1073/pnas.1113972108. http://www.pnas.org/content/108/50/19867.abstract
Lopez-Bigas, N., Audit, B., Ouzounis, C., Parra, G., Guigo, R.: Are splicing mutations the most frequent cause of hereditary disease? FEBS Lett. 579(9), 1900–1903 (2005)
Article Google Scholar
Marquez, Y., Brown, J.W., Simpson, C., Barta, A., Kalyna, M.: Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res. 22(6), 1184–1195 (2012). doi:10.1101/gr.134106.111. http://www.ncbi.nlm.nih.gov/pubmed/22391557
Meacham, F., Boffelli, D., Dhahbi, J., Martin, D.I., Singer, M., Pachter, L.: Identification and correction of systematic error in high-throughput sequence data. BMC Bioinform. 12, 451 (2011). doi:10.1186/1471-2105-12-451. http://www.ncbi.nlm.nih.gov/pubmed/22099972
Pachter, L.: Models for transcript quantification from RNA-Seq. ArXiv e-prints (2011)
Google Scholar
Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L., Pachter, L.: Improving rna-seq expression estimates by correcting for fragment bias. Genome Biol. 12(3), R22 (2011)
Article Google Scholar
Salzman, J.: Spectral analysis with markov chains. Ph.D. thesis, Stanford (2007)
Google Scholar
Salzman, J., Jiang, H., Wong, W.H.: Statistical modeling of RNA-Seq data. Stat. Sci. 26(1), 62–83 (2011)
Article MATH MathSciNet Google Scholar
Salzman, J., Gawad, C., Wang, P.L., Lacayo, N., Brown, P.O.: Circular rnas are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS ONE 7(2), e30,733 (2012)
Article Google Scholar
Salzman, J., Chen, R.E., Olsen, M.N., Wang, P.L., Brown, P.O.: Cell-type specific features of circular RNA expression. PLoS Genet. 9(9), e1003,777 (2013)
Article Google Scholar
Sorber, K., Dimon, M.T., DeRisi, J.L.: RNA-Seq analysis of splicing in Plasmodium falciparum uncovers new splice junctions, alternative splicing and splicing of antisense transcripts. Nucleic Acids Res. 39(9), 3820–3835 (2011). doi:10.1093/nar/gkq1223. http://www.ncbi.nlm.nih.gov/pubmed/21245033
Sun, W., You, X., Gogol-Doring, A., He, H., Kise, Y., Sohn, M., Chen, T., Klebes, A., Schmucker, D., Chen, W.: Ultra-deep profiling of alternatively spliced Drosophila Dscam isoforms by circularization-assisted multi-segment sequencing. EMBO J. 32(14), 2029–2038 (2013). doi:10.1038/emboj.2013.144. http://www.ncbi.nlm.nih.gov/pubmed/23792425
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotech. 28(5), 511–515 (2010)
Article Google Scholar
Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., Burge, C.B.: Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221), 470–476 (2008)
Article Google Scholar
Yang, W., Lu, Z.: Nuclear PKM2 regulates the Warburg effect. Cell Cycle 12(19), 3154–3158 (2013). doi:10.4161/cc.26182. http://www.ncbi.nlm.nih.gov/pubmed/24013426

Download references

Acknowledgements

I thank the editors for helpful comments that improved the exposition of this chapter.

Author information

Authors and Affiliations

Stanford University, Stanford, CA, USA
Julia Salzman

Authors

Julia Salzman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julia Salzman .

Editor information

Editors and Affiliations

Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky, USA
Somnath Datta
Department of Statistics, Iowa State University, Ames, Iowa, USA
Dan Nettleton

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Salzman, J. (2014). RNA Isoform Discovery Through Goodness of Fit Diagnostics. In: Datta, S., Nettleton, D. (eds) Statistical Analysis of Next Generation Sequencing Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-07212-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-07212-8_13
Published: 17 June 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07211-1
Online ISBN: 978-3-319-07212-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics