Abstract
There is great interest from the biological community—basic scientists to clinicians—in determining the expressed RNA isoforms in cells. Determining the extent of RNA expression has potential implications for basic scientific models in biology and for diagnosing and treating diseases such as cancer. Next generation sequencing provides an opportunity to discover expressed RNA isoforms that have previously not been detected. Algorithms for detecting these isoforms from RNA-seq data have attracted great interest and have been quite successful. However, even the most widely used algorithms generally do not assess goodness of fit statistics, even when they are based on statistical models. This leads to high rates of false positives in algorithm output and makes real biological signal more difficult to detect. The goal of this chapter is to present a simple statistical method for isoform discovery based on assessing goodness of fit of a statistical model for mismatches of aligned reads to putative isoforms in RNA-seq data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Black Pyrkosz, A., Cheng, H., Titus Brown, C.: RNA-Seq Mapping Errors When Using Incomplete Reference Transcriptomes of Vertebrates. ArXiv e-prints (2013)
Degner, J.F., Marioni, J.C., Pai, A.A., Pickrell, J.K., Nkadori, E., Gilad, Y., Pritchard, J.K.: Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25(24), 3207–3212 (2009). doi:10.1093/bioinformatics/btp579. http://bioinformatics.oxfordjournals.org/content/25/24/3207.abstract
Hansen, K.D., Brenner, S.E., Dudoit, S.: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38(12), e131 (2010)
Harrow, J., Frankish, A., Gonzalez, J.M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B.L., Barrell, D., Zadissa, A., Searle, S., Barnes, I., Bignell, A., Boychenko, V., Hunt, T., Kay, M., Mukherjee, G., Rajan, J., Despacio-Reyes, G., Saunders, G., Steward, C., Harte, R., Lin, M., Howald, C., Tanzer, A., Derrien, T., Chrast, J., Walters, N., Balasubramanian, S., Pei, B., Tress, M., Rodriguez, J.M., Ezkurdia, I., van Baren, J., Brent, M., Haussler, D., Kellis, M., Valencia, A., Reymond, A., Gerstein, M., Guigio, R., Hubbard, T.J.: Gencode: the reference human genome annotation for the encode project. Genome Res. 22(9), 1760–1774 (2012)
Hoaglin, D.: A poissonness plot. Am. Stat. 34(3), 146–149 (1980)
Jiang, H., Salzman, J.: A penalized likelihood approach for robust estimation of isoform expression. arXiv:1310.0379 (2013, preprint)
Jiang, H., Wong, W.H.: Statistical inferences for isoform expression in RNA-seq. Bioinformatics 25(8), 1026–1032 (2009)
Kemp, A., Kemp, D.: Weldon’s dice data revisted. Am. Stat. 45(3), 216–222 (1991)
Keren, H., Lev-Maor, G., Ast, G.: Alternative splicing and evolution: diversification, exon definition and function. Nat. Rev. Genet. 11(5), 345–355 (2010). doi:10.1038/nrg2776. http://www.ncbi.nlm.nih.gov/pubmed/20376054
Langmead, B.: Aligning short sequencing reads with Bowtie. In: Baxevanis, A.D., et al. (eds.) Current Protocols in Bioinformatics/Editoral Board, Chapter 11, Unit 11 7 (2010). doi:10.1002/0471250953.bi1107s32. http://www.ncbi.nlm.nih.gov/pubmed/21154709
Li, B., Dewey, C.N.: Rsem: accurate transcript quantification from rna-seq data with or without a reference genome. BMC Bioinform. 12, 323 (2011)
Li, J., Jiang, H., Wong, W.H.: Modeling non-uniformity in short-read rates in rna-seq data. Genome Biol. 11(5), R50 (2010)
Li, J.J., Jiang, C.R., Brown, J.B., Huang, H., Bickel, P.J.: Sparse linear modeling of next-generation mRNA sequencing (RNA-seq) data for isoform discovery and abundance estimation. Proc. Natl. Acad. Sci. 108(50), 19,867–19,872 (2011). doi:10.1073/pnas.1113972108. http://www.pnas.org/content/108/50/19867.abstract
Lopez-Bigas, N., Audit, B., Ouzounis, C., Parra, G., Guigo, R.: Are splicing mutations the most frequent cause of hereditary disease? FEBS Lett. 579(9), 1900–1903 (2005)
Marquez, Y., Brown, J.W., Simpson, C., Barta, A., Kalyna, M.: Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res. 22(6), 1184–1195 (2012). doi:10.1101/gr.134106.111. http://www.ncbi.nlm.nih.gov/pubmed/22391557
Meacham, F., Boffelli, D., Dhahbi, J., Martin, D.I., Singer, M., Pachter, L.: Identification and correction of systematic error in high-throughput sequence data. BMC Bioinform. 12, 451 (2011). doi:10.1186/1471-2105-12-451. http://www.ncbi.nlm.nih.gov/pubmed/22099972
Pachter, L.: Models for transcript quantification from RNA-Seq. ArXiv e-prints (2011)
Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L., Pachter, L.: Improving rna-seq expression estimates by correcting for fragment bias. Genome Biol. 12(3), R22 (2011)
Salzman, J.: Spectral analysis with markov chains. Ph.D. thesis, Stanford (2007)
Salzman, J., Jiang, H., Wong, W.H.: Statistical modeling of RNA-Seq data. Stat. Sci. 26(1), 62–83 (2011)
Salzman, J., Gawad, C., Wang, P.L., Lacayo, N., Brown, P.O.: Circular rnas are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS ONE 7(2), e30,733 (2012)
Salzman, J., Chen, R.E., Olsen, M.N., Wang, P.L., Brown, P.O.: Cell-type specific features of circular RNA expression. PLoS Genet. 9(9), e1003,777 (2013)
Sorber, K., Dimon, M.T., DeRisi, J.L.: RNA-Seq analysis of splicing in Plasmodium falciparum uncovers new splice junctions, alternative splicing and splicing of antisense transcripts. Nucleic Acids Res. 39(9), 3820–3835 (2011). doi:10.1093/nar/gkq1223. http://www.ncbi.nlm.nih.gov/pubmed/21245033
Sun, W., You, X., Gogol-Doring, A., He, H., Kise, Y., Sohn, M., Chen, T., Klebes, A., Schmucker, D., Chen, W.: Ultra-deep profiling of alternatively spliced Drosophila Dscam isoforms by circularization-assisted multi-segment sequencing. EMBO J. 32(14), 2029–2038 (2013). doi:10.1038/emboj.2013.144. http://www.ncbi.nlm.nih.gov/pubmed/23792425
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotech. 28(5), 511–515 (2010)
Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., Burge, C.B.: Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221), 470–476 (2008)
Yang, W., Lu, Z.: Nuclear PKM2 regulates the Warburg effect. Cell Cycle 12(19), 3154–3158 (2013). doi:10.4161/cc.26182. http://www.ncbi.nlm.nih.gov/pubmed/24013426
Acknowledgements
I thank the editors for helpful comments that improved the exposition of this chapter.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Salzman, J. (2014). RNA Isoform Discovery Through Goodness of Fit Diagnostics. In: Datta, S., Nettleton, D. (eds) Statistical Analysis of Next Generation Sequencing Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-07212-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-07212-8_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07211-1
Online ISBN: 978-3-319-07212-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)