Simultaneous Isoform Discovery and Quantification from RNA-Seq

Hiller, David; Wong, Wing Hung

doi:10.1007/s12561-012-9069-2

Simultaneous Isoform Discovery and Quantification from RNA-Seq

Published: 14 June 2012

Volume 5, pages 100–118, (2013)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

David Hiller¹ &
Wing Hung Wong²

822 Accesses
10 Citations
15 Altmetric
2 Mentions
Explore all metrics

Abstract

RNA sequencing is a recent technology which has seen an explosion of methods addressing all levels of analysis, from read mapping to transcript assembly to differential expression modeling. In particular the discovery of isoforms at the transcript assembly stage is a complex problem and current approaches suffer from various limitations. For instance, many approaches use graphs to construct a minimal set of isoforms which covers the observed reads, then perform a separate algorithm to quantify the isoforms, which can result in a loss of power. Current methods also use ad-hoc solutions to deal with the vast number of possible isoforms which can be constructed from a given set of reads. Finally, while the need of taking into account features such as read pairing and sampling rate of reads has been acknowledged, most existing methods do not seamlessly integrate these features as part of the model. We present Montebello, an integrated statistical approach which performs simultaneous isoform discovery and quantification by using a Monte Carlo simulation to find the most likely isoform composition leading to a set of observed reads. We compare Montebello to Cufflinks, a popular isoform discovery approach, on a simulated data set and on 46.3 million brain reads from an Illumina tissue panel. On this data set Montebello appears to offer a modest improvement over Cufflinks when considering discovery and parsimony metrics. In addition Montebello mitigates specific difficulties inherent in the Cufflinks approach. Finally, Montebello can be fine-tuned depending on the type of solution desired.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian nonparametric discovery of isoforms and individual specific quantification

Article Open access 27 April 2018

IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data

Article Open access 06 October 2014

Isoform Expression Analysis Based on RNA-seq Data

References

Anton MA, Gorostiaga D, Guruceaga E, Segura V, Carmona-Saez P, Pascual-Montano A, Pio R, Montuenga LM, Rubio A (2008) Space: an algorithm to predict and quantify alternatively spliced isoforms using microarrays. Genome Biol 9:R46
Article Google Scholar
Au KF, Jiang H, Lin L, Xing Y, Wong WH (2010) Detection of splice junctions from paired-end RNA-seq data by splicemap. Nucleic Acids Res 38(14):4570–4578
Article Google Scholar
Geyer C (1991) Markov chain Monte Carlo maximum likelihood. In: Keramidas EM (ed) Computing science and statistics: Proc 23rd symposium on the interface. Interface Foundation, Fairfax Station, pp 156–163
Google Scholar
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat Biotechnol 29:644–652
Article Google Scholar
Grant GR, Farkas MH, Pizarro AD, Lahens NF, Schug J, Brunk BP, Stoeckert CJ, Hogenesch JB, Pierce EA (2011) Comparative analysis of RNA-seq alignment algorithms and the RNA-seq unified mapper (rum). Bioinformatics 27(18):2518–2528
Google Scholar
Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J, Adiconis X, Fan L, Koziol MJ, Gnirke A, Nusbaum C, Rinn JL, Lander ES, Regev A (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nat Biotechnol 28:503–510
Article Google Scholar
Hardcastle T, Kelly K (2010) bayseq: empirical methods for identifying differential expression in sequence count data. BMC Bioinform 11(1):422
Article Google Scholar
Heber S, Alekseyev M, Sze SH, Tang H, Pevzner PA (2002) Splicing graphs and EST assembly problem. Bioinformatics 18(suppl 1):S181–S188
Article Google Scholar
Hiller D (2010) Alternative splicing analysis using RNA-seq data. PhD thesis, Stanford University
Hiller D, Jiang H, Xu W, Wong WH (2009) Identifiability of isoform deconvolution from junction arrays and RNA-seq. Bioinformatics 25(23):3056–3059
Article Google Scholar
Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D (2006) The ucsc known genes. Bioinformatics 22(9):1036–1046
Article Google Scholar
Hu M, Zhu Y, Taylor J, Liu J, Qin Z (2012) Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-seq. Bioinformatics 28(1):63–68
Article MATH Google Scholar
Jiang H (2009) Computational and statistical approaches in RNA sequencing analysis. PhD thesis, Stanford University
Jiang H, Wong W (2009) Statistical inferences for isoform expression in RNA-seq. Bioinformatics 25(8):1026–1032
Article Google Scholar
Katz Y, Wang ET, Airoldi EM, Burge CB (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7:1009–1055
Article Google Scholar
Kim H, Bi Y, Pal S, Gupta R, Davuluri R (2011) Isoformex: isoform level gene expression estimation using weighted non-negative least squares from MRNA-seq data. BMC Bioinform 12(1):305
Article Google Scholar
Lareau LF, Inada M, Green RE, Wengrod JC, Brenner SE (2007) Unproductive splicing of sr genes associated with highly conserved and ultraconserved DNA elements. Nature 446:926–929
Article Google Scholar
Lee C (2003) Generating consensus sequences from partial order multiple sequence alignment graphs. Bioinformatics 19(8):999–1008
Article Google Scholar
Lee S, Seo CH, Lim B, Yang JO, Oh J, Kim M, Lee S, Lee B, Kang C, Lee S (2011) Accurate quantification of transcriptome from RNA-seq data by effective length normalization. Nucleic Acids Res 39(2):e9
Article Google Scholar
Li B, Dewey C (2011) Rsem: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinform 12(1):323
Article Google Scholar
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Subgroup GPDP (2009) The sequence alignment/map format and samtools. Bioinformatics 25(16):2078–2079
Article Google Scholar
Li J, Jiang C, Brown J, Huang H, Bickel P (2011) Sparse linear modeling of next-generation MRNA sequencing (RNA-seq) data for isoform discovery and abundance estimation. Proc Natl Acad Sci 108(50):19,867–19,872
Article Google Scholar
Li J, Jiang H, Wong W (2010) Modeling non-uniformity in short-read rates in RNA-seq data. Genome Biol 11(5):R50
Article Google Scholar
Li W, Feng J, Jiang T (2011) Isolasso: a lasso regression approach to RNA-seq based transcriptome assembly. J Comput Biol 18(11):1693–1707
Article MathSciNet Google Scholar
Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12(10):671–682
Article Google Scholar
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods 5(7):621–628
Article Google Scholar
Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40:1413–1415
Article Google Scholar
Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L (2011) Improving RNA-seq expression estimates by correcting for fragment bias. Genome Biol 12:R22. doi:10.1186/gb-2011-12-3-r22
Article Google Scholar
Robinson MD, McCarthy DJ, Smyth GK (2010) edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140
Article Google Scholar
Salzman J, Jiang H, Wong W (2011) Statistical modeling of RNA-seq data. Stat Sci 26(1):62–83
Article MathSciNet MATH Google Scholar
Shen S, Won Park J, Huang J, Dittmar K, Lu Z, Zhou Q, Carstens R, Xing Y (2012) Mats: a Bayesian framework for flexible detection of differential alternative splicing from RNA-seq data. Nucleic Acids Res 40(8):e61
Article Google Scholar
Stegle O, Drewe P, Bohnert R, Borgwardt K, Rätsch G (2010) Statistical tests for detecting differential RNA-transcript expression from read counts. Available on nature precedings. http://precedings.nature.com/documents/4437/version/1
Tarazona S, García-Alcalde F, Dopazo J, Ferrer A, Conesa A (2011) Differential expression in RNA-seq: a matter of depth. Genome Res. doi:10.1101/gr.124321.111. URL http://genome.cshlp.org/content/early/2011/10/28/gr.124321.111.abstract
Google Scholar
Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25(9):1105–1111
Article Google Scholar
Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg S, Wold B, Pachter L (2010) Transcript assembly and abundance estimation from RNA-seq reveals thousands of new transcripts and switching among isoforms. Nat Biotechnol 28:511–515
Article Google Scholar
Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456:470–476
Article Google Scholar
Wang H, Hubbell E, Hu JS, Mei G, Cline M, Lu G, Clark T, Siani-Rose MA, Ares M, Kulp DC, Haussler D (2003) Gene structure-based splice variant deconvolution using a microarry platform. Bioinformatics 19:i315–i322
Article Google Scholar
Xia Z, Wen J, Chang CC, Zhou X (2011) Nsmap: a method for spliced isoforms identification and quantification from RNA-seq. BMC Bioinform 12(1):162. doi:10.1186/1471-2105-12-162. URL http://www.biomedcentral.com/1471-2105/12/162
Article Google Scholar
Xing Y, Yu T, Wu YN, Roy M, Kim J, Lee C (2006) An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res 34(10):3150–3160
Article Google Scholar
Zhou YH, Xia K, Wright FA (2011) A powerful and flexible approach to the analysis of RNA sequence count data. Bioinformatics 27(19):2672–2678
Article Google Scholar

Download references

Acknowledgements

We thank Hui Jiang and Nicholas Johnson for useful discussions. D.H. developed and tested the model. W.H.W. initiated and supervised the project. D.H. drafted and W.H.W. revised the paper. D.H. was funded a Ric Weiland Graduate Fellowship (Stanford University) and by NIH grants R01 HG004634 and R01 HG005220. W.H.W. was supported by NIH grants R01 HG004634 and R01 HG005717.

Author information

Authors and Affiliations

Center for Epigenetics, Johns Hopkins School of Medicine, 855 N. Wolfe St., Rangos 570, Baltimore, MD, 21205, USA
David Hiller
Department of Statistics, Stanford University, Sequoia Hall, 390 Serra Mall, Stanford, CA, 94305, USA
Wing Hung Wong

Authors

David Hiller
View author publications
You can also search for this author in PubMed Google Scholar
Wing Hung Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Hiller.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hiller, D., Wong, W.H. Simultaneous Isoform Discovery and Quantification from RNA-Seq. Stat Biosci 5, 100–118 (2013). https://doi.org/10.1007/s12561-012-9069-2

Download citation

Received: 06 December 2011
Accepted: 04 June 2012
Published: 14 June 2012
Issue Date: May 2013
DOI: https://doi.org/10.1007/s12561-012-9069-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simultaneous Isoform Discovery and Quantification from RNA-Seq

Abstract

Access this article

Similar content being viewed by others

Bayesian nonparametric discovery of isoforms and individual specific quantification

IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data

Isoform Expression Analysis Based on RNA-seq Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Simultaneous Isoform Discovery and Quantification from RNA-Seq

Abstract

Access this article

Similar content being viewed by others

Bayesian nonparametric discovery of isoforms and individual specific quantification

IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data

Isoform Expression Analysis Based on RNA-seq Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation