Abstract
RNA-Seq has drastically changed our ways of studying transcriptomes in providing more precise estimates of gene expression, including isoform-specific expression. Most of the available methods for RNA-Seq data focus on one sample at a time. We present in this paper a Poisson-Gamma hierarchical model for multi-sample RNA-Seq data analysis in order to simultaneously estimate isoform-specific expression and to identify differentially expressed isoforms. Our model has the advantage of borrowing information across all samples in estimating expression levels, which can improve the estimates drastically, particularly for low abundance isoforms. Furthermore, our hierarchical model has the ability to account for overdispersion in the data and also can incorporate sample-specific covariates in the underlying model, which facilitates the isoform-specific differential expression analysis. Simulation studies demonstrated that this Bayesian multi-sample approach can lead to more precise estimates of isoform-specific expression and higher power to detect differential expression by borrowing information across all samples than single-sample analysis, especially for isoforms of low abundance. We further illustrated our methods using the RNA-Seq data of 10 Yoruban and 10 Caucasian individuals.
Similar content being viewed by others
References
Christiansen CL, Morris CN (1997) Hierarchical Poisson regression modeling. J Am Stat Assoc 92:618–632
Gilks WR (1992) Derivative-free adaptive rejection sampling for Gibbs sampling. In: Bayesian statistics, vol 4. Oxford University Press, London, pp 641–649
Jiang H, Wong WH (2009) Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25:1026–1032
Kass R, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795
Katz Y, Wang ET, Airoldi EM, Burge CB (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7:1009–1015
Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN (2009) RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26:493–500
Li J, Jiang H, Wong WH (2010) Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol 11:R50
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) RNA-Seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18:1509–1517
Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Lach RP, Ingle C, Nisbett J, Guigo R, Dermitzakis ET (2010) Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464:773–777
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628
Muller P, Parmigiani G, Rice K (2006) FDR and Bayesian multiple comparisons rules. In: Proc Valencia/ISBA 8th World meeting on Bayesian statistics, Benidorm, Alicante, Spain
Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320:1344–1349
Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y, Pritchard J (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464:768–772
Spiegelhalter DJ, Thomas A, Best NG (1999) WinBUGS version 1.2 user manual. MRC Biostatistics Unit
Storey JD, Madeoy J, Strout JL, Wurfel M, Ronald J, Akey JM (2007) Gene-expression variation within and among human populations. Am J Hum Genet 80:502–509
Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456:470–476
Zhang W, Duan S, Kistner EO, Bleibel WK, Huang RS, Clark TA, Chen TX, Schweitzer AC, Blume JE, Cox NJ, Dolan ME (2008) Evaluation of genetic variation contributing to differences in gene expression between populations. Am J Hum Genet 82:631–640
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Vardhanabhuti, S., Li, M. & Li, H. A Hierarchical Bayesian Model for Estimating and Inferring Differential Isoform Expression for Multi-sample RNA-Seq Data. Stat Biosci 5, 119–137 (2013). https://doi.org/10.1007/s12561-011-9052-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-011-9052-3