PBSeq: Modeling base-level bias to estimate gene and isoform expression for RNA-seq data
Due to its unprecedented high-throughput and high-resolution, RNA-seq rapidly becomes a revolutionary and powerful technology for transcriptome analysis. However, RNA-seq library preparation results in non-uniformity of read distribution in the represented genes. When estimating gene and isoform expression level, the non-uniformity needs to be accounted and corrected to improve the estimation accuracy. In this paper, we propose PBSeq, a Poisson model utilizing a base-level bias correction strategy to estimate gene and isoform expression. The base-level bias correction strategy simultaneously considers the positional and sequence-specific biases at starting position of reads mapped to the genes of interest. The PBSeq not only provides the expression values but also estimates the uncertainty associated with expression estimation, which represents the variation across replicates and is useful for downstream analysis. We utilize a simulated dataset and three real RNA-seq datasets to validate the PBSeq model. Results show that PBseq can accurately estimate gene and isoform expression levels and is computationally efficient compared with other state-of-art methods.
KeywordsRNA-seq Base-level bias Gene and isoform expression level Expression of uncertainty
This work is supported by the NSFC Grants (61170152), Jiangsu Provincial Qinglan Project, the Fundamental Research Funds for the Central Universities (CXZZ11_0217) and the Natural Science Foundation of Hebei Province (F2013201064).
- 1.Bishop CM et al (2006) Pattern recognition and machine learning, vol. 1. Springer, New YorkGoogle Scholar
- 19.Nariai N, Hirose O, Kojima K, Nagasaki M (2013) Tigar: transcript isoform abundance estimation method with gapped alignment of rna-seq data by variational bayesian inference. Bioinformatics 29:2292–2299Google Scholar
- 22.Pachter L (2011) Models for transcript quantification from RNA-Seq. arXiv:1104.3889
- 24.Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, De Longueville F, Kawasaki ES, Lee KY et al (2006) The microarray quality control (MAQC) project shows inter-and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24(9):1151–1161CrossRefGoogle Scholar
- 33.Zhang L, Chen S, Liu X (2014) Detecting differential expression from rna-seq data with expression measurement uncertainty. Front Comput Sci:1–12Google Scholar