This work presents two methods, the Least-square and Bayesian method, to solve the multiple mapping problem in extracting gene expression profiles through the next-generation sequencing. We parallel the tag sequences to genome, and partition them to improving the methods’ efficiency. The essential feature of these methods is that they can solve the multiple mapping problem between genes and short-reads, while generating almost the same estimation in single-mapping situation as the traditional approaches. These two methods are compared by simulation and a real example, which was generated from radiation-induced lung cancer cells (A549), through mapping short-reads to human ncRNA database. The results show that the Bayesian method, as realized by Gibbs sampler, is more efficient and robust than the Least-square method.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Statist, 2001, 29(4): 1165–1188
Cloonan N, Forrest A R R, Kolle G, Gardiner B B A, Faulkner G J, Brown M K, Taylor D F, Steptoe A L, Wani S, Bethel G, et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature Methods, 2008, 5(7): 613–619
Faulkner G J, Forrest A R R, Chalk A M, Schroder K, Hayashizaki Y, Carninci P, Hume D A, Grimmond S M. A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. Genomics, 2008, 91(3): 281–288
Metzker M L. Sequencing technologies—the next generation. Nat Rev Genet, 2009, 11(1): 31–46
Morin R D, O’Connor M D, Griffith M, Kuchenbauer F, Delaney A, Prabhu A L, Zhao Y, McDonald H, Zeng T, Hirst M, et al. Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Research, 2008, 18(4): 610
Mortazavi A, Williams B A, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods, 2008, 5(7): 621–628
Ozsolak F, Milos P M. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet, 2011, 12(2): 87–98
Tanner M A. Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. Berlin: Springer-Verlag, 1996
Wang W C, Lin F M, Chang W C, Lin K Y, Huang H D, Lin N S. miRExpress: analyzing high-throughput sequencing data for profiling microRNA expression. BMC Bioinformatics, 2009, 10(1): 328
Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet, 2009, 10(1): 57–63
About this article
Cite this article
Liu, H., Jiang, Z., Fang, X. et al. Generate gene expression profile from high-throughput sequencing data. Front. Math. China 6, 1131–1145 (2011). https://doi.org/10.1007/s11464-011-0123-z
- Next-generation sequencing
- multiple mapping
- Gibbs sampler