Frontiers of Mathematics in China

, Volume 6, Issue 6, pp 1131–1145 | Cite as

Generate gene expression profile from high-throughput sequencing data

  • Hui Liu
  • Zhichao Jiang
  • Xiangzhong FangEmail author
  • Hanjiang Fu
  • Xiaofei Zheng
  • Lei Cha
  • Wuju LiEmail author
Research Article


This work presents two methods, the Least-square and Bayesian method, to solve the multiple mapping problem in extracting gene expression profiles through the next-generation sequencing. We parallel the tag sequences to genome, and partition them to improving the methods’ efficiency. The essential feature of these methods is that they can solve the multiple mapping problem between genes and short-reads, while generating almost the same estimation in single-mapping situation as the traditional approaches. These two methods are compared by simulation and a real example, which was generated from radiation-induced lung cancer cells (A549), through mapping short-reads to human ncRNA database. The results show that the Bayesian method, as realized by Gibbs sampler, is more efficient and robust than the Least-square method.


Next-generation sequencing multiple mapping Gibbs sampler least-square Bayesian 


62F15 62J05 62P10 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Statist, 2001, 29(4): 1165–1188CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Cloonan N, Forrest A R R, Kolle G, Gardiner B B A, Faulkner G J, Brown M K, Taylor D F, Steptoe A L, Wani S, Bethel G, et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature Methods, 2008, 5(7): 613–619CrossRefGoogle Scholar
  3. 3.
    Faulkner G J, Forrest A R R, Chalk A M, Schroder K, Hayashizaki Y, Carninci P, Hume D A, Grimmond S M. A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. Genomics, 2008, 91(3): 281–288CrossRefGoogle Scholar
  4. 4.
    Metzker M L. Sequencing technologies—the next generation. Nat Rev Genet, 2009, 11(1): 31–46CrossRefGoogle Scholar
  5. 5.
    Morin R D, O’Connor M D, Griffith M, Kuchenbauer F, Delaney A, Prabhu A L, Zhao Y, McDonald H, Zeng T, Hirst M, et al. Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Research, 2008, 18(4): 610CrossRefGoogle Scholar
  6. 6.
    Mortazavi A, Williams B A, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods, 2008, 5(7): 621–628CrossRefGoogle Scholar
  7. 7.
    Ozsolak F, Milos P M. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet, 2011, 12(2): 87–98CrossRefGoogle Scholar
  8. 8.
    Tanner M A. Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. Berlin: Springer-Verlag, 1996zbMATHGoogle Scholar
  9. 9.
    Wang W C, Lin F M, Chang W C, Lin K Y, Huang H D, Lin N S. miRExpress: analyzing high-throughput sequencing data for profiling microRNA expression. BMC Bioinformatics, 2009, 10(1): 328CrossRefGoogle Scholar
  10. 10.
    Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet, 2009, 10(1): 57–63CrossRefGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.School of Mathematical Sciences, Statistical Center, LMAMPeking UniversityBeijingChina
  2. 2.Beijing Institute of Radiation MedicineBeijingChina
  3. 3.Center of Computational BiologyBeijing Institute of Basic Medical SciencesBeijingChina

Personalised recommendations