Generate gene expression profile from high-throughput sequencing data

Abstract

This work presents two methods, the Least-square and Bayesian method, to solve the multiple mapping problem in extracting gene expression profiles through the next-generation sequencing. We parallel the tag sequences to genome, and partition them to improving the methods’ efficiency. The essential feature of these methods is that they can solve the multiple mapping problem between genes and short-reads, while generating almost the same estimation in single-mapping situation as the traditional approaches. These two methods are compared by simulation and a real example, which was generated from radiation-induced lung cancer cells (A549), through mapping short-reads to human ncRNA database. The results show that the Bayesian method, as realized by Gibbs sampler, is more efficient and robust than the Least-square method.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Statist, 2001, 29(4): 1165–1188

    Article  MATH  MathSciNet  Google Scholar 

  2. 2.

    Cloonan N, Forrest A R R, Kolle G, Gardiner B B A, Faulkner G J, Brown M K, Taylor D F, Steptoe A L, Wani S, Bethel G, et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature Methods, 2008, 5(7): 613–619

    Article  Google Scholar 

  3. 3.

    Faulkner G J, Forrest A R R, Chalk A M, Schroder K, Hayashizaki Y, Carninci P, Hume D A, Grimmond S M. A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. Genomics, 2008, 91(3): 281–288

    Article  Google Scholar 

  4. 4.

    Metzker M L. Sequencing technologies—the next generation. Nat Rev Genet, 2009, 11(1): 31–46

    Article  Google Scholar 

  5. 5.

    Morin R D, O’Connor M D, Griffith M, Kuchenbauer F, Delaney A, Prabhu A L, Zhao Y, McDonald H, Zeng T, Hirst M, et al. Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Research, 2008, 18(4): 610

    Article  Google Scholar 

  6. 6.

    Mortazavi A, Williams B A, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods, 2008, 5(7): 621–628

    Article  Google Scholar 

  7. 7.

    Ozsolak F, Milos P M. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet, 2011, 12(2): 87–98

    Article  Google Scholar 

  8. 8.

    Tanner M A. Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. Berlin: Springer-Verlag, 1996

    Google Scholar 

  9. 9.

    Wang W C, Lin F M, Chang W C, Lin K Y, Huang H D, Lin N S. miRExpress: analyzing high-throughput sequencing data for profiling microRNA expression. BMC Bioinformatics, 2009, 10(1): 328

    Article  Google Scholar 

  10. 10.

    Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet, 2009, 10(1): 57–63

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Xiangzhong Fang or Wuju Li.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Liu, H., Jiang, Z., Fang, X. et al. Generate gene expression profile from high-throughput sequencing data. Front. Math. China 6, 1131–1145 (2011). https://doi.org/10.1007/s11464-011-0123-z

Download citation

Keywords

  • Next-generation sequencing
  • multiple mapping
  • Gibbs sampler
  • least-square
  • Bayesian

MSC

  • 62F15
  • 62J05
  • 62P10