Skip to main content

Accurate Estimation of Gene Expression Levels from DGE Sequencing Data

  • Conference paper
Bioinformatics Research and Applications (ISBRA 2011)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6674))

Included in the following conference series:

Abstract

Two main transcriptome sequencing protocols have been proposed in the literature: the most commonly used shotgun sequencing of full length mRNAs (RNA-Seq) and 3’-tag digital gene expression (DGE). In this paper we present a novel expectation-maximization algorithm, called DGE-EM, for inference of gene-specific expression levels from DGE tags. Unlike previous methods, our algorithm takes into account alternative splicing isoforms and tags that map at multiple locations in the genome, and corrects for incomplete digestion and sequencing errors. The open source Java/Scala implementation of the DGE-EM algorithm is freely available at http://dna.engr.uconn.edu/software/DGE-EM/ .

Experimental results on real DGE data generated from reference RNA samples show that our algorithm outperforms commonly used estimation methods based on unique tag counting. Furthermore, the accuracy of DGE-EM estimates is comparable to that obtained by state-of-the-art estimation algorithms from RNA-Seq data for the same samples. Results of a comprehensive simulation study assessing the effect of various experimental parameters suggest that further improvements in estimation accuracy could be achieved by optimizing DGE protocol parameters such as the anchoring enzymes and digestion time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asmann, Y., Klee, E.W., Thompson, E.A., Perez, E., Middha, S., Oberg, A., Therneau, T., Smith, D., Poland, G., Wieben, E., Kocher, J.-P.: 3’ tag digital gene expression profiling of human brain and universal reference RNA using Illumina Genome Analyzer. BMC Genomics 10(1), 531 (2009)

    Article  Google Scholar 

  2. Bullard, J., Purdom, E., Hansen, K., Dudoit, S.: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11(1), 94 (2010)

    Article  Google Scholar 

  3. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10(3), R25 (2009)

    Article  Google Scholar 

  4. Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4), 493–500 (2010)

    Article  Google Scholar 

  5. MAQC Consortium: The Microarray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology 24(9), 1151–1161 (2006)

    Article  Google Scholar 

  6. Nicolae, M., Mangul, S., Măndoiu, I., Zelikovsky, A.: Estimation of Alternative Splicing isoform Frequencies from RNA-Seq Data. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 202–214. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. ’t Hoen, P.A., Ariyurek, Y., Thygesen, H.H., Vreugdenhil, E., Vossen, R.H., de Menezes, R.X., Boer, J.M., van Ommen, G.-J.J., den Dunnen, J.T.: Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Research 36(21), e141 (2008)

    Article  Google Scholar 

  8. Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology 28(5), 511–515 (2010)

    Article  Google Scholar 

  9. Wu, Z.J., Meyer, C.A., Choudhury, S., Shipitsin, M., Maruyama, R., Bessarabova, M., Nikolskaya, T., Sukumar, S., Schwartzman, A., Liu, J.S., Polyak, K., Liu, X.S.: Gene expression profiling of human breast tissue samples using SAGE-Seq. Genome Research 20(12), 1730–1739 (2010)

    Article  Google Scholar 

  10. Zaretzki, R., Gilchrist, M., Briggs, W., Armagan, A.: Bias correction and Bayesian analysis of aggregate counts in SAGE libraries. BMC Bioinformatics 11(1), 72 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nicolae, M., Măndoiu, I. (2011). Accurate Estimation of Gene Expression Levels from DGE Sequencing Data. In: Chen, J., Wang, J., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2011. Lecture Notes in Computer Science(), vol 6674. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21260-4_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21260-4_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21259-8

  • Online ISBN: 978-3-642-21260-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics