Accurate Estimation of Gene Expression Levels from DGE Sequencing Data

  • Marius Nicolae
  • Ion Măndoiu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6674)


Two main transcriptome sequencing protocols have been proposed in the literature: the most commonly used shotgun sequencing of full length mRNAs (RNA-Seq) and 3’-tag digital gene expression (DGE). In this paper we present a novel expectation-maximization algorithm, called DGE-EM, for inference of gene-specific expression levels from DGE tags. Unlike previous methods, our algorithm takes into account alternative splicing isoforms and tags that map at multiple locations in the genome, and corrects for incomplete digestion and sequencing errors. The open source Java/Scala implementation of the DGE-EM algorithm is freely available at .

Experimental results on real DGE data generated from reference RNA samples show that our algorithm outperforms commonly used estimation methods based on unique tag counting. Furthermore, the accuracy of DGE-EM estimates is comparable to that obtained by state-of-the-art estimation algorithms from RNA-Seq data for the same samples. Results of a comprehensive simulation study assessing the effect of various experimental parameters suggest that further improvements in estimation accuracy could be achieved by optimizing DGE protocol parameters such as the anchoring enzymes and digestion time.


Digital Gene Expression Illumina Genome Analyzer Incomplete Digestion Gene Expression Estimation Digital Gene Expression Library 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asmann, Y., Klee, E.W., Thompson, E.A., Perez, E., Middha, S., Oberg, A., Therneau, T., Smith, D., Poland, G., Wieben, E., Kocher, J.-P.: 3’ tag digital gene expression profiling of human brain and universal reference RNA using Illumina Genome Analyzer. BMC Genomics 10(1), 531 (2009)CrossRefGoogle Scholar
  2. 2.
    Bullard, J., Purdom, E., Hansen, K., Dudoit, S.: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11(1), 94 (2010)CrossRefGoogle Scholar
  3. 3.
    Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10(3), R25 (2009)CrossRefGoogle Scholar
  4. 4.
    Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4), 493–500 (2010)CrossRefGoogle Scholar
  5. 5.
    MAQC Consortium: The Microarray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology 24(9), 1151–1161 (2006)CrossRefGoogle Scholar
  6. 6.
    Nicolae, M., Mangul, S., Măndoiu, I., Zelikovsky, A.: Estimation of Alternative Splicing isoform Frequencies from RNA-Seq Data. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 202–214. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    ’t Hoen, P.A., Ariyurek, Y., Thygesen, H.H., Vreugdenhil, E., Vossen, R.H., de Menezes, R.X., Boer, J.M., van Ommen, G.-J.J., den Dunnen, J.T.: Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Research 36(21), e141 (2008)CrossRefGoogle Scholar
  8. 8.
    Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology 28(5), 511–515 (2010)CrossRefGoogle Scholar
  9. 9.
    Wu, Z.J., Meyer, C.A., Choudhury, S., Shipitsin, M., Maruyama, R., Bessarabova, M., Nikolskaya, T., Sukumar, S., Schwartzman, A., Liu, J.S., Polyak, K., Liu, X.S.: Gene expression profiling of human breast tissue samples using SAGE-Seq. Genome Research 20(12), 1730–1739 (2010)CrossRefGoogle Scholar
  10. 10.
    Zaretzki, R., Gilchrist, M., Briggs, W., Armagan, A.: Bias correction and Bayesian analysis of aggregate counts in SAGE libraries. BMC Bioinformatics 11(1), 72 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Marius Nicolae
    • 1
  • Ion Măndoiu
    • 1
  1. 1.Computer Science & Engineering DepartmentUniversity of ConnecticutStorrsUSA

Personalised recommendations