Chapter

Bioinformatics Research and Applications

Volume 6674 of the series Lecture Notes in Computer Science pp 392-403

Accurate Estimation of Gene Expression Levels from DGE Sequencing Data

  • Marius NicolaeAffiliated withComputer Science & Engineering Department, University of Connecticut
  • , Ion MăndoiuAffiliated withComputer Science & Engineering Department, University of Connecticut

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Two main transcriptome sequencing protocols have been proposed in the literature: the most commonly used shotgun sequencing of full length mRNAs (RNA-Seq) and 3’-tag digital gene expression (DGE). In this paper we present a novel expectation-maximization algorithm, called DGE-EM, for inference of gene-specific expression levels from DGE tags. Unlike previous methods, our algorithm takes into account alternative splicing isoforms and tags that map at multiple locations in the genome, and corrects for incomplete digestion and sequencing errors. The open source Java/Scala implementation of the DGE-EM algorithm is freely available at http://dna.engr.uconn.edu/software/DGE-EM/ .

Experimental results on real DGE data generated from reference RNA samples show that our algorithm outperforms commonly used estimation methods based on unique tag counting. Furthermore, the accuracy of DGE-EM estimates is comparable to that obtained by state-of-the-art estimation algorithms from RNA-Seq data for the same samples. Results of a comprehensive simulation study assessing the effect of various experimental parameters suggest that further improvements in estimation accuracy could be achieved by optimizing DGE protocol parameters such as the anchoring enzymes and digestion time.