Accurate Estimation of Gene Expression Levels from DGE Sequencing Data
- Cite this paper as:
- Nicolae M., Măndoiu I. (2011) Accurate Estimation of Gene Expression Levels from DGE Sequencing Data. In: Chen J., Wang J., Zelikovsky A. (eds) Bioinformatics Research and Applications. ISBRA 2011. Lecture Notes in Computer Science, vol 6674. Springer, Berlin, Heidelberg
Two main transcriptome sequencing protocols have been proposed in the literature: the most commonly used shotgun sequencing of full length mRNAs (RNA-Seq) and 3’-tag digital gene expression (DGE). In this paper we present a novel expectation-maximization algorithm, called DGE-EM, for inference of gene-specific expression levels from DGE tags. Unlike previous methods, our algorithm takes into account alternative splicing isoforms and tags that map at multiple locations in the genome, and corrects for incomplete digestion and sequencing errors. The open source Java/Scala implementation of the DGE-EM algorithm is freely available at http://dna.engr.uconn.edu/software/DGE-EM/.
Experimental results on real DGE data generated from reference RNA samples show that our algorithm outperforms commonly used estimation methods based on unique tag counting. Furthermore, the accuracy of DGE-EM estimates is comparable to that obtained by state-of-the-art estimation algorithms from RNA-Seq data for the same samples. Results of a comprehensive simulation study assessing the effect of various experimental parameters suggest that further improvements in estimation accuracy could be achieved by optimizing DGE protocol parameters such as the anchoring enzymes and digestion time.
Unable to display preview. Download preview PDF.