Computational Statistics

, Volume 22, Issue 1, pp 49–69

Using a VOM model for reconstructing potential coding regions in EST sequences

Authors

    • Department of Information Systems EngineeringBen-Gurion University
  • Irad Ben-Gal
    • Department of Industrial EngineeringTel-Aviv University
Original Paper

DOI: 10.1007/s00180-007-0021-8

Cite this article as:
Shmilovici, A. & Ben-Gal, I. Computational Statistics (2007) 22: 49. doi:10.1007/s00180-007-0021-8

Abstract

This paper presents a method for annotating coding and noncoding DNA regions by using variable order Markov (VOM) models. A main advantage in using VOM models is that their order may vary for different sequences, depending on the sequences’ statistics. As a result, VOM models are more flexible with respect to model parameterization and can be trained on relatively short sequences and on low-quality datasets, such as expressed sequence tags (ESTs). The paper presents a modified VOM model for detecting and correcting insertion and deletion sequencing errors that are commonly found in ESTs. In a series of experiments the proposed method is found to be robust to random errors in these sequences.

Keywords

Variable order Markov modelCoding and noncoding DNAContext treeGene annotationSequencing error detection and correction

Copyright information

© Springer-Verlag 2007