Using a VOM model for reconstructing potential coding regions in EST sequences
- 127 Downloads
This paper presents a method for annotating coding and noncoding DNA regions by using variable order Markov (VOM) models. A main advantage in using VOM models is that their order may vary for different sequences, depending on the sequences’ statistics. As a result, VOM models are more flexible with respect to model parameterization and can be trained on relatively short sequences and on low-quality datasets, such as expressed sequence tags (ESTs). The paper presents a modified VOM model for detecting and correcting insertion and deletion sequencing errors that are commonly found in ESTs. In a series of experiments the proposed method is found to be robust to random errors in these sequences.
KeywordsVariable order Markov model Coding and noncoding DNA Context tree Gene annotation Sequencing error detection and correction
Unable to display preview. Download preview PDF.
- Bilu Y, Linial M, Slonim N. Tishby N (2002) Locating transcription factors binding sites a Variable Memory Markov Model, Leibintz Center TR 2002–57. Available online at http://www.cs.huji.ac.il/~johnblue/papers/Google Scholar
- Cawley SL, Pachter L (2003) HMM sampling and applications to gene finding and alternative splicing. Bioinformatics 19(Suppl 2):ii36–ii41Google Scholar
- GENIE data-sets, from Genbank version 105 (1998) Available: http://www.fruitfly.org/seq_tools/ datasets/Human/CDS_v105/ ; http://www.fruitfly.org/seq_tools/datasets/Human/intron_v105/Google Scholar
- Hanisch D et al. (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 1:1–10Google Scholar
- Iseli C, Jongeneel CV, Bucher P (1999) ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. In: Proceedings of intelligent systems for molecular biology. AAAI Press, Menlo ParkGoogle Scholar
- Larsen TS, Krogh A (2003) EasyGene—a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinf 4(21) Available Online www.biomedcentral.com/1471-2105/4/21Google Scholar
- Lottaz C, Iseli C, Jongeneel CV, Bucher P (2003) Modeling sequencing errors by combining Hidden markov models. Bioinformatics 19(Suppl 2):ii103–ii112Google Scholar
- Nicorici N, Berger JA, Astola J, Mitra SK (2003) Finding borders between coding and noncoding DNA regions using recursive segmentation and statistics of stop codons. Available Online: http://www.engineering.ucsb.edu/~jaberger/pubs/FINSIG03_Nicorici.pdfGoogle Scholar
- Orlov YL, Filippov VP, Potapov VN, Kolchanov NA (2002) Construction of stochastic context trees for genetic texts. In Silico Biol 2(3):233–247Google Scholar
- Zaidenraise KOS, Shmilovici A, Ben-Gal I (2004) A VOM based gene-finder that specializes in short genes. In: Proceedings of the 23th convention of electrical and electronics engineers in Israel, September 6–7, Herzelia, Israel, pp. 189–192Google Scholar