Bayesian Detection of Coding Regions in DNA/RNA Sequences Through Event Factoring
We describe a Bayesian inference method for the identification of protein coding regions (active or residual) in DNA or RNA sequences. Its main feature is the computation of the conditional and a priori probabilities required in Bayes’s formula by factoring each event (possible annotation) for a nucleotide string into the concatenation of shorter events, believed to be independent.The factoring allows us to obtain fast but reliable estimates for these parameters from readily available databases; whereas the probability estimation for unfactored events would require databases and tables of astronomical size. Promising results were obtained in tests with natural and artificial genomes.
Keywordscoding regions ab-initio DNA tagging Bayesian inference
- 1.Meidanis, J., Setubal, J.C.: Introduction to Computational Molecular Biology. PWS Publishing Company (1997)Google Scholar
- 7.Kotlar, D., Lavner, Y.: Gene prediction by spectral rotation measure: A new method for identifying protein-coding regions. Genome Research 13(8), 1930–1937 (2003)Google Scholar
- 10.Capua, R.O., da Gama Leitão, H.C., Stolfi, J.: Uma abordagem estatística para identificação de éxons. In: WOB 2004, Brasília, DF (2004)Google Scholar
- 11.Reese, M.: Database with human genome sequences (2005), http://www.fruitfly.org/seq_tools/datasets/Human/multi_exon_GB.dat