Abstract
The determination of which parts of a DNA sequence are coding is an unsolved and relevant problem in the field of bioinformatics. This problem is called gene prediction or gene finding, and it consists of locating the most likely gene structure in a genomic sequence.
Taking into account some restrictions, gene structure prediction may be considered as a search problem. To address the problem, evolutionary computation approaches can be used, although their performance will depend on the discriminative power of the statistical measures employed to extract useful features from the sequence.
In this study, we test six different content statistics to determine which of them have higher relevance in an evolutionary search for coding and non-coding regions of human DNA. We conduct this comparative study on the human chromosomes 3, 19 and 21.
This work has been financed in part by the Excellence in Research Projects P07-TIC-2682.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brent, M.R., Guigó, R.: Recent advances in gene structure prediction. Current Opinion in Structural Biology 14, 264–272 (2004)
Claverie, J., Sauvaget, I., Bougueleret, L.: k-tuple frequency analysis from intron/exon discrimination to t-cell epitope mapping. Methods Enzymology 183, 237–252 (1990)
García-Pedrajas, N., Pérez-Rodríguez, J., García-Pedrajas, M.D., Ortiz-Boyer, D., Fyfe, C.: Class imbalance methods for translation initiation site recognition in dna sequences. Knowledge-Based Systems 25, 22–34 (2012)
Gross, S.S., Do, C.B., Sirota, M., Batzoglou, S.: CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biology 8, R269.1–R269.16 (2007)
Guigó, R.: DNA composition, codon usage and exon prediction. In: Bishop, M. (ed.) Genetic Databases, pp. 53–80. Academic Press (1999)
Hawkins, J.D.: A survey of intron and exon lengths. Nucleic Acids Research 16, 9893–9908 (1988)
Herzel, H., Große, I.: Measuring correlations in symbolic sequences. Physica A 216, 518–542 (1995)
Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 200 International Conference on Artificial Intelligence (IC-AI 2000): Special Track on Inductive Learning, Las Vegas, USA, vol. 1, pp. 111–117 (2000)
Konopka, A.K., Owens, J.: Complexity charts can be used to map functional domains in DNA. Genetic Analysis, Techniques and Applications 7(2), 35–38 (1990)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Christiani, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
Mathé, C., Sagot, M.F., Schiex, T., Rouzé, P.: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Research 30(19), 4103–4117 (2002)
Pérez-Rodríguez, J., García-Pedrajas, N.: An evolutionary algorithm for gene structure prediction
Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. The University of Illinois Press, Urbana (1964)
Syswerda, G.: A Study of Reproduction in Generational and Steady-State Genetic Algorithms. In: Rawlins, G. (ed.) Foundations of Genetic Algorithms, pp. 94–101. Morgan Kaufmann (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pérez-Rodríguez, J., Arroyo-Peña, A.G., García-Pedrajas, N. (2012). A Comparative Study of Content Statistics of Coding Regions in an Evolutionary Computation Framework for Gene Prediction. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds) Advanced Research in Applied Artificial Intelligence. IEA/AIE 2012. Lecture Notes in Computer Science(), vol 7345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31087-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-31087-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31086-7
Online ISBN: 978-3-642-31087-4
eBook Packages: Computer ScienceComputer Science (R0)