A Comparative Study of Content Statistics of Coding Regions in an Evolutionary Computation Framework for Gene Prediction

Pérez-Rodríguez, Javier; Arroyo-Peña, Alexis G.; García-Pedrajas, Nicolás

doi:10.1007/978-3-642-31087-4_22

Javier Pérez-Rodríguez²³,
Alexis G. Arroyo-Peña²³ &
Nicolás García-Pedrajas²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7345))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

2608 Accesses

Abstract

The determination of which parts of a DNA sequence are coding is an unsolved and relevant problem in the field of bioinformatics. This problem is called gene prediction or gene finding, and it consists of locating the most likely gene structure in a genomic sequence.

Taking into account some restrictions, gene structure prediction may be considered as a search problem. To address the problem, evolutionary computation approaches can be used, although their performance will depend on the discriminative power of the statistical measures employed to extract useful features from the sequence.

In this study, we test six different content statistics to determine which of them have higher relevance in an evolutionary search for coding and non-coding regions of human DNA. We conduct this comparative study on the human chromosomes 3, 19 and 21.

This work has been financed in part by the Excellence in Research Projects P07-TIC-2682.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brent, M.R., Guigó, R.: Recent advances in gene structure prediction. Current Opinion in Structural Biology 14, 264–272 (2004)
Article Google Scholar
Claverie, J., Sauvaget, I., Bougueleret, L.: k-tuple frequency analysis from intron/exon discrimination to t-cell epitope mapping. Methods Enzymology 183, 237–252 (1990)
Article Google Scholar
García-Pedrajas, N., Pérez-Rodríguez, J., García-Pedrajas, M.D., Ortiz-Boyer, D., Fyfe, C.: Class imbalance methods for translation initiation site recognition in dna sequences. Knowledge-Based Systems 25, 22–34 (2012)
Article Google Scholar
Gross, S.S., Do, C.B., Sirota, M., Batzoglou, S.: CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biology 8, R269.1–R269.16 (2007)
Google Scholar
Guigó, R.: DNA composition, codon usage and exon prediction. In: Bishop, M. (ed.) Genetic Databases, pp. 53–80. Academic Press (1999)
Google Scholar
Hawkins, J.D.: A survey of intron and exon lengths. Nucleic Acids Research 16, 9893–9908 (1988)
Article Google Scholar
Herzel, H., Große, I.: Measuring correlations in symbolic sequences. Physica A 216, 518–542 (1995)
Article MathSciNet Google Scholar
Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 200 International Conference on Artificial Intelligence (IC-AI 2000): Special Track on Inductive Learning, Las Vegas, USA, vol. 1, pp. 111–117 (2000)
Google Scholar
Konopka, A.K., Owens, J.: Complexity charts can be used to map functional domains in DNA. Genetic Analysis, Techniques and Applications 7(2), 35–38 (1990)
Article Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Christiani, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
MATH Google Scholar
Mathé, C., Sagot, M.F., Schiex, T., Rouzé, P.: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Research 30(19), 4103–4117 (2002)
Article Google Scholar
Pérez-Rodríguez, J., García-Pedrajas, N.: An evolutionary algorithm for gene structure prediction
Google Scholar
Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. The University of Illinois Press, Urbana (1964)
Google Scholar
Syswerda, G.: A Study of Reproduction in Generational and Steady-State Genetic Algorithms. In: Rawlins, G. (ed.) Foundations of Genetic Algorithms, pp. 94–101. Morgan Kaufmann (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing and Numerical Analysis, University of Córdoba, Spain
Javier Pérez-Rodríguez, Alexis G. Arroyo-Peña & Nicolás García-Pedrajas

Authors

Javier Pérez-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Alexis G. Arroyo-Peña
View author publications
You can also search for this author in PubMed Google Scholar
Nicolás García-Pedrajas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Software, Dalian University of Technology, Dalian, China
He Jiang
Department of Computer Science, University of Massachusetts Boston, 100 Morrissey Boulevard, 02125-3393, Boston,, MA, USA
Wei Ding
Department of Computer Science, Texas State University San Marcos, 601 University Drive, 78666-4616, San Marcos, TX, USA
Moonis Ali
Department of Computer Science, University of Vermont, Burlington, VT, USA
Xindong Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pérez-Rodríguez, J., Arroyo-Peña, A.G., García-Pedrajas, N. (2012). A Comparative Study of Content Statistics of Coding Regions in an Evolutionary Computation Framework for Gene Prediction. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds) Advanced Research in Applied Artificial Intelligence. IEA/AIE 2012. Lecture Notes in Computer Science(), vol 7345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31087-4_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-31087-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31086-7
Online ISBN: 978-3-642-31087-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics