Evolving Regular Expression-Based Sequence Classifiers for Protein Nuclear Localisation
A number of bioinformatics tools use regular expression (RE) matching to locate protein or DNA sequence motifs that have been discovered by researchers in the laboratory. For example, patterns representing nuclear localisation signals (NLSs) are used to predict nuclear localisation. NLSs are not yet well understood, and so the set of currently known NLSs may be incomplete. Here we use genetic programming (GP) to generate RE-based classifiers for nuclear localisation. While the approach is a supervised one (with respect to protein location), it is unsupervised with respect to already-known NLSs. It therefore has the potential to discover new NLS motifs. We apply both tree-based and linear GP to the problem. The inclusion of predicted secondary structure in the input does not improve performance. Benchmarking shows that our majority classifiers are competitive with existing tools. The evolved REs are usually “NLS-like” and work is underway to analyse these for novelty.
KeywordsRegular Expression Nucleic Acid Research Linear Genetic Programming Protein Nuclear Localisation Nuclear Localisation Signal Motif
Unable to display preview. Download preview PDF.
- 4.Brameier, M., Banzhaf, W.: A comparison of linear genetic programming and neural networks in medical data mining. IEEE-EC 5, 17–26 (2001)Google Scholar
- 5.Christophe, D., Christophe-Hobertus, C., Pichon, B.: Nuclear targeting of proteins: how many different signals. CS 12(5), 337–341 (2000)Google Scholar
- 9.Hazel, P.: PCRE - Perl Compatible Regular Expressions library, http://www.pcre.org
- 10.Howard, D., Benson, K.: Promoter prediction with a GP-automaton. In: Raidl, G.R., Cagnoni, S., Cardalda, J.J.R., Corne, D.W., Gottlieb, J., Guillot, A., Hart, E., Johnson, C.G., Marchiori, E., Meyer, J.-A., Middendorf, M. (eds.) EvoIASP 2003, EvoWorkshops 2003, EvoSTIM 2003, EvoROB/EvoRobot 2003, EvoCOP 2003, EvoBIO 2003, and EvoMUSART 2003. LNCS, vol. 2611, pp. 44–53. Springer, Heidelberg (2003)CrossRefGoogle Scholar
- 18.Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochem. Biophys. Acta 405, 442–451 (1975)Google Scholar