Evolving Regular Expressions for GeneChip Probe Performance Prediction

  • William B. Langdon
  • Andrew P. Harrison
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5199)


Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. We use correlations between measurements for the same gene across 6685 human tissue samples from NCBI’s GEO database to indicated the quality of individual HG-U133A probes. Low concordance indicates a poor probe. Regular expressions can be data mined by a Backus-Naur form (BNF) context-free grammar using strongly typed genetic programming written in gawk and using egrep. The automatically produced motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided.


Genetic Programming Regular Expression Median Correlation Grammatical Evolution Linear Genetic Programming 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Thomas B.: Evolutionary Algorithms in Theory and Practice. OUP (1996)Google Scholar
  2. 2.
    Barrett, T., et al.: NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Research 35, D760–D765 (2007)CrossRefGoogle Scholar
  3. 3.
    Beyer, H.-G.: The Theory of Evolution Strategies. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  4. 4.
    Brameier, M., Krings, A., MacCallum, R.M.: NucPred predicting nuclear localization of proteins. Bioinformatics 23(9), 1159–1160 (2007)CrossRefGoogle Scholar
  5. 5.
    Brameier, M., Wiufp, C.: Ab initio identification of human microRNAs based on structure motifs. BMC Bioinformatics 8, 478 (2007)CrossRefGoogle Scholar
  6. 6.
    Cetinkaya, A.: Regular expression generation through grammatical evolution. In: Yu, T. (ed.) GECCO-2007 workshop program, pp. 2643–2646. ACM Press, New York (2007)Google Scholar
  7. 7.
    Handstad, T., Hestnes, A.J.H., Saetrom, P.: Motif kernel generated by GP improves remote homology and fold detection. BMC Bioinformatics 8(23)Google Scholar
  8. 8.
    Koza, J.R.: Genetic Programming. MIT press, Cambridge (1992)zbMATHGoogle Scholar
  9. 9.
    Langdon, W.B.: Evolving GeneChip correlation predictors on parallel graphics hardware. In: WCCI, Hong Kong, June 1-6, 2008, pp. 4152–4157. IEEE, Los Alamitos (2008)Google Scholar
  10. 10.
    Langdon, W.B., Barrett, S.J.: GP in data mining for drug discovery. In: Ghosh, A., et al. (eds.) Evolutionary Computing in Data Mining, pp. 211–235 (2004)Google Scholar
  11. 11.
    Langdon, W.B., da Silva Camargo, R., Harrison, A.P.: Spatial defects in 5896 HG-U133A GeneChips. In: Dopazo, J., et al. (eds.) CAMDA 2007 (2007)Google Scholar
  12. 12.
    Langdon, W.B., Harrison, A.P.: A grammar based strongly typed genetic programming system for finding regular expression which predict affymetrix DNA probe performance. Technical report, CES-483, University of Essex, UK (2008)Google Scholar
  13. 13.
    Langdon, W.B., Upton, G.J.G., da Silva Camargo, R., Harrison, A.P.: A survey of spatial defects in Homo Sapiens Affymetrix GeneChips (submitted)Google Scholar
  14. 14.
    Langdon, W.B.: Genetic Programming and Data Structures. Kluwer, Dordrecht (1998)CrossRefzbMATHGoogle Scholar
  15. 15.
    Langdon, W.B., Banzhaf, W.: Repeated sequences in linear genetic programming genomes. Complex Systems 15(4), 285–306 (2005)zbMATHMathSciNetGoogle Scholar
  16. 16.
    Langdon, W.B., Buxton, B.F.: Evolving receiver operating characteristics for data fusion. In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tetamanzi, A.G.B., Langdon, W.B. (eds.) EuroGP 2001. LNCS, vol. 2038, pp. 87–96. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  17. 17.
    McKay, R.I., Hoang, T.H., Essam, D.L., Nguyen, X.H.: Developmental evaluation in GP. In: Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ekárt, A. (eds.) EuroGP 2006. LNCS, vol. 3905, pp. 280–289. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Montana, D.J.: Strongly typed GP. Evolutionary Computation 3(2), 199–230Google Scholar
  19. 19.
    Naef, F., Wijnen, H., Magnasco, M.: Reply to comment on solving the riddle of the bright mismatches. Physical Review E 73(6), 063902 (2006)CrossRefGoogle Scholar
  20. 20.
    Nikolaev, N.I., Slavov, V.: Concepts of inductive genetic programming. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 49–60. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  21. 21.
    O’Neill, M., Ryan, C.: Grammatical evolution. IEEE TEC 5(4), 349–358 (2001)Google Scholar
  22. 22.
    Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming (With contributions by J. R. Koza) (2008),
  23. 23.
    Radcliff, N.J.: Genetic set recombination. In: FOGA 2, pp. 203–219. Morgan Kaufmann, San FranciscoGoogle Scholar
  24. 24.
    Ross, B.J.: The evaluation of a stochastic regular motif language for protein sequences. In: Spector, L., et al. (eds.) GECCO 2001, pp. 120–128 (2001)Google Scholar
  25. 25.
    Upton, G.J., Langdon, W.B., Harrison, A.P.: Incorrect measurement of gene expression by microarrays (submitted)Google Scholar
  26. 26.
    Whigham, P.A.: Search bias, language bias, and genetic programming. In: Koza, J.R., et al. (eds.) Genetic Programming 1996, pp. 230–237. MIT Press, Cambridge (1996)Google Scholar
  27. 27.
    Whigham, P.A., Crapper, P.F.: Time series modelling using GP: In rainfall-runoff models. In: Spector, L., et al. (eds.) AiGP3, pp. 89–104. MIT Press, Cambridge (1999)Google Scholar
  28. 28.
    Wong, M.L., Leung, K.S.: Evolving recursive functions for the even-parity problem using genetic programming. In: AiGP 2, pp. 221–240. MIT Press, Cambridge (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • William B. Langdon
    • 1
  • Andrew P. Harrison
    • 1
  1. 1.Departments of Mathematical, Biological Sciences and, Computing and Electronic SystemsUniversity of EssexUK

Personalised recommendations