A Case Study on Grammatical-Based Representation for Regular Expression Evolution

Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 71)


Regular expressions, or simply regex, have been widely used as a powerful pattern matching and text extractor tool through decades. Although they provide a powerful and flexible notation to define and retrieve patterns from text, the syntax and the grammatical rules of these regex notations are not easy to use, and even to understand. Any regex can be represented as a Deterministic or Non-Deterministic Finite Automata; so it is possible to design a representation to automatically build a regex, and a optimization algorithm able to find the best regex in terms of complexity. This paper introduces both, a graph-based representation for regex, and a particular heuristic-based evolutionary computing algorithm based on grammatical features from this language in a particular data extraction problem.


Regular Expressions Grammatical-based representation Evolutionary algorithms 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Barrero, D.F., Camacho, D., R-Moreno, M.D.: Automatic Web Data Extraction Based on Genetic Algorithms and Regular Expressions. In: Data Mining and Multiagent Integration. Springer, Heidelberg (2009)Google Scholar
  2. Chang, C.-H., Paige, R.: From regular expressions to dfa’s using compressed nfa’s, pp. 90–110 (1992)Google Scholar
  3. Cox, R. (ed.): Regular expression matching can be simple and fast (2007)Google Scholar
  4. Dunay, B.D., Petry, F., Buckles, B.P.: Regular language induction with genetic programming. In: Proceedings of the 1994 IEEE World Congress on Computational Intelligence, Orlando, Florida, USA, pp. 396–400. IEEE Press, Los Alamitos (1994)CrossRefGoogle Scholar
  5. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Natural Computing Series. Springer, Heidelberg (2008)Google Scholar
  6. Friedl, J.E.F.: Mastering Regular Expressions. O’Reilly & Associates, Inc., Sebastopol (2002)zbMATHGoogle Scholar
  7. Gold, E.M.: Complexity of automaton identification from given data. Inform. Control 37, 302–320 (1978)zbMATHCrossRefMathSciNetGoogle Scholar
  8. Kleene, S.C.: Representation of events in nerve nets and finite automata. In: Shannon, C.E., McCarthy, J. (eds.) Automata studies, vol. 34, pp. 3–40 (1956)Google Scholar
  9. Thompson, K.: Regular expression search algorithm. Comm. Assoc. Comp. Mach. 11(6), 419–422 (1968)zbMATHGoogle Scholar
  10. Zipf, G.: The psycho-biology of language. Houghton Mifflin, Boston (1935)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Departamento de InformáticaUniversidad Autónoma de MadridMadridSpain
  2. 2.Departamento de AutomáticaUniversidad de AlcaláAlcalá de Henares, MadridSpain

Personalised recommendations