Variable Length-Based Genetic Representation to Automatically Evolve Wrappers

Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 71)


The Web has been the star service on the Internet, however the outsized information available and its decentralized nature has originated an intrinsic difficulty to locate, extract and compose information. An automatic approach is required to handle with this huge amount of data. In this paper we present a machine learning algorithm based on Genetic Algorithms which generates a set of complex wrappers, able to extract information from theWeb. The paper presents the experimental evaluation of these wrappers over a set of basic data sets.


Genetic Algorithm Regular Expression Regular Language Chromosome Length Automaton Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Barrero, D., R-Moreno, M., López, D., García, O.: Searchy: A metasearch engine for heterogeneus sources in distributed environments. In: Proceedings of the International Conference on Dublin core and Metadata Applications, Madrid, Spain, pp. 261–265 (2005)Google Scholar
  2. Barrero, D.F., Camacho, D., R-Moreno, M.D.: Automatic Web Data Extraction Based on Genetic Algorithms and Regular Expressions. In: Data Mining and Multiagent Integration, pp. 143–154. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. Brookshear, J.G.: Theory of computation: formal languages, automata, and complexity. Benjamin-Cummings Publishing Co., Inc., Redwood City (1989)zbMATHGoogle Scholar
  4. Burke, D.S., Jong, K.A.D., Grefenstette, J.J., Ramsey, C.L., Wu, A.S.: Putting more genetics into genetic algorithms. Evolutionary Computation 6, 387–410 (1998)CrossRefGoogle Scholar
  5. Camacho, D., R-Moreno, M.D., Barrero, D.F., Akerkar, R.: Semantic wrappers for semi-structured data extraction. Computing Letters (COLE) 4(1), 1–14 (2008)Google Scholar
  6. Chu, D., Rowe, J.E.: Crossover operators to control size growth in linear GP and variable length GAs. In: Wang, J. (ed.) 2008 IEEE World Congress on Computational Intelligence, Hong Kong. IEEE Computational Intelligence Society, IEEE Press, Los Alamitos (2008)Google Scholar
  7. Friedl, J.E.F.: Mastering Regular Expressions. O’Reilly & Associates, Inc., Sebastopol (2002)zbMATHGoogle Scholar
  8. Gold, E.M.: Language identification in the limit. Information and Control 10(5), 447–474 (1967)zbMATHCrossRefGoogle Scholar
  9. Goldberg, D.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading (1989)zbMATHGoogle Scholar
  10. Holland, J.H.: Adaptation in natural and artificial systems. MIT Press, Cambridge (1992)Google Scholar
  11. Hutt, B., Warwick, K.: Synapsing variable-length crossover: Meaningful crossover for variable-length genomes. IEEE Trans. Evolutionary Computation 11(1), 118–131 (2007)CrossRefGoogle Scholar
  12. Lang, K.J.: Evidence driven state merging with search (1998)Google Scholar
  13. Petry, F.E., Dunay, B.D., Buckles, B.P.: Regular language induction with genetic programming. In: International Conference on Evolutionary Computation, pp. 396–400 (1994)Google Scholar
  14. Ramsey, C.L., De Jong, K.A., Grefenstette, J.J., Wu, A.S., Burke, D.S.: Genome length as an evolutionary self-adaptation. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H.-P. (eds.) PPSN 1998. LNCS, vol. 1498, pp. 345–353. Springer, Heidelberg (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Departamento de AutomáticaUniversidad de AlcaláAlcalá de Henares MadridSpain
  2. 2.Departamento de InformáticaUniversidad Autónoma de MadridMadridSpain

Personalised recommendations