Variable Length-Based Genetic Representation to Automatically Evolve Wrappers
The Web has been the star service on the Internet, however the outsized information available and its decentralized nature has originated an intrinsic difficulty to locate, extract and compose information. An automatic approach is required to handle with this huge amount of data. In this paper we present a machine learning algorithm based on Genetic Algorithms which generates a set of complex wrappers, able to extract information from theWeb. The paper presents the experimental evaluation of these wrappers over a set of basic data sets.
KeywordsGenetic Algorithm Regular Expression Regular Language Chromosome Length Automaton Learning
Unable to display preview. Download preview PDF.
- Barrero, D., R-Moreno, M., López, D., García, O.: Searchy: A metasearch engine for heterogeneus sources in distributed environments. In: Proceedings of the International Conference on Dublin core and Metadata Applications, Madrid, Spain, pp. 261–265 (2005)Google Scholar
- Camacho, D., R-Moreno, M.D., Barrero, D.F., Akerkar, R.: Semantic wrappers for semi-structured data extraction. Computing Letters (COLE) 4(1), 1–14 (2008)Google Scholar
- Chu, D., Rowe, J.E.: Crossover operators to control size growth in linear GP and variable length GAs. In: Wang, J. (ed.) 2008 IEEE World Congress on Computational Intelligence, Hong Kong. IEEE Computational Intelligence Society, IEEE Press, Los Alamitos (2008)Google Scholar
- Holland, J.H.: Adaptation in natural and artificial systems. MIT Press, Cambridge (1992)Google Scholar
- Lang, K.J.: Evidence driven state merging with search (1998)Google Scholar
- Petry, F.E., Dunay, B.D., Buckles, B.P.: Regular language induction with genetic programming. In: International Conference on Evolutionary Computation, pp. 396–400 (1994)Google Scholar