Variable Length-Based Genetic Representation to Automatically Evolve Wrappers
Conference paper
Abstract
The Web has been the star service on the Internet, however the outsized information available and its decentralized nature has originated an intrinsic difficulty to locate, extract and compose information. An automatic approach is required to handle with this huge amount of data. In this paper we present a machine learning algorithm based on Genetic Algorithms which generates a set of complex wrappers, able to extract information from theWeb. The paper presents the experimental evaluation of these wrappers over a set of basic data sets.
Keywords
Genetic Algorithm Regular Expression Regular Language Chromosome Length Automaton Learning
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
- Barrero, D., R-Moreno, M., López, D., García, O.: Searchy: A metasearch engine for heterogeneus sources in distributed environments. In: Proceedings of the International Conference on Dublin core and Metadata Applications, Madrid, Spain, pp. 261–265 (2005)Google Scholar
- Barrero, D.F., Camacho, D., R-Moreno, M.D.: Automatic Web Data Extraction Based on Genetic Algorithms and Regular Expressions. In: Data Mining and Multiagent Integration, pp. 143–154. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- Brookshear, J.G.: Theory of computation: formal languages, automata, and complexity. Benjamin-Cummings Publishing Co., Inc., Redwood City (1989)MATHGoogle Scholar
- Burke, D.S., Jong, K.A.D., Grefenstette, J.J., Ramsey, C.L., Wu, A.S.: Putting more genetics into genetic algorithms. Evolutionary Computation 6, 387–410 (1998)CrossRefGoogle Scholar
- Camacho, D., R-Moreno, M.D., Barrero, D.F., Akerkar, R.: Semantic wrappers for semi-structured data extraction. Computing Letters (COLE) 4(1), 1–14 (2008)Google Scholar
- Chu, D., Rowe, J.E.: Crossover operators to control size growth in linear GP and variable length GAs. In: Wang, J. (ed.) 2008 IEEE World Congress on Computational Intelligence, Hong Kong. IEEE Computational Intelligence Society, IEEE Press, Los Alamitos (2008)Google Scholar
- Friedl, J.E.F.: Mastering Regular Expressions. O’Reilly & Associates, Inc., Sebastopol (2002)MATHGoogle Scholar
- Gold, E.M.: Language identification in the limit. Information and Control 10(5), 447–474 (1967)MATHCrossRefGoogle Scholar
- Goldberg, D.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading (1989)MATHGoogle Scholar
- Holland, J.H.: Adaptation in natural and artificial systems. MIT Press, Cambridge (1992)Google Scholar
- Hutt, B., Warwick, K.: Synapsing variable-length crossover: Meaningful crossover for variable-length genomes. IEEE Trans. Evolutionary Computation 11(1), 118–131 (2007)CrossRefGoogle Scholar
- Lang, K.J.: Evidence driven state merging with search (1998)Google Scholar
- Petry, F.E., Dunay, B.D., Buckles, B.P.: Regular language induction with genetic programming. In: International Conference on Evolutionary Computation, pp. 396–400 (1994)Google Scholar
- Ramsey, C.L., De Jong, K.A., Grefenstette, J.J., Wu, A.S., Burke, D.S.: Genome length as an evolutionary self-adaptation. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H.-P. (eds.) PPSN 1998. LNCS, vol. 1498, pp. 345–353. Springer, Heidelberg (1998)CrossRefGoogle Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2010