Abstract
A topic driven crawler chooses the best URLs to pursue during web crawling. It is difficult to evaluate what URLs downloaded are the best. This paper presents some important metrics and an evaluation function for ranking URLs about pages relevance. We also discuss an approach to evaluate the function based on GA. The best combination of the metrics’ weights can be discovered by GA evolving process. The experiment shows that the performance is exciting, especially about a popular topic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
REFERENCES
J. Cho, H. Garcia-Molina and L. Page (1998), Efficient crawling through URL ordering. In: Proceedings of the 7th World Wide Web Conference.
G. Pant and F. Menczer (2003), Topical crawling for business intelligence. In: Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries (ECDL).
J.H. Holland (1975), Adaptation in Natural and Artificial Systems. The University of Michigan Press.
D.E. Goldberg (1989), Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, New York.
J. Johnson, K. Tsioutsiouliklis and C.L. Giles (2003), Evolving strategies for focused Web crawling. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington, DC.
B.W. Xu and W.F. Zhang (2001), Search Engine and Information Retrieval Technology. Tsinghua University Press, Beijing, China, pp. 147–150.
C.G. Zhou and Y.C. Liang (2001), Computational Intelligence. Jilin University press, Changchun, China.
F. Herrera, M. Lozano and J.L. Verdegay (1998), Tackling real coded genetic algorithms: operators and tools for behavioural analysis. Artificial Intelligence Review, 12, pp. 265–319.
G. Salton (1983), Introduction to modern information retrieval, 1st ed. McGraw-Hill.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer
About this paper
Cite this paper
Peng, T., Zuo, W., Liu, Y. (2006). GENETIC ALGORITHM FOR EVALUATION METRICS IN TOPICAL WEB CRAWLING. In: LIU, G., TAN, V., HAN, X. (eds) Computational Methods. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-3953-9_30
Download citation
DOI: https://doi.org/10.1007/978-1-4020-3953-9_30
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-3952-2
Online ISBN: 978-1-4020-3953-9
eBook Packages: EngineeringEngineering (R0)