Abstract
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers. Topic-specific web crawler is developed to collect relevant web pages of interested topics form the Internet. Based on the analyses of HITS algorithm, a new P-HITS algorithm is proposed for topic-specific web crawler in this paper. Probability is introduced to select the URLs to get more global optimality, and the metadata of hyperlinks is appended in this algorithm to predict the relevance of web pages better. Experimental results indicate that our algorithm has better performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Murry, B.H., Moore, A.: Sizing the Internet. A White Paper. Cyveillance, Inc. (2000)
Bharet, K., Broder, A.: A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines. In: Computer Networks and ISDN Systems, Special Issue on the 7th Int. World Wide Web Conf., Brisbane, vol. 30, pp. 1–7 (1998)
Lawrence, S., Giles, C.L.: Searching the World Wide Web. Science 280, 98–100 (1998)
Debra, P., Post, R.: Information Retrieval in the World Wide Web: Making Client – Based Searching Feasible. In: Proc. 1st International World Wide Web Conference (1994)
Hersovici, M., Jacov, M., et al.: The Shark-search Algorithm-An Application: Tailored Web Site Mapping. In: Proc. 7th International World Wide Web Conference, Brisbane, Australia, pp. 317–326 (1998)
Menczer, F., Monge, A.E.: Scalable Web Search by Adaptive Online Agents: An Infospider Case Study. In: Intelligent Information Agents: Agent-based Information Discovery and Management on the Internet, Berlin, pp. 323–347 (1999)
Mukherjea, S.: WTMS: A System for Collecting and Analyzing Topic-specific Web Information. Computer Networks 33, 457–471 (2000)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Kleinberg, J.: Authoritative Sources In a Hyperlinked Environment. In: Proc. 9th Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 668–677. ACM Press, New York (1998)
Rungsawang, A., Angkawattanawit, N.: Learnable Topic-specific Web Crawler. Network and Computer Applications 28, 97–114 (2005)
Menczer, F.: Complementing Search Engines With Online Web Mining Agents. Decision Support Systems 35, 195–212 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zong, X., Shen, Y., Liao, X. (2005). Improvement of HITS for Topic-Specific Web Crawler. In: Huang, DS., Zhang, XP., Huang, GB. (eds) Advances in Intelligent Computing. ICIC 2005. Lecture Notes in Computer Science, vol 3644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11538059_55
Download citation
DOI: https://doi.org/10.1007/11538059_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28226-6
Online ISBN: 978-3-540-31902-3
eBook Packages: Computer ScienceComputer Science (R0)