Abstract
Crawler technology is a technology that uses web crawlers to capture information on a web page to get valuable information. The web crawler is a program that automatically extracts web pages. It downloads Web pages from WEB for search engines. It is an important part of the search engine. This paper introduces the architecture of distributed Hadoop platform, and by the principle of the web crawler puts forward design scheme of distributed crawler based on Hadoop and P2P technology, and ultimately determine the system layout of distributed crawler, module partition and distributed crawler flow control and power can be realized.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhu, X.: Design and implementation of P2P based distributed theme Crawler system. Information Management Department of Nanjing University, Institute of Multimedia Information Processing, Nanjing (2014)
Salakhutdinov, R., Mnih, A.: Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In: Proceedings of ICML, pp. 880–887 (2013)
Xu, H., Zhang, R.: Semantic annotation of ontology by using rough concept lattice isomorphic model. Int. J. Hybrid Inf. Technol. 8(2), 93–108 (2015)
Jin, Y.: Design and Implementation of Distributed Web Information Acquisition System. Harbin University of Technology, Institute of Measurement and Control Technology and Communication Engineering, Heilongjiang (2013)
Herlocker, J.I., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic for performing collaborative filtering. In: Proceedings of the Conference on Research and Development in Information Retrieval (SigIR 1999), pp. 230–237 (2015)
Wu, L., Liu, N.: Design and Implementation of Distributed Network Crawler. Hubei School of Computer Science, Wuhan University, Key Laboratory of Ministry of Information and Network Security, Shanghai (2011)
Xu, H., Zhang, R.: Novel approach of semantic annotation by fuzzy ontology based on variable precision rough set and concept lattice. Int. J. Hybrid Inf. Technol. 9(4), 25–40 (2016)
Bai, H, Wang, J.L. et al.: Research and Implementation of Distributed Multi-theme Web Crawler System, Graduate School of Chinese Academy of Sciences, National Network New Media Engineering Technology Research Center, National Institute of Acoustics, Chinese Academy of Sciences, Beijing (2009)
Li, X.: Design and implementation of distributed crawler system. In: China Science and Technology Information, (2014)
Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In Proceedings of NIPS (2007)
Yu, D., Peng, L.: When does inferring reputation probability countervail temptation in cooperative behaviors for the prisoners’ dilemma game? Chaos, Solitons Fractals 78, 238–244 (2015)
Acknowledgements
This paper is supported by Henan key Laboratory for Big Data Processing & Analytics of Electronic Commerce, and also supported by the science and technology research major project of Henan province Education Department (17B520026, 15A120012), Key scientific research projects in Henan province universities (17A880020).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, H., Li, K., Fan, G. (2019). An Improved Strategy of Distributed Network Crawler Based on Hadoop and P2P. In: Abawajy, J., Choo, KK., Islam, R., Xu, Z., Atiquzzaman, M. (eds) International Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018. ATCI 2018. Advances in Intelligent Systems and Computing, vol 842. Springer, Cham. https://doi.org/10.1007/978-3-319-98776-7_101
Download citation
DOI: https://doi.org/10.1007/978-3-319-98776-7_101
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98775-0
Online ISBN: 978-3-319-98776-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)