Skip to main content

An Improved Strategy of Distributed Network Crawler Based on Hadoop and P2P

  • Conference paper
  • First Online:
International Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018 (ATCI 2018)

Abstract

Crawler technology is a technology that uses web crawlers to capture information on a web page to get valuable information. The web crawler is a program that automatically extracts web pages. It downloads Web pages from WEB for search engines. It is an important part of the search engine. This paper introduces the architecture of distributed Hadoop platform, and by the principle of the web crawler puts forward design scheme of distributed crawler based on Hadoop and P2P technology, and ultimately determine the system layout of distributed crawler, module partition and distributed crawler flow control and power can be realized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhu, X.: Design and implementation of P2P based distributed theme Crawler system. Information Management Department of Nanjing University, Institute of Multimedia Information Processing, Nanjing (2014)

    Google Scholar 

  2. Salakhutdinov, R., Mnih, A.: Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In: Proceedings of ICML, pp. 880–887 (2013)

    Google Scholar 

  3. Xu, H., Zhang, R.: Semantic annotation of ontology by using rough concept lattice isomorphic model. Int. J. Hybrid Inf. Technol. 8(2), 93–108 (2015)

    Article  Google Scholar 

  4. Jin, Y.: Design and Implementation of Distributed Web Information Acquisition System. Harbin University of Technology, Institute of Measurement and Control Technology and Communication Engineering, Heilongjiang (2013)

    Google Scholar 

  5. Herlocker, J.I., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic for performing collaborative filtering. In: Proceedings of the Conference on Research and Development in Information Retrieval (SigIR 1999), pp. 230–237 (2015)

    Google Scholar 

  6. Wu, L., Liu, N.: Design and Implementation of Distributed Network Crawler. Hubei School of Computer Science, Wuhan University, Key Laboratory of Ministry of Information and Network Security, Shanghai (2011)

    Google Scholar 

  7. Xu, H., Zhang, R.: Novel approach of semantic annotation by fuzzy ontology based on variable precision rough set and concept lattice. Int. J. Hybrid Inf. Technol. 9(4), 25–40 (2016)

    Article  MathSciNet  Google Scholar 

  8. Bai, H, Wang, J.L. et al.: Research and Implementation of Distributed Multi-theme Web Crawler System, Graduate School of Chinese Academy of Sciences, National Network New Media Engineering Technology Research Center, National Institute of Acoustics, Chinese Academy of Sciences, Beijing (2009)

    Google Scholar 

  9. Li, X.: Design and implementation of distributed crawler system. In: China Science and Technology Information, (2014)

    Google Scholar 

  10. Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In Proceedings of NIPS (2007)

    Google Scholar 

  11. Yu, D., Peng, L.: When does inferring reputation probability countervail temptation in cooperative behaviors for the prisoners’ dilemma game? Chaos, Solitons Fractals 78, 238–244 (2015)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This paper is supported by Henan key Laboratory for Big Data Processing & Analytics of Electronic Commerce, and also supported by the science and technology research major project of Henan province Education Department (17B520026, 15A120012), Key scientific research projects in Henan province universities (17A880020).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongsheng Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, H., Li, K., Fan, G. (2019). An Improved Strategy of Distributed Network Crawler Based on Hadoop and P2P. In: Abawajy, J., Choo, KK., Islam, R., Xu, Z., Atiquzzaman, M. (eds) International Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018. ATCI 2018. Advances in Intelligent Systems and Computing, vol 842. Springer, Cham. https://doi.org/10.1007/978-3-319-98776-7_101

Download citation

Publish with us

Policies and ethics