Applied Methods and Techniques for Modeling and Control on Micro-Blog Data Crawler

Part of the Lecture Notes in Control and Information Sciences book series (LNCIS, volume 452)


Models can provide mechanisms to improve system performance. This chapter presents the applied methods and techniques for modeling and controlling on micro-blog crawler. With the rapid development of social studies and social network, millions of people present or comment or share their opinions on the platform everyday, and as a result, produce or spread their opinions and sentiments on different topics. The microblog has been an effective platform to know or mine social opinions. In order to do so, crawling the relevant microblog data is necessary. But it is hard for a traditional web crawler to crawl micro-blog data as usual, as by using Web 2.0 techniques such as AJAX, the micro-blog data is dynamically generated rapidly. As most microblogs’ official platforms cannot offer some suitable tools or RPC interface to collect the big data effectively and efficiently, we present an algorithm on modeling and controlling on micro-blog data crawler based on simulating browsers’ behaviors. This needs to analyze the simulated browsers’ behaviors in order to obtain the requesting URLs to simulate and parse and analyze the sending URL requests according to the order of data sequence. The experimental results and the analysis show the feasibility of the approach. Further works are also presented at the end.


Models Social networks Micro-blog Crawler 



Some earlier works were done in Beijing Institute of Technology with the help of Dr. Hua-ping Zhang and Prof. Yin-ping Zhao. This work is sponsored by the National Science Foundation of Hebei Province (No. F2013208105) and the National Science Foundation of China (No. 61272362). It is also sponsored by Hebei Province Scientific and Technical Key Task (No. 12213516D).


  1. 1.
    Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: 19th international conference on world wide web. ACM Press, USA, pp 591–600Google Scholar
  2. 2.
    Weng J, Lim EP, Jiang J, He Q (2010) TwitterRank: finding topic-sensitive influential twitterers. In: 3rd international conference on web search and web data mining. ACM Press, USA, pp 261–270Google Scholar
  3. 3.
    Cristian DNM, Lee L, Bo P, Kleinberg J (2012) Echoes of power: language effects and power differences in social interaction. In: 21th international conference on world wide web. ACM Press, France, pp 699–708Google Scholar
  4. 4.
    Wu S, Hofman JM, Mason WA, Watts DJ (2011) Who says what to whom on Twitter. In: 20th international conference on the world wide web. ACM Press, India, pp 705–714Google Scholar
  5. 5.
    Abel F, Gao Q, Houben GJ, Tao K (2011) Analyzing user modeling on Twitter for personalized news recommendations. In: International conference on user modeling, adaptation and personalization. LNCS, vol 6787. Springer, Spain, pp 1–12Google Scholar
  6. 6.
    Chen J, Nairn R, Nelson L, Bernstein M, Chi E (2010) Short and tweet: experiments on recommending content from information streams. In: 28th international conference on human factors in computing systems. ACM Press, USA, pp 1185–1194Google Scholar
  7. 7.
    Bakshy E, Hofman JM, Mason WA, Watts DJ (2011) Everyone’s an influencer: quantifying influence on Twitter. In: 3rd international conference on web search and data mining. ACM Press, Hong Kong, pp 65–74Google Scholar
  8. 8.
    Bakshy E, Rosenn I, Marlow C, Marlow C (2012) The role of social networks in information diffusion. In: International conference on world wide web. ACM Press, France, pp 519–528Google Scholar
  9. 9.
    Sachan M, Contractor D, Tanveer AF, Subramaniam LV (2012) Using content and interactions for discovering communities in social networks. In: International conference on world wide web. ACM Press, France, pp 331–340Google Scholar
  10. 10.
    Dan C, Shipman FM (2009) Capturing on-line social network link dynamics using event-driven sampling. In: International conference on computational science and engineering, vol 4. Vancouver, Canada, pp 284–291Google Scholar
  11. 11.
    Goyal A, Bonchi F, Lakshmanan LV (2010) Learning influence probabilities in social networks. In: 3th international conference on web search and data mining. ACM Press, USA, pp 241–250Google Scholar
  12. 12.
    Agarwal A, Durgesh S, Pandey AKA, Goel V (2012) Design of a parallel migrating web crawler. J Adv Res Comput Sci Softw Eng 2(4):147–153Google Scholar
  13. 13.
    Kim KS, Kim KY, Lee KH, Kim TK, Cho WS (2012) Design and implementation of web crawler based on dynamic web collection cycle. In: International conference on information networking (ICOIN). Bali, Indonesia, pp 562–566Google Scholar
  14. 14.
    Chandramouli A, Gauch S, Eno J (2012) A cooperative approach to web crawler URL ordering, human–computer systems interaction: backgrounds and applications. J Adv Intell Soft Comput 98:343–357CrossRefGoogle Scholar
  15. 15.
    Lu G, Liu S, Lü K (2013) MBCrawler: a software architecture for micro-blog crawler. In: International conference on information technology and software engineering. Lecture Notes in Electrical Engineering, vol 212. Springer, Berlin, Heidelberg, pp 119–127Google Scholar
  16. 16.
    Gao K, Li SW (2010) The cooperation model for multi agents and the identification on replicated collections for web crawler. Int J Model Identif Control 11(3–4):224–231CrossRefGoogle Scholar
  17. 17.
    Garg A, Tai K (2013) Comparison of statistical and machine learning methods in modelling of data with multicollinearity. Int J Model Identif Control 18(4):295–312CrossRefGoogle Scholar
  18. 18.
    Han G, Zhu H, Ge J (2013) Effective search space reduction for human pose estimation with Viterbi recurrence algorithm. Int J Model Identif Control 18(4):341–348CrossRefGoogle Scholar
  19. 19.
    Singh S, Mittal P, Kahlon KS (2013) Empirical model for predicting high, medium and low severity faults using object oriented metrics in Mozilla Firefox. Int J Comput Appl Technol 47(2/3):110–124CrossRefGoogle Scholar
  20. 20.
    HttpWatch: Introduction to HttpWatch 8.x (2013).
  21. 21.
    Ajax: Introduction to Ajax (2013).
  22. 22.
    Json: Introduction to Json (2013).

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.School of Information Science and EngineeringHebei University of Science and TechnologyShijiazhuangChina
  2. 2.Comrise CompanyHazletUSA

Personalised recommendations