Abstract
Getting data is the precondition of researching on micro-blogging services. By using Web 2.0 techniques such as AJAX, the contents of micro-blog Web pages are dynamically generated rapidly. That makes it hard for traditional Web page crawler to crawl micro-blog Web pages. Micro-blogging services provide some APIs. Through the APIs, well-structured data can be easily obtained. A software architecture for micro-blogging service crawler, which is named as MBCrawler, is designed basing on the APIs provided by micro-blogging services. The architecture is modular and scalable, so it can fit specific features of different micro-blogging services. SinaMBCrawler, which is a crawler application based on MBCrawler for Sina Weibo, has been developed. It automatically invokes the APIs of Sina Weibo to crawl data. The crawled data is saved into local database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arasu A, Cho J, Garcia-Molina H, Paepcke A, Raghavan S (2001) Searching the web. ACM Trans Internet Technol (TOIT) 1(1):2–43
Mesbah A, van Deursen A (2009) Invariant-based automatic testing of AJAX user interfaces. In: Proceedings of the 31st international conference on software engineering, Washington, USA, pp 210–220
Xia T (2009) Extracting structured data from Ajax site. In: First international workshop on database technology and applications, pp 259–262
Duda C, Frey G, Kossmann D, Matter R, Zhou C (2009) AJAX crawl: making AJAX applications searchable. In: IEEE 25th international conference on data engineering ICDE’09, pp 78–89
Peng Z, He N, Jiang C, Li Z, Xu L, Li Y, Ren Y (2012) Graph-based AJAX crawl: mining data from rich internet applications. In: 2012 international conference on computer science and electronics engineering (ICCSEE), vol 3. pp 590–594
Mesbah A, van Deursen A, Lenselink S (2012) Crawling ajax-based web applications through dynamic analysis of user interface state changes. ACM Trans Web 6(1):1–30
Weng J, Lim E-P, Jiang J, He Q (2010) TwitterRank: finding topic-sensitive influential twitterers. In: Proceedings of the third ACM international conference on web search and data mining, New York, USA, pp 261–270
Mendoza M, Poblete B, Castillo C (2010) Twitter under crisis: can we trust what we RT? In: Proceedings of the first workshop on social media analytics, pp 71–79
Sriram B, Fuhry D, Demir E, Ferhatosmanoglu H, Demirbas M (2010) Short text classification in twitter to improve information filtering. In: Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval, pp 841–842
A. for C. M. S. I. G. on Security, Audit, and Control (2007) Why we twitter: understanding microblogging usage and communities. In: Proceedings of the ACM workshop on privacy in the electronic society, Washington, USA
Asur S, Huberman BA (2010) Predicting the future with social media. ArXiv 1003:5699
Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on World wide web, pp 591–600
Wu S, Hofman JM, Mason WA, Watts DJ (2011) Who says what to whom on twitter. In: Proceedings of the 20th international conference on World wide web, pp 705–714
Bakshy E, Hofman JM, Mason V, Watts DJ (2011) Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the fourth ACM international conference on Web search and data mining, pp 65–74
Li R, Lei KH, Khadiwala R, Chang KC-C (2012) TEDAS: a twitter-based event detection and analysis system. In: International conference on data engineering, Los Alamitos, USA, vol 0. pp 1273–1276
Acknowledgments
This work is supported by the Fundamental Research Funds for the Central Universities grants ZZ1224.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lu, G., Liu, S., Lü, K. (2013). MBCrawler: A Software Architecture for Micro-Blog Crawler. In: Lu, W., Cai, G., Liu, W., Xing, W. (eds) Proceedings of the 2012 International Conference on Information Technology and Software Engineering. Lecture Notes in Electrical Engineering, vol 212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34531-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-34531-9_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34530-2
Online ISBN: 978-3-642-34531-9
eBook Packages: EngineeringEngineering (R0)