Skip to main content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 212))

Abstract

Getting data is the precondition of researching on micro-blogging services. By using Web 2.0 techniques such as AJAX, the contents of micro-blog Web pages are dynamically generated rapidly. That makes it hard for traditional Web page crawler to crawl micro-blog Web pages. Micro-blogging services provide some APIs. Through the APIs, well-structured data can be easily obtained. A software architecture for micro-blogging service crawler, which is named as MBCrawler, is designed basing on the APIs provided by micro-blogging services. The architecture is modular and scalable, so it can fit specific features of different micro-blogging services. SinaMBCrawler, which is a crawler application based on MBCrawler for Sina Weibo, has been developed. It automatically invokes the APIs of Sina Weibo to crawl data. The crawled data is saved into local database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arasu A, Cho J, Garcia-Molina H, Paepcke A, Raghavan S (2001) Searching the web. ACM Trans Internet Technol (TOIT) 1(1):2–43

    Article  Google Scholar 

  2. Mesbah A, van Deursen A (2009) Invariant-based automatic testing of AJAX user interfaces. In: Proceedings of the 31st international conference on software engineering, Washington, USA, pp 210–220

    Google Scholar 

  3. Xia T (2009) Extracting structured data from Ajax site. In: First international workshop on database technology and applications, pp 259–262

    Google Scholar 

  4. Duda C, Frey G, Kossmann D, Matter R, Zhou C (2009) AJAX crawl: making AJAX applications searchable. In: IEEE 25th international conference on data engineering ICDE’09, pp 78–89

    Google Scholar 

  5. Peng Z, He N, Jiang C, Li Z, Xu L, Li Y, Ren Y (2012) Graph-based AJAX crawl: mining data from rich internet applications. In: 2012 international conference on computer science and electronics engineering (ICCSEE), vol 3. pp 590–594

    Google Scholar 

  6. Mesbah A, van Deursen A, Lenselink S (2012) Crawling ajax-based web applications through dynamic analysis of user interface state changes. ACM Trans Web 6(1):1–30

    Article  Google Scholar 

  7. Weng J, Lim E-P, Jiang J, He Q (2010) TwitterRank: finding topic-sensitive influential twitterers. In: Proceedings of the third ACM international conference on web search and data mining, New York, USA, pp 261–270

    Google Scholar 

  8. Mendoza M, Poblete B, Castillo C (2010) Twitter under crisis: can we trust what we RT? In: Proceedings of the first workshop on social media analytics, pp 71–79

    Google Scholar 

  9. Sriram B, Fuhry D, Demir E, Ferhatosmanoglu H, Demirbas M (2010) Short text classification in twitter to improve information filtering. In: Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval, pp 841–842

    Google Scholar 

  10. A. for C. M. S. I. G. on Security, Audit, and Control (2007) Why we twitter: understanding microblogging usage and communities. In: Proceedings of the ACM workshop on privacy in the electronic society, Washington, USA

    Google Scholar 

  11. Asur S, Huberman BA (2010) Predicting the future with social media. ArXiv 1003:5699

    Google Scholar 

  12. Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on World wide web, pp 591–600

    Google Scholar 

  13. Wu S, Hofman JM, Mason WA, Watts DJ (2011) Who says what to whom on twitter. In: Proceedings of the 20th international conference on World wide web, pp 705–714

    Google Scholar 

  14. Bakshy E, Hofman JM, Mason V, Watts DJ (2011) Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the fourth ACM international conference on Web search and data mining, pp 65–74

    Google Scholar 

  15. Li R, Lei KH, Khadiwala R, Chang KC-C (2012) TEDAS: a twitter-based event detection and analysis system. In: International conference on data engineering, Los Alamitos, USA, vol 0. pp 1273–1276

    Google Scholar 

Download references

Acknowledgments

This work is supported by the Fundamental Research Funds for the Central Universities grants ZZ1224.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gang Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lu, G., Liu, S., Lü, K. (2013). MBCrawler: A Software Architecture for Micro-Blog Crawler. In: Lu, W., Cai, G., Liu, W., Xing, W. (eds) Proceedings of the 2012 International Conference on Information Technology and Software Engineering. Lecture Notes in Electrical Engineering, vol 212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34531-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34531-9_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34530-2

  • Online ISBN: 978-3-642-34531-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics