Skip to main content

Mining Micro-blogs: Opportunities and Challenges

  • Chapter
  • First Online:

Abstract

This chapter investigates whether and how micro-messaging technologies such as Twitter messages can be harnessed to obtain valuable information. The interesting characteristics of micro-blogging services, such as being user oriented, provide opportunities for different applications to use the content of these sites to their advantage. However, the same characteristics become the weakness of these sites when it comes to data modelling and analysis of the messages. These sites contains very large amount of unstructured, noisy with false or missing data which make the task of data mining difficult. This chapter first reviews some of the potential applications of the micro-messaging services and then provides some insight into different challenges faced by data mining applications. Later in this chapter, characteristics of a real data collected from the Twitter are analysed. At the end of chapter, application of micro-blogging services is shown by three different case studies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The data is based on the output of the Stanford parser over a 0.3-million sample of English tweets, as identified using automatic language identification over our primary data set.

  2. 2.

    There is a minor impulse on the 25th of April. This impulse cannot be explained by any single event but by the accumulation of a large number of minor events in the aftermath of the earthquake.

References

  1. Baldwin, T., Lui, M.: Language identification: the long and the short of the matter. In: Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, pp. 229–237 (2010)

    Google Scholar 

  2. Barbosa, L., Feng, J.: Robust sentiment detection on Twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Posters Volume, Beijing, pp. 36–44 (2010)

    Google Scholar 

  3. Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From tweets to polls: linking text sentiment to public opinion time series. In: Proceedings of the Second International AAAI Conference on Weblogs and Social Media, Washington, DC, pp. 122–129 (2010)

    Google Scholar 

  4. Culotta, A.: Towards detecting influenza epidemics by analyzing Twitter messages. In: Proceedings of the KDD 2010 Workshop on Social Media Analytics, Washington, DC, (2010)

    Google Scholar 

  5. de Marneffe, M., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, (2006)

    Google Scholar 

  6. Goorha, S., Ungar, L.: Discovery of significant emerging trends. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 57–64. ACM (2010)

    Google Scholar 

  7. Han, B., Baldwin, T.: Lexical normalisation of short text messages: Makn sens a #twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT 2011), pp. 368–378. Stroudsburg, PA, USA (2011)

    Google Scholar 

  8. Hannon, J., Bennett, M., Smyth, B.: Recommending Twitter users to follow using content and collaborative filtering approaches. In: Proceedings of the Fourth ACM Conference on Recommender Systems, Barcelona, pp. 199–206 (2010)

    Google Scholar 

  9. Hethcote, H., Tudor, D.: Integral equation models for endemic infectious diseases. J. Math. Biol. 9(1), 37–47 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  10. Huberman, B., Romero, D., Wu, F.: Social networks that matter: Twitter under the microscope. First Monday 14(1), 8 (2009)

    Google Scholar 

  11. Kermack, W.O., McKendrick, A.G.: A contribution to the mathematical theory of epidemics. Proc. R. Soc. A 115, 700–721 (1927)

    Article  MATH  Google Scholar 

  12. Klein, D., Manning, C.D.: Fast exact inference with a factored model for natural language parsing. In: Advances in Neural Information Processing Systems 15 (NIPS 2002), Whistler, pp. 3–10 (2003)

    Google Scholar 

  13. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, Raleigh, pp. 591–600 (2010)

    Google Scholar 

  14. Lane, R.: Methods for maximum-likelihood deconvolution. JOSA A 13(10), 1992–1998 (1996)

    Article  MathSciNet  Google Scholar 

  15. Likas, A., Galatsanos, N.: A variational approach for Bayesian blind image deconvolution. IEEE Trans. Signal Process. 52(8), 2222–2233 (2004)

    Article  MathSciNet  Google Scholar 

  16. Milstein, S., Chowdhury, A., Hochmuth, G., Lorica, B., Magoulas, R.: Twitter and the micro-messaging revolution: communication, connections, and immediacy – 140 characters at a time. O’Reilly Radar Report (2008)

    Google Scholar 

  17. Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, pp. 181–189 (2010)

    Google Scholar 

  18. Quincey, E., Kostkova, P.: Early warning and outbreak detection using social networking websites: the potential of Twitter. In: Electronic Healthcare, vol. 27, pp. 21–24. Springer, Heidelberg (2010)

    Google Scholar 

  19. Reuters-Web: Twitter older than it looks. URL http://blogs.reuters.com/mediafile/2009/03/30/twitter-older-than-it-looks/. Reuters MediaFile blog (2009). Accessed 15 Dec 2011

  20. Ritter, A., Cherry, C., Dolan, B.: Unsupervised modeling of Twitter conversations. In: Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, pp. 172–180 (2010)

    Google Scholar 

  21. Ritterman, J., Osborne, M., Klein, E.: Using prediction markets and Twitter to predict a swine flu pandemic. In: Proceedings of the 1st International Workshop on Mining Social Media, Sevilla (2009)

    Google Scholar 

  22. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, Raleigh, North Carolina, pp. 851–860 (2010)

    Google Scholar 

  23. Santorini, B.: Part-of-speech tagging guidelines for the Penn Treebank project. Techinical report, Department of Computer and Information Science, University of Pennsylvania (1990)

    Google Scholar 

  24. Sproat, R., Black, A.W., Chen, S., Kumar, S., Ostendorf, M., Richards, C.: Normalization of non-standard words. Comput. Speech Lang. 15(3), 287–333 (2001)

    Article  Google Scholar 

  25. Twitter: Big goals, big game, big records. http://blog.twitter.com/2010/06/big-goals-big-game-big-records.html (2010). Retrieved 4 Aug 2010

  26. Wasow, W.: A note on the inversion of matrices by random walks. Math. Table Other Aid Computat. 6(38), 78–81 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  27. Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM ’10, New York, pp. 261–270 (2010)

    Google Scholar 

  28. Yen, L., Vanvyve, D., Wouters, F., Fouss, F., Verleysen, M., Saerens, M.: Clustering using a random walk based distance measure. In: Proceedings of the 13th Symposium on Artificial Neural Networks (ESANN 2005), Bruges, pp. 317–324 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Liao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag London

About this chapter

Cite this chapter

Liao, Y. et al. (2012). Mining Micro-blogs: Opportunities and Challenges. In: Abraham, A. (eds) Computational Social Networks. Springer, London. https://doi.org/10.1007/978-1-4471-4054-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4054-2_6

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4053-5

  • Online ISBN: 978-1-4471-4054-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics