Abstract
This chapter investigates whether and how micro-messaging technologies such as Twitter messages can be harnessed to obtain valuable information. The interesting characteristics of micro-blogging services, such as being user oriented, provide opportunities for different applications to use the content of these sites to their advantage. However, the same characteristics become the weakness of these sites when it comes to data modelling and analysis of the messages. These sites contains very large amount of unstructured, noisy with false or missing data which make the task of data mining difficult. This chapter first reviews some of the potential applications of the micro-messaging services and then provides some insight into different challenges faced by data mining applications. Later in this chapter, characteristics of a real data collected from the Twitter are analysed. At the end of chapter, application of micro-blogging services is shown by three different case studies.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The data is based on the output of the Stanford parser over a 0.3-million sample of English tweets, as identified using automatic language identification over our primary data set.
- 2.
There is a minor impulse on the 25th of April. This impulse cannot be explained by any single event but by the accumulation of a large number of minor events in the aftermath of the earthquake.
References
Baldwin, T., Lui, M.: Language identification: the long and the short of the matter. In: Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, pp. 229–237 (2010)
Barbosa, L., Feng, J.: Robust sentiment detection on Twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Posters Volume, Beijing, pp. 36–44 (2010)
Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From tweets to polls: linking text sentiment to public opinion time series. In: Proceedings of the Second International AAAI Conference on Weblogs and Social Media, Washington, DC, pp. 122–129 (2010)
Culotta, A.: Towards detecting influenza epidemics by analyzing Twitter messages. In: Proceedings of the KDD 2010 Workshop on Social Media Analytics, Washington, DC, (2010)
de Marneffe, M., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, (2006)
Goorha, S., Ungar, L.: Discovery of significant emerging trends. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 57–64. ACM (2010)
Han, B., Baldwin, T.: Lexical normalisation of short text messages: Makn sens a #twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT 2011), pp. 368–378. Stroudsburg, PA, USA (2011)
Hannon, J., Bennett, M., Smyth, B.: Recommending Twitter users to follow using content and collaborative filtering approaches. In: Proceedings of the Fourth ACM Conference on Recommender Systems, Barcelona, pp. 199–206 (2010)
Hethcote, H., Tudor, D.: Integral equation models for endemic infectious diseases. J. Math. Biol. 9(1), 37–47 (1980)
Huberman, B., Romero, D., Wu, F.: Social networks that matter: Twitter under the microscope. First Monday 14(1), 8 (2009)
Kermack, W.O., McKendrick, A.G.: A contribution to the mathematical theory of epidemics. Proc. R. Soc. A 115, 700–721 (1927)
Klein, D., Manning, C.D.: Fast exact inference with a factored model for natural language parsing. In: Advances in Neural Information Processing Systems 15 (NIPS 2002), Whistler, pp. 3–10 (2003)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, Raleigh, pp. 591–600 (2010)
Lane, R.: Methods for maximum-likelihood deconvolution. JOSA A 13(10), 1992–1998 (1996)
Likas, A., Galatsanos, N.: A variational approach for Bayesian blind image deconvolution. IEEE Trans. Signal Process. 52(8), 2222–2233 (2004)
Milstein, S., Chowdhury, A., Hochmuth, G., Lorica, B., Magoulas, R.: Twitter and the micro-messaging revolution: communication, connections, and immediacy – 140 characters at a time. O’Reilly Radar Report (2008)
Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, pp. 181–189 (2010)
Quincey, E., Kostkova, P.: Early warning and outbreak detection using social networking websites: the potential of Twitter. In: Electronic Healthcare, vol. 27, pp. 21–24. Springer, Heidelberg (2010)
Reuters-Web: Twitter older than it looks. URL http://blogs.reuters.com/mediafile/2009/03/30/twitter-older-than-it-looks/. Reuters MediaFile blog (2009). Accessed 15 Dec 2011
Ritter, A., Cherry, C., Dolan, B.: Unsupervised modeling of Twitter conversations. In: Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, pp. 172–180 (2010)
Ritterman, J., Osborne, M., Klein, E.: Using prediction markets and Twitter to predict a swine flu pandemic. In: Proceedings of the 1st International Workshop on Mining Social Media, Sevilla (2009)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, Raleigh, North Carolina, pp. 851–860 (2010)
Santorini, B.: Part-of-speech tagging guidelines for the Penn Treebank project. Techinical report, Department of Computer and Information Science, University of Pennsylvania (1990)
Sproat, R., Black, A.W., Chen, S., Kumar, S., Ostendorf, M., Richards, C.: Normalization of non-standard words. Comput. Speech Lang. 15(3), 287–333 (2001)
Twitter: Big goals, big game, big records. http://blog.twitter.com/2010/06/big-goals-big-game-big-records.html (2010). Retrieved 4 Aug 2010
Wasow, W.: A note on the inversion of matrices by random walks. Math. Table Other Aid Computat. 6(38), 78–81 (1952)
Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM ’10, New York, pp. 261–270 (2010)
Yen, L., Vanvyve, D., Wouters, F., Fouss, F., Verleysen, M., Saerens, M.: Clustering using a random walk based distance measure. In: Proceedings of the 13th Symposium on Artificial Neural Networks (ESANN 2005), Bruges, pp. 317–324 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag London
About this chapter
Cite this chapter
Liao, Y. et al. (2012). Mining Micro-blogs: Opportunities and Challenges. In: Abraham, A. (eds) Computational Social Networks. Springer, London. https://doi.org/10.1007/978-1-4471-4054-2_6
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4054-2_6
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4053-5
Online ISBN: 978-1-4471-4054-2
eBook Packages: Computer ScienceComputer Science (R0)