Mining Micro-blogs: Opportunities and Challenges

Liao, Yang; Moshtaghi, Masud; Han, Bo; Karunasekera, Shanika; Kotagiri, Ramamohanarao; Baldwin, Timothy; Harwood, Aaron; Pattison, Philippa

doi:10.1007/978-1-4471-4054-2_6

Mining Micro-blogs: Opportunities and Challenges

Yang Liao²,
Masud Moshtaghi²,
Bo Han²,
Shanika Karunasekera²,
Ramamohanarao Kotagiri²,
Timothy Baldwin²,
Aaron Harwood² &
…
Philippa Pattison³

Chapter
First Online: 01 January 2012

2686 Accesses
10 Citations

Abstract

This chapter investigates whether and how micro-messaging technologies such as Twitter messages can be harnessed to obtain valuable information. The interesting characteristics of micro-blogging services, such as being user oriented, provide opportunities for different applications to use the content of these sites to their advantage. However, the same characteristics become the weakness of these sites when it comes to data modelling and analysis of the messages. These sites contains very large amount of unstructured, noisy with false or missing data which make the task of data mining difficult. This chapter first reviews some of the potential applications of the micro-messaging services and then provides some insight into different challenges faced by data mining applications. Later in this chapter, characteristics of a real data collected from the Twitter are analysed. At the end of chapter, application of micro-blogging services is shown by three different case studies.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The data is based on the output of the Stanford parser over a 0.3-million sample of English tweets, as identified using automatic language identification over our primary data set.
2.
There is a minor impulse on the 25th of April. This impulse cannot be explained by any single event but by the accumulation of a large number of minor events in the aftermath of the earthquake.

References

Baldwin, T., Lui, M.: Language identification: the long and the short of the matter. In: Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, pp. 229–237 (2010)
Google Scholar
Barbosa, L., Feng, J.: Robust sentiment detection on Twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Posters Volume, Beijing, pp. 36–44 (2010)
Google Scholar
Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From tweets to polls: linking text sentiment to public opinion time series. In: Proceedings of the Second International AAAI Conference on Weblogs and Social Media, Washington, DC, pp. 122–129 (2010)
Google Scholar
Culotta, A.: Towards detecting influenza epidemics by analyzing Twitter messages. In: Proceedings of the KDD 2010 Workshop on Social Media Analytics, Washington, DC, (2010)
Google Scholar
de Marneffe, M., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, (2006)
Google Scholar
Goorha, S., Ungar, L.: Discovery of significant emerging trends. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 57–64. ACM (2010)
Google Scholar
Han, B., Baldwin, T.: Lexical normalisation of short text messages: Makn sens a #twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT 2011), pp. 368–378. Stroudsburg, PA, USA (2011)
Google Scholar
Hannon, J., Bennett, M., Smyth, B.: Recommending Twitter users to follow using content and collaborative filtering approaches. In: Proceedings of the Fourth ACM Conference on Recommender Systems, Barcelona, pp. 199–206 (2010)
Google Scholar
Hethcote, H., Tudor, D.: Integral equation models for endemic infectious diseases. J. Math. Biol. 9(1), 37–47 (1980)
Article MathSciNet MATH Google Scholar
Huberman, B., Romero, D., Wu, F.: Social networks that matter: Twitter under the microscope. First Monday 14(1), 8 (2009)
Google Scholar
Kermack, W.O., McKendrick, A.G.: A contribution to the mathematical theory of epidemics. Proc. R. Soc. A 115, 700–721 (1927)
Article MATH Google Scholar
Klein, D., Manning, C.D.: Fast exact inference with a factored model for natural language parsing. In: Advances in Neural Information Processing Systems 15 (NIPS 2002), Whistler, pp. 3–10 (2003)
Google Scholar
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, Raleigh, pp. 591–600 (2010)
Google Scholar
Lane, R.: Methods for maximum-likelihood deconvolution. JOSA A 13(10), 1992–1998 (1996)
Article MathSciNet Google Scholar
Likas, A., Galatsanos, N.: A variational approach for Bayesian blind image deconvolution. IEEE Trans. Signal Process. 52(8), 2222–2233 (2004)
Article MathSciNet Google Scholar
Milstein, S., Chowdhury, A., Hochmuth, G., Lorica, B., Magoulas, R.: Twitter and the micro-messaging revolution: communication, connections, and immediacy – 140 characters at a time. O’Reilly Radar Report (2008)
Google Scholar
Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, pp. 181–189 (2010)
Google Scholar
Quincey, E., Kostkova, P.: Early warning and outbreak detection using social networking websites: the potential of Twitter. In: Electronic Healthcare, vol. 27, pp. 21–24. Springer, Heidelberg (2010)
Google Scholar
Reuters-Web: Twitter older than it looks. URL http://blogs.reuters.com/mediafile/2009/03/30/twitter-older-than-it-looks/. Reuters MediaFile blog (2009). Accessed 15 Dec 2011
Ritter, A., Cherry, C., Dolan, B.: Unsupervised modeling of Twitter conversations. In: Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, pp. 172–180 (2010)
Google Scholar
Ritterman, J., Osborne, M., Klein, E.: Using prediction markets and Twitter to predict a swine flu pandemic. In: Proceedings of the 1st International Workshop on Mining Social Media, Sevilla (2009)
Google Scholar
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, Raleigh, North Carolina, pp. 851–860 (2010)
Google Scholar
Santorini, B.: Part-of-speech tagging guidelines for the Penn Treebank project. Techinical report, Department of Computer and Information Science, University of Pennsylvania (1990)
Google Scholar
Sproat, R., Black, A.W., Chen, S., Kumar, S., Ostendorf, M., Richards, C.: Normalization of non-standard words. Comput. Speech Lang. 15(3), 287–333 (2001)
Article Google Scholar
Twitter: Big goals, big game, big records. http://blog.twitter.com/2010/06/big-goals-big-game-big-records.html (2010). Retrieved 4 Aug 2010
Wasow, W.: A note on the inversion of matrices by random walks. Math. Table Other Aid Computat. 6(38), 78–81 (1952)
Article MathSciNet MATH Google Scholar
Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM ’10, New York, pp. 261–270 (2010)
Google Scholar
Yen, L., Vanvyve, D., Wouters, F., Fouss, F., Verleysen, M., Saerens, M.: Clustering using a random walk based distance measure. In: Proceedings of the 13th Symposium on Artificial Neural Networks (ESANN 2005), Bruges, pp. 317–324 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, The University of Melbourne, Melbourne, VIC, Australia
Yang Liao, Masud Moshtaghi, Bo Han, Shanika Karunasekera, Ramamohanarao Kotagiri, Timothy Baldwin & Aaron Harwood
Faculty of Medicine, Dentistry and Health Sciences Psychological Sciences, The University of Melbourne, Melbourne, VIC, Australia
Philippa Pattison

Authors

Yang Liao
View author publications
You can also search for this author in PubMed Google Scholar
Masud Moshtaghi
View author publications
You can also search for this author in PubMed Google Scholar
Bo Han
View author publications
You can also search for this author in PubMed Google Scholar
Shanika Karunasekera
View author publications
You can also search for this author in PubMed Google Scholar
Ramamohanarao Kotagiri
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Baldwin
View author publications
You can also search for this author in PubMed Google Scholar
Aaron Harwood
View author publications
You can also search for this author in PubMed Google Scholar
Philippa Pattison
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Liao .

Editor information

Editors and Affiliations

Auburn, 98071, USA
Ajith Abraham

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Liao, Y. et al. (2012). Mining Micro-blogs: Opportunities and Challenges. In: Abraham, A. (eds) Computational Social Networks. Springer, London. https://doi.org/10.1007/978-1-4471-4054-2_6

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4054-2_6
Published: 14 June 2012
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4053-5
Online ISBN: 978-1-4471-4054-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics