Skip to main content

Text Mining and Real-Time Analytics of Twitter Data: A Case Study of Australian Hay Fever Prediction

Part of the Lecture Notes in Computer Science book series (LNISA,volume 11148)

Abstract

Social media platforms such as Twitter contain wealth of user-generated data and over time has become a virtual treasure trove of information for knowledge discovery with applications in healthcare, politics, social initiatives, to name a few. Despite the evident benefits of tweets exploration, there are numerous challenges associated with processing such data, given tweets specific characteristics. The study provides a brief of steps involved in manipulation Twitter data as well as offers the examples of the machine learning algorithms most commonly used in text analysis. It concludes with the case study on the Australian hay fever prediction with the application of the selected techniques described in the brief. It demonstrates an example of Twitter real-time analytics for heath condition surveillance with the use of interactive visualisations to assist knowledge discovery and findings dissemination. The results prove the potential of social media to play an important role in meaningful results extraction and guidance for decision makers.

Keywords

  • Twitter
  • Machine learning
  • Text mining
  • Information retrieval
  • Knowledge discovery

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-01078-2_12
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   44.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-01078-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   59.99
Price excludes VAT (USA)
Fig. 1.

References

  1. Twitter. https://about.twitter.com/company

  2. Bruns, A., Stieglitz, S.: Towards more systematic twitter analysis: metrics for tweeting activities. Int. J. Soc. Res. Methodol. 16(2), 91–108 (2013)

    CrossRef  Google Scholar 

  3. Australian Institute of Health and Welfare. Allergic Rhinitis (‘Hay Fever’) in Australia (2016)

    Google Scholar 

  4. Sorensen, L.: User managed trust in social networking-comparing Facebook, Myspace and Linkedin. In: 1st International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic Systems Technology, Wireless VITAE 2009, pp. 427–431. IEEE (2009)

    Google Scholar 

  5. Liu, F., Xiong, L.: Survey on text clustering algorithm-research present situation of text clustering algorithm. In: 2011 IEEE 2nd International Conference on Software Engineering and Service Science (ICSESS), pp. 196–199. IEEE (2011)

    Google Scholar 

  6. Dai, Y., Kakkonen, T., Sutinen, E.: MinEDec: a decision-support model that combines text-mining technologies with two competitive intelligence analysis methods. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 3, 165–173 (2011)

    Google Scholar 

  7. Forman, G., Kirshenbaum, E.: Extremely fast text feature extraction for classification and indexing. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 1221–1230. ACM (2008)

    Google Scholar 

  8. Stavrianou, A., Brun, C., Silander, T., Roux, C.: NLP-based feature extraction for automated tweet classification. Interact. Data Min. Nat. Lang. Process. 145 (2014)

    Google Scholar 

  9. Zhao, P., Li, X., Wang, K.: Feature extraction from micro-blogs for comparison of products and services. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013. LNCS, vol. 8180, pp. 82–91. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41230-1_7

    CrossRef  Google Scholar 

  10. Shirbhate, A.G., Deshmukh, S.N.: Feature extraction for sentiment classification on twitter data. Int. J. Sci. Res. (IJSR), 2319–7064 (2016). ISSN (Online)

    Google Scholar 

  11. Saif, H., Fernández, M., He, Y., Alani, H.: On stopwords, filtering and data sparsity for sentiment analysis of twitter (2014)

    Google Scholar 

  12. Porter, M.F.: Snowball: a language for stemming algorithms (2001)

    Google Scholar 

  13. Yuan, L.: Improvement for the automatic part-of-speech tagging based on Hidden Markov Model. In: 2010 2nd International Conference on Signal Processing Systems (ICSPS), vol. 1, pp. V1–744. IEEE (2010)

    Google Scholar 

  14. Jadhao, H., Aghav, D.J., Vegiraju, A.: Semantic tool for analysing unstructured data. Int. J. Sci. Eng. Res. 3(8) (2012)

    Google Scholar 

  15. Strapparava, C., Valitutti, A., et al.: WordNet affect: an affective extension of WordNet. In: LREC, vol. 4, pp. 1083–1086. Citeseer (2004)

    Google Scholar 

  16. Esuli, A., Sebastiani, F.: SentiWordNet: a high-coverage lexical resource for opinion mining. Evaluation 17, 1–26 (2007)

    Google Scholar 

  17. Montañés, E., Fernández, J., Díaz, I., Combarro, E.F., Ranilla, J.: Measures of rule quality for feature selection in text categorization. In: R. Berthold, M., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 589–598. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45231-7_54

    CrossRef  Google Scholar 

  18. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    CrossRef  Google Scholar 

  19. Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5(Nov), 1531–1555 (2004)

    Google Scholar 

  20. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6, pp. 775–780 (2006)

    Google Scholar 

  21. Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142 (2003)

    Google Scholar 

  22. Lee, K., Agrawal, A., Choudhary, A.: Real-time disease surveillance using twitter data: demonstration on flu and cancer. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1474–1477. ACM (2013)

    Google Scholar 

  23. Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Association for Computational Linguistics, pp. 36–44 (2010)

    Google Scholar 

  24. Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with twitter: what 140 characters reveal about political sentiment. Icwsm 10(1), 178–185 (2010)

    Google Scholar 

  25. O’Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From tweets to polls: linking text sentiment to public opinion time series. Icwsm 11(122–129), 1–2 (2010)

    Google Scholar 

  26. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860. ACM (2010)

    Google Scholar 

  27. Chunara, R., Andrews, J.R., Brownstein, J.S.: Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian Cholera outbreak. Am. J. Trop. Med. Hyg. 86(1), 39–45 (2012)

    CrossRef  Google Scholar 

  28. Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to twitter. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 181–189 (2010)

    Google Scholar 

  29. Jiang, H., Zhou, R., Zhang, L., Wang, H., Zhang, Y.: A topic model based on Poisson decomposition. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1489–1498. ACM (2017)

    Google Scholar 

  30. Huang, J., Peng, M., Wang, H., Cao, J., Gao, W., Zhang, X.: A probabilistic method for emerging topic tracking in microblog stream. World Wide Web 20(2), 325–350 (2017)

    CrossRef  Google Scholar 

  31. Peng, M., Xie, Q., Wang, H., Zhang, Y., Tian, G.: Bayesian sparse topical coding. IEEE Trans. Knowl. Data Eng. (2018)

    Google Scholar 

  32. Peng, M., et al.: Mining event-oriented topics in microblog stream with unsupervised multi-view hierarchical embedding. ACM Trans. Knowl. Discov. Data (TKDD) 12(3), 38 (2018)

    Google Scholar 

  33. Peng, M., et al.: Neural sparse topical coding. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2332–2340 (2018)

    Google Scholar 

  34. Yao, W., He, J., Wang, H., Zhang, Y., Cao, J.: Collaborative topic ranking: Leveraging item meta-data for sparsity reduction. In: AAAI, pp. 374–380 (2015)

    Google Scholar 

  35. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends® Inf. Retr. 2(1–2), 1–135 (2008)

    CrossRef  Google Scholar 

  36. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)

    CrossRef  Google Scholar 

  37. Bollen, J., Mao, H., Pepe, A.: Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. Icwsm 11, 450–453 (2011)

    Google Scholar 

  38. Bruns, A., Burgess, J.E.: # Ausvotes: How twitter covered the 2010 Australian federal election. Commun. Polit. Cult. 44(2), 37–56 (2011)

    Google Scholar 

  39. Gaffney, D.: iranElection: quantifying online activism. In: Proceedings of the Web Science Conference WebSci10. Citeseer (2010)

    Google Scholar 

  40. Culotta, A.: Towards detecting influenza epidemics by analyzing twitter messages. In: Proceedings of the First Workshop on Social Media Analytics, pp. 115–122. ACM (2010)

    Google Scholar 

  41. de Quincey, E., Kostkova, P.: Early warning and outbreak detection using social networking websites: the potential of twitter. In: Kostkova, P. (ed.) eHealth 2009. LNICST, vol. 27, pp. 21–24. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11745-9_4

    CrossRef  Google Scholar 

  42. Bosley, J.C., et al.: Decoding twitter: Surveillance and trends for cardiac arrest and resuscitation communication. Resuscitation 84(2), 206–212 (2013)

    CrossRef  Google Scholar 

  43. Culotta, A.: Lightweight methods to estimate influenza rates and alcohol sales volume from twitter messages. Lang. Resour. Eval. 47(1), 217–238 (2013)

    CrossRef  Google Scholar 

  44. Cobb, N.K., Graham, A.L., Byron, M.J., Niaura, R.S., Abrams, D.B., Participants, W.: Online social networks and smoking cessation: a scientific research agenda. J. Med. Internet Res. 13(4) (2011)

    Google Scholar 

  45. Paul, M.J., Dredze, M.: Drug extraction from the web: Summarizing drug experiences with multi-dimensional topic models. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 168–178 (2013)

    Google Scholar 

  46. Golder, S.A., Macy, M.W.: Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science 333(6051), 1878–1881 (2011)

    CrossRef  Google Scholar 

  47. Odlum, M., Yoon, S.: What can we learn about the ebola outbreak from tweets? Am. J. Infect. Control. 43(6), 563–571 (2015)

    CrossRef  Google Scholar 

  48. Paul, M.J., Dredze, M.: Discovering health topics in social media using topic models. PloS one 9(8), e103408 (2014)

    CrossRef  Google Scholar 

  49. Paul, M.J., Dredze, M.: You are what you tweet: analyzing twitter for public health. Icwsm 20, 265–272 (2011)

    Google Scholar 

  50. Allergic\_rhinitis. https://en.wikipedia.org/wiki/Allergic_rhinitis

  51. Allergy\_cosmos. https://www.allergycosmos.co.uk/blog/why-is-my-hay-fever-worse-when-it-rains/

  52. Silver, J.D., et al.: Seasonal asthma in Melbourne, Australia, and some observations on the occurrence of thunderstorm asthma and its predictability. PloS one 13(4), e0194929 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sudha Subramani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Subramani, S., Michalska, S., Wang, H., Whittaker, F., Heyward, B. (2018). Text Mining and Real-Time Analytics of Twitter Data: A Case Study of Australian Hay Fever Prediction. In: Siuly, S., Lee, I., Huang, Z., Zhou, R., Wang, H., Xiang, W. (eds) Health Information Science. HIS 2018. Lecture Notes in Computer Science(), vol 11148. Springer, Cham. https://doi.org/10.1007/978-3-030-01078-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01078-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01077-5

  • Online ISBN: 978-3-030-01078-2

  • eBook Packages: Computer ScienceComputer Science (R0)