Identifying Topical Shifts in Twitter Streams: An Integration of Non-negative Matrix Factorisation, Sentiment Analysis and Structural Break Models for Large Scale Data

Luber, Mattias; Weisser, Christoph; Säfken, Benjamin; Silbersdorff, Alexander; Kneib, Thomas; Kis-Katos, Krisztina

doi:10.1007/978-3-030-87031-7_3

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12887))

Included in the following conference series:

Multidisciplinary International Symposium on Disinformation in Open Online Media

986 Accesses
1 Citations
2 Altmetric

Abstract

We propose an integration of Non-negative Matrix Factorisation, Sentiment analysis and Structural Break Models to identify significant topical shifts on the social media platform Twitter. For the topic modelling, we compare Latent Dirichlet Allocation and Non-negative Matrix Factorization in terms of their applicability to short text documents. The extraction of sentiment is done by the rule-based VADER model. Structural breaks in the relative frequency and daily sentiments of topics over time are identified with the Bai-Perron model. Combining these methods, we provide a valuable and easy to use exploratory tool for social scientists to study the discourse on Twitter over time. Detecting statistically significant shifts in topics over time enables researchers to perform statistical inference and test hypotheses about the discourse on Twitter. The framework is implemented efficiently to ensure that it can be used on average consumer hardware in a reasonable amount of time. A case study with COVID-19 related tweets in the UK is provided. Our method is validated by linking the topical shifts to real world events by the use of the timestamps of the COVID-19 related tweets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adedoyin-Olowe, M., Gaber, M.M., Dancausa, C.M., Stahl, F., Gomes, J.B.: A rule dynamics approach to event detection in twitter with its application to sports and politics. Expert Syst. Appl. 55, 351–360 (2016)
Article Google Scholar
Andry, A., Wirawan, R., Adhi, N.D.D., Farhan, R., Siti, S.: Dynamic large scale data on twitter using sentiment analysis and topic modeling. In: 2018 6th International Conference on Information and Communication Technology (ICoICT), pp. 254–258 (2018). https://doi.org/10.1109/ICoICT.2018.8528776
Antonakaki, D., Fragopoulou, P., Ioannidis, S.: A survey of twitter research: data model, graph structure, sentiment analysis and attacks. Expert Syst. Appl. 164, 114006 (2021)
Article Google Scholar
Bahja, M., Lycett, M.: Identifying patient experience from online resources via sentiment analysis and topic modelling. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2016, New York, NY, USA, pp. 94–99. Association for Computing Machinery (2016). https://doi.org/10.1145/3006299.3006335
Bai, J., Perron, P.: Estimating and testing linear models with multiple structural changes. Econometrica 66(1), 47–78 (1998)
Article MathSciNet Google Scholar
Bai, J., Perron, P.: Computation and analysis of multiple structural change models. J. Appl. Economet. 18(1), 1–22 (2003). https://doi.org/10.1002/jae.659
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(null), 993–1022 (2003)
MATH Google Scholar
Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recogn. 41(4), 1350–1362 (2008)
Article Google Scholar
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the beta-divergence. CoRR abs/1010.1763 (2010)
Google Scholar
Ding, C., He, X., Simon, H.D.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM International Conference on Data Mining (SDM), pp. 606–610. https://doi.org/10.1137/1.9781611972757.70
Giachanou, A., Crestani, F.: Like it or not: a survey of twitter sentiment analysis methods. ACM Comput. Surv. 49(2) (2016). https://doi.org/10.1145/2938640
Hutto, C., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 8, no. 1 (2014). https://ojs.aaai.org/index.php/ICWSM/article/view/14550
Johnson, K., Jin, D., Goldwasser, D.: Modeling of political discourse framing on twitter. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 11, no. 1, May 2017. https://ojs.aaai.org/index.php/ICWSM/article/view/14958
Keller, F.B., Schoch, D., Stier, S., Yang, J.: Political astroturfing on twitter: how to coordinate a disinformation campaign. Polit. Commun. 37(2), 256–280 (2020)
Article Google Scholar
Lu, H., Fu, Z., Shu, X.: Non-negative and sparse spectral clustering. Pattern Recogn. 47(1), 418–426 (2014)
Article Google Scholar
Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: Jones, G.J., Sheridan, P., Kelly, D., de Rijke, M., Sakai, T. (eds.) Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 889–892. ACM (2013). https://doi.org/10.1145/2484028.2484166
Mohammad, S.M., Bravo-Marquez, F., Salameh, M., Kiritchenko, S.: SemEval-2018 task 1: affect in tweets. In: Proceedings of International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, USA (2018)
Google Scholar
Patil, P.P., Phansalkar, S., Kryssanov, V.V.: Topic modelling for aspect-level sentiment analysis. In: Kulkarni, A.J., Satapathy, S.C., Kang, T., Kashan, A.H. (eds.) Proceedings of the 2nd International Conference on Data Engineering and Communication Technology. AISC, vol. 828, pp. 221–229. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1610-4_23
Chapter Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Rill, S., Reinel, D., Scheidt, J., Zicari, R.V.: PoliTwi: early detection of emerging political topics on twitter and the impact on concept-level sentiment analysis. Knowl.-Based Syst. 69, 24–33 (2014)
Article Google Scholar
Roesslein, J.: Tweepy: Twitter for Python! (2020). https://github.com/tweepy/tweepy
Severyn, A., Moschitti, A.: Twitter sentiment analysis with deep convolutional neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, New York, NY, USA, pp. 959–962. Association for Computing Machinery (2015). https://doi.org/10.1145/2766462.2767830
Siegrist, M., Luchsinger, L., Bearth, A.: The impact of trust and risk perception on the acceptance of measures to reduce COVID-19 cases. Risk Anal. (2021). https://doi.org/10.1111/risa.13675
Article Google Scholar
Siegrist, M., Zingg, A.: The role of public trust during pandemics. Eur. Psychol. 19(1), 23–32 (2014). https://doi.org/10.1027/1016-9040/a000169
Article Google Scholar
Soares, F.B., Recuero, R., Zago, G.: Influencers in polarized political networks on twitter. In: Proceedings of the 9th International Conference on Social Media and Society, SMSociety 2018, New York, NY, USA, pp. 168–177. Association for Computing Machinery (2018). https://doi.org/10.1145/3217804.3217909
Suri, P., Roy, N.R.: Comparison between LDA & NMF for event-detection from large text stream data. In: 2017 3rd International Conference on Computational Intelligence and Communication Technology (CICT), pp. 1–5. IEEE (09022017-10022017). https://doi.org/10.1109/CIACT.2017.7977281
Yaqub, U., Chun, S.A., Atluri, V., Vaidya, J.: Analysis of political discourse on twitter in the context of the 2016 US presidential elections. Gov. Inf. Q. 34(4), 613–626 (2017)
Article Google Scholar
Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242 (2014)
Google Scholar
Chen, Y., Zhang, H., Liu, R., Ye, Z., Lin, J.: Experimental explorations on short text topic mining between LDA and NMF based schemes. Knowl.-Based Syst. 163, 1–13 (2019)
Article Google Scholar
Zeileis, A., Kleiber, C., Krämer, W., Hornik, K.: Testing and dating of structural changes in practice. Comput. Stat. Data Anal. 44, 109–123 (2003)
Article MathSciNet Google Scholar
Zeileis, A., Leisch, F., Hornik, K., Kleiber, C.: strucchange: an R package for testing for structural change in linear regression models. J. Stat. Softw. 7(2), 1–38 (2002). http://www.jstatsoft.org/v07/i02/
Zhou, X., Chen, L.: Event detection over twitter social media streams. VLDB J. 23(3), 381–400 (2013). https://doi.org/10.1007/s00778-013-0320-3
Article Google Scholar
Zimbra, D., Abbasi, A., Zeng, D., Chen, H.: The state-of-the-art in twitter sentiment analysis: a review and benchmark evaluation. ACM Trans. Manage. Inf. Syst. 9(2) (2018). https://doi.org/10.1145/3185045

Download references

Author information

Authors and Affiliations

Georg-August-Universität Göttingen, Göttingen, Germany
Mattias Luber, Christoph Weisser, Benjamin Säfken, Alexander Silbersdorff, Thomas Kneib & Krisztina Kis-Katos
Campus-Institut Data Science (CIDAS), Göttingen, Germany
Christoph Weisser, Benjamin Säfken, Alexander Silbersdorff, Thomas Kneib & Krisztina Kis-Katos

Authors

Mattias Luber
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Weisser
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Säfken
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Silbersdorff
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Kneib
View author publications
You can also search for this author in PubMed Google Scholar
Krisztina Kis-Katos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christoph Weisser .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Jonathan Bright
Utrecht University, Utrecht, The Netherlands
Anastasia Giachanou
University of Leeds, Leeds, UK
Viktoria Spaiser
Boise State University, Boise, ID, USA
Francesca Spezzano
University of Oxford, Oxford, UK
Anna George
University of Oxford, Oxford, UK
Alexandra Pavliuc

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luber, M., Weisser, C., Säfken, B., Silbersdorff, A., Kneib, T., Kis-Katos, K. (2021). Identifying Topical Shifts in Twitter Streams: An Integration of Non-negative Matrix Factorisation, Sentiment Analysis and Structural Break Models for Large Scale Data. In: Bright, J., Giachanou, A., Spaiser, V., Spezzano, F., George, A., Pavliuc, A. (eds) Disinformation in Open Online Media. MISDOOM 2021. Lecture Notes in Computer Science(), vol 12887. Springer, Cham. https://doi.org/10.1007/978-3-030-87031-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-87031-7_3
Published: 15 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87030-0
Online ISBN: 978-3-030-87031-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics