Abstract
Social media platforms (like Twitter) positively and negatively impact users in diverse societies; one of Twitter’s negative effects is the usage of hate and offensive language. Hate speech fosters prejudice; it also harms the vulnerable. There are always emotions associated with hateful and offensive actions. This work addressed hate and offensive tweet detection, low-level emotional classifications using 28 labels to train transformers models in three ways (model 1—\({BERT}_{{G28}}\), model 2—\({BERT}_{{G27}}\), and model 3—\({RoBERTa}_{{G27}}\)) before predicting the hateful and offensive tweets emotions. Model 1 was trained on low-level labels, and models 2 and 3 were trained on 27 labels excluding the neutral label. This study performed topic modeling to extract the discussed theme, spatiotemporal trends to determine where and when these tweets occurred, and event summarization for identified hate and offensive tweets. GoEmotions and Ekman were used for direct and indirect assessment, respectively, to evaluate the model’s precision, recall, and F1-score. In terms of precision evaluation, the model 1 outperformed Google Research on GoEmotions. Furthermore, this study’s model 2 and model 3 outperformed the Google research on both the GoEmotions and Ekman’s evaluation in terms of precision and F1-score. Generally, model 2 was the best model in the analysis for both recall and F1-score while model 3 performed better for precision. Due to the training on samples without the neutral label, model 2 obtained 27% and model 3 achieved 29% label prediction out of the 30% neutral samples that was predicted in model 1 for hate and offensive tweets. This is a significant improvement to optimize classified emotions that are not truly neutral by eliminating the false neutral class.
Similar content being viewed by others
Notes
References
Adwan OY, Al-Tawil M, Huneiti A et al (2020) Twitter sentiment analysis approaches: a survey. Int J Emerg Technol Learn 15(15):79
Agarwal A, Salehundam P, Padhee S, et al (2020) Leveraging natural language processing to mine issues on twitter during the COVID-19 pandemic. In: 2020 IEEE International conference on big data (Big Data). IEEE
Allahyari M, Pouriyeh S, Assefi M, et al (2017) Text summarization techniques: a brief survey. https://doi.org/10.48550/ARXIV.1707.02268,
Alsaeedi A, Zubair M (2019) A study on sentiment analysis techniques of twitter data. Int J Adv Comput Sci Appl 10(2):361
Alshalan R, Al-Khalifa H, Alsaeed D et al (2020) Detection of hate speech in covid-19-related tweets in the arab region: deep learning and topic modeling approach. J Med Internet Res 22(12):609. https://doi.org/10.2196/22609
Bogdanowicz A, Guan C (2022) Dynamic topic modeling of twitter data during the COVID-19 pandemic. PLoS One 17(5):e0268669
Calabrese A, Bevilacqua M, Ross B, et al (2021) AAA: fair evaluation for abuse detection systems wanted. In: 13th ACM Web science conference 2021. ACM, New York, NY, USA
Chakrabarti D, Punera K (2021) Event summarization using tweets. Proc Int AAAI Conf Web Social Media 5(1):66–73
Chiril P, Pamungkas EW, Benamara F et al (2022) Emotionally informed hate speech detection: a multi-target perspective. Cognit Comput 14(1):322–352
Davidson T, Warmsley D, Macy M et al (2017) Automated hate speech detection and the problem of offensive language. Proc Int AAAI Conf Web Social Media 11(1):512–515
Demszky D, Movshovitz-Attias D, Ko J, et al (2020) Goemotions: a dataset of fine-grained emotions. 2005.00547
Devlin J, Chang MW, Lee K, et al (2018) Bert: pre-training of deep bidirectional transformers for language understanding. https://doi.org/10.48550/ARXIV.1810.04805
Draw.io J (2005) Jgraph/drawio: Draw.io is a javascript, client-side editor for general diagramming and whiteboarding. https://github.com/jgraph/drawio
Dusart A, Pinel-Sauvagnat K, Hubert G (2021) Tssubert: tweet stream summarization using bert. arxiv:2106.08770
Egger R, Yu J (2022) A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify twitter posts. Front Sociol 7:886498
George S, Vasudevan S (2021) Comparison of LDA and NMF topic modeling techniques for restaurant reviews. Indian J Nat Sci 10(62):28210
Grant CE, George CP, Jenneisch C, et al (2011) Online topic modeling for real-time twitter search. In: text retrieval conference
Grootendorst M (2022) Bertopic: neural topic modeling with a class-based tf-idf procedure. https://doi.org/10.48550/ARXIV.2203.05794,
Gupta S, Kaur M, Lakra S (2021) BERT-BU12 hate speech detection using bidirectional encoder-decoder. Int J Syst Dyn Appl 11(2):1–16
Hardage D, Najafirad P (2020) Hate and toxic speech detection in the context of covid-19 pandemic using XAI: Ongoing applied research. In: Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020. Association for Computational Linguistics, Stroudsburg, PA, USA
Kabir MY, Madria S (2021) EMOCOV: machine learning for emotion detection, analysis and visualization using COVID-19 tweets. Online Soc Netw Media 23(100135):100–135
Kabir MY, Madria S (2022) A deep learning approach for ideology detection and polarization analysis using Covid-19 tweets. In: Ralyté J, Chakravarthy S, Mohania M et al (eds) Conceptual modeling. Springer, Cham, pp 209–223
Kamal O, Kumar A, Vaidhya T (2021) Hostility detection in hindi leveraging pre-trained language models. arxiv:2101.05494
Li Q, Zhang Q (2021) Twitter event summarization by exploiting semantic terms and graph network. Proc Conf AAAI Artif Intell 35(17):347–354
Liu Y, Ott M, Goyal N, et al (2019) Roberta: a robustly optimized bert pretraining approach. https://doi.org/10.48550/ARXIV.1907.11692,
Qomariyah S, Iriawan N, Fithriasari K (2019) Topic modeling twitter data using latent dirichlet allocation and latent semantic analysis. In: The 2nd international conference on science, mathematics, environment, and education. AIP Publishing
Qureshi KA, Sabih M (2021) Un-compromised credibility: social media based multi-class hate speech classification for text. IEEE Access 9:465–477
Rudrapal D, Das A, Bhattacharya B (2018) A survey on automatic twitter event summarization. J Inf Process Syst 14(1):79–100. https://doi.org/10.3745/JIPS.02.0079
Rudrapal D, Das A, Bhattacharya B (2019) A new approach for twitter event summarization based on sentence identification and partial textual entailment. Comput Sist 23(3):1065
Savelieva A, Au-Yeung B, Ramani V (2020) Abstractive summarization of spoken and written instructions with bert. https://doi.org/10.48550/ARXIV.2008.09676,
Schober P, Boer C, Schwarte LA (2018) Correlation coefficients: appropriate use and interpretation. Anesth Analg 126(5):1763–1768. https://doi.org/10.1213/ane.0000000000002864
Shi T, Kang K, Choo J, et al (2018) Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations. In: Proceedings of the 2018 World wide web conference on world wide web—WWW ’18. ACM Press, New York, New York, USA
Silva NFF, Hruschka ER, Hruschka ER (2014) Tweet sentiment analysis with classifier ensembles. Decis Support Syst 66:170–179
Toliyat A, Levitan SI, Peng Z et al (2022) Asian hate speech detection on twitter during COVID-19. Front Artif Intell 5(932):381
Varab D, Schluter N (2020) DaNewsroom: a large-scale Danish summarisation dataset. In: Proceedings of the Twelfth language resources and evaluation conference. European Language Resources Association, Marseille, France, pp 6731–6739, https://aclanthology.org/2020.lrec-1.831
Xiao Z, Song W, Xu H, et al (2020) Timme: Twitter ideology-detection via multi-task multi-relational embedding. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2258–2268
Yadav Y, Bajaj P, Gupta RK, et al (2021) A comparative study of deep learning methods for hate speech and offensive language detection in textual data. In: 2021 IEEE 18th India Council International Conference (INDICON). IEEE
Zhang J, Zhao Y, Saleh M, et al (2019) Pegasus: pre-training with extracted gap-sentences for abstractive summarization. https://doi.org/10.48550/ARXIV.1912.08777, https://arxiv.org/abs/1912.08777
Funding
This work was partially supported by NSF—USA CNS-2219614, CNS-2219615 and the Missouri University of Science and Technology’s Kummer Institute for Student Success, Research and Economic Development through the Kummer Innovation and Entrepreneurship Doctoral Fellowship.
Author information
Authors and Affiliations
Contributions
AA was involved in methodology, formal analysis, software, writing—original draft, writing—review and editing, investigation. SM helped in conceptualization, data curation, writing—review and editing, supervision, funding acquisition, project administration. LN contributed to conceptualization, writing—review and editing, supervision, funding acquisition, project administration.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Adesokan, A., Madria, S. & Nguyen, L. HatEmoTweet: low-level emotion classifications and spatiotemporal trends of hate and offensive COVID-19 tweets. Soc. Netw. Anal. Min. 13, 136 (2023). https://doi.org/10.1007/s13278-023-01132-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-023-01132-6