Abstract
The widespread presence of offensive content is a major issue in social media. This has motivated the development of computational models to identify such content in posts or conversations. Most of these models, however, treat offensive language identification as an isolated task. Very recently, a few datasets have been annotated with post-level offensiveness and related phenomena, such as offensive tokens, humor, engaging content, etc., creating the opportunity of modeling related tasks jointly which will help improve the explainability of offensive language detection systems and potentially aid human moderators. This study proposes a novel multi-task learning (MTL) architecture that can predict: (1) offensiveness at both post and token levels in English; and (2) offensiveness and related subjective tasks such as humor, engaging content, and gender bias identification in multilingual settings. Our results show that the proposed multi-task learning architecture outperforms current state-of-the-art methods trained to identify offense at the post level. We further demonstrate that MTL outperforms single-task learning (STL) across different tasks and language combinations.
Similar content being viewed by others
Data availability
Data is available at https://github.com/imdiptanu/MAD.
Code availability
Code is available at https://github.com/imdiptanu/MAD.
References
Abdi, S., Bagherzadeh, J., Gholami, G., & Tajbakhsh, M. S. (2021). Using an auxiliary dataset to improve emotion estimation in users’ opinions. Journal of Intelligent Information Systems, 56(3), 581–603. https://doi.org/10.1007/s10844-021-00643-y
Abu Farha, I., & Magdy, W. (2020). Multitask learning for Arabic offensive language and hate-speech detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 86–90). Marseille, France: European Language Resource Association. https://aclanthology.org/2020.osact-1.14
Antoun, W., Baly, F., & Hajj, H. (2020). AraBERT: Transformer-based model for Arabic language understanding. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 9–15). Marseille, France: European Language Resource Association. https://aclanthology.org/2020.osact-1.2
Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Rangel Pardo, F. M., Rosso, P., & Sanguinetti, M. (2019). SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation (pp. 54–63). Minneapolis, Minnesota, USA: Association for Computational Linguistics. https://doi.org/10.18653/v1/S19-2007, https://aclanthology.org/S19-2007
Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75. https://doi.org/10.1023/A:1007379606734
Casavantes, M., Aragón, M. E., González, L. C., & Montes-y Gómez, M. (2023). Leveraging posts’ and authors’ metadata to spot several forms of abusive comments in twitter. Journal of Intelligent Information Systems. https://doi.org/10.1007/s10844-023-00779-z
Castro, S., Hazarika, D., Pérez-Rosas, V., Zimmermann, R., Mihalcea, R., & Poria, S. (2019). Towards multimodal sarcasm detection (an _Obviously_ perfect paper). In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 4619–4629). Florence, Italy: Association for Computational Linguistics. https://aclanthology.org/P19-1455
Chan, B., Schweter, S., & Möller, T. (2020). German’s next language model. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 6788–6796). Barcelona, Spain (Online): International Committee on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.598, https://aclanthology.org/2020.coling-main.598
Chang, J. P., Cheng, J., & Danescu-Niculescu-Mizil, C. (2020). Don’t let me be misunderstood:comparing intentions and perceptions in online discussions. In Proceedings of The Web Conference 2020, WWW ’20 (pp. 2066–2077). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3366423.3380273
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08 (pp. 160–167). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/1390156.1390177
Çöltekin, Ç. (2020). A corpus of Turkish offensive language on social media. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 6174–6184). Marseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.758
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 8440–8451). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.747, https://aclanthology.org/2020.acl-main.747
Dai, W., Yu, T., Liu, Z., & Fung, P. (2020). Kungfupanda at SemEval-2020 task 12: BERT-based multi-TaskLearning for offensive language detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation pages 2060–2066, Barcelona (online). International Committee for Computational Linguistics. https://doi.org/10.18653/v1/2020.semeval-1.272, https://aclanthology.org/2020.semeval-1.272
Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., & Sen, P. (2020). A survey of the state of explainable AI for natural language processing. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (pp. 447–459). Suzhou, China: Association for Computational Linguistics. https://aclanthology.org/2020.aacl-main.46
Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated hate speech detection and the problem of offensive language. Proceedings of the International AAAI Conference on Web and Social Media, 11(1), 512–515. https://doi.org/10.1609/icwsm.v11i1.14955
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Minneapolis, Minnesota: Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Djandji, M., Baly, F., Antoun, W., & Hajj, H. (2020). Multi-task learning using AraBert for offensive language detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 97–101). Marseille, France: European Language Resource Association. https://aclanthology.org/2020.osact-1.16
Ein-Dor, L., Halfon, A., Gera, A., Shnarch, E., Dankin, L., Choshen, L., Danilevsky, M., Aharonov, R., Katz, Y., & Slonim, N. (2020). Active Learning for BERT: An Empirical Study. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 7949–7962). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.638, https://aclanthology.org/2020.emnlp-main.638
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
Kakwani, D., Kunchukuttan, A., Golla, S., N.C., G., Bhattacharyya, A., Khapra, M. M., & Kumar, P. (2020). IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 4948–4961). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.445, https://aclanthology.org/2020.findings-emnlp.445
Kanfoud, M. R., & Bouramoul, A. (2022). Senticode: A new paradigm for one-time training and global prediction in multilingual sentiment analysis. Journal of Intelligent Information Systems, 59(2), 501–522. https://doi.org/10.1007/s10844-022-00714-8
Kumar, R., Ojha, A. K., Malmasi, S., & Zampieri, M. (2018). Benchmarking aggression identification in social media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018) (pp. 1–11). Santa Fe, New Mexico, USA: Association for Computational Linguistics. https://aclanthology.org/W18-4401
Kumar, R., Ojha, A. K., Malmasi, S., & Zampieri, M. (2020). Evaluating aggression identification in social media. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (pp. 1–5). Marseille, France: European Language Resources Association (ELRA). https://aclanthology.org/2020.trac-1.1
Kumar, R., Ratan, S., Singh, S., Nandi, E., Devi, L. N., Bhagat, A., Dawer, Y., Lahiri, B., & Bansal, A. (2021). ComMA@ICON: Multilingual gender biased and communal language identification task at ICON-2021. In Proceedings of the 18th International Conference on Natural Language Processing: Shared Task on Multilingual Gender Biased and Communal Language Identification (pp. 1–12). NIT Silchar: NLP Association of India (NLPAI). https://aclanthology.org/2021.icon-multigen.1
Liu, P., Qiu, X., & Huang, X. (2017). Adversarial multi-task learning for text classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1–10). Vancouver, Canada: Association for Computational Linguistics. https://doi.org/10.18653/v1/P17-1001, https://aclanthology.org/P17-1001
Liu, X., He, P., Chen, W., & Gao, J. (2019a). Multi-task deep neural networks for natural language understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 4487–4496), Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1441, https://aclanthology.org/P19-1441
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019b). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692.
Mathew, B., Saha, P., Yimam, S. M., Biemann, C., Goyal, P., & Mukherjee, A. (2021). Hatexplain: A benchmark dataset for explainable hate speech detection. Proceedings of the AAAI Conference on Artificial Intelligence, 35(17), 14867–14875. https://doi.org/10.1609/aaai.v35i17.17745
Meaney, J. A., Wilson, S., Chiruzzo, L., Lopez, A., & Magdy, W. (2021a). SemEval 2021 task 7: HaHackathon, detecting and rating humor and offense. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021) (pp. 105–119). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.semeval-1.9, https://aclanthology.org/2021.semeval-1.9
Meaney, J. A., Wilson, S., Chiruzzo, L., Lopez, A., & Magdy, W. (2021b). SemEval 2021 task 7: HaHackathon, detecting and rating humor and offense. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021) (pp. 105–119). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.semeval-1.9, https://aclanthology.org/2021.semeval-1.9
Modha, S., Mandl, T., Shahi, G. K., Madhu, H., Satapara, S., Ranasinghe, T., & Zampieri, M. (2022). Overview of the hasoc subtrack at fire 2021: Hate speech and offensive content identification in english and indo-aryan languages and conversational hate speech. In Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE ’21 (pp. 1–3). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3503162.3503176
Mosbach, M., Andriushchenko, M., & Klakow, D. (2021). On the stability of fine-tuning BERT: Misconceptions, explanations, and strong baselines. In International Conference on Learning Representations. https://openreview.net/forum?id=nzpLWnVAyah
Mubarak, H., Darwish, K., Magdy, W., Elsayed, T., & Al-Khalifa, H. (2020). Overview of OSACT4 Arabic offensive language detection shared task. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 48–52). Marseille, France: European Language Resource Association. https://aclanthology.org/2020.osact-1.7
Mubarak, H., Rashed, A., Darwish, K., Samih, Y., & Abdelali, A. (2021). Arabic offensive language on Twitter: Analysis and experiments. In Proceedings of the Sixth Arabic Natural Language Processing Workshop (pp. 126–135). Kyiv, Ukraine (Virtual): Association for Computational Linguistics. https://aclanthology.org/2021.wanlp-1.13
Nelatoori, K. B., & Kommanti, H. B. (2022). Multi-task learning for toxic comment classification and rationale extraction. Journal of Intelligent Information Systems. https://doi.org/10.1007/s10844-022-00726-4
Pandey, R., & Singh, J. P. (2023). Bert-lstm model for sarcasm detection in code-mixed social media post. Journal of Intelligent Information Systems, 60(1), 235–254. https://doi.org/10.1007/s10844-022-00755-z
Pitenis, Z., Zampieri, M., Ranasinghe, T. (2020) Offensive language identification in Greek. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 5113–5119). Marseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.629
Ranasinghe, T., & Zampieri, M. (2020) Multilingual offensive language identification with cross-lingual embeddings. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 5838–5844). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.470, https://aclanthology.org/2020.emnlp-main.470
Ranasinghe, T., & Zampieri, M. (2021). MUDES: Multilingual detection of offensive spans. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations (pp. 144–152). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-demos.17, https://aclanthology.org/2021.naacl-demos.17
Risch, J., & Krestel, R. (2020). Bagging BERT models for robust aggression identification. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (pp. 55–61). Marseille, France: European Language Resources Association (ELRA). https://aclanthology.org/2020.trac-1.9
Risch, J., Stoll, A., Wilms, L., & Wiegand, M. (2021). Overview of the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments. In Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments (pp. 1–12). Duesseldorf, Germany: Association for Computational Linguistics. https://aclanthology.org/2021.germeval-1.1
Rosa, H., Pereira, N., Ribeiro, R., Ferreira, P. C., Carvalho, J. P., Oliveira, S., Coheur, L., Paulino, P., Simão, A. V., & Trancoso, I. (2019). Automatic cyberbullying detection: A systematic review. Computers in Human Behavior, 93, 333–345.
Rosenthal, S., Atanasova, P., Karadzhov, G., Zampieri, M., & Nakov, P. (2021). SOLID: A large-scale semi-supervised dataset for offensive language identification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 915–928). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.findings-acl.80, https://aclanthology.org/2021.findings-acl.80
Sarkar, D. (2021). An empirical study of offensive language in online interactions.
Sarkar, D., Zampieri, M., Ranasinghe, T., & Ororbia, A. (2021). fBERT: A neural transformer for identifying offensive content. In Findings of the Association for Computational Linguistics: EMNLP 2021 (pp. 1792–1798). Punta Cana, Dominican Republic: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.findings-emnlp.154, https://aclanthology.org/2021.findings-emnlp.154
Satapara, S., Majumder, P., Mandl, T., Modha, S., Madhu, H., Ranasinghe, T., Zampieri, M., North, K., & Premasiri, D. (2023). Overview of the hasoc subtrack at fire 2022: Hate speech and offensive content identification in english and indo-aryan languages. In Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation FIRE ’22, (pp. 4–7). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3574318.3574326
Schwartz, R., Dodge, J., Smith, N. A., & Etzioni, O. (2020). Green ai. Commun ACM, 63(12), 54–63. https://doi.org/10.1145/3381831
Sellam, T., Yadlowsky, S., Tenney, I., Wei, J., Saphra, N., D’Amour, A., Linzen, T., Bastings, J., Turc, I. R., Eisenstein, J., Das, D., & Pavlick, E. (2022). The multiBERTs: BERT reproductions for robustness analysis. In International Conference on Learning Representations. https://openreview.net/forum?id=K0E_F0gFDgA
Sigurbergsson, G. I., & Derczynski, L. (2020). Offensive language and hate speech detection for Danish. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 3498–3508). Marseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.430
Skenduli, M. P., Biba, M., Loglisci, C., Ceci, M., & Malerba, D. (2021). Mining emotion-aware sequential rules at user-level from micro-blogs. Journal of Intelligent Information Systems, 57(2), 369–394. https://doi.org/10.1007/s10844-021-00647-8
Talat, Z., Thorne, J., & Bingel, J. (2018). Bridging the Gaps: Multi Task Learning for Domain Transfer of Hate Speech Detection (pp. 29–55). https://doi.org/10.1007/978-3-319-78583-7_3
Vohra, A., & Garg, R. (2023). Deep learning based sentiment analysis of public perception of working from home through tweets. Journal of Intelligent Information Systems, 60(1), 255–274. https://doi.org/10.1007/s10844-022-00736-2
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., & Kumar, R. (2019a). Predicting the type and target of offensive posts in social media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 1415–1420). Minneapolis, Minnesota: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.findings-acl.80, https://aclanthology.org/2021.findings-acl.80
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., & Kumar, R. (2019b). SemEval-2019 task 6: Identifying and categorizing offensive language in social media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation (pp. 75–86). Minneapolis, Minnesota, USA: Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1144, https://aclanthology.org/N19-1144
Zampieri, M., Nakov, P., Rosenthal, S., Atanasova, P., Karadzhov, G., Mubarak, H., Derczynski, L., Pitenis, Z., & Çöltekin, Ç. (2020). SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval 2020). In Proceedings of the Fourteenth Workshop on Semantic Evaluation (pp. 1425–1447). Barcelona (online): International Committee for Computational Linguistics. https://doi.org/10.18653/v1/2020.semeval-1.188, https://aclanthology.org/2020.semeval-1.188
Zhang, J., Chang, J., Danescu-Niculescu-Mizil, C., Dixon, L., Hua, Y., Taraborelli, D., & Thain, N. (2018). Conversations gone awry: Detecting early signs of conversational failure. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1350–1361). Melbourne, Australia: Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-1125, https://aclanthology.org/P18-1125
Zhang, Y., & Yang, Q. (2022). A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, 34(12), 5586–5609. https://doi.org/10.1109/TKDE.2021.3070203
Zhao, X., Li, H., Shen, X., Liang, X., & Wu, Y. (2018). A modulation module for multi-task learning with applications in image retrieval. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Computer Vision - ECCV 2018 (pp. 415–432). Cham: Springer International Publishing.
Acknowledgements
We would like to thank the creators of the datasets for making them available.
Author information
Authors and Affiliations
Contributions
Marcos Zampieri - Problem formulation, Conducting experiments, Writing, Supervising. Tharindu Ranasinghe - Coding, Conducting experiments, Writing. Diptanu Sarkar - Coding, Conducting experiments, Writing. Alex Ororbia - Problem formulation, Writing, Supervising.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no confict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zampieri, M., Ranasinghe, T., Sarkar, D. et al. Offensive language identification with multi-task learning. J Intell Inf Syst 60, 613–630 (2023). https://doi.org/10.1007/s10844-023-00787-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-023-00787-z