Skip to main content
Log in

Offensive language identification with multi-task learning

  • Research
  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The widespread presence of offensive content is a major issue in social media. This has motivated the development of computational models to identify such content in posts or conversations. Most of these models, however, treat offensive language identification as an isolated task. Very recently, a few datasets have been annotated with post-level offensiveness and related phenomena, such as offensive tokens, humor, engaging content, etc., creating the opportunity of modeling related tasks jointly which will help improve the explainability of offensive language detection systems and potentially aid human moderators. This study proposes a novel multi-task learning (MTL) architecture that can predict: (1) offensiveness at both post and token levels in English; and (2) offensiveness and related subjective tasks such as humor, engaging content, and gender bias identification in multilingual settings. Our results show that the proposed multi-task learning architecture outperforms current state-of-the-art methods trained to identify offense at the post level. We further demonstrate that MTL outperforms single-task learning (STL) across different tasks and language combinations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

Data is available at https://github.com/imdiptanu/MAD.

Code availability

Code is available at https://github.com/imdiptanu/MAD.

Notes

  1. https://sites.google.com/view/sustainlp2021/home

  2. https://www.aclweb.org/portal/content/efficient-nlp-policy-document

References

  • Abdi, S., Bagherzadeh, J., Gholami, G., & Tajbakhsh, M. S. (2021). Using an auxiliary dataset to improve emotion estimation in users’ opinions. Journal of Intelligent Information Systems, 56(3), 581–603. https://doi.org/10.1007/s10844-021-00643-y

    Article  Google Scholar 

  • Abu Farha, I., & Magdy, W. (2020). Multitask learning for Arabic offensive language and hate-speech detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 86–90). Marseille, France: European Language Resource Association. https://aclanthology.org/2020.osact-1.14

  • Antoun, W., Baly, F., & Hajj, H. (2020). AraBERT: Transformer-based model for Arabic language understanding. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 9–15). Marseille, France: European Language Resource Association. https://aclanthology.org/2020.osact-1.2

  • Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Rangel Pardo, F. M., Rosso, P., & Sanguinetti, M. (2019). SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation (pp. 54–63). Minneapolis, Minnesota, USA: Association for Computational Linguistics. https://doi.org/10.18653/v1/S19-2007, https://aclanthology.org/S19-2007

  • Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75. https://doi.org/10.1023/A:1007379606734

    Article  MathSciNet  Google Scholar 

  • Casavantes, M., Aragón, M. E., González, L. C., & Montes-y Gómez, M. (2023). Leveraging posts’ and authors’ metadata to spot several forms of abusive comments in twitter. Journal of Intelligent Information Systems. https://doi.org/10.1007/s10844-023-00779-z

  • Castro, S., Hazarika, D., Pérez-Rosas, V., Zimmermann, R., Mihalcea, R., & Poria, S. (2019). Towards multimodal sarcasm detection (an _Obviously_ perfect paper). In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 4619–4629). Florence, Italy: Association for Computational Linguistics. https://aclanthology.org/P19-1455

  • Chan, B., Schweter, S., & Möller, T. (2020). German’s next language model. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 6788–6796). Barcelona, Spain (Online): International Committee on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.598, https://aclanthology.org/2020.coling-main.598

  • Chang, J. P., Cheng, J., & Danescu-Niculescu-Mizil, C. (2020). Don’t let me be misunderstood:comparing intentions and perceptions in online discussions. In Proceedings of The Web Conference 2020, WWW ’20 (pp. 2066–2077). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3366423.3380273

  • Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08 (pp. 160–167). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/1390156.1390177

  • Çöltekin, Ç. (2020). A corpus of Turkish offensive language on social media. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 6174–6184). Marseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.758

  • Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 8440–8451). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.747, https://aclanthology.org/2020.acl-main.747

  • Dai, W., Yu, T., Liu, Z., & Fung, P. (2020). Kungfupanda at SemEval-2020 task 12: BERT-based multi-TaskLearning for offensive language detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation pages 2060–2066, Barcelona (online). International Committee for Computational Linguistics. https://doi.org/10.18653/v1/2020.semeval-1.272, https://aclanthology.org/2020.semeval-1.272

  • Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., & Sen, P. (2020). A survey of the state of explainable AI for natural language processing. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (pp. 447–459). Suzhou, China: Association for Computational Linguistics. https://aclanthology.org/2020.aacl-main.46

  • Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated hate speech detection and the problem of offensive language. Proceedings of the International AAAI Conference on Web and Social Media, 11(1), 512–515. https://doi.org/10.1609/icwsm.v11i1.14955

    Article  Google Scholar 

  • Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Minneapolis, Minnesota: Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423

  • Djandji, M., Baly, F., Antoun, W., & Hajj, H. (2020). Multi-task learning using AraBert for offensive language detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 97–101). Marseille, France: European Language Resource Association. https://aclanthology.org/2020.osact-1.16

  • Ein-Dor, L., Halfon, A., Gera, A., Shnarch, E., Dankin, L., Choshen, L., Danilevsky, M., Aharonov, R., Katz, Y., & Slonim, N. (2020). Active Learning for BERT: An Empirical Study. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 7949–7962). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.638, https://aclanthology.org/2020.emnlp-main.638

  • Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).

  • Kakwani, D., Kunchukuttan, A., Golla, S., N.C., G., Bhattacharyya, A., Khapra, M. M., & Kumar, P. (2020). IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 4948–4961). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.445, https://aclanthology.org/2020.findings-emnlp.445

  • Kanfoud, M. R., & Bouramoul, A. (2022). Senticode: A new paradigm for one-time training and global prediction in multilingual sentiment analysis. Journal of Intelligent Information Systems, 59(2), 501–522. https://doi.org/10.1007/s10844-022-00714-8

    Article  Google Scholar 

  • Kumar, R., Ojha, A. K., Malmasi, S., & Zampieri, M. (2018). Benchmarking aggression identification in social media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018) (pp. 1–11). Santa Fe, New Mexico, USA: Association for Computational Linguistics. https://aclanthology.org/W18-4401

  • Kumar, R., Ojha, A. K., Malmasi, S., & Zampieri, M. (2020). Evaluating aggression identification in social media. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (pp. 1–5). Marseille, France: European Language Resources Association (ELRA). https://aclanthology.org/2020.trac-1.1

  • Kumar, R., Ratan, S., Singh, S., Nandi, E., Devi, L. N., Bhagat, A., Dawer, Y., Lahiri, B., & Bansal, A. (2021). ComMA@ICON: Multilingual gender biased and communal language identification task at ICON-2021. In Proceedings of the 18th International Conference on Natural Language Processing: Shared Task on Multilingual Gender Biased and Communal Language Identification (pp. 1–12). NIT Silchar: NLP Association of India (NLPAI). https://aclanthology.org/2021.icon-multigen.1

  • Liu, P., Qiu, X., & Huang, X. (2017). Adversarial multi-task learning for text classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1–10). Vancouver, Canada: Association for Computational Linguistics. https://doi.org/10.18653/v1/P17-1001, https://aclanthology.org/P17-1001

  • Liu, X., He, P., Chen, W., & Gao, J. (2019a). Multi-task deep neural networks for natural language understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 4487–4496), Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1441, https://aclanthology.org/P19-1441

  • Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019b). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692.

  • Mathew, B., Saha, P., Yimam, S. M., Biemann, C., Goyal, P., & Mukherjee, A. (2021). Hatexplain: A benchmark dataset for explainable hate speech detection. Proceedings of the AAAI Conference on Artificial Intelligence, 35(17), 14867–14875. https://doi.org/10.1609/aaai.v35i17.17745

    Article  Google Scholar 

  • Meaney, J. A., Wilson, S., Chiruzzo, L., Lopez, A., & Magdy, W. (2021a). SemEval 2021 task 7: HaHackathon, detecting and rating humor and offense. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021) (pp. 105–119). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.semeval-1.9, https://aclanthology.org/2021.semeval-1.9

  • Meaney, J. A., Wilson, S., Chiruzzo, L., Lopez, A., & Magdy, W. (2021b). SemEval 2021 task 7: HaHackathon, detecting and rating humor and offense. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021) (pp. 105–119). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.semeval-1.9, https://aclanthology.org/2021.semeval-1.9

  • Modha, S., Mandl, T., Shahi, G. K., Madhu, H., Satapara, S., Ranasinghe, T., & Zampieri, M. (2022). Overview of the hasoc subtrack at fire 2021: Hate speech and offensive content identification in english and indo-aryan languages and conversational hate speech. In Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE ’21 (pp. 1–3). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3503162.3503176

  • Mosbach, M., Andriushchenko, M., & Klakow, D. (2021). On the stability of fine-tuning BERT: Misconceptions, explanations, and strong baselines. In International Conference on Learning Representations. https://openreview.net/forum?id=nzpLWnVAyah

  • Mubarak, H., Darwish, K., Magdy, W., Elsayed, T., & Al-Khalifa, H. (2020). Overview of OSACT4 Arabic offensive language detection shared task. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 48–52). Marseille, France: European Language Resource Association. https://aclanthology.org/2020.osact-1.7

  • Mubarak, H., Rashed, A., Darwish, K., Samih, Y., & Abdelali, A. (2021). Arabic offensive language on Twitter: Analysis and experiments. In Proceedings of the Sixth Arabic Natural Language Processing Workshop (pp. 126–135). Kyiv, Ukraine (Virtual): Association for Computational Linguistics. https://aclanthology.org/2021.wanlp-1.13

  • Nelatoori, K. B., & Kommanti, H. B. (2022). Multi-task learning for toxic comment classification and rationale extraction. Journal of Intelligent Information Systems. https://doi.org/10.1007/s10844-022-00726-4

    Article  Google Scholar 

  • Pandey, R., & Singh, J. P. (2023). Bert-lstm model for sarcasm detection in code-mixed social media post. Journal of Intelligent Information Systems, 60(1), 235–254. https://doi.org/10.1007/s10844-022-00755-z

    Article  Google Scholar 

  • Pitenis, Z., Zampieri, M., Ranasinghe, T. (2020) Offensive language identification in Greek. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 5113–5119). Marseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.629

  • Ranasinghe, T., & Zampieri, M. (2020) Multilingual offensive language identification with cross-lingual embeddings. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 5838–5844). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.470, https://aclanthology.org/2020.emnlp-main.470

  • Ranasinghe, T., & Zampieri, M. (2021). MUDES: Multilingual detection of offensive spans. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations (pp. 144–152). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-demos.17, https://aclanthology.org/2021.naacl-demos.17

  • Risch, J., & Krestel, R. (2020). Bagging BERT models for robust aggression identification. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (pp. 55–61). Marseille, France: European Language Resources Association (ELRA). https://aclanthology.org/2020.trac-1.9

  • Risch, J., Stoll, A., Wilms, L., & Wiegand, M. (2021). Overview of the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments. In Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments (pp. 1–12). Duesseldorf, Germany: Association for Computational Linguistics. https://aclanthology.org/2021.germeval-1.1

  • Rosa, H., Pereira, N., Ribeiro, R., Ferreira, P. C., Carvalho, J. P., Oliveira, S., Coheur, L., Paulino, P., Simão, A. V., & Trancoso, I. (2019). Automatic cyberbullying detection: A systematic review. Computers in Human Behavior, 93, 333–345.

    Article  Google Scholar 

  • Rosenthal, S., Atanasova, P., Karadzhov, G., Zampieri, M., & Nakov, P. (2021). SOLID: A large-scale semi-supervised dataset for offensive language identification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 915–928). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.findings-acl.80, https://aclanthology.org/2021.findings-acl.80

  • Sarkar, D. (2021). An empirical study of offensive language in online interactions.

  • Sarkar, D., Zampieri, M., Ranasinghe, T., & Ororbia, A. (2021). fBERT: A neural transformer for identifying offensive content. In Findings of the Association for Computational Linguistics: EMNLP 2021 (pp. 1792–1798). Punta Cana, Dominican Republic: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.findings-emnlp.154, https://aclanthology.org/2021.findings-emnlp.154

  • Satapara, S., Majumder, P., Mandl, T., Modha, S., Madhu, H., Ranasinghe, T., Zampieri, M., North, K., & Premasiri, D. (2023). Overview of the hasoc subtrack at fire 2022: Hate speech and offensive content identification in english and indo-aryan languages. In Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation FIRE ’22, (pp. 4–7). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3574318.3574326

  • Schwartz, R., Dodge, J., Smith, N. A., & Etzioni, O. (2020). Green ai. Commun ACM, 63(12), 54–63. https://doi.org/10.1145/3381831

    Article  Google Scholar 

  • Sellam, T., Yadlowsky, S., Tenney, I., Wei, J., Saphra, N., D’Amour, A., Linzen, T., Bastings, J., Turc, I. R., Eisenstein, J., Das, D., & Pavlick, E. (2022). The multiBERTs: BERT reproductions for robustness analysis. In International Conference on Learning Representations. https://openreview.net/forum?id=K0E_F0gFDgA

  • Sigurbergsson, G. I., & Derczynski, L. (2020). Offensive language and hate speech detection for Danish. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 3498–3508). Marseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.430

  • Skenduli, M. P., Biba, M., Loglisci, C., Ceci, M., & Malerba, D. (2021). Mining emotion-aware sequential rules at user-level from micro-blogs. Journal of Intelligent Information Systems, 57(2), 369–394. https://doi.org/10.1007/s10844-021-00647-8

    Article  Google Scholar 

  • Talat, Z., Thorne, J., & Bingel, J. (2018). Bridging the Gaps: Multi Task Learning for Domain Transfer of Hate Speech Detection (pp. 29–55). https://doi.org/10.1007/978-3-319-78583-7_3

  • Vohra, A., & Garg, R. (2023). Deep learning based sentiment analysis of public perception of working from home through tweets. Journal of Intelligent Information Systems, 60(1), 255–274. https://doi.org/10.1007/s10844-022-00736-2

    Article  Google Scholar 

  • Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., & Kumar, R. (2019a). Predicting the type and target of offensive posts in social media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 1415–1420). Minneapolis, Minnesota: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.findings-acl.80, https://aclanthology.org/2021.findings-acl.80

  • Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., & Kumar, R. (2019b). SemEval-2019 task 6: Identifying and categorizing offensive language in social media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation (pp. 75–86). Minneapolis, Minnesota, USA: Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1144, https://aclanthology.org/N19-1144

  • Zampieri, M., Nakov, P., Rosenthal, S., Atanasova, P., Karadzhov, G., Mubarak, H., Derczynski, L., Pitenis, Z., & Çöltekin, Ç. (2020). SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval 2020). In Proceedings of the Fourteenth Workshop on Semantic Evaluation (pp. 1425–1447). Barcelona (online): International Committee for Computational Linguistics. https://doi.org/10.18653/v1/2020.semeval-1.188, https://aclanthology.org/2020.semeval-1.188

  • Zhang, J., Chang, J., Danescu-Niculescu-Mizil, C., Dixon, L., Hua, Y., Taraborelli, D., & Thain, N. (2018). Conversations gone awry: Detecting early signs of conversational failure. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1350–1361). Melbourne, Australia: Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-1125, https://aclanthology.org/P18-1125

  • Zhang, Y., & Yang, Q. (2022). A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, 34(12), 5586–5609. https://doi.org/10.1109/TKDE.2021.3070203

    Article  Google Scholar 

  • Zhao, X., Li, H., Shen, X., Liang, X., & Wu, Y. (2018). A modulation module for multi-task learning with applications in image retrieval. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Computer Vision - ECCV 2018 (pp. 415–432). Cham: Springer International Publishing.

    Chapter  Google Scholar 

Download references

Acknowledgements

We would like to thank the creators of the datasets for making them available.

Author information

Authors and Affiliations

Authors

Contributions

Marcos Zampieri - Problem formulation, Conducting experiments, Writing, Supervising. Tharindu Ranasinghe - Coding, Conducting experiments, Writing. Diptanu Sarkar - Coding, Conducting experiments, Writing. Alex Ororbia - Problem formulation, Writing, Supervising.

Corresponding author

Correspondence to Marcos Zampieri.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no confict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zampieri, M., Ranasinghe, T., Sarkar, D. et al. Offensive language identification with multi-task learning. J Intell Inf Syst 60, 613–630 (2023). https://doi.org/10.1007/s10844-023-00787-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-023-00787-z

Keywords

Navigation