Skip to main content
Log in

Data Augmentation for Improving Explainability of Hate Speech Detection

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

The paper presents a novel data augmentation-based approach to develop explainable, deep learning models for hate speech detection. Hate speech is widely prevalent on online social media but difficult to detect automatically due to challenges of natural language processing and complexity of hate speech. Further, the decisions of the existing solutions possess constrained explainability since limited annotated data are available for training and testing of models. Therefore, this work proposes the use of text-based data augmentation for improving the performance and explainability of deep learning models. Techniques based on easy data augmentation, bidirectional encoder representations from transformers and back translation have been utilized for data augmentation. Convolutional neural networks and long short-term memory models are trained with augmented data and evaluated on two publicly available datasets for hate speech detection. Methods of LIME and integrated gradients are used to retrieve explanations of the deep learning models. A diagnostic study is conducted on test samples to check for improvement in the models as a result of the data augmentation. The experimental results verify that the proposed approach improves the explainability as well as the accuracy of hate speech detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

The datasets analyzed during the current study are available at https://www.dropbox.com/s/21wtzy9arc5skr8/ICWSM18 and https://data.mendeley.com/datasets/jf4pzyvnpj/1.

Notes

  1. https://data.mendeley.com/datasets/jf4pzyvnpj/1.

  2. https://www.nltk.org/.

  3. https://nlp.stanford.edu/projects/glove/.

  4. https://www.tensorflow.org/tutorials/images/cnn.

  5. https://www.tensorflow.org/text/tutorials/text_classification_rnn.

  6. https://github.com/marcotcr/lime.

  7. https://github.com/SeldonIO/alibi.

  8. https://github.com/varunkumar-dev/TransformersDataAugmentation.

References

  1. Wright, M.F.: Cyberbullying in cultural context. J. Cross-Cultural Psychol. 48(8), 1136–1137 (2017). https://doi.org/10.1177/0022022117723107

    Article  Google Scholar 

  2. MacAvaney, S.; Yao, H.-R.; Yang, E.; Russell, K.; Goharian, N.; Frieder, O.: Hate speech detection: challenges and solutions. PLoS ONE 14(8), 1–16 (2019). https://doi.org/10.1371/journal.pone.0221152

    Article  CAS  Google Scholar 

  3. Agrawal, S.; Awekar, A.: Deep learning for detecting cyberbullying across multiple social media platforms. arXiv:1801.06482 (2018)

  4. Dadvar, M.; Eckert, K.: Cyberbullying detection in social networks using deep learning based models; a reproducibility study. arxiv:1812.08046 (2018)

  5. Zhang, Z.; Robinson, D.; Tepper, J.A.: Detecting hate speech on twitter using a convolution-gru based deep neural network. In: ESWC (2018)

  6. Phanomtip, A.; Sueb-in, T.; Vittayakorn, S.: Cyberbullying detection on tweets. In: 2021 18th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pp. 295–298 (2021). https://doi.org/10.1109/ECTI-CON51831.2021.9454848

  7. Mishra, P.; Del Tredici, M.; Yannakoudakis, H.; Shutova, E.: Author profiling for abuse detection. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1088–1098. Association for Computational Linguistics, Santa Fe, New Mexico, USA (2018). https://aclanthology.org/C18-1093

  8. Waseem, Z.; Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88–93. Association for Computational Linguistics, San Diego, California (2016). https://doi.org/10.18653/v1/N16-2013. https://aclanthology.org/N16-2013

  9. Mathew, B.; Saha, P.; Yimam, S.M.; Biemann, C.; Goyal, P.; Mukherjee, A.: HateXplain: a benchmark dataset for explainable hate speech detection (2020)

  10. Ribeiro, M.T.; Singh, S.; Guestrin, C.: "Why should i trust you?": explaining the predictions of any classifier (2016)

  11. Simonyan, K.; Vedaldi, A.; Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps (2014)

  12. Zeiler, M.D.; Fergus, R.: Visualizing and understanding convolutional networks (2013)

  13. Castro, J.; Gomez, D.; Tejada, J.: Polynomial calculation of the Shapley value based on sampling. Comput. Oper. Res. 36, 1726–1730 (2009). https://doi.org/10.1016/j.cor.2008.04.004

    Article  MathSciNet  Google Scholar 

  14. Atanasova, P.; Simonsen, J.G.; Lioma, C.; Augenstein, I.: A diagnostic study of explainability techniques for text classification (2020)

  15. DeYoung, J.; Jain, S.; Rajani, N.F.; Lehman, E.; Xiong, C.; Socher, R.; Wallace, B.C.: Eraser: a benchmark to evaluate rationalized NLP models. arXiv:1911.03429 (2019)

  16. Beddiar, D.R.; Jahan, M.S.; Oussalah, M.: Data expansion using back translation and paraphrasing for hate speech detection. Online Soc. Netw. Media 24, 100153 (2021). https://doi.org/10.1016/j.osnem.2021.100153

    Article  Google Scholar 

  17. Feng, S.Y.; Gangal, V.; Wei, J.; Chandar, S.; Vosoughi, S.; Mitamura, T.; Hovy, E.: A survey of data augmentation approaches for NLP (2021)

  18. Chen, H.; Ji, Y.: Improving the explainability of neural sentiment classifiers via data augmentation. arXiv:1909.04225 (2019)

  19. Doran, D.; Schulz, S.; Besold, T.R.: What does explainable AI really mean? A new conceptualization of perspectives. arXiv:1710.00794 (2017)

  20. Hagras, H.: Toward human-understandable, explainable AI. Computer 51(9), 28–36 (2018)

    Article  Google Scholar 

  21. Došilović, F.K.; Brčić, M.; Hlupić, N.: Explainable artificial intelligence: a survey. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 0210–0215 (2018). IEEE

  22. Samek, W.; Müller, K.-R.: In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Towards Explainable Artificial Intelligence, pp. 5–22. Springer, Cham (2019)

  23. Lundberg, S.M.; Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

  24. Ribeiro, M.T.; Singh, S.; Guestrin, C.: " why should i trust you?" explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)

  25. Kindermans, P.-J.; Schütt, K.T.; Alber, M.; Müller, K.-R.; Erhan, D.; Kim, B.; Dähne, S.: Learning how to explain neural networks: patternnet and patternattribution. arXiv:1705.05598 (2017)

  26. Saxena, C.; Garg, M.; Saxena, G.: Explainable causal analysis of mental health on social media data. arXiv:2210.08430 (2022)

  27. Garg, M.; Saxena, C.; Saha, S.; Krishnan, V.; Joshi, R.; Mago, V.: Cams: An annotated corpus for causal analysis of mental health issues in social media posts. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 6387–6396 (2022)

  28. Lei, T.; Barzilay, R.; Jaakkola, T.: Rationalizing neural predictions. arXiv:1606.04155 (2016)

  29. Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; Elhadad, N.: Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1721–1730 (2015)

  30. Amann, J.; Blasimme, A.; Vayena, E.; Frey, D.; Madai, V.I.: Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med. Inform. Decis. Mak. 20(1), 1–9 (2020)

    Article  Google Scholar 

  31. Pope, P.E.; Kolouri, S.; Rostami, M.; Martin, C.E.; Hoffmann, H.: Explainability methods for graph convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10772–10781 (2019)

  32. Zablocki, É.; Ben-Younes, H.; Pérez, P.; Cord, M.: Explainability of vision-based autonomous driving systems: review and challenges. arXiv:2101.05307 (2021)

  33. Mahajan, A.; Shah, D.; Jafar, G.: Explainable AI approach towards toxic comment classification, pp. 849–858 (2021). https://doi.org/10.1007/978-981-33-4367-2_81

  34. Danilevsky, M.; Qian, K.; Aharonov, R.; Katsis, Y.; Kawas, B.; Sen, P.: A survey of the state of explainable AI for natural language processing. arXiv:2010.00711 (2020)

  35. Badimala, P.; Mishra, C.; Modam Venkataramana, R.K.; Bukhari, S.; Dengel, A.: A study of various text augmentation techniques for relation classification in free text, pp. 360–367 (2019). https://doi.org/10.5220/0007311003600367

  36. Feng, S.Y.; Gangal, V.; Wei, J.; Chandar, S.; Vosoughi, S.; Mitamura, T.; Hovy, E.: A survey of data augmentation approaches for NLP. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 968–988. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.findings-acl.84. https://aclanthology.org/2021.findings-acl.84

  37. Wei, J.; Zou, K.: Eda: easy data augmentation techniques for boosting performance on text classification tasks. arXiv:1901.11196 (2019)

  38. Ng, N.; Yee, K.; Baevski, A.; Ott, M.; Auli, M.; Edunov, S.: Facebook fair’s WMT19 news translation task submission. CoRR (2019) arXiv:1907.06616

  39. Kobayashi, S.: Contextual augmentation: data augmentation by words with paradigmatic relations (2018)

  40. Kumar, V.; Choudhary, A.; Cho, E.: Data augmentation using pre-trained transformer models. In: Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems, pp. 18–26. Association for Computational Linguistics, Suzhou, China (2020). https://www.aclweb.org/anthology/2020.lifelongnlp-1.3

  41. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/D14-1181. https://aclanthology.org/D14-1181

  42. van Aken, B.; Risch, J.; Krestel, R.; Löser, A.: Challenges for toxic comment classification: an in-depth error analysis. In: ALW (2018)

  43. Sundararajan, M.; Taly, A.; Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)

  44. Nguyen, D.: Comparing automatic and human evaluation of local explanations for text classification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1069–1078. Association for Computational Linguistics, New Orleans, Louisiana (2018). https://doi.org/10.18653/v1/N18-1097. https://aclanthology.org/N18-1097

  45. Chen, J.; Song, L.; Wainwright, M.J.; Jordan, M.I.: L-shapley and c-shapley: efficient model interpretation for structured data. arXiv:1808.02610 (2018)

  46. Baccianella, S.; Esuli, A.; Sebastiani, F.: SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta (2010)

  47. Salminen, J.; Almerekhi, H.; Milenković, M.; Jung, S.-G.; An, J.; Kwak, H.; Jansen, B.J.: Anatomy of online hate: developing a taxonomy and machine learning models for identifying and classifying hate in online news media. In: Twelfth International AAAI Conference on Web and Social Media (2018)

  48. Ansari, G.; Garg, M.; Saxena, C.: Data augmentation for mental health classification on social media. arXiv:2112.10064 (2021)

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Authors 1 and 2 conceived as well as designed the analysis and wrote the manuscript. Author 3 performed the analysis and compiled the results of implementation. All authors reviewed the manuscript.

Corresponding author

Correspondence to Parmeet Kaur.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical Approval and Consent to participate

Not applicable.

Human and Animal Ethics

Not applicable.

Consent for Publication

All authors have given consent to submit the manuscript in its present form. Consent from others is not applicable.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ansari, G., Kaur, P. & Saxena, C. Data Augmentation for Improving Explainability of Hate Speech Detection. Arab J Sci Eng 49, 3609–3621 (2024). https://doi.org/10.1007/s13369-023-08100-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-023-08100-4

Keywords

Navigation