Abstract
The paper presents a novel data augmentation-based approach to develop explainable, deep learning models for hate speech detection. Hate speech is widely prevalent on online social media but difficult to detect automatically due to challenges of natural language processing and complexity of hate speech. Further, the decisions of the existing solutions possess constrained explainability since limited annotated data are available for training and testing of models. Therefore, this work proposes the use of text-based data augmentation for improving the performance and explainability of deep learning models. Techniques based on easy data augmentation, bidirectional encoder representations from transformers and back translation have been utilized for data augmentation. Convolutional neural networks and long short-term memory models are trained with augmented data and evaluated on two publicly available datasets for hate speech detection. Methods of LIME and integrated gradients are used to retrieve explanations of the deep learning models. A diagnostic study is conducted on test samples to check for improvement in the models as a result of the data augmentation. The experimental results verify that the proposed approach improves the explainability as well as the accuracy of hate speech detection.
Similar content being viewed by others
Data Availability
The datasets analyzed during the current study are available at https://www.dropbox.com/s/21wtzy9arc5skr8/ICWSM18 and https://data.mendeley.com/datasets/jf4pzyvnpj/1.
Notes
References
Wright, M.F.: Cyberbullying in cultural context. J. Cross-Cultural Psychol. 48(8), 1136–1137 (2017). https://doi.org/10.1177/0022022117723107
MacAvaney, S.; Yao, H.-R.; Yang, E.; Russell, K.; Goharian, N.; Frieder, O.: Hate speech detection: challenges and solutions. PLoS ONE 14(8), 1–16 (2019). https://doi.org/10.1371/journal.pone.0221152
Agrawal, S.; Awekar, A.: Deep learning for detecting cyberbullying across multiple social media platforms. arXiv:1801.06482 (2018)
Dadvar, M.; Eckert, K.: Cyberbullying detection in social networks using deep learning based models; a reproducibility study. arxiv:1812.08046 (2018)
Zhang, Z.; Robinson, D.; Tepper, J.A.: Detecting hate speech on twitter using a convolution-gru based deep neural network. In: ESWC (2018)
Phanomtip, A.; Sueb-in, T.; Vittayakorn, S.: Cyberbullying detection on tweets. In: 2021 18th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pp. 295–298 (2021). https://doi.org/10.1109/ECTI-CON51831.2021.9454848
Mishra, P.; Del Tredici, M.; Yannakoudakis, H.; Shutova, E.: Author profiling for abuse detection. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1088–1098. Association for Computational Linguistics, Santa Fe, New Mexico, USA (2018). https://aclanthology.org/C18-1093
Waseem, Z.; Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88–93. Association for Computational Linguistics, San Diego, California (2016). https://doi.org/10.18653/v1/N16-2013. https://aclanthology.org/N16-2013
Mathew, B.; Saha, P.; Yimam, S.M.; Biemann, C.; Goyal, P.; Mukherjee, A.: HateXplain: a benchmark dataset for explainable hate speech detection (2020)
Ribeiro, M.T.; Singh, S.; Guestrin, C.: "Why should i trust you?": explaining the predictions of any classifier (2016)
Simonyan, K.; Vedaldi, A.; Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps (2014)
Zeiler, M.D.; Fergus, R.: Visualizing and understanding convolutional networks (2013)
Castro, J.; Gomez, D.; Tejada, J.: Polynomial calculation of the Shapley value based on sampling. Comput. Oper. Res. 36, 1726–1730 (2009). https://doi.org/10.1016/j.cor.2008.04.004
Atanasova, P.; Simonsen, J.G.; Lioma, C.; Augenstein, I.: A diagnostic study of explainability techniques for text classification (2020)
DeYoung, J.; Jain, S.; Rajani, N.F.; Lehman, E.; Xiong, C.; Socher, R.; Wallace, B.C.: Eraser: a benchmark to evaluate rationalized NLP models. arXiv:1911.03429 (2019)
Beddiar, D.R.; Jahan, M.S.; Oussalah, M.: Data expansion using back translation and paraphrasing for hate speech detection. Online Soc. Netw. Media 24, 100153 (2021). https://doi.org/10.1016/j.osnem.2021.100153
Feng, S.Y.; Gangal, V.; Wei, J.; Chandar, S.; Vosoughi, S.; Mitamura, T.; Hovy, E.: A survey of data augmentation approaches for NLP (2021)
Chen, H.; Ji, Y.: Improving the explainability of neural sentiment classifiers via data augmentation. arXiv:1909.04225 (2019)
Doran, D.; Schulz, S.; Besold, T.R.: What does explainable AI really mean? A new conceptualization of perspectives. arXiv:1710.00794 (2017)
Hagras, H.: Toward human-understandable, explainable AI. Computer 51(9), 28–36 (2018)
Došilović, F.K.; Brčić, M.; Hlupić, N.: Explainable artificial intelligence: a survey. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 0210–0215 (2018). IEEE
Samek, W.; Müller, K.-R.: In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Towards Explainable Artificial Intelligence, pp. 5–22. Springer, Cham (2019)
Lundberg, S.M.; Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Ribeiro, M.T.; Singh, S.; Guestrin, C.: " why should i trust you?" explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Kindermans, P.-J.; Schütt, K.T.; Alber, M.; Müller, K.-R.; Erhan, D.; Kim, B.; Dähne, S.: Learning how to explain neural networks: patternnet and patternattribution. arXiv:1705.05598 (2017)
Saxena, C.; Garg, M.; Saxena, G.: Explainable causal analysis of mental health on social media data. arXiv:2210.08430 (2022)
Garg, M.; Saxena, C.; Saha, S.; Krishnan, V.; Joshi, R.; Mago, V.: Cams: An annotated corpus for causal analysis of mental health issues in social media posts. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 6387–6396 (2022)
Lei, T.; Barzilay, R.; Jaakkola, T.: Rationalizing neural predictions. arXiv:1606.04155 (2016)
Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; Elhadad, N.: Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1721–1730 (2015)
Amann, J.; Blasimme, A.; Vayena, E.; Frey, D.; Madai, V.I.: Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med. Inform. Decis. Mak. 20(1), 1–9 (2020)
Pope, P.E.; Kolouri, S.; Rostami, M.; Martin, C.E.; Hoffmann, H.: Explainability methods for graph convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10772–10781 (2019)
Zablocki, É.; Ben-Younes, H.; Pérez, P.; Cord, M.: Explainability of vision-based autonomous driving systems: review and challenges. arXiv:2101.05307 (2021)
Mahajan, A.; Shah, D.; Jafar, G.: Explainable AI approach towards toxic comment classification, pp. 849–858 (2021). https://doi.org/10.1007/978-981-33-4367-2_81
Danilevsky, M.; Qian, K.; Aharonov, R.; Katsis, Y.; Kawas, B.; Sen, P.: A survey of the state of explainable AI for natural language processing. arXiv:2010.00711 (2020)
Badimala, P.; Mishra, C.; Modam Venkataramana, R.K.; Bukhari, S.; Dengel, A.: A study of various text augmentation techniques for relation classification in free text, pp. 360–367 (2019). https://doi.org/10.5220/0007311003600367
Feng, S.Y.; Gangal, V.; Wei, J.; Chandar, S.; Vosoughi, S.; Mitamura, T.; Hovy, E.: A survey of data augmentation approaches for NLP. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 968–988. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.findings-acl.84. https://aclanthology.org/2021.findings-acl.84
Wei, J.; Zou, K.: Eda: easy data augmentation techniques for boosting performance on text classification tasks. arXiv:1901.11196 (2019)
Ng, N.; Yee, K.; Baevski, A.; Ott, M.; Auli, M.; Edunov, S.: Facebook fair’s WMT19 news translation task submission. CoRR (2019) arXiv:1907.06616
Kobayashi, S.: Contextual augmentation: data augmentation by words with paradigmatic relations (2018)
Kumar, V.; Choudhary, A.; Cho, E.: Data augmentation using pre-trained transformer models. In: Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems, pp. 18–26. Association for Computational Linguistics, Suzhou, China (2020). https://www.aclweb.org/anthology/2020.lifelongnlp-1.3
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/D14-1181. https://aclanthology.org/D14-1181
van Aken, B.; Risch, J.; Krestel, R.; Löser, A.: Challenges for toxic comment classification: an in-depth error analysis. In: ALW (2018)
Sundararajan, M.; Taly, A.; Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)
Nguyen, D.: Comparing automatic and human evaluation of local explanations for text classification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1069–1078. Association for Computational Linguistics, New Orleans, Louisiana (2018). https://doi.org/10.18653/v1/N18-1097. https://aclanthology.org/N18-1097
Chen, J.; Song, L.; Wainwright, M.J.; Jordan, M.I.: L-shapley and c-shapley: efficient model interpretation for structured data. arXiv:1808.02610 (2018)
Baccianella, S.; Esuli, A.; Sebastiani, F.: SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta (2010)
Salminen, J.; Almerekhi, H.; Milenković, M.; Jung, S.-G.; An, J.; Kwak, H.; Jansen, B.J.: Anatomy of online hate: developing a taxonomy and machine learning models for identifying and classifying hate in online news media. In: Twelfth International AAAI Conference on Web and Social Media (2018)
Ansari, G.; Garg, M.; Saxena, C.: Data augmentation for mental health classification on social media. arXiv:2112.10064 (2021)
Acknowledgements
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Authors 1 and 2 conceived as well as designed the analysis and wrote the manuscript. Author 3 performed the analysis and compiled the results of implementation. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical Approval and Consent to participate
Not applicable.
Human and Animal Ethics
Not applicable.
Consent for Publication
All authors have given consent to submit the manuscript in its present form. Consent from others is not applicable.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ansari, G., Kaur, P. & Saxena, C. Data Augmentation for Improving Explainability of Hate Speech Detection. Arab J Sci Eng 49, 3609–3621 (2024). https://doi.org/10.1007/s13369-023-08100-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-023-08100-4