Skip to main content

Effective hate-speech detection in Twitter data using recurrent neural networks

Abstract

This paper addresses the important problem of discerning hateful content in social media. We propose a detection scheme that is an ensemble of Recurrent Neural Network (RNN) classifiers, and it incorporates various features associated with user-related information, such as the users’ tendency towards racism or sexism. This data is fed as input to the above classifiers along with the word frequency vectors derived from the textual content. We evaluate our approach on a publicly available corpus of 16k tweets, and the results demonstrate its effectiveness in comparison to existing state-of-the-art solutions. More specifically, our scheme can successfully distinguish racism and sexism messages from normal text, and achieve higher classification quality than current state-of-the-art algorithms.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. http://www.statmt.org/moses/

  2. The small discrepancy observed in the class quantities with regard to those mentioned in the original dataset is due to fact that, at the time we performed the evaluation, a number of tweets were not retrievable.

  3. https://github.com/fchollet/keras

References

  1. Badjatiya P, Gupta S, Gupta M, Varma V (2017) Deep learning for hate speech detection in tweets. In: Proceedings of the 26th international conference on world wide web companion. International World Wide Web Conferences Steering Committee, pp 759–760

  2. Barnaghi P, Ghaffari P, Breslin JG (2016) Opinion mining and sentiment polarity on twitter and correlation between events and sentiment. In: 2nd IEEE international conference on big data computing service and applications (BigDataService), pp 52–57

  3. Facebook, Google and Twitter agree german hate speech deal. Website. http://www.bbc.com/news/world-europe-35105003. Accessed 26 Nov 2016

  4. Chen Y, Zhou Y, Zhu S, Xu H (2012) Detecting offensive language in social media to protect adolescent online safety. In: 2012 international conference on privacy, security, risk and trust (PASSAT 2012), and 2012 international confernece on social computing (SocialCom 2012), pp 71–80

  5. Zuckerberg in Germany: No place for hate speech onFacebook. Website. http://www.dailymail.co.uk/wires/ap/article-3465562/Zuckerberg-no-place-hate-speech-Facebook.html. Accessed 26 Feb 2016

  6. Davidson T, Warmsley D, Macy MW, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th international conference on web and social media (ICWSM 2017), pp 512–515

  7. Djuric N, Zhou J, Morris R, Grbovic M, Radosavljevic V, Bhamidipati N (2015) Hate speech detection with comment embeddings. In: Proceedings of the 24th international conference on world wide web companion. ACM, pp 29–30

  8. Elman J (1990) Finding structure in time. Cogn Sci 14:179–211

    Article  Google Scholar 

  9. Gambäck B., Sikdar UK (2017) Using convolutional neural networks to classify hate-speech. In: Proceedings of the 1st workshop on abusive language online at ACL 2017

  10. Gandhi I, Pandey M (2015) Hybrid ensemble of classifiers using voting. In: 2015 international conference on green computing and Internet of Things (ICGCIoT), pp 399–404. https://doi.org/10.1109/ICGCIoT.2015.7380496

  11. Jha A, Mamidi R (2017) When does a compliment become sexist? analysis and classification of ambivalent sexism using twitter data. In: Proceedings of the second workshop on NLP and computational social science. Association for Computational Linguistics, pp 7–16

  12. Joulin A, Grave E, Bojanowski P, Mikolov T (2016) Bag of tricks for efficient text classification. arXiv:http://arXiv.org/abs/1607.01759

  13. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. In: Proceedings of the 3rd international conference on learning representations (ICLR 2014)

  14. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  15. NewYorkTimes (2017) Twitter must do more to block isis. Website. https://www.nytimes.com/2017/01/13/opinion/twitter-must-do-more-to-block-isis.html. Accessed 30 Sept 2017

  16. Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. In: Proceedings of the 25th international conference on world wide web (WWW 2016). International World Wide Web Conferences Steering Committee, pp 145–153

  17. Omer S, Lior R (2018) Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, p e1249. https://doi.org/10.1002/widm.1249. Published Online: Feb 27 2018

    Google Scholar 

  18. Orrite C, Rodríguez M, Martínez F, Fairhurst M (2008) Classifier ensemble generation for the majority vote rule. In: Ruiz-Shulcloper J, Kropatsch WG (eds) Progress in pattern recognition, image analysis and applications. Springer Berlin Heidelberg, Berlin Heidelberg, pp 340–347

    Google Scholar 

  19. Park JH, Fung P (2017) One-step and two-step classification for abusive language detection on twitter. In: Proceedings of the 1st workshop on abusive language online at ACL 2017

  20. Saha S, Ekbal A (2013) Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition. Data Knowl Eng 85:15–39. Natural Language for Information Systems, Communicating with Anything, Anywhere in Natural Language

    Article  Google Scholar 

  21. Schmidt A, Wiegand M (2017) A survey on hate speech detection using natural language processing. In: Proceedings of the 5th international workshop on natural language processing for social media. Association for Computational Linguistics, pp 1–10)

  22. Vigna FD, Cimino A, Dell’Orletta F, Petrocchi M, Tesconi M (2017) Hate me, hate me not: hate speech detection on facebook. In: Proceedings of the 1st Italian conference on cybersecurity (ITASEC17), pp 86–95. http://ceur-ws.org/Vol-1816/paper-09.pdf

  23. Warner W, Hirschberg J (2012) Detecting hate speech on the world wide web. In: Proceedings of the 2nd workshop on language in social media (LSM2012) LSM ’12. Association for Computational Linguistics, pp 19–26

  24. Waseem Z (2016) Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In: Proceedings of the first workshop on NLP and computational social science. association for computational linguistics, pp 138–142

  25. Waseem Z, Hovy D (2016) Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In: Proceedings of the NAACL student research workshop. Association for Computational Linguistics

Download references

Acknowledgements

This work has been supported by Telenor Research, Norway, through the collaboration project between NTNU and Telenor. It has been carried out at the Telenor – NTNU AI-Lab.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios K. Pitsilis.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pitsilis, G.K., Ramampiaro, H. & Langseth, H. Effective hate-speech detection in Twitter data using recurrent neural networks. Appl Intell 48, 4730–4742 (2018). https://doi.org/10.1007/s10489-018-1242-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1242-y

Keywords

  • Text classification
  • Micro-blogging
  • Hate-speech
  • Deep learning
  • Recurrent neural networks
  • Twitter