Neural Network Hate Deletion: Developing a Machine Learning Model to Eliminate Hate from Online Comments

  • Joni SalminenEmail author
  • Juhani Luotolahti
  • Hind Almerekhi
  • Bernard J. Jansen
  • Soon-gyo Jung
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11193)


We propose a method for modifying hateful online comments to non-hateful comments without losing the understandability and original meaning of the comments. To accomplish this, we retrieve and classify 301,153 hateful and 1,041,490 non-hateful comments from Facebook and YouTube channels of a large international media organization that is a target of considerable online hate. We supplement this dataset by 10,000 Reddit comments manually labeled for hatefulness. Using these two datasets, we train a neural network to distinguish linguistic patterns. The model we develop, Neural Network Hate Deletion (NNHD), computes how hateful the sentences of a social media comment are and if they are above a given threshold, it deletes them using a language dependency tree. We evaluate the results by comparing crowd workers’ perceptions of hatefulness and understandability before and after transformation and find that our method reduces hatefulness without resulting in a significant loss of understandability. In some cases, removing hateful elements improves understandability by reducing the linguistic complexity of the comment. In addition, we find that NNHD can satisfactorily retain the original meaning on average but is not perfect in this regard. In terms of practical implications, NNHD could be used in social media platforms to suggest more neutral use of language to agitated online users.


Online hate Toxic comments Hate deletion Neural networks 


  1. 1.
    Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5, 11 (2016)CrossRefGoogle Scholar
  2. 2.
    Del Vicario, M., et al.: Echo chambers: emotional contagion and group polarization on facebook. Sci. Rep. 6, 37825 (2016)CrossRefGoogle Scholar
  3. 3.
    Kramer, A.D.I., Guillory, J.E., Hancock, J.T.: Experimental evidence of massive-scale emotional contagion through social networks. PNAS 111, 8788–8790 (2014)CrossRefGoogle Scholar
  4. 4.
    Salminen, J., et al.: Anatomy of online hate: developing a taxonomy and machine learning models for identifying and classifying hate in online news media. In: Proceeding of the International AAAI Conference on Web and Social Media (ICWSM 2018), San Francisco, California, USA (2018)Google Scholar
  5. 5.
    Wright, L., Ruths, D., Dillon, K.P., Saleem, H.M., Benesch, S.: Vectors for counterspeech on Twitter. In: Proceedings of the First Workshop on Abusive Language Online, pp. 57–62 (2017)Google Scholar
  6. 6.
    Scheuermann, L., Taylor, G.: Netiquette. Internet Res. 7, 269–273 (1997)CrossRefGoogle Scholar
  7. 7.
    Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of Eleventh International AAAI Conference on Web and Social Media, Québec, Canada (2017)Google Scholar
  8. 8.
    Bamberg, S.: Changing environmentally harmful behaviors: a stage model of self-regulated behavioral change. J. Environ. Psychol. 34, 151–159 (2013)CrossRefGoogle Scholar
  9. 9.
    Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web, pp. 29–30. ACM, New York (2015)Google Scholar
  10. 10.
    Mondal, M., Silva, L.A., Benevenuto, F.: A Measurement study of hate speech in social media. In: Proceedings of the 28th ACM Conference on Hypertext and Social Media, pp. 85–94. ACM, New York (2017)Google Scholar
  11. 11.
    Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, pp. 145–153. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2016)Google Scholar
  12. 12.
    Ries, E.: The Lean Startup. Penguin Books Ltd, London (2011)Google Scholar
  13. 13.
    Mohan, S., Guha, A., Harris, M., Popowich, F., Schuster, A., Priebe, C.: The impact of toxic language on the health of Reddit communities. In: Mouhoub, M., Langlais, P. (eds.) AI 2017. LNCS (LNAI), vol. 10233, pp. 51–56. Springer, Cham (2017). Scholar
  14. 14.
    Saleem, H.M., Dillon, K.P., Benesch, S., Ruths, D.: A web of hate: tackling hateful speech in online social spaces (2017). arXiv:1709.10159 [cs]
  15. 15.
    Silva, L., Mondal, M., Correa, D., Benevenuto, F., Weber, I.: Analyzing the targets of hate in online social media. In: Proceedings of Tenth International AAAI Conference on Web and Social Media, Palo Alto, CA (2016)Google Scholar
  16. 16.
    Sood, S., Antin, J., Churchill, E.: Profanity use in online communities. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1481–1490. ACM, New York (2012)Google Scholar
  17. 17.
    Sood, S.O., Churchill, E.F., Antin, J.: Automatic identification of personal insults on social news sites. J. Am. Soc. Inf. Sci. 63, 270–285 (2012)CrossRefGoogle Scholar
  18. 18.
    Rajadesingan, A., Zafarani, R., Liu, H.: Sarcasm detection on Twitter: a behavioral modeling approach. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 97–106. ACM (2015)Google Scholar
  19. 19.
    Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2017)Google Scholar
  20. 20.
    Park, J.H., Fung, P.: One-step and two-step classification for abusive language detection on Twitter (2017). arXiv preprint arXiv:1706.01206
  21. 21.
    Strauss, A., Corbin, J.: Grounded theory methodology. In: Denzin, N.K., Lincoln, Y.S. (eds.) Handbook of Qualitative Research, pp. 273–285. Sage, Thousand Oaks (1994)Google Scholar
  22. 22.
    Geiger, D., Seedorf, S., Schulze, T., Nickerson, R., Schader, M.: Managing the crowd: towards a taxonomy of crowdsourcing processes. In: AMCIS 2011 Proceedings, pp. 1–11 (2011)Google Scholar
  23. 23.
    Filippova, K., Strube, M.: Dependency tree based sentence compression. In: Proceedings of the Fifth International Natural Language Generation Conference, pp. 25–32. Association for Computational Linguistics, Stroudsburg (2008)Google Scholar
  24. 24.
    Alguliev, R., Aliguliyev, R.: Evolutionary algorithm for extractive text summarization. Intell. Inf. Manag. 1, 128 (2009)Google Scholar
  25. 25.
    Straka, M., Hajic, J., Strakova, J.: UDPipe: trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, POS tagging and parsing. Presented at the Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia (2016)Google Scholar
  26. 26.
    Alonso, O., Marshall, C.C., Najork, M.: Debugging a crowdsourced task with low inter-rater agreement. In: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 101–110. ACM, New York (2015)Google Scholar
  27. 27.
    Norušis, M.J.: IBM SPSS Statistics 19 Statistical Procedures Companion. Prentice Hall, Upper Saddle River (2011)Google Scholar
  28. 28.
    Norman, G.: Likert scales, levels of measurement and the “laws” of statistics. Adv Health Sci. Educ. Theory Pract. 15, 625–632 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Joni Salminen
    • 1
    • 2
    Email author
  • Juhani Luotolahti
    • 2
  • Hind Almerekhi
    • 1
  • Bernard J. Jansen
    • 1
  • Soon-gyo Jung
    • 1
  1. 1.Qatar Computing Research Institute, Hamad Bin Khalifa UniversityDohaQatar
  2. 2.University of TurkuTurkuFinland

Personalised recommendations