Skip to main content

Neural Network Hate Deletion: Developing a Machine Learning Model to Eliminate Hate from Online Comments

  • Conference paper
  • First Online:
Book cover Internet Science (INSCI 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11193))

Included in the following conference series:

Abstract

We propose a method for modifying hateful online comments to non-hateful comments without losing the understandability and original meaning of the comments. To accomplish this, we retrieve and classify 301,153 hateful and 1,041,490 non-hateful comments from Facebook and YouTube channels of a large international media organization that is a target of considerable online hate. We supplement this dataset by 10,000 Reddit comments manually labeled for hatefulness. Using these two datasets, we train a neural network to distinguish linguistic patterns. The model we develop, Neural Network Hate Deletion (NNHD), computes how hateful the sentences of a social media comment are and if they are above a given threshold, it deletes them using a language dependency tree. We evaluate the results by comparing crowd workers’ perceptions of hatefulness and understandability before and after transformation and find that our method reduces hatefulness without resulting in a significant loss of understandability. In some cases, removing hateful elements improves understandability by reducing the linguistic complexity of the comment. In addition, we find that NNHD can satisfactorily retain the original meaning on average but is not perfect in this regard. In terms of practical implications, NNHD could be used in social media platforms to suggest more neutral use of language to agitated online users.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We compiled the list of hate words by combining open coding done on our dataset and lists of profanity or swear words available online: http://www.bannedwordlist.com/lists/swearWords.txt; https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/blob/master/en; https://www.frontgatemedia.com/a-list-of-723-bad-words-to-blacklist-and-how-to-use-facebooks-moderation-tool/; http://onlineslangdictionary.com/lists/most-vulgar-words/.

  2. 2.

    fastText is a library for learning of word embeddings and sentence classification created by Facebook.

References

  1. Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5, 11 (2016)

    Article  Google Scholar 

  2. Del Vicario, M., et al.: Echo chambers: emotional contagion and group polarization on facebook. Sci. Rep. 6, 37825 (2016)

    Article  Google Scholar 

  3. Kramer, A.D.I., Guillory, J.E., Hancock, J.T.: Experimental evidence of massive-scale emotional contagion through social networks. PNAS 111, 8788–8790 (2014)

    Article  Google Scholar 

  4. Salminen, J., et al.: Anatomy of online hate: developing a taxonomy and machine learning models for identifying and classifying hate in online news media. In: Proceeding of the International AAAI Conference on Web and Social Media (ICWSM 2018), San Francisco, California, USA (2018)

    Google Scholar 

  5. Wright, L., Ruths, D., Dillon, K.P., Saleem, H.M., Benesch, S.: Vectors for counterspeech on Twitter. In: Proceedings of the First Workshop on Abusive Language Online, pp. 57–62 (2017)

    Google Scholar 

  6. Scheuermann, L., Taylor, G.: Netiquette. Internet Res. 7, 269–273 (1997)

    Article  Google Scholar 

  7. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of Eleventh International AAAI Conference on Web and Social Media, Québec, Canada (2017)

    Google Scholar 

  8. Bamberg, S.: Changing environmentally harmful behaviors: a stage model of self-regulated behavioral change. J. Environ. Psychol. 34, 151–159 (2013)

    Article  Google Scholar 

  9. Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web, pp. 29–30. ACM, New York (2015)

    Google Scholar 

  10. Mondal, M., Silva, L.A., Benevenuto, F.: A Measurement study of hate speech in social media. In: Proceedings of the 28th ACM Conference on Hypertext and Social Media, pp. 85–94. ACM, New York (2017)

    Google Scholar 

  11. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, pp. 145–153. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2016)

    Google Scholar 

  12. Ries, E.: The Lean Startup. Penguin Books Ltd, London (2011)

    Google Scholar 

  13. Mohan, S., Guha, A., Harris, M., Popowich, F., Schuster, A., Priebe, C.: The impact of toxic language on the health of Reddit communities. In: Mouhoub, M., Langlais, P. (eds.) AI 2017. LNCS (LNAI), vol. 10233, pp. 51–56. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57351-9_6

    Chapter  Google Scholar 

  14. Saleem, H.M., Dillon, K.P., Benesch, S., Ruths, D.: A web of hate: tackling hateful speech in online social spaces (2017). arXiv:1709.10159 [cs]

  15. Silva, L., Mondal, M., Correa, D., Benevenuto, F., Weber, I.: Analyzing the targets of hate in online social media. In: Proceedings of Tenth International AAAI Conference on Web and Social Media, Palo Alto, CA (2016)

    Google Scholar 

  16. Sood, S., Antin, J., Churchill, E.: Profanity use in online communities. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1481–1490. ACM, New York (2012)

    Google Scholar 

  17. Sood, S.O., Churchill, E.F., Antin, J.: Automatic identification of personal insults on social news sites. J. Am. Soc. Inf. Sci. 63, 270–285 (2012)

    Article  Google Scholar 

  18. Rajadesingan, A., Zafarani, R., Liu, H.: Sarcasm detection on Twitter: a behavioral modeling approach. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 97–106. ACM (2015)

    Google Scholar 

  19. Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2017)

    Google Scholar 

  20. Park, J.H., Fung, P.: One-step and two-step classification for abusive language detection on Twitter (2017). arXiv preprint arXiv:1706.01206

  21. Strauss, A., Corbin, J.: Grounded theory methodology. In: Denzin, N.K., Lincoln, Y.S. (eds.) Handbook of Qualitative Research, pp. 273–285. Sage, Thousand Oaks (1994)

    Google Scholar 

  22. Geiger, D., Seedorf, S., Schulze, T., Nickerson, R., Schader, M.: Managing the crowd: towards a taxonomy of crowdsourcing processes. In: AMCIS 2011 Proceedings, pp. 1–11 (2011)

    Google Scholar 

  23. Filippova, K., Strube, M.: Dependency tree based sentence compression. In: Proceedings of the Fifth International Natural Language Generation Conference, pp. 25–32. Association for Computational Linguistics, Stroudsburg (2008)

    Google Scholar 

  24. Alguliev, R., Aliguliyev, R.: Evolutionary algorithm for extractive text summarization. Intell. Inf. Manag. 1, 128 (2009)

    Google Scholar 

  25. Straka, M., Hajic, J., Strakova, J.: UDPipe: trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, POS tagging and parsing. Presented at the Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), PortoroĹľ, Slovenia (2016)

    Google Scholar 

  26. Alonso, O., Marshall, C.C., Najork, M.: Debugging a crowdsourced task with low inter-rater agreement. In: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 101–110. ACM, New York (2015)

    Google Scholar 

  27. Norušis, M.J.: IBM SPSS Statistics 19 Statistical Procedures Companion. Prentice Hall, Upper Saddle River (2011)

    Google Scholar 

  28. Norman, G.: Likert scales, levels of measurement and the “laws” of statistics. Adv Health Sci. Educ. Theory Pract. 15, 625–632 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joni Salminen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Salminen, J., Luotolahti, J., Almerekhi, H., Jansen, B.J., Jung, Sg. (2018). Neural Network Hate Deletion: Developing a Machine Learning Model to Eliminate Hate from Online Comments. In: Bodrunova, S. (eds) Internet Science. INSCI 2018. Lecture Notes in Computer Science(), vol 11193. Springer, Cham. https://doi.org/10.1007/978-3-030-01437-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01437-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01436-0

  • Online ISBN: 978-3-030-01437-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics