Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach

Abstract

Over the last decade, the increased use of social media has led to an increase in hateful activities in social networks. Hate speech is one of the most dangerous of these activities, so users have to protect themselves from these activities from YouTube, Facebook, Twitter etc. This paper introduces a method for using a hybrid of natural language processing and with machine learning technique to predict hate speech from social media websites. After hate speech is collected, steaming, token splitting, character removal and inflection elimination is performed before performing hate speech recognition process. After that collected data is examined using a killer natural language processing optimization ensemble deep learning approach (KNLPEDNN). This method detects hate speech on social media websites using an effective learning process that classifies the text into neutral, offensive and hate language. The performance of the system is then evaluated using overall accuracy, f-score, precision and recall metrics. The system attained minimum deviations mean square error − 0.019, Cross Entropy Loss − 0.015 and Logarithmic loss L-0.0238 and 98.71% accuracy.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. 1.

    Xiang G, Fan B, Wang L, Hong J, Rose C (2012) Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 1980–1984

  2. 2.

    Del Vigna F, Cimino A, Dell’Orletta F, Petrocchi M, Tesconi M (2017) Hate me, hate me not: hate speech detection on Facebook. In: Proceedings of the first Italian conference on cybersecurity (ITASEC17), Venice, Italy

  3. 3.

    Waseem Z, Hovy D (2016) Hateful symbols or hateful people? Predictive features for hate speech detection on twitter. In: Proceedings of the NAACL student research workshop, pp 88–93

  4. 4.

    Watanabe H, Bouazizi M, Ohtsuki T (2018) Hate speech on twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6:13825–13835

    Article  Google Scholar 

  5. 5.

    Bouazizi M, Ohtsuki TO (2016) A pattern-based approach for sarcasm detection on twitter. IEEE Access 4:5477–5488

    Article  Google Scholar 

  6. 6.

    Facebook, Google and Twitter agree German Hate Speech Deal. Website. http://www.bbc.com/news/world-europe-35105003. Accessed 26 Mar 2019

  7. 7.

    AlFarraj O, AlZubi A, Tolba A (2018) Optimized feature selection algorithm based on fireflies with gravitational ant colony algorithm for big data predictive analytics. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3612-0

    Article  Google Scholar 

  8. 8.

    Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

  9. 9.

    Chen Y, Zhou Y, Zhu S, Xu H (2012) Detecting offensive language in social media to protect adolescent online safety. In: 2012 international conference on privacy, security, risk and trust and 2012 international conference on social computing. IEEE, pp 71–80

  10. 10.

    Xia F, Liaqat HB, Ahmed AM, Liu L, Ma J, Huang R, Tolba A (2016) User popularity-based packet scheduling for congestion control in ad-hoc social networks. J Comput Syst Sci 82(1):93–112

    MathSciNet  Article  Google Scholar 

  11. 11.

    Li J, Ning Z, Jedari B, Xia F, Lee I, Tolba A (2016) Geo-social distance-based data dissemination for socially aware networking. IEEE Access 4:1444–1453

    Article  Google Scholar 

  12. 12.

    Rahim A, Qiu T, Ning Z, Wang J, Ullah N, Tolba A, Xia F (2019) Social acquaintance based routing in vehicular social networks. Future Gen Comput Syst 93:751–760

    Article  Google Scholar 

  13. 13.

    Fortuna P, Nunes S (2018) A survey on automatic detection of hate speech in text. ACM Comput Surv (CSUR) 51(4):85

    Article  Google Scholar 

  14. 14.

    Pitsilis GK, Ramampiaro H, Langseth H (2018) Effective hate-speech detection in Twitter data using recurrent neural networks. Appl Intell 48(12):4730–4742

    Article  Google Scholar 

  15. 15.

    Gaydhani A, Doma V, Kendre S, Bhagwat L (2018) Detecting hate speech and offensive language on twitter using machine learning: an N-gram and TFIDF based approach. arXiv preprint arXiv:1809.08651

  16. 16.

    Fauzi MA, Yuniarti A (2018) Ensemble method for indonesian twitter hate speech detection. Indones. J Electr Eng Comput Sci 11(1):294–299

    Article  Google Scholar 

  17. 17.

    Zhang Z, Luo L (2018) Hate speech detection: A solved problem? The challenging case of long tail on Twitter. Semantic Web, (Preprint), pp 1–21

  18. 18.

    Chang CY, Lee SJ, Lai CC (2017) Sighted word2vec based on the distance of words. In: 2017 international conference on machine learning and cybernetics (ICMLC). IEEE, vol 2, pp 563–568

  19. 19.

    Alarifi A, Tolba A, Al-Makhadmeh Z, Said W (2018) A big data approach to sentiment analysis using greedy feature selection with cat swarm optimization-based long short-term memory neural networks. J Supercomput. https://doi.org/10.1007/s11227-018-2398-2

    Article  Google Scholar 

  20. 20.

    Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Thirtieth AAAI conference on artificial intelligence

  21. 21.

    Caren N, Jowers K, Gaby S (2012) A social movement online community: stormfront and the white nationalist movement. In: Earl J, Rohlinger DA (eds) Media, movements, and political change (research in social movements, conflicts and change, volume 33). Emerald Group Publishing Limited, Bingley, pp 163–193

    Google Scholar 

  22. 22.

    https://data.world/crowdflower/hate-speech-identification. Accessed 10 June 2019

  23. 23.

    Bergin TJ (2006) The origins of word processing software for personal computers: 1976–1985. IEEE Ann Hist Comput 28(4):32–47

    MathSciNet  Article  Google Scholar 

  24. 24.

    Wong KF, Li W, Xu R, Zhang ZS (2009) Introduction to Chinese natural language processing. Synth Lect Hum Lang Technol 2(1):1–148

    Article  Google Scholar 

  25. 25.

    Gupta V (2014) Automatic stemming of words for Punjabi language. In: Thampi SM, Gelbukh A, Mukhopadhyay J (eds) Advances in signal processing and intelligent recognition systems. Springer, Cham, pp 73–84

    Google Scholar 

  26. 26.

    Fares M, Oepen S, Zhang Y (2013) Machine learning for high-quality tokenization replicating variable tokenization schemes. In: International conference on intelligent text processing and computational linguistics. Springer, Berlin, Heidelberg, pp 231–244

    Google Scholar 

  27. 27.

    Domínguez MA, Infante-Lopez G (2008) Searching for part of speech tags that improve parsing models. In: International conference on natural language processing. Springer, Berlin, Heidelberg, pp 126–137

    Google Scholar 

  28. 28.

    Rahim A, Ma K, Zhao W, Tolba A, Al-Makhadmeh Z, Xia F (2018) Cooperative data forwarding based on crowdsourcing in vehicular social networks. Pervasive Mob Comput 51:43–55

    Article  Google Scholar 

  29. 29.

    Nicholls C, Song F (2010) Comparison of feature selection methods for sentiment analysis. In: Canadian conference on artificial intelligence. Springer, Berlin, Heidelberg, pp 286–289

    Google Scholar 

  30. 30.

    Razavi AH, Inkpen D, Uritsky S, Matwin S (2010) Offensive language detection using multi-level classification. In: Canadian conference on artificial intelligence. Springer, Berlin, Heidelberg, pp 16–27

    Google Scholar 

  31. 31.

    Chen Y, Zhou Y, Zhu S, Xu H (2012) Detecting offensive language in social media to protect adolescent online safety. In: 2012 international conference on privacy, security, risk and trust and 2012 international confernece on social computing. IEEE, pp 71–80

  32. 32.

    Jedari B, Xia F, Chen H, Das SK, Tolba A, Zafer AM (2019) A social-based watchdog system to detect selfish nodes in opportunistic mobile networks. Future Gen Comput Syst 92:777–788

    Article  Google Scholar 

  33. 33.

    Gomathi P, Baskar S, Shakeel PM, Dhulipala VS (2019) Identifying brain abnormalities from electroencephalogram using evolutionary gravitational neocognitron neural network. Multimedia Tools Appl. https://doi.org/10.1007/s11042-019-7301-5

    Article  Google Scholar 

  34. 34.

    Shakeel PM, Tolba A, Al-Makhadmeh Z, Al-Makhadmeh M, Musa J (2019) Automatic detection of lung cancer from biomedical data set using discrete AdaBoost optimized ensemble learning generalized neural networks. Neural Comput Appl. https://doi.org/10.1007/s00521-018-03972-2

    Article  Google Scholar 

  35. 35.

    Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Eleventh international AAAI conference on web and social media

  36. 36.

    Badjatiya P, Gupta S, Gupta M, Varma V (2017) Deep learning for hate speech detection in tweets. In: Proceedings of the 26th international conference on World Wide Web companion, pp 759–760

  37. 37.

    Yao Z, Sun Y, Ding W, Rao N, Xiong H (2018) Dynamic word embeddings for evolving semantic discovery. In: Proceedings of the eleventh ACM international conference on web search and data mining, pp 673–681

  38. 38.

    Hong G (2005) Relation extraction using support vector machine. In: International conference on natural language processing. Springer, Berlin, Heidelberg, pp 366–377

    Google Scholar 

  39. 39.

    Zhang Z, Robinson D, Tepper J (2018) Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In: European semantic web conference. Springer, Cham, pp 745–760

    Google Scholar 

  40. 40.

    Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Thirtieth AAAI conference on artificial intelligence

  41. 41.

    Wackerly D, Mendenhall W, Scheaffer RL (2008) Mathematical statistics with applications, 7th edn. Thomson Higher Education, Belmont. ISBN 978-0-495-38508-0

    Google Scholar 

  42. 42.

    Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    Google Scholar 

  43. 43.

    Mikolov T, Deoras A, Kombrink S, Burget L, Černocký J (2011) Empirical evaluation and combination of advanced language modeling techniques. In: Twelfth annual conference of the international speech communication association

  44. 44.

    Powers DM (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol 2(1):37–63

    MathSciNet  Google Scholar 

  45. 45.

    Muhammed Shafi P, Selvakumar S, Mohamed Shakeel P (2018) An efficient optimal fuzzy C means (OFCM) algorithm with particle swarm optimization (PSO) to analyze and predict crime data. J Adv Res Dyn Control Syst 10(06):699–707

    Google Scholar 

  46. 46.

    Shakeel PM, Manogaran G (2018) Prostate cancer classification from prostate biomedical data using ant rough set algorithm with radial trained extreme learning neural network. Health Technol. https://doi.org/10.1007/s12553-018-0279-6

    Article  Google Scholar 

  47. 47.

    Powers DM (2012) ROC-ConCert: ROC-based measurement of consistency and certainty. In: 2012 Spring congress on engineering and technology. IEEE, pp 1–4

Download references

Acknowledgements

The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through Research Group No. RG-1439-088.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Zafer Al-Makhadmeh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Al-Makhadmeh, Z., Tolba, A. Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach. Computing 102, 501–522 (2020). https://doi.org/10.1007/s00607-019-00745-0

Download citation

Keywords

  • Social media
  • YouTube
  • Facebook
  • Twitter
  • Hate speech
  • Killer natural language processing optimizing ensemble deep learning approach

Mathematics Subject Classification

  • 01-00
  • 01-02
  • 11Axx
  • 03-04
  • 03Bxx
  • 39-00