Skip to main content

An Ensemble Method for Radicalization and Hate Speech Detection Online Empowered by Sentic Computing

Abstract

The dramatic growth of the Web has motivated researchers to extract knowledge from enormous repositories and to exploit the knowledge in myriad applications. In this study, we focus on natural language processing (NLP) and, more concretely, the emerging field of affective computing to explore the automation of understanding human emotions from texts. This paper continues previous efforts to utilize and adapt affective techniques into different areas to gain new insights. This paper proposes two novel feature extraction methods that use the previous sentic computing resources AffectiveSpace and SenticNet. These methods are efficient approaches for extracting affect-aware representations from text. In addition, this paper presents a machine learning framework using an ensemble of different features to improve the overall classification performance. Following the description of this approach, we also study the effects of known feature extraction methods such as TF-IDF and SIMilarity-based sentiment projectiON (SIMON). We perform a thorough evaluation of the proposed features across five different datasets that cover radicalization and hate speech detection tasks. To compare the different approaches fairly, we conducted a statistical test that ranks the studied methods. The obtained results indicate that combining affect-aware features with the studied textual representations effectively improves performance. We also propose a criterion considering both classification performance and computational complexity to select among the different methods.

This is a preview of subscription content, access via your institution.

Fig. 1

Notes

  1. http://www.noswearing.com/

  2. https://hatebase.org/

  3. https://tone-analyzer-demo.mybluemix.net/

  4. We use the 5th version. A new version of the resource, SenticNet 6 by [12], has been recently released and could be used for this model.

  5. https://www.kaggle.com/fifthtribe/how-isis-uses-twitter

  6. https://cnn.com

  7. https://nytimes.com

  8. https://github.com/t-davidson/hate-speech-and-offensive-language

  9. https://sentic.net/downloads/

  10. https://github.com/gsi-upm/sentic-computing-radical-hate

References

  1. Hendler J, Shadbolt N, Hall W, Berners-Lee T, Weitzner D. Web science: an interdisciplinary approach to understanding the web. Commun ACM. 2008;51(7):60–9. https://doi.org/10.1145/1364782.1364798.

    Article  Google Scholar 

  2. Cambria E, White B. Jumping NLP curves: A review of natural language processing research. IEEE Comput Intell Mag. 2014;9(2):48–57. https://doi.org/10.1109/MCI.2014.2307227.

    Article  Google Scholar 

  3. Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AY, Gelbukh A, Zhou Q. Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn Comput. 2016;8(4):757–71. https://doi.org/10.1007/s12559-016-9421-9.

    Article  Google Scholar 

  4. Tao J, Tan T. Affective computing: A review. In International Conference on Affective computing and intelligent interaction. Springer, 2005. pp. 981–95. https://doi.org/10.1007/11573548\_125.

  5. Crowston K, Allen EE, Heckman R. Using natural language processing technology for qualitative data analysis. Int J Soc Res Methodol. 2012;15(6):523–43. https://doi.org/10.1080/13645579.2011.625764.

    Article  Google Scholar 

  6. Cambria E, Hussain A. Sentic computing: A common-sense-based framework for concept-level sentiment analysis. Cogn Comput. 2015;7:183–5. https://doi.org/10.1007/s12559-015-9325-0.

    Article  Google Scholar 

  7. Araque O, Zhu G, Iglesias CA. A semantic similarity-based perspective of affect lexicons for sentiment analysis. Knowl-Based Syst. 2019;165:346–59. https://doi.org/10.1016/j.knosys.2019.105184http://www.sciencedirect.com/science/article/pii/S095070511930526X.

    Article  Google Scholar 

  8. Cambria E, Fu J, Bisio F, Poria S. AffectiveSpace 2: Enabling affective intuition for concept-level sentiment analysis. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI Press, 2015. pp. 508–14.

  9. Cambria E, Poria S, Hazarika D, Kwok K. SenticNet 5: Discovering conceptual primitives for sentiment analysis by means of context embeddings. In Thirty-Second AAAI Conference on Artificial Intelligence. 2018. pp. 1795–802. https://doi.org/10.1109/MIS.2017.4531228.

  10. Cambria E, Poria S, Gelbukh A, Thelwall M. Sentiment analysis is a big suitcase. IEEE Intell Syst. 2017;32(6):74–80. https://doi.org/10.1109/MIS.2017.4531228.

    Article  Google Scholar 

  11. Cambria E. Affective computing and sentiment analysis. IEEE Intell Syst. 2016;31(2):102–7. https://doi.org/10.1109/MIS.2016.31.

    Article  Google Scholar 

  12. Cambria E, Li Y, Xing FZ, Poria S, Kwok K. SenticNet 6: Ensemble application of symbolic and subsymbolic ai for sentiment analysis. CIKM’20, Oct 20-24. 2020. pp. 105–14. https://doi.org/10.1145/3340531.3412003.

  13. Dragoni M, Poria S, Cambria E. Ontosenticnet: A commonsense ontology for sentiment analysis. IEEE Intell Syst. 2018;33(3):77–85. https://doi.org/10.1109/MIS.2018.033001419.

    Article  Google Scholar 

  14. Weichselbraun A, Gindl S, Fischer F, Vakulenko S, Scharl A. Aspect-based extraction and analysis of affective knowledge from social media streams. IEEE Intell Syst. 2017;32(3):80–8. https://doi.org/10.1109/MIS.2017.57.

    Article  Google Scholar 

  15. Chen M, Wang S, Liang PP, Baltrušaitis T, Zadeh A, Morency LP. Multimodal sentiment analysis with word-level fusion and reinforcement learning. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. 2017. pp. 163–71. https://doi.org/10.1145/3136755.3136801.

  16. Zadeh A, Chen M, Poria S, Cambria E, Morency LP. Tensor fusion network for multimodal sentiment analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark, sep 2017. Association for Computational Linguistics. pp. 1103–14. https://doi.org/10.18653/v1/D17-1115. https://www.aclweb.org/anthology/D17-1115.

  17. Chen X, Sun Y, Athiwaratkun B, Cardie C, Weinberger K. Adversarial deep averaging networks for cross-lingual sentiment classification. Transactions of the Association for Computational Linguistics. 2018;6:557–70. https://doi.org/10.1162/tacl\_a\_00039.

    Article  Google Scholar 

  18. Esuli A, Moreo A, Sebastiani F. Cross-lingual sentiment quantification. IEEE Intell Syst. 2020;35(3):106–14. https://doi.org/10.1109/MIS.2020.2979203.

    Article  MATH  Google Scholar 

  19. Liu R, Shi Y, Ji C, Jia M. A survey of sentiment analysis based on transfer learning. IEEE Access. 2019;7:85401–12. https://doi.org/10.1109/ACCESS.2019.2925059.

    Article  Google Scholar 

  20. Hussain A, Cambria E. Semi-supervised learning for big social data analysis. Neurocomputing. 2018;275:1662–73. https://doi.org/10.1016/j.neucom.2017.10.010http://www.sciencedirect.com/science/article/pii/S0925231217316363.

    Article  Google Scholar 

  21. Park S, Lee J, Kim K. Semi-supervised distributed representations of documents for sentiment analysis. Neural Netw. 2019;119:139–50. https://doi.org/10.1016/j.neunet.2019.08.001http://www.sciencedirect.com/science/article/pii/S0893608019302187.

    Article  Google Scholar 

  22. Lo SL, Cambria E, Chiong R, Cornforth D. A multilingual semi-supervised approach in deriving singlish sentic patterns for polarity detection. Knowl-Based Syst. 2016;105:236–47. https://doi.org/10.1016/j.knosys.2016.04.024http://www.sciencedirect.com/science/article/pii/S0950705116300764.

    Article  Google Scholar 

  23. Xia Y, Cambria E, Hussain A, Zhao H. Word polarity disambiguation using bayesian model and opinion-level features. Cogn Comput. 2015;7(3):369–80. https://doi.org/10.1007/s12559-014-9298-4.

    Article  Google Scholar 

  24. Vechtomova O. Disambiguating context-dependent polarity of words: An information retrieval approach. Inf Process Manag. 2017;53(5):1062–79. https://doi.org/10.1016/j.ipm.2017.03.007http://www.sciencedirect.com/science/article/pii/S0306457316305416.

    Article  Google Scholar 

  25. Araque O, Corcuera-Platas I, Sánchez-Rada JF, Iglesias CA. Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications. 2017;77:236–46. https://doi.org/10.1016/j.eswa.2017.02.002http://www.sciencedirect.com/science/article/pii/S0957417417300751.

  26. Emre Isik Y, Görmez Y, Kaynar O, Aydin Z. Nsem: Novel stacked ensemble method for sentiment analysis. In 2018 International Conference on Artificial Intelligence and Data Processing (IDAP). 2018. pp. 1–4. https://doi.org/10.1109/IDAP.2018.8620913.

  27. Akhtar MS, Ekbal A, Cambria E. How intense are you? predicting intensities of emotions and sentiments using stacked ensemble. IEEE Comput Intell Mag. 2020;15(1):64–75.

    Article  Google Scholar 

  28. Al-Azani S, El-Alfy ESM. Using word embedding and ensemble learning for highly imbalanced data sentiment analysis in short arabic text. In ANT/SEIT. 2017. pp. 359–366. https://doi.org/10.1016/j.procs.2017.05.365

  29. Sarkar K. A stacked ensemble approach to bengali sentiment analysis. In: Tiwary US, Chaudhury S, editors. Intelligent Human Computer Interaction., ppCham: Springer International Publishing; 2020. p. 102–111.

    Chapter  Google Scholar 

  30. Oussous A, Lahcen AA, Belfkih S. Improving sentiment analysis of moroccan tweets using ensemble learning. In International Conference on Big Data, Cloud and Applications. Springer, 2018. pp. 91–104. https://doi.org/10.1007/978-3-319-96292-4\_8.

  31. Bandhakavi A, Wiratunga N, Massie S, Padmanabhan D. Lexicon generation for emotion detection from text. IEEE Intell Syst. 2017;32(1):102–8.

    Article  Google Scholar 

  32. Araque O, Gatti L, Staiano J, Guerini M. Depechemood++: a bilingual emotion lexicon built through simple yet powerful techniques. IEEE Trans Affect Comput. 2019. pp. 17877–91. https://doi.org/10.1109/TAFFC.2019.2934444.

  33. Correa D, Sureka A. Solutions to detect and analyze online radicalization: a survey. arXiv preprint 2013. arXiv:1301.4916.

  34. Fernandez M, Asif M, Alani H. Understanding the roots of radicalisation on Twitter. In Proceedings of the 10th ACM Conference on Web Science, WebSci ’18, pp. 1–10, New York, NY, USA, 2018. ACM. http://doi.acm.org/10.1145/3201064.3201082.

  35. Agarwal S, Sureka A. Topic-specific youtube crawling to detect online radicalization. In International Workshop on Databases in Networked Information Systems. Springer, 2015. pp. 133–51. https://doi.org/10.1007/978-3-319-16313-0_10.

  36. Rowe M, Saif H. Mining pro-isis radicalisation signals from social media users. In Proceedings of the tenth international AAAI conference on web and social media (ICWSM 2016). pp. 329–38.

  37. Ferrara E, Wang WQ, Varol O, Flammini A, Galstyan A. Predicting online extremism, content adopters, and interaction reciprocity. In International conference on social informatics. Springer, 2016. pp. 22–39. https://doi.org/10.1007/978-3-319-47874-6\_3.

  38. Agarwal S, Sureka A. Applying social media intelligence for predicting and identifying on-line radicalization and civil unrest oriented threats. arXiv preprint 2015. arXiv:1511.06858.

  39. López-Sáncez D, Revuelta J, de la Prieta F, Corchado JM. Towards the automatic identification and monitoring of radicalization activities in twitter. In International Conference on Knowledge Management in Organizations. Springer, 2018. pp. 589–99. https://doi.org/10.1007/978-3-319-95204-8\_49.

  40. Abbasi A, Chen H. Affect intensity analysis of dark web forums. In 2007 IEEE Intelligence and Security Informatics. IEEE, 2007. pp. 282–8. https://doi.org/10.1109/ISI.2007.379486.

  41. Chalothorn T, Ellman J. Affect analysis of radical contents on web forums using sentiwordnet. International Journal of Innovation Management and Technology. 2013;4(1):122–4.

    Google Scholar 

  42. Pennebaker JW, Francis ME, Booth RJ. Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates, 71(2001):2001.

  43. Vergani M, Bliuc A-M. The evolution of the ISIS language: a quantitative analysis of the language of the first year of Dabiq magazine. Sicurezza, Terrorismo e Società Security, Terrorism and Society. 2015;2(2):7–20.

    Google Scholar 

  44. Ghajar-Khosravi S, Kwantes P, Derbentseva N, Huey L. Quantifying salient concepts discussed in social media content: A case study using twitter content written by radicalized youth. Journal of Terrorism Research. 2016;7(2):79–90. https://doi.org/10.15664/jtr.1241.

    Article  Google Scholar 

  45. Jurek A, Mulvenna MD, Bi Y. Improved lexicon-based sentiment analysis for social media analytics. Security Informatics. 2015;4(1):1–13. https://doi.org/10.1186/s13388-015-0024-x.

    Article  Google Scholar 

  46. Saif H, Dickinson T, Kastler L, Fernandez M, Alani H. A semantic graph-based approach for radicalisation detection on social media. In European Semantic Web Conference. Springer, 2017. pp. 571–87. https://doi.org/10.1007/978-3-319-58068-5\_35.

  47. Dewan P, Suri A, Bharadhwaj V, Mithal A, Kumaraguru P. Towards understanding crisis events on online social networks through pictures. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2017. pp. 439–46. https://doi.org/10.1145/3110025.3110062.

  48. Bermingham A, Conway M, McInerney L, O’Hare N, Smeaton AF. Combining social network analysis and sentiment analysis to explore the potential for online radicalisation. In Social Network Analysis and Mining, 2009. ASONAM’09. International Conference on Advances in. IEEE, 2009. pp. 231–6. https://doi.org/10.1109/ASONAM.2009.31.

  49. Agarwal S, Sureka A. Using knn and svm based one-class classifier for detecting online radicalization on twitter. In International Conference on Distributed Computing and Internet Technology. Springer, 2015. pp. 431–42. https://doi.org/10.1007/978-3-319-14977-6\_47.

  50. Ashcroft M, Fisher A, Kaati L, Omer E, Prucha N. Detecting jihadist messages on twitter. In Intelligence and Security Informatics Conference (EISIC), 2015 European, IEEE, 2015. pp. 161–4. https://doi.org/10.1109/EISIC.2015.27.

  51. Fortuna P, Nunes S. A survey on automatic detection of hate speech in text. ACM Comput Surv. 2018;51(4):7. https://doi.org/10.1145/3232676.

    Article  Google Scholar 

  52. Dadvar M, Jong FD, Ordelman R, Trieschnigg D. Improved cyberbullying detection using gender information. In Proceedings of the Twelfth Dutch-Belgian Information Retrieval Workshop (DIR 2012). University of Ghent, 2012. pp. 23–5.

  53. Dinakar K, Reichart R, Lieberman H. Modeling the detection of textual cyberbullying. In Fifth International AAAI Conference on Weblogs and Social Media. 2011. https://ojs.aaai.org/index.php/ICWSM/article/view/14209.

  54. Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web. 2016. pp. 145–53. https://doi.org/10.1145/2872427.2883062.

  55. Nandhini BS, Sheeba J. Cyberbullying detection and classification using information retrieval algorithm. In Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering & Technology (ICARCSET 2015). pp. 1–5. https://doi.org/10.1145/2743065.2743085.

  56. Burnap P, Williams ML. Us and them: identifying cyber hate on twitter across multiple protected characteristics. EPJ Data Science. 2016;5(1):11. https://doi.org/10.1140/epjds/s13688-016-0072-6.

    Article  Google Scholar 

  57. Greevy E, Smeaton AF. Classifying racist texts using a support vector machine. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2004. pp. 468–9. https://doi.org/10.1145/1008992.1009074.

  58. Kwok I, Wang Y. Locate the hate: Detecting tweets against blacks. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. AAAI Press, 2013. p. 1621–2.

  59. Badjatiya P, Gupta S, Gupta M, Varma V. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. 2017. pp. 759–60. https://doi.org/10.1145/3041021.3054223.

  60. Davidson T, Warmsley D, Macy M, Weber I. Automated hate speech detection and the problem of offensive language. In Proceedings of the 11th International AAAI Conference on Web and Social Media, ICWSM. 2017. pp. 512–5.

  61. Liu S, Forss T. Combining n-gram based similarity analysis with sentiment analysis in web content classification. In KDIR. 2014. pp. 530–7. https://doi.org/10.5220/0005170305300537.

  62. Mehdad Y, Tetreault J. Do characters abuse more than words? In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2016. pp. 299–303. https://doi.org/10.18653/v1/W16-3638.

  63. Burnap P, Williams ML. Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making. Policy Internet. 2015;7(2):223–42. https://doi.org/10.1002/poi3.85.

    Article  Google Scholar 

  64. Warner W, Hirschberg J. Detecting hate speech on the world wide web. In Proceedings of the second workshop on language in social media. Association for Computational Linguistics, 2012. pp. 19–26.

  65. Agarwal S, Sureka A. Characterizing linguistic attributes for automatic classification of intent based racist/radicalized posts on tumblr micro-blogging website. arXiv preprint 2017. arXiv:1701.04931.

  66. Hutto CJ, Gilbert E. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth International AAAI Conference on Weblogs and Social Media, 2014.

  67. Del Vigna F, Cimino A, Dell’Orletta F, Petrocchi M, Tesconi M. Hate me, hate me not: Hate speech detection on facebook. In Proceedings of the First Italian Conference on Cybersecurity (ITASEC17). 2017 pp. 86–95.

  68. Gitari ND, Zuping Z, Damien H, Long J. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering. 2015;10(4):215–30. https://doi.org/10.14257/ijmue.2015.10.4.21.

    Article  Google Scholar 

  69. Thelwall M. The heart and soul of the web? sentiment strength detection in the social web with sentistrength. In Cyberemotions. Springer, 2017. pp. 119–34. https://doi.org/10.1007/978-3-319-43639-5\_7.

  70. Djuric N, Zhou J, Morris R, Grbovic M, Radosavljevic V, Bhamidipati N. Hate speech detection with comment embeddings. In Proceedings of the 24th international conference on world wide web. 2015. pp. 29–30. https://doi.org/10.1145/2740908.2742760.

  71. Le Q, Mikolov T. Distributed representations of sentences and documents. In International Conference on Machine Learning. 2014. pp. 1188–96.

  72. Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics. 2017;5:135–46. https://doi.org/10.1162/tacl\_a\_00051.

    Article  Google Scholar 

  73. Khatua A, Cambria E, Khatua A. Sounds of silence breakers: exploring sexual violence on twitter. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 2018pp. 397–400. https://doi.org/10.1109/ASONAM.2018.8508576.

  74. Zhang Z, Luo L. Hate speech detection: A solved problem? the challenging case of long tail on twitter. Semantic Web. 2019;10(5):925–45. https://doi.org/10.3233/SW-180338.

    Article  Google Scholar 

  75. Mathew B, Dutt R, Goyal P, Mukherjee A. Spread of hate speech in online social media. In Proceedings of the 10th ACM Conference on Web Science. 2019. pp. 173–82. https://doi.org/10.1145/3292522.3326034.

  76. Araque O, Iglesias CA. An Approach for Radicalization Detection Based on Emotion Signals and Semantic Similarity. IEEE Access. 2020;8:17877–91. https://doi.org/10.1109/ACCESS.2020.2967219.

    Article  Google Scholar 

  77. Araque O, Gatti L, Kalimeri K. MoralStrength: Exploiting a moral lexicon and embedding similarity for moral foundations prediction. Knowl-Based Syst. 2019;105184:11. https://doi.org/10.1016/j.knosys.2019.105184.

    Article  Google Scholar 

  78. Benito D, Araque O, Iglesias CA. GSI-UPM at SemEval-2019 Task 5: Semantic Similarity and Word Embeddings for Multilingual Detection of Hate Speech Against Immigrants and Women on Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation. Minneapolis, Minnesota, USA, 2019. Association for Computational Linguistics. pp. 396–403. https://doi.org/10.18653/v1/S19-2070https://www.aclweb.org/anthology/S19-2070.

  79. Baeza-Yates R, Ribeiro-Neto B et al. Modern information retrieval, volume 463. ACM press New York, 1999.

  80. Gambhir HK. Dabiq: The strategic messaging of the islamic state. Institute for the Study of War, 15, 2014.

  81. Mahzam R. Rumiyah: Jihadist propaganda and information warfare in cyberspace. Counter Terrorist Trends and Analyses. 2017;9(3):8–14. http://www.jstor.org/stable/26351502.

  82. Azman NA. Islamic state (is) propaganda: Dabiq and future directions of islamic state. Counter Terrorist Trends and Analyses. 2016;8(10):3–8. https://doi.org/10.1145/3041021.3054223.

    Article  Google Scholar 

  83. Basile V, Bosco C, Fersini E, Nozza D, Patti V, Pardo FMR, Rosso P, Sanguinetti M. Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation. 2019. pp. 54–63. https://doi.org/10.18653/v1/S19-2007.

  84. Fersini E, Nozza D, Rosso P. Overview of the evalita 2018 task on automatic misogyny identification (ami). EVALITA Evaluation of NLP and Speech Tools for Italian. 2018;12:59. https://doi.org/10.4000/books.aaccademia.4497.

    Article  Google Scholar 

  85. Demšar J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7(Jan):1–30, 2006.

Download references

Acknowledgements

The authors would like to thank Miriam Fernandez and Harith Alani for sharing part of the data used in this research.

Funding

This work was supported by the European Union’s Horizon 2020 Research and Innovation Programme under project Participation (grant agreement no. SEP-210655026) and by the Spanish Ministry of Science and Innovation through project COGNOS (PID2019-105484RB-I00).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oscar Araque.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Araque, O., Iglesias, C.A. An Ensemble Method for Radicalization and Hate Speech Detection Online Empowered by Sentic Computing. Cogn Comput 14, 48–61 (2022). https://doi.org/10.1007/s12559-021-09845-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-021-09845-6

Keywords

  • Sentic computing
  • Affective computing
  • Radicalization detection
  • Hate speech detection
  • Machine learning
  • Natural language processing