Skip to main content

Advertisement

Log in

Multi-input integrative learning using deep neural networks and transfer learning for cyberbullying detection in real-time code-mix data

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Automatic detection of cyberbullying in social media content is a natural language understanding and generic text classification task. The cultural diversities, country-specific trending topics hash-tags on social media, the unconventional use of typographical resources such as capitals, punctuation, emojis and easy availability of native language keyboards add to the variety and volume of user-generated content compounding the linguistic challenges. This research focuses on cyberbullying detection in the code-mix data, specifically the Hinglish, which refers to the juxtaposition of words from the Hindi and English languages. We explore the problem of cyberbullying prediction and propose MIIL-DNN, a multi-input integrative learning model based on deep neural networks. MIIL-DNN combines information from three sub-networks to detect and classify bully content in real-time code-mix data. It takes three inputs, namely English language features, Hindi language features (transliterated Hindi converted to the Hindi language) and typographic features, which are learned separately using sub-networks (capsule network for English, bi-LSTM for Hindi and MLP for typographic). These are then combined into one unified representation to be used as the input for a final regression output with linear activation. The advantage of using this model-level multi-lingual fusion is that it operates with the unique distribution of each input type without increasing the dimensionality of the input space. The robustness of the technique is validated on two datasets created by scraping data from the popular social networking sites, namely Twitter and Facebook. Experimental evaluation reveals that MIIL-DNN achieves superlative performance in terms of AUC-ROC curve on both the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://www.firstsiteguide.com/cyberbullying-stats/.

  2. https://www.pewresearch.org/internet/wp-content/uploads/sites/9/media/Files/Reports/2007/PIP-Cyberbullying-Memo.pdf.pdf.

  3. https://www.cyberbullying.org/.

  4. https://www.unicef.org/press-releases/unicef-poll-more-third-young-people-30-countries-report-being-victim-online-bullying.

  5. https://www.ditchthelabel.org/wp-content/uploads/2017/07/The-Annual-Bullying-Survey-2017-1.pdf.

  6. https://www.en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers.

  7. Google Input Tools: https://www.google.co.in/inputtools/try/.

  8. SMS Dictionary. Vodacom Messaging. Retrieved 16 March 2012.

  9. https://www.emojipedia.org/.

  10. https://www.nltk.org/.

  11. https://www.cs.cmu.edu/~biglou/resources/bad-words.txt.

  12. https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge.

References

  1. Kumar, A., Jaiswal, A.: Systematic literature review of sentiment analysis on Twitter using soft computing techniques. Concurr. Comput. Pract. Exp. 32(1), e5107 (2020)

    Article  MathSciNet  Google Scholar 

  2. Kumar, A., Sharma, A.: Systematic literature review on opinion mining of big data for government intelligence. Webology 14(2), 6–47 (2017)

    Google Scholar 

  3. Brown L (2012) New Harvard study shows why social media is so addictive for many. [online] WTWH Marketing Lab. https://www.marketing.wtwhmedia.com/new-harvard-study-shows-why-social-media-is-so-addictive-for-many/. Accessed 27 Jan 2020

  4. Campbell, M.A.: Cyber bullying: an old problem in a new guise? J. Psychol. Couns. Sch. 15(1), 68–76 (2005)

    Google Scholar 

  5. Child Rights and You (CRY): Online Safety and Internet Addiction (A Study Conducted Amongst Adolescents in Delhi-NCR). Child Rights and You, New Delhi (2020)

    Google Scholar 

  6. Kumar, A., Sachdeva, N.: Cyberbullying detection on social multimedia using soft computing techniques: a meta-analysis. Multimed. Tools Appl. 78(17), 23973–24010 (2019)

    Article  Google Scholar 

  7. Patra, B.G., Das, D., Das, A.: Sentiment analysis of code-mixed Indian languages: an overview of SAIL_Code-Mixed Shared Task@ ICON-2017. arXiv preprint. arXiv:1803.06745 (2018)

  8. Parshad, R.D., Bhowmick, S., Chand, V., Kumari, N., Sinha, N.: What is India speaking? Exploring the “Hinglish” invasion. Phys. A 449, 375–389 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  9. Jain, D., Kumar, A., Garg, G.: Sarcasm detection in mash-up language using soft-attention based bi-directional LSTM and feature-rich CNN. Appl. Soft Comput. 91, 106198 (2020). https://doi.org/10.1016/j.asoc.2020.106198

    Article  Google Scholar 

  10. Rosa, H., Pereira, N., Ribeiro, R., Ferreira, P.C., Carvalho, J.P., Oliveira, S., Trancoso, I.: Automatic cyberbullying detection: a systematic review. Comput. Hum. Behav. 93, 333–345 (2019)

    Article  Google Scholar 

  11. Salawu, S., He, Y., Lumsden, J.: Approaches to automated detection of cyberbullying: a survey. IEEE Trans. Affect. Comput. 1, 1–20 (2017)

    Google Scholar 

  12. Reynolds, K., Kontostathis. A., Edwards, L.: Using machine learning to detect cyberbullying. In: Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference, vol. 2, pp. 241–244. IEEE (2011)

  13. Dinakar, K., Reichart, R., Lieberman, H.: Modeling the detection of textual cyberbullying. In: International AAAI Conference on Web and Social Media, North America, July 2011 (2016)

  14. Dadvar, M., Trieschnigg, D., Ordelman, R., de Jong, F.: Improving cyberbullying detection with user context. In: European Conference on Information Retrieval, pp. 693–696. Springer, Berlin, Heidelberg (2013)

  15. Dadvar, M., Trieschnigg, D., de Jong, F.: Experts and machines against bullies: a hybrid approach to detect cyberbullies. In: Canadian Conference on Artificial Intelligence, pp. 275–281. Springer, Cham (2014)

  16. Kontostathis, A., Reynolds, K., Garron, A., Edwards, L.: Detecting cyberbullying: query terms and techniques. In: Proceedings of the 5th Annual ACM web Science Conference, pp. 195–204 (2013)

  17. Potha, N., Maragoudakis, M., Lyras, D.: A biology-inspired, data mining framework for extracting patterns in sexual cyberbullying data. Knowl. Based Syst. 96, 134–155 (2016)

    Article  Google Scholar 

  18. Hosseinmardi, H., Rafiq, R.I., Han, R., Lv, Q., Mishra, S.: Prediction of cyberbullying incidents in a media based social network. In: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 186–192 (2016)

  19. Hammer, H.L.: Automatic detection of hateful comments in online discussion. In: International Conference on Industrial Networks and Intelligent Systems, pp 164–173. Springer, Cham (2016)

  20. Sarna, G., Bhatia, M.P.: Content based approach to find the credibility of user in social networks: an application of cyberbullying. Int. J. Mach. Learn. Cybern. 8(2), 677–689 (2017)

    Article  Google Scholar 

  21. Zhang, X., Tong, J., Vishwamitra, N., Whittaker, E., Mazer, J.P., Kowalski, R., Hu, H., Luo, F., Macbeth, J., Dillon, E.: Cyberbullying detection with a pronunciation based convolutional neural network. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 740–745 (2016)

  22. Zhao, R., Mao, K.: Cyberbullying detection based on semantic-enhanced marginalized denoising autoencoder. IEEE Trans. Affect. Comput. 8(3), 328–339 (2017)

    Article  Google Scholar 

  23. Zhao, R., Zhou, A., Mao, K.: Automatic detection of cyberbullying on social networks based on bullying features. In: Proceedings of the 17th International Conference on Distributed Computing and Networking, pp. 43–48 (2016)

  24. Raisi, E., Huang, B.: Cyberbullying detection with weakly supervised machine learning. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 409–416. ACM (2017)

  25. Rakib, T.B., Soon, L.K.: Using the Reddit Corpus for cyberbully detection. In: Asian Conference on Intelligent Information and Database Systems, p. 180. Springer, Cham (2018)

  26. Ptaszynski, M., Pieciukiewicz, A., Dybała, P.: Results of the PolEval 2019 shared task 6: first dataset and open shared task for automatic cyberbullying detection in Polish Twitter. In: Proceedings of the PolEval2019 Workshop, p. 89 (2019)

  27. Gordeev, D.: Automatic detection of verbal aggression for Russian and American image boards. Procedia Soc. Behav. Sci. 236, 71–75 (2016)

    Article  Google Scholar 

  28. Ibrohim, M.O., Budi, I.: Multi-label hate speech and abusive language detection in Indonesian Twitter. In: Proceedings of the Third Workshop on Abusive Language Online, pp. 46–57 (2019)

  29. Pratiwi, N.I., Budi, I., Jiwanggi, M.A.: Hate Speech Identification using the Hate Codes for Indonesian Tweets. In: Proceedings of the 2019 2nd International Conference on Data Science and Information Technology, pp. 128–133 (2019)

  30. Haidar, B., Chamoun, M., Serhrouchni, A.: Multilingual cyberbullying detection system: detecting cyberbullying in Arabic content. In: 2017 1st Cyber Security in Networking Conference (CSNet), pp. 1–8. IEEE (2017)

  31. Haidar, B., Chamoun, M., Serhrouchni, A.: A multilingual system for cyberbullying detection: Arabic content detection using machine learning. Adv. Sci. Technol. Eng. Syst J. 2(6), 275–284 (2017)

    Article  Google Scholar 

  32. Pawar, R., Raje, R.R.: Multilingual cyberbullying detection system. In: 2019 IEEE International Conference on Electro Information Technology (EIT), pp. 040–044. IEEE (2019)

  33. Arreerard, R., Senivongse, T.: Thai defamatory text classification on social media. In: 2018 IEEE International Conference on Big Data, Cloud Computing, Data Science and Engineering (BCD), pp. 73–78. IEEE (2018)

  34. Tarwani, S., Jethanandani, M., Kant, V.: Cyberbullying detection in Hindi–English code-mixed language using sentiment classification. In: International Conference on Advances in Computing and Data Sciences, pp. 543–551. Springer, Singapore (2019)

  35. Bohra, A., Vijay, D., Singh, V., Akhtar, S.S., Shrivastava, M.: A dataset of Hindi–English code-mixed social media text for hate speech detection. In: Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, pp. 36–41 (2018)

  36. Singh, V., Varshney, A., Akhtar, S. S., Vijay, D., Shrivastava, M.: Aggression detection on social media text using deep neural networks. In: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2) ,pp. 43–50 (2018)

  37. Santosh, T.Y.S.S., Aravind, K.V.S.: Hate speech detection in Hindi–English code-mixed social media text. In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, pp. 310–313 (2019)

  38. Gupta, V.K.: “Hinglish” language-modeling a messy code-mixed language. arXiv preprint. arXiv:1912.13109 (2019)

  39. Haidar, B., Chamoun, M., Yamout, F.: Cyberbullying detection: a survey on multilingual techniques. In: 2016 European Modelling Symposium (EMS), pp. 165–171. IEEE (2016)

  40. Al-Hassan, A., Al-Dossari, H.: Detection of hate speech in social networks: a survey on multilingual corpus. In: 6th International Conference on Computer Science and Information Technology (2019)

  41. Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 13(3), 55–75 (2018)

    Article  Google Scholar 

  42. Araci, D.: FinBERT: financial sentiment analysis with pre-trained language models. arXiv preprint. arXiv:1908.10063 (2019)

  43. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)

  44. Kumar, A., Srinivasan, K., Cheng, W.H., Zomaya, A.Y.: Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Inf. Process. Manag. 57(1), 102141 (2020)

    Article  Google Scholar 

  45. Loper, E., Bird, S.: NLTK: The natural language toolkit. In: Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, vol. 1, pp. 63–70. Association for Computational Linguistics (2002)

  46. Knight, K., Graehl, J.: Machine transliteration. Comput. Linguist. 24(4), 599–612 (1998)

    Google Scholar 

  47. Kumar, A., Jaiswal, A.: Swarm intelligence based optimal feature selection for enhanced predictive sentiment accuracy on Twitter. Multimed. Tools Appl. 78(20), 29529–29553 (2019)

    Article  Google Scholar 

  48. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

  49. Zhao, W., Ye, J., Yang, M., Lei, Z., Zhang, S., Zhao, Z.: Investigating capsule networks with dynamic routing for text classification. arXiv preprint. arXiv:1804.00538 (2018)

  50. Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273–278. IEEE (2013)

  51. Srivastava, S., Khurana, P., Tewari, V.: Identifying aggression and toxicity in comments using capsule network. In: Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pp. 98–105 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akshi Kumar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, A., Sachdeva, N. Multi-input integrative learning using deep neural networks and transfer learning for cyberbullying detection in real-time code-mix data. Multimedia Systems 28, 2027–2041 (2022). https://doi.org/10.1007/s00530-020-00672-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-020-00672-7

Keywords

Navigation