Skip to main content

Transfer Learning for Detecting Hateful Sentiments in Code Switched Language

Part of the Algorithms for Intelligent Systems book series (AIS)

Abstract

With the phenomenal increase in the penetration of social media in linguistically diverse demographic regions, conversations have become more casual and multilingual. The rise of informal code-switched multilingual languages makes it tough for automated systems to monitor instances of hate speech, which are further intelligently disguised through the use of spelling variations, code-mixing, homophones, homonyms, and the absence of sophisticated grammar rules. Machine transliteration can be employed for converting the code-switched text into a singular script but poses the challenge of the semantical breakdown of the text. To overcome this drawback, this chapter investigates the application of transfer learning. The CNN-based neural models are trained on a large dataset of hateful tweets in a chosen primary language, followed by retraining on the small transliterated dataset in the same language. Since transfer learning can act as an effective strategy to reuse already learned features in learning a specialized task through cross-domain knowledge transfer, hate speech classification on a large English corpus can act as source tasks to help in obtaining pre-trained deep learning classifiers for the target task of classifying tweets translated in English from other code-switched languages. Effects of the different types of popular word embeddings and multiple supervised inputs such as the LIWC, the presence of profanities, and sentiment are carefully studied to derive the most representative combination of input settings that can help achieve state-of-the-art hate speech detection from code-switched multilingual short texts on Twitter.

Keywords

  • Hate speech
  • Code-switching
  • Transfer learning
  • Multilingual
  • Social media
  • Offensive text classification

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-981-15-1216-2_7
  • Chapter length: 34 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   139.00
Price excludes VAT (USA)
  • ISBN: 978-981-15-1216-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   179.99
Price excludes VAT (USA)
Hardcover Book
USD   179.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. Agarwal, Apoorv, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca Passonneau. 2011. Sentiment analysis of Twitter data. In Proceedings of the Workshop on Language in Social Media (LSM 2011), 30–38

    Google Scholar 

  2. Ayyar, Meghna, Puneet Mathur, Rajiv Ratn Shah, and Shree G. Sharma. 2018. Harnessing AI for kidney Glomeruli classification. In 2018 IEEE International Symposium on Multimedia (ISM), 17–20. New York: IEEE

    Google Scholar 

  3. Badjatiya, Pinkesh, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion, 759–760. International World Wide Web Conferences Steering Committee

    Google Scholar 

  4. Bali, Kalika, Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas. 2014. I am borrowing ya mixing? An analysis of English-Hindi code mixing in Facebook. In Proceedings of the First Workshop on Computational Approaches to Code Switching, 116–126

    Google Scholar 

  5. Bohra, Aditya, Deepanshu Vijay, Vinay Singh, Syed Sarfaraz Akhtar, and Manish Shrivastava. 2018. A dataset of Hindi-English code-mixed social media text for hate speech detection. In Proceedings of the Second Workshop on Computational Modeling of Peoples Opinions, Personality, and Emotions in Social Media, 36–41

    Google Scholar 

  6. Bojanowski, Piotr, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5: 135–146

    CrossRef  Google Scholar 

  7. Cavnar, William B., John M. Trenkle, et al. 1994. N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, vol. 161175. Citeseer

    Google Scholar 

  8. Chowdhury, Arijit Ghosh, Ramit Sawhney, Puneet Mathur, Debanjan Mahata, and Rajiv Ratn Shah. 2019. Speak up, fight back! detection of social media disclosures of sexual harassment. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, 136–146

    Google Scholar 

  9. Das, Amitava, and Björn Gambäck. 2014. Identifying languages at the word level in code-mixed indian social media text. In Proceedings of the 11th International Conference on Natural Language Processing, 378–387

    Google Scholar 

  10. Davidson, Thomas, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Eleventh International AAAI Conference on Web and Social Media

    Google Scholar 

  11. Godin, Fréderic, Baptist Vandersmissen, Wesley De Neve, and Rik Van de Walle. 2015. Multimedia Lab @ ACL W-NUT NER shared task: Named entity recognition for twitter microposts using distributed word representations. In Proceedings of the Workshop on Noisy User-Generated Text, 146–153

    Google Scholar 

  12. Gupta, Deepak, Ankit Lamba, Asif Ekbal, and Pushpak Bhattacharyya. 2016. Opinion mining in a code-mixed environment: A case study with government portals. In Proceedings of the 13th International Conference on Natural Language Processing, 249–258

    Google Scholar 

  13. Gupta, Deepak, Shubham Tripathi, Asif Ekbal, and Pushpak Bhattacharyya. 2017. SMPOST: Parts of speech tagger for code-mixed Indic social media text. arXiv preprint arXiv:1702.00167

  14. Haccianella, S., A. Esuli, and F. Sebastiani. 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the Seventh Conference on International Language Resources and Evaluation

    Google Scholar 

  15. Huffman, Stephen. 1995. Acquaintance: Language-independent document categorization by n-grams. Technical report, Department of Defense Fort George G Meade MD

    Google Scholar 

  16. Jain, Roopal, Ramit Sawhney, and Puneet Mathur. 2018. Feature selection for cryotherapy and immunotherapy treatment methods based on gravitational search algorithm. In 2018 International Conference on Current Trends Towards Converging Technologies (ICCTCT), 1–7. New York: IEEE

    Google Scholar 

  17. Jangid, Hitkul, Shivangi Singhal, Rajiv Ratn Shah, and Roger Zimmermann. 2018. Aspect-based financial sentiment analysis using deep learning. In Companion of the The Web Conference 2018 on The Web Conference 2018, 1961–1966. International World Wide Web Conferences Steering Committee

    Google Scholar 

  18. Jhanwar, Madan Gopal, and Arpita Das. 2018. An ensemble model for sentiment analysis of Hindi-English code-mixed data. arXiv preprint arXiv:1806.04450

  19. Kapoor, Raghav, Yaman Kumar, Kshitij Rajput, Rajiv Ratn Shah, Ponnurangam Kumaraguru, and Roger Zimmermann. 2018. Mind your language: Abuse and offense detection for code-switched languages. arXiv preprint arXiv:1809.08652

  20. Kingma, Diederik P., and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  21. Lafferty, John, Andrew McCallum, and Fernando C.N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data

    Google Scholar 

  22. Lodhi, Huma, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins. 2002. Text classification using string kernels. Journal of Machine Learning Research 2: 419–444

    Google Scholar 

  23. Maas, Andrew L., Awni Y. Hannun, and Andrew Y. Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of ICML, vol. 30, 3

    Google Scholar 

  24. Mahata, Debanjan, Jasper Friedrichs, Rajiv Ratn Shah, et al. 2018. # phramacovigilance-exploring deep learning techniques for identifying mentions of medication intake from twitter. arXiv preprint arXiv:1805.06375

  25. Mahata, Debanjan, Haimin Zhang, Karan Uppal, Yaman Kumar, Rajiv Shah, Simra Shahid, Laiba Mehnaz, and Sarthak Anand. 2019. MIDAS at SemEval-2019 task 6: Identifying offensive posts and targeted offense from twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation, 683–690

    Google Scholar 

  26. Mathur, Puneet, Meghna Ayyar, Rajiv Ratn Shah, and Sg Sharma. 2019. Exploring classification of histological disease biomarkers from renal biopsy images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 81–90. New York: IEEE

    Google Scholar 

  27. Mathur, Puneet, Ramit Sawhney, Meghna Ayyar, and Rajiv Shah. 2018. Did you offend me? classification of offensive tweets in Hinglish language. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), 138–148

    Google Scholar 

  28. Mathur, Puneet, Rajiv Shah, Ramit Sawhney, and Debanjan Mahata. 2018. Detecting offensive tweets in Hindi-English code-switched language. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media, 18–26

    Google Scholar 

  29. Mave, Deepthi, Suraj Maharjan, and Thamar Solorio. 2018. Language identification and analysis of code-switched social media text. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching, 51–61

    Google Scholar 

  30. Meghawat, Mayank, Satyendra Yadav, Debanjan Mahata, Yifang Yin, Rajiv Ratn Shah, and Roger Zimmermann. 2018. A multimodal approach to predict social media popularity. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 190–195. New York: IEEE

    Google Scholar 

  31. Mishra, Rohan, Pradyumn Prakhar Sinha, Ramit Sawhney, Debanjan Mahata, Puneet Mathur, and Rajiv Ratn Shah. 2019. SNAP-BATNET: Cascading author profiling and social network graphs for suicide ideation detection on social media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, 147–156

    Google Scholar 

  32. Mohammad, Saif. 2012. Portable features for classifying emotional text. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 587–591. Association for Computational Linguistics

    Google Scholar 

  33. Pan, Sinno Jialin and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22 (10): 1345–1359

    CrossRef  Google Scholar 

  34. Pang, Bo, Lillian Lee, et al. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2): 1–135

    Google Scholar 

  35. Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12: 2825–2830

    Google Scholar 

  36. Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543

    Google Scholar 

  37. Prabhu, Ameya, Aditya Joshi, Manish Shrivastava, and Vasudeva Varma. 2016. Towards sub-word level compositions for sentiment analysis of Hindi-English code mixed text. arXiv preprint arXiv:1611.00472

  38. Purver, Matthew, and Stuart Battersby. 2012. Experimenting with distant supervision for emotion classification. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 482–491. Association for Computational Linguistics

    Google Scholar 

  39. Rao, Pattabhi R.K., and Sobha Lalitha Devi. 2016. CMEE-IL: Code mix entity extraction in Indian languages from social media text@ fire 2016-an overview. In FIRE (Working Notes), 289–295

    Google Scholar 

  40. Sawhney, Ramit, Prachi Manchanda, Puneet Mathur, Rajiv Shah, and Raj Singh. 2018. Exploring and learning suicidal ideation connotations on social media with deep learning. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 167–175

    Google Scholar 

  41. Sawhney, Ramit, Prachi Manchanda, Raj Singh, and Swati Aggarwal. 2018. A computational approach to feature extraction for identification of suicidal ideation in tweets. In Proceedings of ACL 2018, Student Research Workshop, 91–98

    Google Scholar 

  42. Sawhney, Ramit, Puneet Mathur, and Ravi Shankar. 2018. A firefly algorithm based wrapper-penalty feature selection method for cancer diagnosis. In International Conference on Computational Science and Its Applications, 438–449. Berlin: Springer

    Google Scholar 

  43. Sawhney, Ramit, Ravi Shankar, and Roopal Jain. 2018. A comparative study of transfer functions in binary evolutionary algorithms for single objective optimization. In International Symposium on Distributed Computing and Artificial Intelligence, 27–35. Berlin: Springer

    Google Scholar 

  44. Shah, Rajiv Ratn. 2016. Multimodal analysis of user-generated content in support of social media applications. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 423–426. New York: ACM

    Google Scholar 

  45. Shah, Rajiv Ratn, Debanjan Mahata, Vishal Choudhary, and Rajiv Bajpai. 2018. Multimodal semantics and affective computing from multimedia content. In Intelligent Multidimensional Data and Image Processing, 359–382. IGI Global

    Google Scholar 

  46. Shah, Rajiv Ratn, Anwar Dilawar Shaikh, Yi Yu, Wenjing Geng, Roger Zimmermann, and Gangshan Wu. 2015. Eventbuilder: Real-time multimedia event summarization by visualizing social media. In Proceedings of the 23rd ACM International Conference on Multimedia, 185–188. New York: ACM

    Google Scholar 

  47. Shah, Rajiv Ratn, Yi Yu, Anwar Dilawar Shaikh, Suhua Tang, and Roger Zimmermann. ATLAS: automatic temporal segmentation and annotation of lecture videos based on modelling transition time. In Proceedings of the 22nd ACM International Conference on Multimedia, 209–212. New York: ACM

    Google Scholar 

  48. Sharma, Shashank, P.Y.K.L. Srinivas, and Rakesh Chandra Balabantaray. 2015. Text normalization of code mix and sentiment analysis. In 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 1468–1473. New York: IEEE

    Google Scholar 

  49. Singh, Kushagra, Indira Sen, and Ponnurangam Kumaraguru. 2018. Language identification and named entity recognition in Hinglish code mixed tweets. In Proceedings of ACL 2018, Student Research Workshop, 52–58

    Google Scholar 

  50. Solorio, Thamar, Melissa Sherman, Yang Liu, Lisa M. Bedore, Elisabeth D. Peña, and Aquiles Iglesias. 2011. Analyzing language samples of Spanish–English bilingual children for the automated prediction of language dominance. Natural Language Engineering, 17(3): 367–395

    Google Scholar 

  51. Vyas, Yogarshi, Spandana Gella, Jatin Sharma, Kalika Bali, and Monojit Choudhury. 2014. POS tagging of English-Hindi code-mixed social media content. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 974–979

    Google Scholar 

  52. Wang, Sida, and Christopher D. Manning. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, 90–94. Association for Computational Linguistics

    Google Scholar 

  53. Warner, William, and Julia Hirschberg. 2012. Detecting hate speech on the world wide web. In Proceedings of the Second Workshop on Language in Social Media, 19–26. Association for Computational Linguistics

    Google Scholar 

  54. Zhang, Haimin, Debanjan Mahata, Simra Shahid, Laiba Mehnaz, Sarthak Anand, Yaman Singla, Rajiv Ratn Shah, and Karan Uppal. 2019. Identifying offensive posts and targeted offense from twitter. arXiv preprint arXiv:1904.09072

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajiv Ratn Shah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Rajput, K., Kapoor, R., Mathur, P., Hitkul, Kumaraguru, P., Shah, R.R. (2020). Transfer Learning for Detecting Hateful Sentiments in Code Switched Language. In: Agarwal, B., Nayak, R., Mittal, N., Patnaik, S. (eds) Deep Learning-Based Approaches for Sentiment Analysis. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-1216-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-1216-2_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-1215-5

  • Online ISBN: 978-981-15-1216-2

  • eBook Packages: EngineeringEngineering (R0)