Skip to main content
Log in

Faux Hate: unravelling the web of fake narratives in spreading hateful stories: a multi-label and multi-class dataset in cross-lingual Hindi-English code-mixed text

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Social media has undeniably transformed the way people communicate; however, it also comes with unquestionable drawbacks, notably the proliferation of fake and hateful comments. Recent observations have indicated that these two issues often coexist, with discussions on hate topics frequently being dominated by the fake. Therefore, it has become imperative to explore the role of fake narratives in the dissemination of hate in contemporary times. In this direction, the proposed article introduces a novel data set known as the Faux Hate Multi-Label Data set (FHMLD) comprising 8014 fake-instigated hateful comments in Hindi-English code-mixed text. To the best of our knowledge, this marks the first endeavour to bring together both fake and hateful content within a unified framework. Further, the proposed data set is collected from diverse platforms such as YouTube and Twitter to mitigate user-associated bias. To investigate a relation between the presence of fake narratives and its impact on the intensity of the hate, this study presents a statistical analysis using the Chi-square test. The statistical findings indicate that the calculated \(\chi ^2\) value is greater than the value from the standard table, leading to the rejection of the null hypothesis. Additionally, the current study present baseline methods for categorizing multi-class and multi-label data set, utilizing syntactical and semantic features at both word and sentence levels. The experimental results demonstrate that the fastText and SVM based method outperforms others models with an accuracy of 71% and 58% for binary fake–hate and severity prediction respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/origins-of-the-virus.

  2. https://thewire.in/communalism/delhi-riots-misinformation-radicalisation-social-media.

  3. https://www.bbc.com/news/world-asia-india-59338245.

  4. https://twitter.com/.

  5. https://www.youtube.com/.

  6. https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers.

  7. https://www.facebook.com/.

  8. https://www.instagram.com/.

  9. https://items.ssrc.org/disinformation-democracy-and-conflict-prevention/classifying-and-identifying-the-intensity-of-hate-speech/.

  10. https://items.ssrc.org/disinformation-democracy-and-conflict-prevention/classifying-and-identifying-the-intensity-of-hate-speech/.

  11. https://items.ssrc.org/disinformation-democracy-and-conflict-prevention/classifying-and-identifying-the-intensity-of-hate-speech/.

  12. https://twitter.com/.

  13. https://www.youtube.com/.

  14. https://www.altnews.in/.

  15. https://www.boomlive.in/fact-check.

  16. https://www.factchecker.in/fact-check.

  17. https://factly.in/.

  18. https://github.com/JustAnotherArchivist/snscrape.

  19. https://pypi.org/project/youtube-comment-scraper-python/.

  20. https://transparency.fb.com/en-gb/policies/community-standards/hate-speech/.

  21. In Table 7 L: low, M: medium, H: high, I: individual, O: collective, and R: religion.

  22. https://matplotlib.org/.

  23. https://scikit-learn.org/.

  24. https://pypi.org/project/gensim/.

  25. https://huggingface.co/models.

  26. https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html.

  27. In Fig. 10 label 0: Nill, O: Collective, I: Individual, R: Religion, L: Low severity, M: Medium severity, H: High severity.

References

  • Akash, B., Badam, J., Raju, K., & Chakraborty, D. (2021). A poster on learnings from an attempt to build an NLP-based fake news classification system for Hindi. In ACM SIGCAS conference on computing and sustainable societies (pp. 397–401).

  • Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of Economic Perspectives, 31(2), 211–236.

    Article  Google Scholar 

  • Banerjee, S., Sarkar, M., Agrawal, N., Saha, P., & Das, M. (2021). Exploring transformer based models to identify hate speech and offensive content in English and Indo-Aryan languages. arXiV Preprint. arXiv:2111.13974

  • Bhardwaj, M., Akhtar, M. S., Ekbal, A., Das, A., & Chakraborty, T. (2020). Hostility detection dataset in Hindi. . arXiV Preprint. arXiv:2011.03588

  • Biradar, S., & Saumya, S. (2022). Iiitdwd@ tamilnlp-acl2022: Transformer-based approach to classify abusive content in dravidian code-mixed text. In Proceedings of the 2nd workshop on speech and language technologies for Dravidian languages (pp. 100–104).

  • Biradar, S., Saumya, S., & Chauhan, A. (2021). Hate or non-hate: Translation based hate speech identification in code-mixed hinglish data set. In 2021 IEEE international conference on big data (Big Data) (pp. 2470–2475). IEEE.

  • Biradar, S., Saumya, S., & Chauhan, A. (2022a). Combating the infodemic: Covid-19 induced fake news recognition in social media networks. Complex & Intelligent Systems, 9(3), 2879–2891.

  • Biradar, S., Saumya, S., & Chauhan, A. (2022c). Fighting hate speech from bilingual hinglish speaker’s perspective, a transformer-and translation-based approach. Social Network Analysis and Mining, 12(1), 87.

    Article  Google Scholar 

  • Biradar, S., Saumya, S., Kumar, A., & Singh, A. (2022b). Pradvis vac: A socio-demographic dataset for determining the level of hatred severity in a low-resource hinglish language. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3573199

  • Bisht, A., Singh, A., Bhadauria, H., Virmani, J., & Kriti. (2020). Detection of hate speech and offensive language in Twitter data using LSTM model. In Recent trends in image and signal processing in computer vision (pp. 243–264).

  • Bohra, A., Vijay, D., Singh, V., Akhtar, S. S., & Shrivastava, M. (2018). A dataset of Hindi-English code-mixed social media text for hate speech detection. In Proceedings of the second workshop on computational modeling of people’s opinions, personality, and emotions in social media (pp. 36–41).

  • Bojanowski, P., Grave, É., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.

    Article  Google Scholar 

  • Caselli, T., Basile, V., Mitrović, J., & Granitzer, M. (2021). Hatebert: Retraining bert for abusive language detection in English. In Proceedings of the 5th workshop on online abuse and harms (WOAH 2021) (pp. 17–25).

  • Chopra, S., Sawhney, R., Mathur, P., & Shah, R. R. (2020). Hindi-english hate speech detection: Author profiling, debiasing, and practical perspectives. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 386–393.

    Article  Google Scholar 

  • Chowdhury, S. A., Mubarak, H., Abdelali, A., Jung, S.-G., Jansen, B. J., & Salminen, J. (2020). A multi-platform arabic news comment dataset for offensive language detection. In Proceedings of the twelfth language resources and evaluation conference (pp. 6203–6212).

  • De, A., Bandyopadhyay, D., Gain, B., & Ekbal, A. (2021). A transformer-based approach to multilingual fake news detection in low-resource languages. Transactions on Asian and Low-Resource Language Information Processing, 21(1), 1–20.

    Google Scholar 

  • Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint. arXiv:1810.04805

  • EFSAS. (2021). The role of fake news in fueling hate speech and extremism online; promoting adequate measures for tackling the phenomenon

  • Farooqi, Z. M., Ghosh, S., & Shah, R. R. (2021). Leveraging transformers for hate speech detection in conversational code-mixed tweets. European Foundation for South Asian Studies (EFSAS).

  • Fharook, S., Ahmed, S. S., Rithika, G., Budde, S. S., Saumya, S., & Biradar, S. (2022). Are you a hero or a villain? a semantic role labelling approach for detecting harmful memes. In Proceedings of the workshop on combating online hostile posts in regional languages during emergency situations (pp. 19–23).

  • Gollatz, K., & Jenner, L. (2018). Hate speech and fake news—how two concepts got intertwined and politicised. HIIG Digital Society Blog. https://doi.org/10.5281/zenodo.1197174

  • Jafri, F. A., Siddiqui, M.A., Thapa, S., Rauniyar, K., Naseem, U., & Razzak, I. (2023). Uncovering political hate speech during indian election campaign: A new low-resource dataset and baselines. arXiv preprint. arXiv:2306.14764

  • Kar, D., Bhardwaj, M., Samanta, S., & Azad, A. P. (2021). No rumours please! a multi-indic-lingual approach for covid fake-tweet detection. In 2021 Grace Hopper Celebration India (GHCI) (pp. 1–5). IEEE.

  • Mandl, T., Modha, S., Kumar M. A., & Chakravarthi, B. R. (2020). Overview of the HASOC track at fire 2020: Hate speech and offensive language identification in Tamil, Malayalam, Hindi, English and German. In Proceedings of the 12th annual meeting of the forum for information retrieval evaluation (pp. 29–32).

  • Mathur, P., Sawhney, R., Ayyar, M., & Shah, R. (2018). Did you offend me? Classification of offensive tweets in Hinglish language. In Proceedings of the 2nd workshop on abusive language online (ALW2) (pp. 138–148).

  • Mehta, M., Pandey, U., Chaudhary, Y., Sharma, R., Gill, I., Gupta, D., & Khanna, A. (2021). Hindi text classification: A review. In 2021 3rd international conference on advances in computing, communication control and networking (ICAC3N) (pp. 839–843). IEEE.

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint. arXiv:1301.3781

  • Nie, J.-B. (2020). In the shadow of biological warfare: Conspiracy theories on the origins of covid-19 and enhancing global governance of biosafety as a matter of urgency. Journal of Bioethical Inquiry, 17(4), 567–574.

    Article  Google Scholar 

  • Sánchez-Junquera, J., Chulvi, B., Rosso, P., & Ponzetto, S. P. (2021). How do you speak about immigrants? Taxonomy and stereoimmigrants dataset for identifying stereotypes about immigrants. Applied Sciences, 11(8), 3610.

    Article  Google Scholar 

  • Shankar Biradar, S. S., & Chauhan, A. (2021). mbert based model for identification of offensive content in South Indian languages. In Working notes of FIRE 2021-forum for information retrieval evaluation (online). CEUR.

  • Sharma, D. K., & Garg, S. (2021). Machine learning methods to identify Hindi fake news within social-media. In 2021 12th International conference on computing communication and networking technologies (ICCCNT) (pp. 1–6). IEEE.

  • Shekhar, C., Bagla, B., Maurya, K. K., & Desarkar, M. S. (2021). Walk in wild: An ensemble approach for hostility detection in hindi posts. arXiv preprint. arXiv:2101.06004

  • Shvets, A., Fortuna, P., Soler, J., & Wanner, L. (2021). Targets and aspects in social media hate speech. In Proceedings of the 5th workshop on online abuse and harms (WOAH 2021) (pp. 179–190).

  • Singhal, S., Shah, R. R., & Kumaraguru, P. (2021). Factorization of fact-checks for low resource indian languages. arXiv preprint. arXiv:2102.11276

  • Sreelakshmi, K., Premjith, B., & Soman, K. (2020). Detection of hate speech text in hindi-english code-mixed data. Procedia Computer Science, 171, 737–744.

    Article  Google Scholar 

  • Srivastava, A., Hasan, M., Yagnik, B., Walambe, R., & Kotecha, K. (2021). Role of artificial intelligence in detection of hateful speech for Hinglish data on social media. In Applications of artificial intelligence and machine learning: Select proceedings of ICAAAIML 2020 (pp. 83–95). Springer.

  • Zhang, X., Cao, J., Li, X., Sheng, Q., Zhong, L., & Shu, K. (2021). Mining dual emotion for fake news detection. Proceedings of the Web Conference, 2021, 3465–3476.

    Google Scholar 

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All the authors have equally contributed towards manuscript building.

Corresponding author

Correspondence to Shankar Biradar.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Ethical approval

This article does not contain any studies with animals performed by any of the authors. This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Biradar, S., Saumya, S. & Chauhan, A. Faux Hate: unravelling the web of fake narratives in spreading hateful stories: a multi-label and multi-class dataset in cross-lingual Hindi-English code-mixed text. Lang Resources & Evaluation (2024). https://doi.org/10.1007/s10579-024-09732-0

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10579-024-09732-0

Keywords

Navigation