Faux Hate: unravelling the web of fake narratives in spreading hateful stories: a multi-label and multi-class dataset in cross-lingual Hindi-English code-mixed text

Biradar, Shankar; Saumya, Sunil; Chauhan, Arun

doi:10.1007/s10579-024-09732-0

Faux Hate: unravelling the web of fake narratives in spreading hateful stories: a multi-label and multi-class dataset in cross-lingual Hindi-English code-mixed text

Original Paper
Published: 16 April 2024

(2024)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Shankar Biradar¹,
Sunil Saumya² &
Arun Chauhan³

63 Accesses
Explore all metrics

Abstract

Social media has undeniably transformed the way people communicate; however, it also comes with unquestionable drawbacks, notably the proliferation of fake and hateful comments. Recent observations have indicated that these two issues often coexist, with discussions on hate topics frequently being dominated by the fake. Therefore, it has become imperative to explore the role of fake narratives in the dissemination of hate in contemporary times. In this direction, the proposed article introduces a novel data set known as the Faux Hate Multi-Label Data set (FHMLD) comprising 8014 fake-instigated hateful comments in Hindi-English code-mixed text. To the best of our knowledge, this marks the first endeavour to bring together both fake and hateful content within a unified framework. Further, the proposed data set is collected from diverse platforms such as YouTube and Twitter to mitigate user-associated bias. To investigate a relation between the presence of fake narratives and its impact on the intensity of the hate, this study presents a statistical analysis using the Chi-square test. The statistical findings indicate that the calculated \(\chi ^2\) value is greater than the value from the standard table, leading to the rejection of the null hypothesis. Additionally, the current study present baseline methods for categorizing multi-class and multi-label data set, utilizing syntactical and semantic features at both word and sentence levels. The experimental results demonstrate that the fastText and SVM based method outperforms others models with an accuracy of 71% and 58% for binary fake–hate and severity prediction respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ETHOS: a multi-label hate speech detection dataset

Article Open access 04 January 2022

Hate Speech Detection in the Bengali Language: A Dataset and Its Baseline Evaluation

Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale

Article 16 January 2022

Notes

https://www.who.int/emergencies/diseases/novel-coronavirus-2019/origins-of-the-virus.
https://thewire.in/communalism/delhi-riots-misinformation-radicalisation-social-media.
https://www.bbc.com/news/world-asia-india-59338245.
https://twitter.com/.
https://www.youtube.com/.
https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers.
https://www.facebook.com/.
https://www.instagram.com/.
https://items.ssrc.org/disinformation-democracy-and-conflict-prevention/classifying-and-identifying-the-intensity-of-hate-speech/.
https://items.ssrc.org/disinformation-democracy-and-conflict-prevention/classifying-and-identifying-the-intensity-of-hate-speech/.
https://items.ssrc.org/disinformation-democracy-and-conflict-prevention/classifying-and-identifying-the-intensity-of-hate-speech/.
https://twitter.com/.
https://www.youtube.com/.
https://www.altnews.in/.
https://www.boomlive.in/fact-check.
https://www.factchecker.in/fact-check.
https://factly.in/.
https://github.com/JustAnotherArchivist/snscrape.
https://pypi.org/project/youtube-comment-scraper-python/.
https://transparency.fb.com/en-gb/policies/community-standards/hate-speech/.
In Table 7 L: low, M: medium, H: high, I: individual, O: collective, and R: religion.
https://matplotlib.org/.
https://scikit-learn.org/.
https://pypi.org/project/gensim/.
https://huggingface.co/models.
https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html.
In Fig. 10 label 0: Nill, O: Collective, I: Individual, R: Religion, L: Low severity, M: Medium severity, H: High severity.

References

Akash, B., Badam, J., Raju, K., & Chakraborty, D. (2021). A poster on learnings from an attempt to build an NLP-based fake news classification system for Hindi. In ACM SIGCAS conference on computing and sustainable societies (pp. 397–401).
Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of Economic Perspectives, 31(2), 211–236.
Article Google Scholar
Banerjee, S., Sarkar, M., Agrawal, N., Saha, P., & Das, M. (2021). Exploring transformer based models to identify hate speech and offensive content in English and Indo-Aryan languages. arXiV Preprint. arXiv:2111.13974
Bhardwaj, M., Akhtar, M. S., Ekbal, A., Das, A., & Chakraborty, T. (2020). Hostility detection dataset in Hindi. . arXiV Preprint. arXiv:2011.03588
Biradar, S., & Saumya, S. (2022). Iiitdwd@ tamilnlp-acl2022: Transformer-based approach to classify abusive content in dravidian code-mixed text. In Proceedings of the 2nd workshop on speech and language technologies for Dravidian languages (pp. 100–104).
Biradar, S., Saumya, S., & Chauhan, A. (2021). Hate or non-hate: Translation based hate speech identification in code-mixed hinglish data set. In 2021 IEEE international conference on big data (Big Data) (pp. 2470–2475). IEEE.
Biradar, S., Saumya, S., & Chauhan, A. (2022a). Combating the infodemic: Covid-19 induced fake news recognition in social media networks. Complex & Intelligent Systems, 9(3), 2879–2891.
Biradar, S., Saumya, S., & Chauhan, A. (2022c). Fighting hate speech from bilingual hinglish speaker’s perspective, a transformer-and translation-based approach. Social Network Analysis and Mining, 12(1), 87.
Article Google Scholar
Biradar, S., Saumya, S., Kumar, A., & Singh, A. (2022b). Pradvis vac: A socio-demographic dataset for determining the level of hatred severity in a low-resource hinglish language. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3573199
Bisht, A., Singh, A., Bhadauria, H., Virmani, J., & Kriti. (2020). Detection of hate speech and offensive language in Twitter data using LSTM model. In Recent trends in image and signal processing in computer vision (pp. 243–264).
Bohra, A., Vijay, D., Singh, V., Akhtar, S. S., & Shrivastava, M. (2018). A dataset of Hindi-English code-mixed social media text for hate speech detection. In Proceedings of the second workshop on computational modeling of people’s opinions, personality, and emotions in social media (pp. 36–41).
Bojanowski, P., Grave, É., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.
Article Google Scholar
Caselli, T., Basile, V., Mitrović, J., & Granitzer, M. (2021). Hatebert: Retraining bert for abusive language detection in English. In Proceedings of the 5th workshop on online abuse and harms (WOAH 2021) (pp. 17–25).
Chopra, S., Sawhney, R., Mathur, P., & Shah, R. R. (2020). Hindi-english hate speech detection: Author profiling, debiasing, and practical perspectives. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 386–393.
Article Google Scholar
Chowdhury, S. A., Mubarak, H., Abdelali, A., Jung, S.-G., Jansen, B. J., & Salminen, J. (2020). A multi-platform arabic news comment dataset for offensive language detection. In Proceedings of the twelfth language resources and evaluation conference (pp. 6203–6212).
De, A., Bandyopadhyay, D., Gain, B., & Ekbal, A. (2021). A transformer-based approach to multilingual fake news detection in low-resource languages. Transactions on Asian and Low-Resource Language Information Processing, 21(1), 1–20.
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint. arXiv:1810.04805
EFSAS. (2021). The role of fake news in fueling hate speech and extremism online; promoting adequate measures for tackling the phenomenon
Farooqi, Z. M., Ghosh, S., & Shah, R. R. (2021). Leveraging transformers for hate speech detection in conversational code-mixed tweets. European Foundation for South Asian Studies (EFSAS).
Fharook, S., Ahmed, S. S., Rithika, G., Budde, S. S., Saumya, S., & Biradar, S. (2022). Are you a hero or a villain? a semantic role labelling approach for detecting harmful memes. In Proceedings of the workshop on combating online hostile posts in regional languages during emergency situations (pp. 19–23).
Gollatz, K., & Jenner, L. (2018). Hate speech and fake news—how two concepts got intertwined and politicised. HIIG Digital Society Blog. https://doi.org/10.5281/zenodo.1197174
Jafri, F. A., Siddiqui, M.A., Thapa, S., Rauniyar, K., Naseem, U., & Razzak, I. (2023). Uncovering political hate speech during indian election campaign: A new low-resource dataset and baselines. arXiv preprint. arXiv:2306.14764
Kar, D., Bhardwaj, M., Samanta, S., & Azad, A. P. (2021). No rumours please! a multi-indic-lingual approach for covid fake-tweet detection. In 2021 Grace Hopper Celebration India (GHCI) (pp. 1–5). IEEE.
Mandl, T., Modha, S., Kumar M. A., & Chakravarthi, B. R. (2020). Overview of the HASOC track at fire 2020: Hate speech and offensive language identification in Tamil, Malayalam, Hindi, English and German. In Proceedings of the 12th annual meeting of the forum for information retrieval evaluation (pp. 29–32).
Mathur, P., Sawhney, R., Ayyar, M., & Shah, R. (2018). Did you offend me? Classification of offensive tweets in Hinglish language. In Proceedings of the 2nd workshop on abusive language online (ALW2) (pp. 138–148).
Mehta, M., Pandey, U., Chaudhary, Y., Sharma, R., Gill, I., Gupta, D., & Khanna, A. (2021). Hindi text classification: A review. In 2021 3rd international conference on advances in computing, communication control and networking (ICAC3N) (pp. 839–843). IEEE.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint. arXiv:1301.3781
Nie, J.-B. (2020). In the shadow of biological warfare: Conspiracy theories on the origins of covid-19 and enhancing global governance of biosafety as a matter of urgency. Journal of Bioethical Inquiry, 17(4), 567–574.
Article Google Scholar
Sánchez-Junquera, J., Chulvi, B., Rosso, P., & Ponzetto, S. P. (2021). How do you speak about immigrants? Taxonomy and stereoimmigrants dataset for identifying stereotypes about immigrants. Applied Sciences, 11(8), 3610.
Article Google Scholar
Shankar Biradar, S. S., & Chauhan, A. (2021). mbert based model for identification of offensive content in South Indian languages. In Working notes of FIRE 2021-forum for information retrieval evaluation (online). CEUR.
Sharma, D. K., & Garg, S. (2021). Machine learning methods to identify Hindi fake news within social-media. In 2021 12th International conference on computing communication and networking technologies (ICCCNT) (pp. 1–6). IEEE.
Shekhar, C., Bagla, B., Maurya, K. K., & Desarkar, M. S. (2021). Walk in wild: An ensemble approach for hostility detection in hindi posts. arXiv preprint. arXiv:2101.06004
Shvets, A., Fortuna, P., Soler, J., & Wanner, L. (2021). Targets and aspects in social media hate speech. In Proceedings of the 5th workshop on online abuse and harms (WOAH 2021) (pp. 179–190).
Singhal, S., Shah, R. R., & Kumaraguru, P. (2021). Factorization of fact-checks for low resource indian languages. arXiv preprint. arXiv:2102.11276
Sreelakshmi, K., Premjith, B., & Soman, K. (2020). Detection of hate speech text in hindi-english code-mixed data. Procedia Computer Science, 171, 737–744.
Article Google Scholar
Srivastava, A., Hasan, M., Yagnik, B., Walambe, R., & Kotecha, K. (2021). Role of artificial intelligence in detection of hateful speech for Hinglish data on social media. In Applications of artificial intelligence and machine learning: Select proceedings of ICAAAIML 2020 (pp. 83–95). Springer.
Zhang, X., Cao, J., Li, X., Sheng, Q., Zhong, L., & Shu, K. (2021). Mining dual emotion for fake news detection. Proceedings of the Web Conference, 2021, 3465–3476.
Google Scholar

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Computer Science and Engineering, Indian Institute of Information Technology, Dharwad, Karnataka, India
Shankar Biradar
Data Science and Intelligent Systems, Indian Institute of Information Technology, Dharwad, Karnataka, India
Sunil Saumya
Computer Science and Engineering, Graphic Era University, Dehradun, Uttarakhand, India
Arun Chauhan

Authors

Shankar Biradar
View author publications
You can also search for this author in PubMed Google Scholar
Sunil Saumya
View author publications
You can also search for this author in PubMed Google Scholar
Arun Chauhan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors have equally contributed towards manuscript building.

Corresponding author

Correspondence to Shankar Biradar.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Ethical approval

This article does not contain any studies with animals performed by any of the authors. This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Biradar, S., Saumya, S. & Chauhan, A. Faux Hate: unravelling the web of fake narratives in spreading hateful stories: a multi-label and multi-class dataset in cross-lingual Hindi-English code-mixed text. Lang Resources & Evaluation (2024). https://doi.org/10.1007/s10579-024-09732-0

Download citation

Accepted: 05 March 2024
Published: 16 April 2024
DOI: https://doi.org/10.1007/s10579-024-09732-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Faux Hate: unravelling the web of fake narratives in spreading hateful stories: a multi-label and multi-class dataset in cross-lingual Hindi-English code-mixed text

Abstract

Access this article

Similar content being viewed by others

ETHOS: a multi-label hate speech detection dataset

Hate Speech Detection in the Bengali Language: A Dataset and Its Baseline Evaluation

Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Faux Hate: unravelling the web of fake narratives in spreading hateful stories: a multi-label and multi-class dataset in cross-lingual Hindi-English code-mixed text

Abstract

Access this article

Similar content being viewed by others

ETHOS: a multi-label hate speech detection dataset

Hate Speech Detection in the Bengali Language: A Dataset and Its Baseline Evaluation

Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation