Multilingual hope speech detection in English and Dravidian languages

Chakravarthi, Bharathi Raja

doi:10.1007/s41060-022-00341-0

Multilingual hope speech detection in English and Dravidian languages

Regular Paper
Open access
Published: 10 July 2022

Volume 14, pages 389–406, (2022)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Multilingual hope speech detection in English and Dravidian languages

Download PDF

Bharathi Raja Chakravarthi ORCID: orcid.org/0000-0002-4575-7934¹

3297 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

Recent work on language technology has aimed to identify negative language such as hate speech and cyberbullying as well as improve offensive language detection to mediate social media platforms. Most of these systems rely on using machine learning models along with the labelled dataset. Such models have succeeded in identifying negativity and removing it from the platform deleting it. However, recently, more research has been conducted on the improvement of freedom of speech on social media. Instead of deleting supposedly offensive speech, we developed a multilingual dataset to identify hope speech in the comments and promote positivity. This paper presents a multilingual hope speech dataset that promotes equality, diversity and inclusion (EDI) in English, Tamil, Malayalam and Kannada. It was collected to promote positivity and ensure EDI in language technology. Our dataset is unique, as it contains data collected from the LGBTQIA+ community, persons with disabilities and women in science, engineering, technology and management (STEM). We also report our benchmark system results in various machine learning models. We experimented on the Hope Speech dataset for Equality, Diversity and Inclusion (HopeEDI) using different state-of-the-art machine learning models and deep learning models to create benchmark systems.

Fake news, disinformation and misinformation in social media: a review

Article 09 February 2023

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

Article 19 November 2021

Transformer models for text-based emotion detection: a review of BERT-based approaches

Article 08 February 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Recently, equality, diversity and inclusion (EDI) has attracted widespread attention with a focus on the protected classes of gender and race. It started as early as 1960, but it is only now that the interpretation of diversity has broadened to include other demographics such as the lesbian, gay, bisexual, transgender, queer/questioning (one’s sexual or gender identity), intersex, and asexual/aromantic/agender (LGBTQIA+) community, women in the fields of science, engineering, technology and management (STEM), and persons with disabilities. [1]. Inclusion refers to making an individual feel like they are a part of a group or organisation, both in terms of the formal and informal environment [2, 3]. Another essential part of this wheel is bias. People have both conscious and unconscious biases, which lead to explicit and implicit stereotyping, respectively. To avoid bias, much training has been provided to school students [4], employees and various levels [5]. However, it is only very recently that artificial intelligence (AI) researchers have started looking at biases, especially gender bias [6]. Language technologies in AI are expected to have a growing influence over our lives in the internet era. Nevertheless, from the perspective of language technologies research, the EDI for minority LGBTQIA+ or marginalised populations has not been considered with great urgency or importance compared to other topics or areas. It is important that the language technologies developed consider the inclusion of all communities for social integration.

Online social media platforms such as Facebook, Twitter and YouTube have encouraged millions of people to express themselves and share their opinions. These platforms also provide a medium for many marginalised people to look for support online [7,8,9]. The emergence of the infectious disease COVID-19 led to the exposure of the entire population to the disease without specific pharmacological treatment; the exponential levels of infection has deeply affected countries across the world, and the pandemic forced public places to remain closed temporarily [10]. Several areas have been affected worldwide, and the fear of losing loved ones caused even basic necessities such as schools, hospitals and mental health care centres to remain closed [11]. As a consequence, people were forced to look at online forums for their informational and emotional needs. In some areas and for some people, online social networking has been the only means of ensuring social connectedness and seeking social support during the COVID-19 pandemic [12].

Online social networking delivers a platform for network individuals to be in the know and to be known, both of which are more significant with more prominent social integration. Social integration is essential for the overall well-being of every individual, but most importantly vulnerable individuals who are more prone to social exclusion. A sense of belonging and community is an essential aspect of people’s mental health, which influences both psychological and physical well-being [13]. The importance of social inclusion in the online lives of marginalised populations, such as women in the fields of STEM, people who belong to the LGBTQIA+ community, racial minorities or people with disabilities, has been studied, and it has been proven that the online life of vulnerable individuals produces a significant impact on their mental health [14,15,16]. However, the contents of social media comments or posts may be negative, hateful, offensive or abusive since there is no mediating authority.

Comments and posts on social media have been analysed to find and stop the spread of negativity using methods such as hate speech detection [17], offensive language identification [18,19,20] and abusive language detection [21]. However, according to [22], technologies developed for the detection of abusive language do not consider the potential biases of the dataset that they are trained on. The systematic racial bias in the datasets causes abusive language detection to be biased, and this may result in discrimination against one group over another. This will have a negative impact on minorities or marginalised people. As language is a major part of communication, it should be inclusive. A large internet community that uses language technology has a direct impact on people across the globe. We should turn our attention towards spreading positivity instead of curbing an individual’s freedom of speech by removing negative comments. However, hope speech detection should be done alongside hate speech detection. Otherwise, hope speech detection by itself may lead to bias while perpetrators of negative and harmful comments continue to act wildly on the web.

Therefore, in our research, we focused on hope speech. Hope is commonly associated with the promise, potential, support, reassurance, suggestions or inspiration provided to participants by their peers during periods of illness, stress, loneliness and depression [23]. Psychologists, sociologists and social workers from the Association of Hope have concluded that hope can also be a useful tool for saving people from suicide or self-harm [24]. The ’Hope Speech’ delivered by gay rights activist Harvey Milk on the steps of the San Francisco City Hall during a mass rally to celebrate California Gay Freedom Day on 25 June 1978^{Footnote 1} inspired millions to demand rights that ensure EDI [25]. Recently, [26] analysed how to use hope speech from social media texts to diffuse tensions between two nuclear powered nations (India and Pakistan) and support marginalised Rohingya refugees [27]. They experimented with detecting hope versus non-hope. However, to the best of our knowledge, no prior work has explored hope speech for women in STEM, LGBTQIA+ individuals, racial minorities or people with disabilities in general.

Moreover, although people from various linguistic backgrounds are getting exposed to online social media language, English remains at the centre of ongoing trends in language technology research. Recently, some research studies have been conducted on high-resourced languages such as Arabic, German, Hindi and Italian. However, such studies usually use monolingual corpora and do not examine code-switched textual data. Code-switching is a phenomenon where the individual switches between two or more languages in a single utterance [28]. We have introduced a dataset for hope speech identification not only in English but also in the under-resourced code-switched Tamil (ISO 639-3: tam), Malayalam (ISO 639-3: mal) and Kannada (ISO 639-3: kan) languages.

We have proposed to encourage hope speech rather than take away an individual’s freedom of speech by detecting and removing a negative comment.
We applied the schema to create a multilingual hope speech dataset for EDI. This is a new large-scale dataset of English, Tamil (code-mixed) and Malayalam (code-mixed) YouTube comments with high-quality annotation of the target.
We performed an experiment on Hope Speech dataset for Equality, Diversity and Inclusion (HopeEDI) using different state-of-the-art machine learning and deep learning models to create benchmark systems.

2 Related works

When it comes to crawling social media data, there are many works on YouTube mining [29, 30], which are mainly focused on exploiting user comments. [31] performed opinion mining and a trend analysis on YouTube comments. The researchers conducted an analysis of the sentiments to identify their trends, seasonality and forecasts, and it was found that user sentiments are well correlated with the influence of real-world events. [32] conducted a systematic study on opinion mining by targeting YouTube comments. The authors developed a comment corpus containing 35K manually labelled data for modelling the opinion polarity of the comments based on tree kernel models. [33] and [34] collected comments from YouTube and created a manually annotated corpus for the sentiment analysis of the under-resourced Tamil and Malayalam languages.

Methods to mitigate gender bias in natural language processing (NLP) have been extensively studied for the English language [35]. Some studies have investigated gender bias beyond the English language using machine translation to French [36] and other languages [37]. [38] studied the gender and dialect bias in automatically generated captions on YouTube. Technologies for abusive language [39, 40], hate speech [17, 41] and offensive language detection [42,43,44] are being developed and applied without considering the potential biases [22, 45, 46]. However, current gender debiasing methods in NLP are not sufficient to debias other issues related to EDI in the end-to-end systems of many language technology applications; this causes unrest and escalates the issues with EDI besides leading to greater inequality on digital platforms [47].

The use of counter-narratives (i.e. informed textual responses) is another strategy that has received the attention of researchers recently [48, 49]. A counter-narrative approach was proposed to weigh the right to freedom of speech and avoid over-blocking. [50] created and released a dataset for counterspeech using comments from YouTube. However, the core idea of directly intervening with textual responses escalates hostility even though it is advantageous for the writer to understand why their comment or post has been deleted or blocked and then favourably change the discourse and attitudes presented in their comments. Thus, we directed our attention to finding positive information such as hope and encouraging such activities.

Recently, a work by [26] and [27] analysed how to use hope speech from a social media text to diffuse tension between two nuclear powered nations (India and Pakistan) and support minority Rohingya refugees. However, the authors’ definition of hope was only confined to diffusing tensions and preventing violence. It did not take into account other perspectives on hope and EDI. The authors did not provide more information such as the inter-annotator agreement, diversity among annotators and details about the dataset. The dataset is not publicly available for research. It was created in English, Hindi and other languages related known to the Rohingyas. Our work differs from the previous works in that we have defined hope speech for EDI and introduced a dataset for English, Tamil and Malayalam on the EDI of it. To the best of our knowledge, this was the first work to create a dataset for EDI in Tamil and Malayalam, which are under-resourced languages.

3 Hope speech

Hope is an upbeat state of mind based on a desire for positive outcomes in one’s life or the world at large, and it is both present and future-oriented [23]. Inspirational talks about how people deal with and overcome adversity may also provide hope. Hope speech instills optimism and resilience, which have a beneficial impact on many parts of life, including [51] college [52] and other factors that put us at risk [53]. For our problem, we defined hope speech as ’YouTube comments/posts that offer support, reassurance, suggestions, inspiration and insight’.

The notion that one may uncover and become motivated to use routes to their desired goals is reflected in hope speech. Our approach sought to shift the dominant mindset away from a focus on discrimination, loneliness or the negative aspects of life and towards a focus on promoting confidence, offering support and creating positive characteristics based on individual remarks. Thus, we instructed annotators that if a comment or post meets the following conditions, then it should be annotated as hope speech.

The comment contains inspiration provided to participants by their peers and others and/or offers support, reassurance, suggestions and insight.
The comment promotes well-being and satisfaction (past), joy, sensual pleasures and happiness (present).
The comment triggers constructive cognition about the future—optimism, hope and faith.
The comment contains an expression of love, courage, interpersonal skill, aesthetic sensibility, perseverance, forgiveness, tolerance, future-mindedness, praise for talents and wisdom.
The comment promotes the values of EDI.
The comment brings out a survival story of gay, lesbian or transgender individuals, women in science or a COVID-19 survivor.
The comment talks about fairness in the industry (e.g. [I do not think banning all apps is right; we should ban only the apps which are not safe]).
The comment explicitly talks about a hopeful future (e.g. [We will survive these things]).
The comment explicitly talks about and says no to division in any form.

Non-hope speech includes comments that do not bring positivity, such as the following:

The comment uses racially, ethnically, sexually or nationally motivated slurs.
The comment produces hate towards a minority group.
The comment is highly prejudiced and attacks people without thinking about the consequences.
The comment does not inspire hope in the reader’s mind.

Non-hope speech is different from hate speech. Some examples are provided below.

’How is that the same thing???’ This is non-hope speech, but it is not hate speech.—explanation
’Society says don’t assume, but they assume to anyways’ This is non-hope speech, but it is not hate speech.—explanation

A hate speech or offensive language detection dataset is not available for code-mixed Tamil and code-mixed Malayalam, and it does not take into account LGBTQIA+ people, women in STEM or other minority or under-represented groups. Thus, we cannot use the existing hate speech or offensive language detection datasets to detect hope or non-hope for EDI of minorities.

4 Dataset construction

We concentrated on gathering information from YouTube comments on social media,^{Footnote 2} which is the most widely used platform in the world for commenting on and publicly expressing opinions about topics or videos. We did not use comments from LGBTQIA+ people’s personal coming out stories since they contained personal information. For English, we gathered information on recent EDI themes such as women in STEM, LGBTQIA+ concerns, COVID-19, Black Lives Matter, the United Kingdom (UK) versus China, the United States of America (USA) and Australia versus China. The information was collected from recordings of individuals from English-speaking nations like Australia, Canada, Ireland, the United Kingdom, the United States of America and New Zealand.

For Tamil and Malayalam, we gathered data from India on recent themes such as LGBTQIA+ concerns, COVID-19, women in STEM, the Indo-China war and Dravidian affairs. India is a country that is multilingual and multiracial. In terms of linguistics, India is split into three major language families: Dravidian, Indo-Aryan and Tibeto-Burman. The ongoing Indo-China border conflict has sparked online bigotry towards persons with East-Asian characteristics despite the fact that they are Indians from the North East. Similarly, in Tamil Nadu, the National Education Policy, which calls for the adoption of Sanskrit or Hindi, has exacerbated concerns about the linguistic autonomy of Dravidian languages. We used the YouTube comment scraper^{Footnote 3} to collect comments. From November 2019 to June 2020, we gathered data on the aforementioned subjects. We believe that the statistics we have shared will help to reduce animosity and promote optimism. Our dataset was created as a multilingual resource to enable cross-lingual research and analysis. It includes hope speeches in English, Tamil and Malayalam, among other languages.

4.1 Code-mixing

When a speaker employs two or more languages in a single speech, it is known as code-mixing. It is prevalent in the social media discourse of multilingual speakers. Code-mixing has long been connected with a lack of formal or informal linguistic expertise. It is, nevertheless, common in user-generated social media material according to studies. In a multilingual country like India, code-mixing is quite a frequent occurrence [54,55,56,57]. Our Tamil and Malayalam datasets are code-mixed since our data was collected from YouTube. In our corpus, we found all three forms of code-mixing, including tag, inter-sentential and intra-sentential. Our corpus also includes code-mixing between Latin and native scripts.

4.2 Ethical concerns

Data collected from social media is extremely sensitive, especially when it concerns minorities such as the LGBTQIA+ community or women. By eliminating personal information from the dataset, such as names but not celebrity names, we have taken great care to reduce the danger of the data revealing an individual’s identity. However, in order to investigate EDI, we needed to keep track of the information on race, gender, sexual orientation, ethnicity and philosophical views. The annotators only viewed anonymised postings and promised not to contact the author of a remark. Only researchers who agree to follow ethical norms will be given access to the dataset for research purposes. We opted not to ask the annotator for racial information after a lengthy debate with our local EDI committee members.^{Footnote 4} Due to recent events, the EDI committee was strongly against the collection of racial information based on the belief that it would split people according to their racial origin. Thus, we recorded only the nationality of the annotators.

Table 1 Annotators

Full size table

4.3 Annotation set-up

After the data collection phase, we cleaned the data using Langdetect^{Footnote 5} to identify the language of the comments and removed comments that were not in the specified languages. However, owing to code-mixing at various levels, comments in other languages became unintentionally included in the cleaned corpus of the Tamil and Malayalam comments. Finally, based on our description from Sect. 3, we identified three groups, two of which were hope and non-hope; the last group (Other languages) was introduced to account for comments that were not in the required language. These classes were chosen since they provided a sufficient amount of generalisation for describing the remarks in the EDI hope speech dataset.

4.4 Annotators

We created Google forms to collect annotations from annotators. To maintain the level of annotation, each form was limited to 100 comments and each page to ten comments. We collected information on the annotator’s gender, educational background and preferred medium of instruction in order to comprehend the annotator’s diversity and avoid bias. The annotators were warned that the comments may contain profanity and hostile material. If the annotator deemed the remarks to be too upsetting or unmanageable, they were offered the choice of ceasing to annotate. We trained annotators by directing them to YouTube videos on EDI.^{Footnote 6}$^{,}$^{Footnote 7}$^{,}$^{Footnote 8}$^{,}$^{Footnote 9} Each form was annotated by at least three individuals. After the annotators marked the first form with 100 comments, the findings were manually validated in the warm-up phase. This strategy was utilised to help them acquire a better knowledge of EDI and focus on the project. Following the initial stage of annotating their first form, a few annotators withdrew from the project and their remarks were deleted. The annotators were told to conduct another evaluation of the EDI videos and annotation guidelines. From Table 1, we can see the statistics pertaining to the annotators. The annotators for English language remarks came from Australia, Ireland, the United Kingdom and the United States of America. We were able to obtain annotations in Tamil from persons from both India’s Tamil Nadu and Sri Lanka. Graduate and postgraduate students made up the majority of the annotators.

Table 2 Corpus statistic

Full size table

4.5 Inter-annotator Agreement

We used the majority to aggregate the hope speech annotations from several annotators; the comments that did not get a majority in the first round were collected and added to a second Google form to allow more annotators to contribute them. We calculated the inter-annotator agreement following the last round of annotation. We quantified the clarity of the annotation and reported the inter-annotator agreement using Krippendorff’s alpha. Krippendorff’s alpha is a statistical measure of annotator agreement that indicates how well the resulting data corresponds to actual data [58]. Although Krippendorff’s alpha $(\alpha )$ is computationally hard, it was more relevant in our instance since the comments were annotated by more than two annotators and not all sentences were commented on by the same annotator. It is unaffected by missing data, allows for variation in sample sizes, categories and the number of raters and may be used with any measurement level, including nominal, ordinal, interval and ratio. $\alpha $ is characterised by the following:

$$\begin{aligned} \alpha = 1 - \frac{D_o}{D_e} \end{aligned}$$

(1)

$D_o$ is the observed disagreement between sentiment labels assigned by the annotators, and $D_e$ is the disagreement expected when the coding of sentiments can be attributed to chance rather than to the inherent property of the sentiment itself.

$$\begin{aligned} D_o= & {} \frac{1}{n}\sum _{c}\sum _{k}o_{ck\;metric}\;\delta ^2_{ck} \end{aligned}$$

(2)

$$\begin{aligned} D_e= & {} \frac{1}{n(n-1)} \sum _{c}\sum _{k}n_c \;.\;n_{k\;metric}\,\delta ^2_{ck} \end{aligned}$$

(3)

Here $o_{ck}\;n_c\;n_k\;$ and n refer to the frequencies of values in the coincidence matrices, and metric refers to any metric or level of measurement such as nominal, ordinal, interval, ratio and others. Krippendorff’s alpha applies to all these metrics. The range of $\alpha $ is between ‘0’ and ‘1’ and $1 \ge \alpha \ge 0$. When $\alpha $ is ‘1’, there is perfect agreement between the annotators, and when it is ‘0’, the agreement is entirely due to chance. It is customary to require $\alpha $ $\ge $.800. A reasonable rule of thumb that allows for tentative conclusions to be drawn requires $0.67 \le \alpha \le 0.8 $ while $\alpha \ge $.653 is the lowest conceivable limit. For computing Krippendorff’s alpha (alpha) [59], we utilised nltk.^{Footnote 10} Our annotations achieved agreement values of 0.63, 0.76 and 0.85 for English, Tamil and Malayalam, respectively, using the nominal measure. Previous research on sentiment analysis annotations and offensive language identification for Tamil and Malayalam in the code-switched settings achieved 0.69 for Tamil and 0.87 for Tamil in sentiment analysis as well as 0.74 for Tamil and 0.83 for Malayalam in offensive language identification. Our inter-annotator agreement (IAA) values for hope speech were close to the previous research on sentiment analysis and offensive language identification in Dravidian languages.

4.6 Corpus statistics

Our dataset contains 59,354 YouTube comments, with 28,451 comments in English, 20,198 comments in Tamil and 10,705 comments in Malayalam. Our dataset also includes 59,354 comments in other languages. The distribution of our dataset is depicted in Table 2. When tokenising words and phrases in the comments, we used the nltk tool to obtain corpus statistics for use in research. Tamil and Malayalam have a broad vocabulary as a result of the various types of code-switching that take place.

Table 3 shows the distribution of the annotated dataset by the label in the reference tab: data distribution. As a result, the data was found to be biased, with nearly all of the comments being classified as ’not optimistic’ (NOT). An automatic detection system that can manage imbalanced data is essential for being really successful in the age of user-generated content on internet platforms, which is becoming increasingly popular. Using the fully annotated dataset, a train set, a development set and a test set were produced.

A few samples from the dataset, together with their translations and hope speech class annotations, are shown below.

kashtam thaan. irundhaalum muyarchi seivom – It is indeed difficult. Let us try it out though. Hope speech
uff. China mon vannallo– Phew! Here comes the Chinese guy. Non-hope speech
paambu kari saappittu namma uyirai vaanguranunga– These guys (Chinese) eat snake meat and make our lives miserable. Non-hope speech

Table 3 Class-wise data distribution

Full size table

4.7 Problematic examples

We found some problematic comments during the process of annotation.

’God gave us a choice.’ This sentence was interpreted by some as hopeful and others as not hopeful.
Sri Lankan Tamilar history patti pesunga—Please speak about history of Tamil people in Sri Lanka. Inter-sentential switch in Tamil corpus written using Latin script. The history of Tamil people in Sri Lanka is both hopeful and non-hopeful due to the recent civil war.
Bro helo app ku oru alternate appa solunga.— Bro tell me an alternate app for Helo app. Intra-sentential and tag switch in Tamil corpus written using Latin script.

Table 4 Train-development-test data distribution

Full size table

Table 5 Precision, recall and F-score for English: support is the number of actual occurrences of the class in the specified dataset

Full size table

Table 6 Precision, recall and F-score for Tamil: support is the number of actual occurrences of the class in the specified dataset

Full size table

Table 7 Precision, recall and F-score for Malayalam: support is the number of actual occurrences of the class in the specified dataset

Full size table

5 Benchmark experiments

We presented our dataset by utilising a broad range of common classifiers on the dataset’s imbalanced parameters, and the results were quite promising. The experiment was conducted on the token frequency-inverse document frequency (TF-IDF) relationship between tokens and documents. To generate baseline classifiers, we utilised the sklearn package (https://scikit-learn.org/stable/) from the sklearn project. Alpha = 0.7 was used for the multinomial Naive Bayes model. We employed a grid search for the k-nearest neighbours (KNN), a support vector machine (SVM), a decision tree and logistic regression. Detailed information on the parameters of the classifier will be made available in the code.

By using Facebook AI’s RoBERTa model, which is an upgraded version of the BERT model [60], the company has achieved state-of-the-art results on numerous natural language understanding (NLU) tasks, including GLUE [61] and SQUAD [62]. RoBERTa is enhanced by training BERT for a longer period of time on longer sequences, increasing the amount of data available, eliminating the sentence prediction target during pre-training and modifying the masking pattern used during pre-training, among other things. It was created with the goal of increasing cross-lingual language understanding (XLU) by utilising a transformer-based masked language model, and it is known as XLM-RoBERTa (MLM). In order to train XLM-RoBERTa, it was fed 2 gigabytes of CommonCrawl data [63], which had one hundred languages in total. It was found that XLM-RoBERTa surpasses its multilingual MLM competitors mBERT [60] and XLM [64] in terms of performance.

Using the training dataset, we trained our models; the development dataset was used to fine-tune the hyper-parameters, and the models were assessed by predicting labels for the held-out test set, as shown in Table 4. The performance of the categorisation was measured using a macro-averaged F-score, which was derived by averaging accuracy and recall over a large number of trials. Such a decision was made owing to the uneven class distribution, which causes well-known measures of performance such as accuracy and the micro-averaged F-score to be less than accurately representational of actual performance. Since the overall performance of all classes is important, we also presented the accuracy, recall and weighted F-score of the individual courses in addition to the overall performance. There are three tables in this section that provide the precision, recall and F-score findings of the HopeEDI test set employing baseline classifiers in conjunction with support from the test data: Table 5, Table 6 and Table 7.

As demonstrated, all of the models performed badly as a result of an issue with class imbalance. Using the HopeEDI dataset, the SVM classifier showed the worst performance, with macro-averaged F-scores of 0.32, 0.21 and 0.28 for English, Tamil and Malayalam, respectively. The decision tree led to a higher macro F-score for English and Malayalam than the logistic regression; however, Tamil fared well in both tests. In order to eliminate non-intended language comments from our dataset, we applied language identification techniques. The annotation ’Other languages’ was made in some comments by annotators although this was not the case in all of them. Another inconsistency was introduced into our dataset as a result of this. The majority of the macro scores were lower for English as a result of the ’Other languages’ category. In the case of English, this could have been prevented by simply eliminating those comments from the dataset. However, this label was required for Tamil and Malayalam since the comments in these languages were code-mixed and written in a script that was not native to the language (Latin script). The distribution of data for the Tamil language was roughly equal between the hope and non-hope classes.

The usefulness of our dataset was evaluated through the use of machine learning techniques, which we carried out in our trials. Due to its novel method of data collection and annotation, we believe the HopeEDI dataset has the potential to revolutionise the field of language technology. We believe that it will open up new directions in the future for further research on positivity.

6 Task description

We also organised a shared task to invite more researchers to perform hope speech detection and benchmark the data. For our problem, we defined the hope speech as ’YouTube comments/posts that offer support, reassurance, suggestions, inspiration and insight’. A comment or post within the corpus may contain more than one sentence, but the average sentence length of the corpus is one. The annotations in the corpus were made at a comment/post level. The datasets for development, training and testing were supplied to the participants in English, Tamil and Malayalam.

6.1 Training phase

During the first phase, participants were provided with training, validation and development data in order to train and develop hope speech detection for one or more of the three languages. Cross-validation on the training data was an option as was using the validation dataset for early evaluations and the development set for hyper-parameter sharing. The objective of this step was to guarantee that the participants’ systems were ready for review before the test data was released. In total, 137 people registered and downloaded the data in all three languages.

6.2 Testing phase

The test dataset was provided without the gold labels in CodaLab during this phase. Participants were given Google forms to fill out in order to submit their predictions. They were given the option of submitting their findings as many times as they wished, with the best entry being picked for assessment and the creation of the rank list. The outcomes were compared to the gold standard labels. Across all classes, the classification system’s performance was assessed in terms of the weighted averaged precision, recall and F-score. The support-weighted mean per label was calculated using the weighted averaged scores. The metric used for preparing the rank list was the weighted F1 score. Participants were encouraged to check their systems using the Sklearn classification report.^{Footnote 11} The final test included 30, 31 and 31 participants for Tamil, Malayalam and English languages, respectively.

7 Systems

Table 8 Rank list based on F1-score along with other evaluation metrics (precision and recall) for the Tamil language

Full size table

Table 9 Rank list based on F1-score along with other evaluation metrics (precision and recall) for the Malayalam language

Full size table

Table 10 Rank list based on F1 score along with other evaluation metrics (precision and recall) for the English language

Full size table

7.1 System descriptions

In this section, we have summarised the systems implemented by the participants to complete the shared task. For more details, please refer to the shared task papers submitted by the authors.

[71] participated in identifying hope speech classes in the English, Tamil and Malayalam datasets. They presented a two-phase mechanism to detect hope speech. In the first phase, they built a classifier to identify the language of the text. In the second phase, they created a classifier to identify the class labels. The author used the language models SBERT, FNN and BERT inference. They achieved the 3rd, 4th and 2nd ranks in Tamil, Malayalam and English, respectively.
[76] used context-aware string embeddings for word representations and recurrent neural networks (RNNs) and pooled document embeddings for text representation. Their proposed methodology achieved a higher performance than the baseline results. The highest weighted average F-scores of 0.93, 0.56 and 0.84 for English, Tamil and Malayalam were reported on the final evaluation test set. The proposed models outperformed baselines by 3%, 2% and 11% in absolute terms for English, Tamil and Malayalam.
[73] performed experiments by taking advantage of the pre-processing and transfer learning models. They showed that the pre-trained multilingual BERT model with convolution neural networks provided the best results. Their model ranked 1st, 3rd and 4th on the English, Malayalam-English and Tamil-English code-mixed datasets, respectively.
[83] trained the data using transformer models, specifically mBERT for Tamil and Malayalam and BERT for English, and achieved weighted average F1 scores of 0.38, 0.81 and 0.92 for Tamil, Malayalam and English, respectively. They achieved the ranks of 14, 4 and 2 for Tamil, Malayalam and English, respectively.
[84] experimented with several transformer-based models, including BERT, ALBERT, DistilBERT, XLM-RoBERTa and MuRIL, to classify the dataset into English, Malayalam and Tamil languages. ULMFiT achieved a weighted average F1 score of 0.91 on the English data, mBERT achieved 0.57 on the Malayalam data and distilmBERT achieved 0.37 on the Tamil data. They secured the 15th, 12th and 3rd ranks for predictions on the Tamil, Malayalam and English datasets, respectively.
[78] used various machine learning- and deep learning-based models (SVM, logistic regression, convolutional neural network and RNN) to identify the hope speech in the given YouTube comments. The best-performing model on English data was 2-parallel CNN-LSTM with GloVe and Word2Vec embeddings, and it reported a weighted average F1 score of 0.91 and 0.90 for the development and test sets, respectively. Similarly, the best-performing model on Tamil and Malayalam data was obtained from 3-parallel Bi-LSTM. For Tamil, the reported F1 scores were 0.56 and 0.54 on the development and test datasets, respectively. Similarly, for Malayalam, the reported weighted F1 scores were 0.78 and 0.79 on the development and test datasets, respectively.
[75] used TF-IDF character n-grams and pre-trained MuRIL embeddings for text representation as well as logistic regression and linear SVM for classification. Their best approach achieved the 2nd, 8th and 5th ranks with weighted F1 scores of 0.92, 0.75 and 0.57 in English, Malayalam-English and Tamil-English on the test dataset.
[77] fine-tuned the RoBERTa pre-training model based on three datasets: English, Tamil and Malayalam. The F1 scores of their models in the Tamil and Malayalam sub-tasks reached 0.56 and 0.78, respectively, and the F1 score in the English sub-task reached 0.93, achieving the 1st rank.
[70] used the attention mechanism to adjust the weights of all the output layers of XLM-RoBERTa to make full use of the information extracted from each layer, and they used the weighted sum of all the output layers to complete the classification task. They used the stratified k fold method to address class imbalance. They achieved weighted average F1 scores of 0.59, 0.84 and 0.92 for Tamil, Malayalam and English languages, which ranked 3rd, 2nd and 2nd, respectively.
[68] used the method and model that combines the XLM-RoBERTa pre-raining language model and the TF-IDF algorithm. They secured the 1st, 2nd and 3rd ranks on the English, Malayalam and Tamil datasets, respectively.
[85] used fine-tuned BERT and k fold cross-validation to accomplish classification on the English dataset. They achieved a final F1 score of 0.93 and secured the 1st rank for the English language.
[72] demonstrated that even very simple baseline algorithms perform reasonably well in this task if provided with enough training data. However, their best-performing algorithm was a cross-lingual transfer learning approach where they fine-tuned XLM-RoBERTa. The model achieved the 1st rank for Malayalam and English and the 4th rank for Tamil.
[66], in their paper, described their approach of fine-tuning RoBERTa for hope speech detection in English and fine-tuning XLM-RoBERTa for hope speech detection in the Tamil and Malayalam languages. They ranked 1st in English (F1 = 0.93), 1st in Tamil (F1 = 0.61) and 3rd in Malayalam (F1 = 0.83).
[86] described a transformer-based BERT model for hope speech detection. Their model achieved a weighted averaged F1 score of 0.93 on the test set for English. They showed that the BERT model helped in providing better contextual representation of words in a comment and that the language identification model assisted in detecting comments in the ‘Other languages’ category. They also explored the use of other transformer models such as RoBERTa, XLNet, Albert, FLAIR and ELMo for a superior hope speech detection.
[82] proposed a BiLSTM with an attention-based approach to solving hope speech detection, and using this approach, they achieved an F1 score of 0.73 (9th rank) in the Malayalam–English dataset.
[80] experimented with two approaches. In the first approach, they used contextual embeddings to train classifiers using logistic regression-, random forest-, SVM- and LSTM-based models. The second approach involved using a majority voting ensemble of 11 models that were obtained by fine-tuning pre-trained transformer models (BERT, AL-BERT, RoBERTa and IndicBERT) after adding an output layer. They found that the second approach was superior for English, Tamil and Malayalam. They got a weighted F1 score of 0.93, 0.75 and 0.49 for English, Malayalam and Tamil, respectively. They ranked 1st in English, 8th in Malayalam and 11th in Tamil.
[79] achieved an F-score of 0.93, ranking 1st on the leaderboard for English comments. The paper used pre-trained transformers and Paraphrasing Generation for Data Augmentation.
[67] employed various machine learning (SVM, LR and ensemble), deep learning (CNN + BiLSTM) and transformer-based (m-BERT, Indic-BERT, XLNet and XLM-R) methods. They showed that XLM-R outperformed all other techniques by gaining a weighted F1 score of 0.93, 0.60 and 0.85, respectively, for the English, Tamil and Malayalam languages. Their team achieved the 1st, 2nd and 1st ranks in these three tasks, respectively.
[81] used the XLM- RoBERTa model and proposed an excellent multilingual model to complete the classification task.
[69] created three models, namely CoHope-ML, CoHope-NN and CoHope-TL based on Ensemble of classifiers, the Keras neural network (NN) and BiLSTM with the Conv1D model. The CoHope-ML and CoHope-NN models were trained on a feature set comprising char sequences extracted from sentences combined with words for Malayalam–English and Tamil-English code-mixed text and a combination of word and char n-grams along with syntactic word n-grams for English text. The CoHope-TL model consisted of three major parts: training tokeniser, BERT language model (LM) training and then using the pre-trained BERT LM as weights in the BiLSTM - Conv1d model. Out of the three proposed models, the CoHope-ML model (the best one among the models proposed) obtained the 1st, 2nd and 3rd ranks with weighted F1 scores of 0.85, 0.92 and 0.59 for Malayalam-English, English and Tamil-English texts, respectively.
[65] extended the work of Arora (2020a), as they used their strategy to synthetically generate code-mixed data for training a transformer-based RoBERTa model and used it in an ensemble along with their pre-trained ULMFiT. They presented the RoBERTa language model for code-mixed Tamil, which they had pre-trained from scratch. Using transfer learning, they fine-tuned the RoBERTa and ULMFiT language models on downstream tasks of OLI and HSD. They secured the 4th rank in the former task using an ensemble of classifiers trained on RoBERTa and ULMFiT and the 1st rank in the latter task using the classifier based on ULMFiT.

8 Results and discussion

Overall, we received a total of 31, 31 and 30 submissions for English, Malayalam and Tamil tasks. It is interesting to note that the top-performing teams in all the three languages predominantly used XLM-RoBERTa to complete the shared task. One of the top-ranking teams for English used context-aware string embeddings for word representations and RNNs as well as pooled document embeddings for text representation. Among the other submissions, although Bi-LSTM was popular, there were other machine learning and deep learning models that were used. However, they did not achieve good results compared to the RoBERTa-based models.

The top scores were 0.61, 0.85 and 0.93 for Tamil, Malayalam and English, respectively. The ranges of scores were between 0.37 and 0.61, 0.49 and 0.85 and 0.61 and 0.93 for the Tamil, Malayalam and English datasets, respectively. It can be seen that the F1 scores of all the submissions on the Tamil dataset were considerably lower than those of Malayalam and English. It is not surprising that the English scores were better, as many approaches used variations of pre-trained transformer-based models trained on English data. Due to code-mixing at various levels, the scores were naturally lower for the Malayalam and Tamil datasets. Among these two, the systems submitted performed badly on Tamil data. The identification of the exact reasons for the bad performance in Tamil requires further research. However, one possible explanation for this could be that the distribution of the ’Hope_speech’ and ’Non_hope_speech’ classes in Tamil was starkly different from that in English and Malayalam. In the remaining two classes, the number of non-hope speech comments were significantly higher than hope speech comments.

9 Conclusion

As online content increases massively, it is necessary to encourage positivity, such as in the form of hope speech on online forums, to induce compassion and acceptable social behaviour. In this paper, we presented the largest manually annotated dataset of hope speech detection in English, Tamil and Malayalam consisting of 28,451, 20,198 and 10,705 comments, respectively. We believe that this dataset will facilitate future research on encouraging positivity. We aimed to promote research on hope speech and encourage positive content on online social media for ensuring EDI. In the future, we plan to extend the study by introducing a larger dataset with further fine-grained classification and content analysis.

Data availability and material

The datasets used in this paper were obtained from https://huggingface.co/datasets/hope_edi.

Notes

http://www.terpconnect.umd.edu/~jklumpp/ARD/MilkSpeech.pdf.
https://www.youtube.com/.
https://github.com/philbot9/youtube-comment-scraper.
We considered women in the fields of STEM, people belonging to the LGBTQIA+ community, racial minorities or people with disabilities as a part of the EDI community for this study.
https://pypi.org/project/langdetect/.
https://www.youtube.com/watch?v=C-uyB5I6WnQ &t=6s.
https://www.youtube.com/watch?v=UcuS5glhNto.
https://www.youtube.com/watch?v=hNeR4bBUj68.
https://www.youtube.com/watch?v=LqP6iU3g2eE.
https://www.nltk.org/.
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html.

References

Roberson, Q., Ryan, A.M., Ragins, B.R.: The evolution and future of diversity at work. J. Appl. Psychol. 102(3), 483 (2017)
Article Google Scholar
Roberson, Q.M.: Disentangling the meanings of diversity and inclusion in organizations. Group Organ. Manag. 31(2), 212–236 (2006)
Article Google Scholar
Shore, L.M., Randel, A.E., Chung, B.G., Dean, M.A., Holcombe Ehrhart, K., Singh, G.: Inclusion and diversity in work groups: a review and model for future research. J. Manag. 37(4), 1262–1289 (2011)
Google Scholar
Finkel, M.J., Storaasli, R.D., Bandele, A., Schaefer, V.: Diversity training in graduate school: an exploratory evaluation of the safe zone project. Prof. Psychol. Res. Pract. 34(5), 555 (2003)
Article Google Scholar
Poteat, T., Park, C., Solares, D., Williams, J.K., Wolf, R.C., Metheny, N., Vazzano, A., Dent, J., Gibbs, A., Nonyane, B.A.S., et al.: Changing hearts and minds: results from a multi-country gender and sexual diversity training. PLoS ONE 12(9), e0184484 (2017)
Article Google Scholar
Leavy, S.: Gender bias in artificial intelligence: The need for diversity and gender theory in machine learning. In: Proceedings of the 1st international workshop on gender equality in software engineering, pp. 14–16 (2018)
Gowen, K., Deschaine, M., Gruttadara, D., Markey, D.: Young adults with mental health conditions and social networking websites: Seeking tools to build community. Psychiatr. Rehabil. J. 35(3), 245–250 (2012). https://doi.org/10.2975/35.3.2012.245.250
Article Google Scholar
Yates, A., Cohan, A., Goharian, N.: Depression and self-harm risk assessment in online forums. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2968–2978. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/D17-1322. https://www.aclweb.org/anthology/D17-1322
Wang, Z., Jurgens, D.: It’s going to be okay: Measuring access to support in online communities. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 33–45. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-1004. https://www.aclweb.org/anthology/D18-1004
Anderson, R.M., Heesterbeek, H., Klinkenberg, D., Hollingsworth, T.D.: How will country-based mitigation measures influence the course of the COVID-19 epidemic? Lancet 395(10228), 931–934 (2020)
Article Google Scholar
Pérez-Escoda, A., Jiménez-Narros, C., Perlado-Lamo-de Espinosa, M., Pedrero-Esteban, L.M.: Social networks’ engagement during the COVID-19 pandemic in Spain: health media vs. healthcare professionals. Int. J. Environ. Res. Public Health 17(14), 5261 (2020)
Article Google Scholar
Elmer, T., Mepham, K., Stadtfeld, C.: Students under lockdown: comparisons of students’ social networks and mental health before and during the covid-19 crisis in switzerland. PLoS ONE 15(7), e0236337 (2020)
Article Google Scholar
Rook, K.S., Charles, S.T.: Close social ties and health in later life: Strengths and vulnerabilities. The American psychologist 72(6), 567–577 (2017). https://doi.org/10.1037/amp0000104. https://pubmed.ncbi.nlm.nih.gov/28880103. 28880103[pmid]
Chung, J.E.: Social networking in online support groups for health: How online social networking benefits patients. J. Health Commun. 19(6), 639–659 (2013). https://doi.org/10.1080/10810730.2012.757396
Article Google Scholar
Altszyler, E., Berenstein, A.J., Milne, D., Calvo, R.A., Fernandez Slezak, D.: Using contextual information for automatic triage of posts in a peer-support forum. In: Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 57–68. Association for Computational Linguistics, New Orleans, LA (2018). https://doi.org/10.18653/v1/W18-0606. https://www.aclweb.org/anthology/W18-0606
Tortoreto, G., Stepanov, E., Cervone, A., Dubiel, M., Riccardi, G.: Affective behaviour analysis of on-line user interactions: Are on-line support groups more therapeutic than twitter? In: Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task, pp. 79–88. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/W19-3211. https://www.aclweb.org/anthology/W19-3211
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10. Association for Computational Linguistics, Valencia, Spain (2017). https://doi.org/10.18653/v1/W17-1101. https://www.aclweb.org/anthology/W17-1101
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Predicting the type and target of offensive posts in social media. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 1415–1420. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1144. https://www.aclweb.org/anthology/N19-1144
Austin, D., Sanzgiri, A., Sankaran, K., Woodard, R., Lissack, A., Seljan, S.: Classifying sensitive content in online advertisements with deep learning. Int. J. Data Sci. Anal. 10(3), 265–276 (2020)
Article Google Scholar
Yenala, H., Jhanwar, A., Chinnakotla, M.K., Goyal, J.: Deep learning for detecting inappropriate content in text. Int. J. Data Sci. Anal. 6(4), 273–286 (2018)
Article Google Scholar
Lee, Y., Yoon, S., Jung, K.: Comparative studies of detecting abusive language on twitter. In: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pp. 101–106. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/W18-5113 . https://www.aclweb.org/anthology/W18-5113
Davidson, T., Bhattacharya, D., Weber, I.: Racial bias in hate speech and abusive language detection datasets. In: Proceedings of the Third Workshop on Abusive Language Online, pp. 25–35. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/W19-3504. https://www.aclweb.org/anthology/W19-3504
Snyder, C.R., Rand, K.L., Sigmon, D.R.: Hope theory: A member of the positive psychology family. (2002)
Herrestad, H., Biong, S.: Relational hopes: A study of the lived experience of hope in some patients hospitalized for intentional self-harm. International Journal of Qualitative Studies on Health and Well-being 5(1), 4651 (2010). https://doi.org/10.3402/qhw.v5i1.4651. PMID: 20640026
Milk, H.: The hope speech. We are everywhere: A historical sourcebook of gay and lesbian politics pp. 51–53 (1997)
Palakodety, S., KhudaBukhsh, A.R., Carbonell, J.G.: Hope speech detection: A computational analysis of the voice of peace. In: Proceedings of the 24th European Conference on Artificial Intelligence - ECAI 2020 (2020)
Palakodety, S., KhudaBukhsh, A.R., Carbonell, J.G.: Voice for the voiceless: Active sampling to detect comments supporting the rohingyas. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 454–462 (2020)
Sciullo, A.M.D., Muysken, P., Singh, R.: Government and code-mixing. Journal of Linguistics 22(1), 1–24 (1986). http://www.jstor.org/stable/4175815
Marrese-Taylor, E., Balazs, J., Matsuo, Y.: Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN. In: Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 102–111. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/W17-5213. https://www.aclweb.org/anthology/W17-5213
Muralidhar, S., Nguyen, L., Gatica-Perez, D.: Words worth: Verbal content and hirability impressions in YouTube video resumes. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 322–327. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/W18-6247. https://www.aclweb.org/anthology/W18-6247
Krishna, A., Zambreno, J., Krishnan, S.: Polarity Trend Analysis of Public Sentiment on YouTube. In: Proceedings of the 19th International Conference on Management of Data, COMAD ’13, p. 125–128. Computer Society of India, Mumbai, Maharashtra, IND (2013). https://dl.acm.org/doi/10.5555/2694476.2694505
Severyn, A., Moschitti, A., Uryupina, O., Plank, B., Filippova, K.: Opinion mining on YouTube. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1252–1261. Association for Computational Linguistics, Baltimore, Maryland (2014). https://doi.org/10.3115/v1/P14-1118. https://www.aclweb.org/anthology/P14-1118
Chakravarthi, B.R., Jose, N., Suryawanshi, S., Sherly, E., McCrae, J.P.: A sentiment analysis dataset for code-mixed Malayalam-English. In: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pp. 177–184. European Language Resources association, Marseille, France (2020). https://www.aclweb.org/anthology/2020.sltu-1.25
Chakravarthi, B.R., Muralidaran, V., Priyadharshini, R., McCrae, J.P.: Corpus creation for sentiment analysis in code-mixed Tamil-English text. In: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pp. 202–210. European Language Resources association, Marseille, France (2020). https://www.aclweb.org/anthology/2020.sltu-1.28
Sun, T., Gaut, A., Tang, S., Huang, Y., ElSherief, M., Zhao, J., Mirza, D., Belding, E., Chang, K.W., Wang, W.Y.: Mitigating gender bias in natural language processing: Literature review. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1630–1640. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1159. https://www.aclweb.org/anthology/P19-1159
Vanmassenhove, E., Hardmeier, C., Way, A.: Getting gender right in neural machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3003–3008. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-1334. https://www.aclweb.org/anthology/D18-1334
Prates, M.O.R., Avelar, P.H., Lamb, L.C.: Assessing gender bias in machine translation: a case study with google translate. Neural Comput. Appl. 32(10), 6363–6381 (2020). https://doi.org/10.1007/s00521-019-04144-6
Article Google Scholar
Tatman, R.: Gender and dialect bias in YouTube’s automatic captions. In: Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, pp. 53–59. Association for Computational Linguistics, Valencia, Spain (2017). https://doi.org/10.18653/v1/W17-1606. https://www.aclweb.org/anthology/W17-1606
Waseem, Z., Davidson, T., Warmsley, D., Weber, I.: Understanding abuse: A typology of abusive language detection subtasks. In: Proceedings of the First Workshop on Abusive Language Online, pp. 78–84. Association for Computational Linguistics, Vancouver, BC, Canada (2017). https://doi.org/10.18653/v1/W17-3012. https://www.aclweb.org/anthology/W17-3012
Clarke, I., Grieve, J.: Dimensions of abusive language on twitter. In: Proceedings of the First Workshop on Abusive Language Online, pp. 1–10. Association for Computational Linguistics, Vancouver, BC, Canada (2017). https://doi.org/10.18653/v1/W17-3001. https://www.aclweb.org/anthology/W17-3001
Ousidhoum, N., Lin, Z., Zhang, H., Song, Y., Yeung, D.Y.: Multilingual and multi-aspect hate speech analysis. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 4675–4684. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1474. https://www.aclweb.org/anthology/D19-1474
Nogueira dos Santos, C., Melnyk, I., Padhi, I.: Fighting offensive language on social media with unsupervised text style transfer. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 189–194. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2031. https://www.aclweb.org/anthology/P18-2031
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: SemEval-2019 task 6: Identifying and categorizing offensive language in social media (OffensEval). In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 75–86. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019). https://doi.org/10.18653/v1/S19-2010. https://www.aclweb.org/anthology/S19-2010
Sigurbergsson, G.I., Derczynski, L.: Offensive language and hate speech detection for Danish. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp. 3498–3508. European Language Resources Association, Marseille, France (2020). https://www.aclweb.org/anthology/2020.lrec-1.430
Wiegand, M., Ruppenhofer, J., Kleinbauer, T.: Detection of Abusive Language: the Problem of Biased Datasets. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 602–608. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1060. https://www.aclweb.org/anthology/N19-1060
Xia, M., Field, A., Tsvetkov, Y.: Demoting racial bias in hate speech detection. In: Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media, pp. 7–14. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.socialnlp-1.2. https://www.aclweb.org/anthology/2020.socialnlp-1.2
Robinson, L., Schulz, J., Blank, G., Ragnedda, M., Ono, H., Hogan, B., Mesch, G.S., Cotten, S.R., Kretchmer, S.B., Hale, T.M., Drabowicz, T., Yan, P., Wellman, B., Harper, M.G., Quan-Haase, A., Dunn, H.S., Casilli, A.A., Tubaro, P., Carvath, R., Chen, W., Wiest, J.B., Dodel, M., Stern, M.J., Ball, C., Huang, K.T., Khilnani, A.: Digital inequalities 2.0: Legacy inequalities in the information age. First Monday 25(7) (2020). https://doi.org/10.5210/fm.v25i7.10842. https://firstmonday.org/ojs/index.php/fm/article/view/10842
Chung, Y.L., Kuzmenko, E., Tekiroglu, S.S., Guerini, M.: CONAN - COunter NArratives through nichesourcing: a multilingual dataset of responses to fight online hate speech. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2819–2829. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1271. https://www.aclweb.org/anthology/P19-1271
Tekiroğlu, S.S., Chung, Y.L., Guerini, M.: Generating counter narratives against online hate speech: Data and strategies. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1177–1190. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-main.110. https://www.aclweb.org/anthology/2020.acl-main.110
Mathew, B., Saha, P., Tharad, H., Rajgaria, S., Singhania, P., Maity, S.K., Goyal, P., Mukherjee, A.: Thou shalt not hate: Countering online hate speech. Proceedings of the International AAAI Conference on Web and Social Media 13(01), 369–380 (2019). https://www.aaai.org/ojs/index.php/ICWSM/article/view/3237
Youssef, C.M., Luthans, F.: Positive organizational behavior in the workplace: The impact of hope, optimism, and resilience. J. Manag. 33(5), 774–800 (2007). https://doi.org/10.1177/0149206307305562
Article Google Scholar
Chang, E.C.: Hope, problem-solving ability, and coping in a college student population: Some implications for theory and practice. J. Clin. Psychol. 54(7), 953–962 (1998). https://doi.org/10.1002/(SICI)1097-4679(199811)54:7<953::AID-JCLP9>3.0.CO;2-F
Article Google Scholar
Cover, R.: Queer youth resilience: Critiquing the discourse of hope and hopelessness in lgbt suicide representation. M/C Journal 16(5) (2013). http://www.journal.media-culture.org.au/index.php/mcjournal/article/view/702
Barman, U., Das, A., Wagner, J., Foster, J.: Code mixing: A challenge for language identification in the language of social media. In: Proceedings of the First Workshop on Computational Approaches to Code Switching, pp. 13–23. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-3902. https://www.aclweb.org/anthology/W14-3902
Barman, U., Wagner, J., Chrupała, G., Foster, J.: DCU-UVT: Word-level language classification with code-mixed data. In: Proceedings of the First Workshop on Computational Approaches to Code Switching, pp. 127–132. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-3915. https://www.aclweb.org/anthology/W14-3915
Bali, K., Sharma, J., Choudhury, M., Vyas, Y.: “I am borrowing ya mixing ?” an analysis of English-Hindi code mixing in Facebook. In: Proceedings of the First Workshop on Computational Approaches to Code Switching, pp. 116–126. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-3914. https://www.aclweb.org/anthology/W14-3914
Gupta, D., Lenka, P., Ekbal, A., Bhattacharyya, P.: Uncovering code-mixed challenges: A framework for linguistically driven question generation and neural based question answering. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 119–130. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/K18-1012. https://www.aclweb.org/anthology/K18-1012
Krippendorff, K.: Estimating the reliability, systematic error and random error of interval data. Educ. Psychol. Measur. 30(1), 61–70 (1970). https://doi.org/10.1177/001316447003000105
Article Google Scholar
Krippendorff, K.: Computing Krippendorff’s alpha-reliability (2011)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: Glue: A multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 353–355 (2018)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392 (2016)
Wenzek, G., Lachaux, M.A., Conneau, A., Chaudhary, V., Guzmán, F., Joulin, A., Grave, É.: CCNet: Extracting high quality monolingual datasets from web crawl data. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4003–4012 (2020)
Lample, G., Conneau, A.: Cross-lingual language model pretraining. In: NeurIPS (2019)
Sharma, M., Arora, G.: Spartans@LT-EDI-EACL2021: Inclusive Speech Detection using Pretrained Language Models. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Mahajan, K., Al-Hossami, E., Shaikh, S.: TeamUNCC@LT-EDI-EACL2021: Hope Speech Detection using Transfer Learning with Transformers. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Hossain, E., Sharif, O., Moshiul Hoque, M.: NLP-CUET@LT-EDI-EACL2021: Multilingual Code-Mixed Hope Speech Detection using Cross-lingual Representation Learner . In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Huang, B., Bai, Y.: TEAM HUB@LT-EDI-EACL2021: Hope Speech Detection Based On Pre-trained Language Model. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Balouchzahi, F., B K, A., Shashirekha, H.L.: MUCS@LT-EDI-EACL2021:CoHope-Hope Speech Detection for Equality, Diversity, and Inclusion in Code-Mixed Texts. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Zhao, Y.: ZYJ@LT-EDI-EACL2021:XLM-RoBERTa-Based Model with Attention for Hope Speech Detection. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Chinnappa, D.: Multilingual Hope Speech Detection for Code-mixed and Transliterated Texts. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Ziehe, S., Pannach, F., Krishnan, A.: cs-english@GCDH@LT-EDI-EACL2021: XLM-RoBERTa for Hope Speech Detection in English, Malayalam, and Tamil. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Dowlagar, S., Mamidi, R.: EDIOne@LT-EDI-EACL2021: Pre-trained Transformers with Convolutional Neural Networks for Hope Speech Detection. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Ghanghor, N.K., Ponnusamy, R., Kumaresan, P.K., Priyadharshini, R., Thavareesan, S., Chakravarthi, B.R.: IIITK@LT-EDI-EACL2021: Hope Speech Detection for Equality, Diversity, and Inclusion in Tamil, Malayalam and English. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Online (2021)
Dave, B., Bhat, S., Majumder, P.: IRNLP-DAIICT@LT-EDI-EACL2021: Hope Speech detection in Code Mixed text using TF-IDF Char N-grams and MuRIL. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
M K, J., A P, A.: KU-NLP@LT-EDI-EACL2021: A Multilingual Hope Speech Detection for Equality, Diversity, and Inclusion using Context Aware Embeddings. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Zhou, S.: Zeus@LT-EDI-EACL2021: Hope speech detection based on Pre-training Mode. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Saumya, S., Mishra, A.K.: IIIT-DWD@LT-EDI-EACL2021: Hope Speech Detection in YouTube multilingual comments. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Awatramani, V.: Hopeful NLP@LT-EDI-EACL2021: Finding Hope in YouTube Comment Section. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Upadhyay, I.S., E, N., Wadhawan, A., Mamidi, R.: Hopeful Men@LT-EDI-EACL2021: Hope Speech Detection Using Indic Transliteration and Transformers. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Que, Q.: Simon @ LT-EDI-EACL2021: Detecting Hope Speech with BERT. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
S, T., Tasubilli, R.T., Sai Rahul, K.: Amrita@LT-EDI-EACL2021: Hope Speech Detection on Multilingual Text. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
S, A., Ramakrishnan, A., Balaji, A., D, T., B, S.K.: ssn-diBERTsity@LT-EDI-EACL2021:Hope Speech Detection on multilingual YouTube comments via transformer based approach. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Puranik, K., Hande, A., Priyadharshini, R., Thavareesan, S., Chakravarthi, B.R.: IIITT@LT-EDI-EACL2021-Hope Speech Detection: There is always hope in Transformers. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Chen, S., Kong, B.: cs-english@LT-EDI-EACL2021: Hope Speech Detection Based On Fine-tuning AlBERT Model. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)
Gundapu, S., Mamidi, R.: Autobots@LT-EDI-EACL2021: All Lives Matter! Hope Speech Detection with BERT Transformer Model. In: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Online (2021)

Download references

Acknowledgements

The author Bharathi Raja Chakravarthi was supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289$\_$P2 (Insight$\_$2) and the Irish Research Council grant IRCLA/2017/129 (CARDAMOM - Comparative Deep Models of Language for Minority and Historical Languages) for his postdoctoral studies at National University of Ireland Galway.

Funding

Open Access funding provided by the IReL Consortium. This research has not been funded by any company or organisation.

Author information

Authors and Affiliations

Insight SFI Research Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland
Bharathi Raja Chakravarthi

Authors

Bharathi Raja Chakravarthi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bharathi Raja Chakravarthi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies conducted by any of the authors that involve human participants or animals. The authors complied with the ethical standards.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chakravarthi, B.R. Multilingual hope speech detection in English and Dravidian languages. Int J Data Sci Anal 14, 389–406 (2022). https://doi.org/10.1007/s41060-022-00341-0

Download citation

Received: 08 December 2021
Accepted: 18 June 2022
Published: 10 July 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s41060-022-00341-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multilingual hope speech detection in English and Dravidian languages

Abstract

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

Transformer models for text-based emotion detection: a review of BERT-based approaches

1 Introduction

2 Related works

3 Hope speech

4 Dataset construction

4.1 Code-mixing

4.2 Ethical concerns

4.3 Annotation set-up

4.4 Annotators

4.5 Inter-annotator Agreement

4.6 Corpus statistics

4.7 Problematic examples

5 Benchmark experiments

6 Task description

6.1 Training phase

6.2 Testing phase

7 Systems

7.1 System descriptions

8 Results and discussion

9 Conclusion

Data availability and material

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation