Multilingual hope speech detection in English and Dravidian languages

Recent work on language technology has aimed to identify negative language such as hate speech and cyberbullying as well as improve offensive language detection to mediate social media platforms. Most of these systems rely on using machine learning models along with the labelled dataset. Such models have succeeded in identifying negativity and removing it from the platform deleting it. However, recently, more research has been conducted on the improvement of freedom of speech on social media. Instead of deleting supposedly offensive speech, we developed a multilingual dataset to identify hope speech in the comments and promote positivity. This paper presents a multilingual hope speech dataset that promotes equality, diversity and inclusion (EDI) in English, Tamil, Malayalam and Kannada. It was collected to promote positivity and ensure EDI in language technology. Our dataset is unique, as it contains data collected from the LGBTQIA+ community, persons with disabilities and women in science, engineering, technology and management (STEM). We also report our benchmark system results in various machine learning models. We experimented on the Hope Speech dataset for Equality, Diversity and Inclusion (HopeEDI) using different state-of-the-art machine learning models and deep learning models to create benchmark systems.


Introduction
Recently, equality, diversity and inclusion (EDI) has attracted widespread attention with a focus on the protected classes of gender and race. It started as early as 1960, but it is only now that the interpretation of diversity has broadened to include other demographics such as the lesbian, gay, bisexual, transgender, queer/questioning (one's sexual or gender identity), intersex, and asexual/aromantic/agender (LGBTQIA+) community, women in the fields of science, engineering, technology and management (STEM), and persons with disabilities. [1]. Inclusion refers to making an individual feel like they are a part of a group or organisation, both in terms of the formal and informal environment [2,3]. Another essential part of this wheel is bias. People have both conscious and unconscious biases, which lead to explicit and implicit stereotyping, respectively. To avoid bias, much training has been provided to school students [4], employees and B Bharathi Raja Chakravarthi bharathi.raja@insight-centre.org 1 Insight SFI Research Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland various levels [5]. However, it is only very recently that artificial intelligence (AI) researchers have started looking at biases, especially gender bias [6]. Language technologies in AI are expected to have a growing influence over our lives in the internet era. Nevertheless, from the perspective of language technologies research, the EDI for minority LGBTQIA+ or marginalised populations has not been considered with great urgency or importance compared to other topics or areas. It is important that the language technologies developed consider the inclusion of all communities for social integration.
Online social media platforms such as Facebook, Twitter and YouTube have encouraged millions of people to express themselves and share their opinions. These platforms also provide a medium for many marginalised people to look for support online [7][8][9]. The emergence of the infectious disease COVID-19 led to the exposure of the entire population to the disease without specific pharmacological treatment; the exponential levels of infection has deeply affected countries across the world, and the pandemic forced public places to remain closed temporarily [10]. Several areas have been affected worldwide, and the fear of losing loved ones caused even basic necessities such as schools, hospitals and mental health care centres to remain closed [11]. As a consequence, people were forced to look at online forums for their informational and emotional needs. In some areas and for some people, online social networking has been the only means of ensuring social connectedness and seeking social support during the COVID-19 pandemic [12].
Online social networking delivers a platform for network individuals to be in the know and to be known, both of which are more significant with more prominent social integration. Social integration is essential for the overall well-being of every individual, but most importantly vulnerable individuals who are more prone to social exclusion. A sense of belonging and community is an essential aspect of people's mental health, which influences both psychological and physical well-being [13]. The importance of social inclusion in the online lives of marginalised populations, such as women in the fields of STEM, people who belong to the LGBTQIA+ community, racial minorities or people with disabilities, has been studied, and it has been proven that the online life of vulnerable individuals produces a significant impact on their mental health [14][15][16]. However, the contents of social media comments or posts may be negative, hateful, offensive or abusive since there is no mediating authority.
Comments and posts on social media have been analysed to find and stop the spread of negativity using methods such as hate speech detection [17], offensive language identification [18][19][20] and abusive language detection [21]. However, according to [22], technologies developed for the detection of abusive language do not consider the potential biases of the dataset that they are trained on. The systematic racial bias in the datasets causes abusive language detection to be biased, and this may result in discrimination against one group over another. This will have a negative impact on minorities or marginalised people. As language is a major part of communication, it should be inclusive. A large internet community that uses language technology has a direct impact on people across the globe. We should turn our attention towards spreading positivity instead of curbing an individual's freedom of speech by removing negative comments. However, hope speech detection should be done alongside hate speech detection. Otherwise, hope speech detection by itself may lead to bias while perpetrators of negative and harmful comments continue to act wildly on the web. Therefore, in our research, we focused on hope speech. Hope is commonly associated with the promise, potential, support, reassurance, suggestions or inspiration provided to participants by their peers during periods of illness, stress, loneliness and depression [23]. Psychologists, sociologists and social workers from the Association of Hope have concluded that hope can also be a useful tool for saving people from suicide or self-harm [24]. The 'Hope Speech' delivered by gay rights activist Harvey Milk on the steps of the San Francisco City Hall during a mass rally to celebrate Califor-nia Gay Freedom Day on 25 June 1978 1 inspired millions to demand rights that ensure EDI [25]. Recently, [26] analysed how to use hope speech from social media texts to diffuse tensions between two nuclear powered nations (India and Pakistan) and support marginalised Rohingya refugees [27]. They experimented with detecting hope versus nonhope. However, to the best of our knowledge, no prior work has explored hope speech for women in STEM, LGBTQIA+ individuals, racial minorities or people with disabilities in general.
Moreover, although people from various linguistic backgrounds are getting exposed to online social media language, English remains at the centre of ongoing trends in language technology research. Recently, some research studies have been conducted on high-resourced languages such as Arabic, German, Hindi and Italian. However, such studies usually use monolingual corpora and do not examine code-switched textual data. Code-switching is a phenomenon where the individual switches between two or more languages in a single utterance [28]. We have introduced a dataset for hope speech identification not only in English but also in the under-resourced code-switched Tamil (ISO 639-3: tam), Malayalam (ISO 639-3: mal) and Kannada (ISO 639-3: kan) languages.
-We have proposed to encourage hope speech rather than take away an individual's freedom of speech by detecting and removing a negative comment. -We applied the schema to create a multilingual hope speech dataset for EDI. This is a new large-scale dataset of English, Tamil (code-mixed) and Malayalam (codemixed) YouTube comments with high-quality annotation of the target. -We performed an experiment on Hope Speech dataset for Equality, Diversity and Inclusion (HopeEDI) using different state-of-the-art machine learning and deep learning models to create benchmark systems.

Related works
When it comes to crawling social media data, there are many works on YouTube mining [29,30], which are mainly focused on exploiting user comments. [31] performed opinion mining and a trend analysis on YouTube comments. The researchers conducted an analysis of the sentiments to identify their trends, seasonality and forecasts, and it was found that user sentiments are well correlated with the influence of real-world events. [32] conducted a systematic study on opinion mining by targeting YouTube comments. The authors developed a comment corpus containing 35K manually labelled data for modelling the opinion polarity of the comments based on tree kernel models. [33] and [34] collected comments from YouTube and created a manually annotated corpus for the sentiment analysis of the underresourced Tamil and Malayalam languages. Methods to mitigate gender bias in natural language processing (NLP) have been extensively studied for the English language [35]. Some studies have investigated gender bias beyond the English language using machine translation to French [36] and other languages [37]. [38] studied the gender and dialect bias in automatically generated captions on YouTube. Technologies for abusive language [39,40], hate speech [17,41] and offensive language detection [42][43][44] are being developed and applied without considering the potential biases [22,45,46]. However, current gender debiasing methods in NLP are not sufficient to debias other issues related to EDI in the end-to-end systems of many language technology applications; this causes unrest and escalates the issues with EDI besides leading to greater inequality on digital platforms [47].
The use of counter-narratives (i.e. informed textual responses) is another strategy that has received the attention of researchers recently [48,49]. A counter-narrative approach was proposed to weigh the right to freedom of speech and avoid over-blocking. [50] created and released a dataset for counterspeech using comments from YouTube. However, the core idea of directly intervening with textual responses escalates hostility even though it is advantageous for the writer to understand why their comment or post has been deleted or blocked and then favourably change the discourse and attitudes presented in their comments. Thus, we directed our attention to finding positive information such as hope and encouraging such activities.
Recently, a work by [26] and [27] analysed how to use hope speech from a social media text to diffuse tension between two nuclear powered nations (India and Pakistan) and support minority Rohingya refugees. However, the authors' definition of hope was only confined to diffusing tensions and preventing violence. It did not take into account other perspectives on hope and EDI. The authors did not provide more information such as the inter-annotator agreement, diversity among annotators and details about the dataset. The dataset is not publicly available for research. It was created in English, Hindi and other languages related known to the Rohingyas. Our work differs from the previous works in that we have defined hope speech for EDI and introduced a dataset for English, Tamil and Malayalam on the EDI of it. To the best of our knowledge, this was the first work to create a dataset for EDI in Tamil and Malayalam, which are under-resourced languages.

Hope speech
Hope is an upbeat state of mind based on a desire for positive outcomes in one's life or the world at large, and it is both present and future-oriented [23]. Inspirational talks about how people deal with and overcome adversity may also provide hope. Hope speech instills optimism and resilience, which have a beneficial impact on many parts of life, including [51] college [52] and other factors that put us at risk [53]. For our problem, we defined hope speech as 'YouTube comments/posts that offer support, reassurance, suggestions, inspiration and insight'.
The notion that one may uncover and become motivated to use routes to their desired goals is reflected in hope speech. Our approach sought to shift the dominant mindset away from a focus on discrimination, loneliness or the negative aspects of life and towards a focus on promoting confidence, offering support and creating positive characteristics based on individual remarks. Thus, we instructed annotators that if a comment or post meets the following conditions, then it should be annotated as hope speech.
-The comment contains inspiration provided to participants by their peers and others and/or offers support, reassurance, suggestions and insight. [We will survive these things]). -The comment explicitly talks about and says no to division in any form.
Non-hope speech includes comments that do not bring positivity, such as the following: -The comment uses racially, ethnically, sexually or nationally motivated slurs. -The comment produces hate towards a minority group.
-The comment is highly prejudiced and attacks people without thinking about the consequences.
-The comment does not inspire hope in the reader's mind.
Non-hope speech is different from hate speech. Some examples are provided below.
-'How is that the same thing???' This is non-hope speech, but it is not hate speech.-explanation -'Society says don't assume, but they assume to anyways' This is non-hope speech, but it is not hate speech.-explanation A hate speech or offensive language detection dataset is not available for code-mixed Tamil and code-mixed Malayalam, and it does not take into account LGBTQIA+ people, women in STEM or other minority or under-represented groups. Thus, we cannot use the existing hate speech or offensive language detection datasets to detect hope or non-hope for EDI of minorities.

Dataset construction
We concentrated on gathering information from YouTube comments on social media, 2 which is the most widely used platform in the world for commenting on and publicly expressing opinions about topics or videos. We did not use comments from LGBTQIA+ people's personal coming out stories since they contained personal information.
For English, we gathered information on recent EDI themes such as women in STEM, LGBTQIA+ concerns, COVID-19, Black Lives Matter, the United Kingdom (UK) versus China, the United States of America (USA) and Australia versus China. The information was collected from recordings of individuals from English-speaking nations like Australia, Canada, Ireland, the United Kingdom, the United States of America and New Zealand. For Tamil and Malayalam, we gathered data from India on recent themes such as LGBTQIA+ concerns, COVID-19, women in STEM, the Indo-China war and Dravidian affairs. India is a country that is multilingual and multiracial. In terms of linguistics, India is split into three major language families: Dravidian, Indo-Aryan and Tibeto-Burman. The ongoing Indo-China border conflict has sparked online bigotry towards persons with East-Asian characteristics despite the fact that they are Indians from the North East. Similarly, in Tamil Nadu, the National Education Policy, which calls for the adoption of Sanskrit or Hindi, has exacerbated concerns about the linguistic autonomy of Dravidian languages. We used the YouTube comment scraper 3 to collect comments.
From November 2019 to June 2020, we gathered data on the aforementioned subjects. We believe that the statistics we have shared will help to reduce animosity and promote optimism. Our dataset was created as a multilingual resource to enable cross-lingual research and analysis. It includes hope speeches in English, Tamil and Malayalam, among other languages.

Code-mixing
When a speaker employs two or more languages in a single speech, it is known as code-mixing. It is prevalent in the social media discourse of multilingual speakers. Codemixing has long been connected with a lack of formal or informal linguistic expertise. It is, nevertheless, common in user-generated social media material according to studies. In a multilingual country like India, code-mixing is quite a frequent occurrence [54][55][56][57]. Our Tamil and Malayalam datasets are code-mixed since our data was collected from YouTube. In our corpus, we found all three forms of codemixing, including tag, inter-sentential and intra-sentential. Our corpus also includes code-mixing between Latin and native scripts.

Ethical concerns
Data collected from social media is extremely sensitive, especially when it concerns minorities such as the LGBTQIA+ community or women. By eliminating personal information from the dataset, such as names but not celebrity names, we have taken great care to reduce the danger of the data revealing an individual's identity. However, in order to investigate EDI, we needed to keep track of the information on race, gender, sexual orientation, ethnicity and philosophical views. The annotators only viewed anonymised postings and promised not to contact the author of a remark. Only researchers who agree to follow ethical norms will be given access to the dataset for research purposes. We opted not to ask the annotator for racial information after a lengthy debate with our local EDI committee members. 4 Due to recent events, the EDI committee was strongly against the collection of racial information based on the belief that it would split people according to their racial origin. Thus, we recorded only the nationality of the annotators.

Annotation set-up
After the data collection phase, we cleaned the data using Langdetect 5 to identify the language of the comments and removed comments that were not in the specified languages. However, owing to code-mixing at various levels, comments in other languages became unintentionally included in the cleaned corpus of the Tamil and Malayalam comments. Finally, based on our description from Sect. 3, we identified three groups, two of which were hope and non-hope; the last group (Other languages) was introduced to account for comments that were not in the required language. These classes were chosen since they provided a sufficient amount of generalisation for describing the remarks in the EDI hope speech dataset.

Annotators
We created Google forms to collect annotations from annotators. To maintain the level of annotation, each form was limited to 100 comments and each page to ten comments. We collected information on the annotator's gender, educational background and preferred medium of instruction in order to comprehend the annotator's diversity and avoid bias. The annotators were warned that the comments may contain profanity and hostile material. If the annotator deemed the remarks to be too upsetting or unmanageable, they were offered the choice of ceasing to annotate. We trained annotators by directing them to YouTube videos on EDI. 6,7,8,9 Each form was annotated by at least three individuals. After the annotators marked the first form with 100 comments, the findings were manually validated in the warm-up phase. This strategy was utilised to help them acquire a better knowledge of EDI and focus on the project. Following the initial stage of annotating their first form, a few annotators withdrew from the project and their remarks were deleted. The annotators were told to conduct another evaluation of the EDI videos and annotation guidelines. From Table 1, we can see the statistics pertaining to the annotators. The annotators for English language remarks came from Australia, Ireland, the United Kingdom and the United States of America. We were able to obtain annotations in Tamil from persons from both India's Tamil Nadu and Sri Lanka. Graduate and postgraduate students made up the majority of the annotators.

Inter-annotator Agreement
We used the majority to aggregate the hope speech annotations from several annotators; the comments that did not get a majority in the first round were collected and added to a second Google form to allow more annotators to contribute them. We calculated the inter-annotator agreement following the last round of annotation. We quantified the clarity of the annotation and reported the inter-annotator agreement using Krippendorff's alpha. Krippendorff's alpha is a statistical measure of annotator agreement that indicates how well the resulting data corresponds to actual data [58]. Although Krippendorff's alpha (α) is computationally hard, it was more relevant in our instance since the comments were annotated by more than two annotators and not all sentences were commented on by the same annotator. It is unaffected by missing data, allows for variation in sample sizes, categories and the number of raters and may be used with any measurement level, including nominal, ordinal, interval and ratio. α is characterised by the following: D o is the observed disagreement between sentiment labels assigned by the annotators, and D e is the disagreement expected when the coding of sentiments can be attributed to chance rather than to the inherent property of the sentiment itself.
Here o ck n c n k and n refer to the frequencies of values in the coincidence matrices, and metric refers to any metric or level of measurement such as nominal, ordinal, interval, ratio and others. Krippendorff's alpha applies to all these metrics. The range of α is between '0' and '1' and 1 ≥ α ≥ 0. When α is '1', there is perfect agreement between the annotators, and when it is '0', the agreement is entirely due to chance. It is customary to require α ≥.800. A reasonable rule of thumb that allows for tentative conclusions to be drawn requires 0.67 ≤ α ≤ 0.8 while α ≥.653 is the lowest conceivable limit. For computing Krippendorff's alpha (alpha) [59], we utilised nltk. 10    have a broad vocabulary as a result of the various types of code-switching that take place. Table 3 shows the distribution of the annotated dataset by the label in the reference tab: data distribution. As a result, the data was found to be biased, with nearly all of the comments being classified as 'not optimistic' (NOT). An automatic detection system that can manage imbalanced data is essential for being really successful in the age of user-generated content on internet platforms, which is becoming increasingly popular. Using the fully annotated dataset, a train set, a development set and a test set were produced.

Corpus statistics
A few samples from the dataset, together with their translations and hope speech class annotations, are shown below.
kashtam thaan. irundhaalum muyarchi seivom -It is indeed difficult. Let us try it out though. Hope speech uff. China mon vannallo-Phew! Here comes the Chinese guy. Non-hope speech paambu kari saappittu namma uyirai vaanguranunga-These guys (Chinese) eat snake meat and make our lives miserable. Non-hope speech

Problematic examples
We found some problematic comments during the process of annotation. -

Benchmark experiments
We presented our dataset by utilising a broad range of common classifiers on the dataset's imbalanced parameters, and the results were quite promising. The experiment was conducted on the token frequency-inverse document frequency (TF-IDF) relationship between tokens and documents. To generate baseline classifiers, we utilised the sklearn package (https://scikit-learn.org/stable/) from the sklearn project. Alpha = 0.7 was used for the multinomial Naive Bayes model. We employed a grid search for the k-nearest neighbours (KNN), a support vector machine (SVM), a decision tree and logistic regression. Detailed information on the parameters of the classifier will be made available in the code. By using Facebook AI's RoBERTa model, which is an upgraded version of the BERT model [60], the company has achieved state-of-the-art results on numerous natural language understanding (NLU) tasks, including GLUE [61] and SQUAD [62]. RoBERTa is enhanced by training BERT for a longer period of time on longer sequences, increasing the amount of data available, eliminating the sentence prediction target during pre-training and modifying the masking pattern used during pre-training, among other things. It was created with the goal of increasing cross-lingual language understanding (XLU) by utilising a transformer-based masked language model, and it is known as XLM-RoBERTa (MLM). In order to train XLM-RoBERTa, it was fed 2 gigabytes of CommonCrawl data [63], which had one hundred languages in total. It was found that XLM-RoBERTa surpasses its multilingual MLM competitors mBERT [60] and XLM [64] in terms of performance.
Using the training dataset, we trained our models; the development dataset was used to fine-tune the hyperparameters, and the models were assessed by predicting labels for the held-out test set, as shown in Table 4. The performance of the categorisation was measured using a macro-averaged F-score, which was derived by averaging accuracy and recall over a large number of trials. Such a decision was made owing to the uneven class distribution, which causes well-known measures of performance such as accuracy and the micro-averaged F-score to be less than accurately representational of actual performance. Since the overall performance of all classes is important, we also presented the accuracy, recall and weighted F-score of the individual courses in addition to the overall performance. There are three tables in this section that provide the precision, recall and F-score findings of the HopeEDI test set employing baseline classifiers in conjunction with support from the test data: Table 5, Table 6 and Table 7.
As demonstrated, all of the models performed badly as a result of an issue with class imbalance. Using the HopeEDI dataset, the SVM classifier showed the worst performance, with macro-averaged F-scores of 0.32, 0.21 and 0.28 for English, Tamil and Malayalam, respectively. The decision tree led to a higher macro F-score for English and Malayalam than the logistic regression; however, Tamil fared well in both tests. In order to eliminate non-intended language comments from our dataset, we applied language identification techniques. The annotation 'Other languages' was made in some comments by annotators although this was not the case in all of them. Another inconsistency was introduced into our dataset as a result of this. The majority of the macro scores were lower for English as a result of the 'Other languages' category. In the case of English, this could have been prevented by simply eliminating those comments from the dataset. However, this label was required for Tamil and Malayalam since the comments in these languages were code-mixed and written in a script that was not native to the language (Latin script). The distribution of data for the Tamil language was roughly equal between the hope and non-hope classes.
The usefulness of our dataset was evaluated through the use of machine learning techniques, which we carried out in our trials. Due to its novel method of data collection and annotation, we believe the HopeEDI dataset has the potential to revolutionise the field of language technology. We believe that it will open up new directions in the future for further research on positivity.

Task description
We also organised a shared task to invite more researchers to perform hope speech detection and benchmark the data. For our problem, we defined the hope speech as 'YouTube comments/posts that offer support, reassurance, suggestions, inspiration and insight'. A comment or post within the corpus may contain more than one sentence, but the average

Training phase
During the first phase, participants were provided with training, validation and development data in order to train and develop hope speech detection for one or more of the three languages. Cross-validation on the training data was an option as was using the validation dataset for early evaluations and the development set for hyper-parameter sharing. The objective of this step was to guarantee that the participants' systems were ready for review before the test data was released. In total, 137 people registered and downloaded the data in all three languages.

Testing phase
The test dataset was provided without the gold labels in CodaLab during this phase. Participants were given Google forms to fill out in order to submit their predictions. They were given the option of submitting their findings as many times as they wished, with the best entry being picked for assessment and the creation of the rank list. The outcomes were compared to the gold standard labels. Across all classes, the classification system's performance was assessed in terms of the weighted averaged precision, recall and F-score. The support-weighted mean per label was calculated using the weighted averaged scores. The metric used for preparing the rank list was the weighted F1 score. Participants were encouraged to check their systems using the Sklearn classification report. 11 The final test included 30, 31 and 31 participants for Tamil, Malayalam and English languages, respectively.

System descriptions
In this section, we have summarised the systems implemented by the participants to complete the shared task. For more details, please refer to the shared task papers submitted by the authors.
- [71] participated in identifying hope speech classes in the English, Tamil and Malayalam datasets. They presented a two-phase mechanism to detect hope speech. In the first phase, they built a classifier to identify the language of the text. In the second phase, they created a classifier to identify the class labels. The author used the language models SBERT, FNN and BERT inference. They achieved the 3rd, 4th and 2nd ranks in Tamil, Malayalam and English, respectively. - [76] used context-aware string embeddings for word representations and recurrent neural networks (RNNs) and pooled document embeddings for text representation. Their proposed methodology achieved a higher performance than the baseline results. The highest weighted average F-scores of 0.93, 0.56 and 0.84 for English, Tamil and Malayalam were reported on the final evaluation test set. The proposed models outperformed baselines by 3%, 2% and 11% in absolute terms for English, Tamil and Malayalam.
- [73] performed experiments by taking advantage of the pre-processing and transfer learning models. They showed that the pre-trained multilingual BERT model with convolution neural networks provided the best results. Their model ranked 1st, 3rd and 4th on the English, Malayalam-English and Tamil-English codemixed datasets, respectively. - [83] trained the data using transformer models, specifically mBERT for Tamil and Malayalam and BERT for English, and achieved weighted average F1 scores of 0.38, 0.81 and 0.92 for Tamil, Malayalam and English, respectively. They achieved the ranks of 14, 4 and 2 for Tamil, Malayalam and English, respectively. - [84] experimented with several transformer-based models, including BERT, ALBERT, DistilBERT, XLM-RoBERTa and MuRIL, to classify the dataset into   [70] used the attention mechanism to adjust the weights of all the output layers of XLM-RoBERTa to make full use of the information extracted from each layer, and they used the weighted sum of all the output layers to complete the classification task. They used the stratified k fold method to address class imbalance. They achieved weighted average F1 scores of 0.59, 0.84 and 0.92 for Tamil, Malayalam and English languages, which ranked 3rd, 2nd and 2nd, respectively. - [68] used the method and model that combines the XLM-RoBERTa pre-raining language model and the TF-IDF algorithm. They secured the 1st, 2nd and 3rd ranks on the English, Malayalam and Tamil datasets, respectively.
- [85] used fine-tuned BERT and k fold cross-validation to accomplish classification on the English dataset. They achieved a final F1 score of 0.93 and secured the 1st rank for the English language. - [72] demonstrated that even very simple baseline algorithms perform reasonably well in this task if provided with enough training data. However, their bestperforming algorithm was a cross-lingual transfer learning approach where they fine-tuned XLM-RoBERTa. The model achieved the 1st rank for Malayalam and English and the 4th rank for Tamil. - [66], in their paper, described their approach of finetuning RoBERTa for hope speech detection in English and fine-tuning XLM-RoBERTa for hope speech detection in the Tamil and Malayalam languages. They ranked 1st in English (F1 = 0.93), 1st in Tamil (F1 = 0.61) and 3rd in Malayalam (F1 = 0.83). - [86] described a transformer-based BERT model for hope speech detection. Their model achieved a weighted averaged F1 score of 0.93 on the test set for English. They showed that the BERT model helped in providing better contextual representation of words in a comment and that the language identification model assisted in detecting comments in the 'Other languages' category. They also explored the use of other transformer models such as RoBERTa, XLNet, Albert, FLAIR and ELMo for a superior hope speech detection. - [82] proposed a BiLSTM with an attention-based approach to solving hope speech detection, and using this approach, they achieved an F1 score of 0.73 (9th rank) in the Malayalam-English dataset. - [80] experimented with two approaches. In the first approach, they used contextual embeddings to train classifiers using logistic regression-, random forest-, SVMand LSTM-based models. The second approach involved using a majority voting ensemble of 11 models that were obtained by fine-tuning pre-trained transformer models (BERT, AL-BERT, RoBERTa and IndicBERT) after adding an output layer. They found that the second approach was superior for English, Tamil [65] extended the work of Arora (2020a), as they used their strategy to synthetically generate code-mixed data for training a transformer-based RoBERTa model and used it in an ensemble along with their pre-trained ULM-FiT. They presented the RoBERTa language model for code-mixed Tamil, which they had pre-trained from scratch. Using transfer learning, they fine-tuned the RoBERTa and ULMFiT language models on downstream tasks of OLI and HSD. They secured the 4th rank in the former task using an ensemble of classifiers trained on RoBERTa and ULMFiT and the 1st rank in the latter task using the classifier based on ULMFiT.

Results and discussion
Overall, we received a total of 31, 31 and 30 submissions for English, Malayalam and Tamil tasks. It is interesting to note that the top-performing teams in all the three languages predominantly used XLM-RoBERTa to complete the shared task. One of the top-ranking teams for English used contextaware string embeddings for word representations and RNNs as well as pooled document embeddings for text representation. Among the other submissions, although Bi-LSTM was popular, there were other machine learning and deep learning models that were used. However, they did not achieve good results compared to the RoBERTa-based models. The top scores were 0.61, 0.85 and 0.93 for Tamil, Malayalam and English, respectively. The ranges of scores were between 0.37 and 0.61, 0.49 and 0.85 and 0.61 and 0.93 for the Tamil, Malayalam and English datasets, respectively. It can be seen that the F1 scores of all the submissions on the Tamil dataset were considerably lower than those of Malayalam and English. It is not surprising that the English scores were better, as many approaches used variations of pretrained transformer-based models trained on English data. Due to code-mixing at various levels, the scores were naturally lower for the Malayalam and Tamil datasets. Among these two, the systems submitted performed badly on Tamil data. The identification of the exact reasons for the bad performance in Tamil requires further research. However, one possible explanation for this could be that the distribution of the 'Hope_speech' and 'Non_hope_speech' classes in Tamil was starkly different from that in English and Malayalam. In the remaining two classes, the number of non-hope speech comments were significantly higher than hope speech comments.

Conclusion
As online content increases massively, it is necessary to encourage positivity, such as in the form of hope speech on online forums, to induce compassion and acceptable social behaviour. In this paper, we presented the largest manually annotated dataset of hope speech detection in English, Tamil and Malayalam consisting of 28,451, 20,198 and 10,705 comments, respectively. We believe that this dataset will facilitate future research on encouraging positivity. We aimed to promote research on hope speech and encourage positive content on online social media for ensuring EDI. In the future, we plan to extend the study by introducing a larger dataset with further fine-grained classification and content analysis. Funding Open Access funding provided by the IReL Consortium This research has not been funded by any company or organisation.

Data availability and material
The datasets used in this paper were obtained from https://huggingface.co/datasets/hope_edi.

Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval This article does not contain any studies conducted by any of the authors that involve human participants or animals. The authors complied with the ethical standards.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.