Leveraging distant supervision and deep learning for twitter sentiment and emotion classification

Kastrati, Muhamet; Kastrati, Zenun; Shariq Imran, Ali; Biba, Marenglen

doi:10.1007/s10844-024-00845-0

Leveraging distant supervision and deep learning for twitter sentiment and emotion classification

Research
Open access
Published: 22 March 2024

Volume 62, pages 1045–1070, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Leveraging distant supervision and deep learning for twitter sentiment and emotion classification

Download PDF

Muhamet Kastrati¹,
Zenun Kastrati²,
Ali Shariq Imran³ &
…
Marenglen Biba¹

1618 Accesses
3 Citations
Explore all metrics

Abstract

Nowadays, various applications across industries, healthcare, and security have begun adopting automatic sentiment analysis and emotion detection in short texts, such as posts from social media. Twitter stands out as one of the most popular online social media platforms due to its easy, unique, and advanced accessibility using the API. On the other hand, supervised learning is the most widely used paradigm for tasks involving sentiment polarity and fine-grained emotion detection in short and informal texts, such as Twitter posts. However, supervised learning models are data-hungry and heavily reliant on abundant labeled data, which remains a challenge. This study aims to address this challenge by creating a large-scale real-world dataset of 17.5 million tweets. A distant supervision approach relying on emojis available in tweets is applied to label tweets corresponding to Ekman’s six basic emotions. Additionally, we conducted a series of experiments using various conventional machine learning models and deep learning, including transformer-based models, on our dataset to establish baseline results. The experimental results and an extensive ablation analysis on the dataset showed that BiLSTM with FastText and an attention mechanism outperforms other models in both classification tasks, achieving an F1-score of 70.92% for sentiment classification and 54.85% for emotion detection.

Sentiment Polarity and Emotion Detection from Tweets Using Distant Supervision and Deep Learning Models

Language-Agnostic Method for Sentiment Analysis of Twitter

Twitter Sentiment Analysis Using LSTM-Dense-Dropout Hybrid Deep Learning Approach

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Microblogging and social networks wield significant influence today in a wide range of domains, encompassing daily communication, ideas sharing, opinions, emotions, reactions, shopping behaviors, political discourse, and responses to crises, to name a few (Kapoor et al., 2018). Over the past few years, researchers have shown a growing interest in text-based sentiment and emotion detection on online social networks, particularly Twitter and Facebook (Zimbra et al., 2018).

The vast amount of text generated by Twitter users serves as a rich source for capturing people’s emotions, integral to human life, and strongly influencing people’s behaviors and actions (Wang et al., 2012). Emotion detection in short texts, such as social media posts, has a high impact on different sectors including industries, health, security, or education with a wide range of applications such as e-learning environment, depression monitoring (Zucco et al., 2017), detecting mental disorders (Aragon et al., 2021), personality traits, detection suicide-related content and emotions (Schoene et al., 2022), hate speech detection, cyber-bullying identification, event detection, disease tracking, and cyber threat detection.

Moreover, detecting emotions in social network data poses a non-trivial task due to the brevity of the text, especially considering that Twitter users often employ non-standard language (irony, sarcasm, and humor) to express their emotional state (Canales et al., 2019). Additionally, social tweets are characterized by a prevalence of informal and slang words, misspellings, hashtags, emoticons, and abbreviations, making interpretation challenging for automated emotion detection models (Kusal et al., 2021).

Emotional models form the foundation of the emotion-sensing process, with three main modeling approaches being categorical, dimensional, and componential emotion models. The categorical emotion model assumes that only a small number of significant emotions are independent and not related to each other. Two predominant emotion models for emotion classification are Plutchik’s model (Plutchik, 1980) with eight basic (primary) emotions and Ekman’s model (Ekman, 1993) with six basic emotions.

Various learning approaches are employed for text emotion detection, including lexicon-based (Mohammad & Turney, 2013), rule-based (Krommyda et al., 2020), machine learning-based (Wood & Ruder, 2016; Yousaf et al., 2020), and deep learning-based approaches (Colnerič & Demšar, 2018; Polignano et al., 2019; Kastrati et al., 2022).

Conventional machine learning and deep learning models are widely used to build sentiment analysis and emotion recognition systems (Kastrati et al., 2022; Imran et al., 2020; Edalati et al., 2021). More recently, deep neural networks, including CNN and RNN (such as LSTM, BiLSTM, and GRU), have gained popularity for their state-of-the-art performance in various natural language processing (NLP) tasks. Kastrati and Biba (2021). Supervised learning is the most widely used approach in machine learning, including deep and shallow learning (LeCun et al., 2015). However, training supervised learning models requires a large amount of human-labeled data, which is not always available for real-world applications, and text emotion detection is no exception (Wood & Ruder, 2016). Furthermore, high-quality datasets for text emotion research have been scarce. Most existing datasets with multiclass emotion annotations are either too small or/and highly imbalanced to adequately support supervised emotion learning (Kang et al., 2020).

To address this challenge, we have collected a large-scale emotion dataset of tweets from Twitter. Inspired by the research study conducted in Batra et al. (2021), emotion-indicative emojis are used for the automatic labeling of the dataset. Then, several supervised conventional machine learning and deep learning, including transformer-based models are tested on the newly collected dataset to establish the baseline results and examine an approach to sentiment polarity and emotion detection that better suits the dataset, aiming to improve the performance of the classifier models.

1.1 Study objective and research questions

This study focuses on automatic labeling techniques for very large-scale tweet datasets for sentiment and emotion analysis tasks using distant supervision with emojis. It also investigates the training of deep neural networks on our large-scale dataset for classifying both sentiment polarity and emotions.

Therefore, with this background, we formulated the main research objective to improve the effectiveness of sentiment polarity and emotion classification using a very large-scale dataset automatically labeled through distant supervision with emojis and deep learning models.

According to the objective above, the following research questions were raised:

RQ1: How can we automatically create a large-scale emotion dataset by utilizing emotion-indicative emojis available in tweets for sentiment polarity and emotion classification tasks?
RQ2: How do the size of training data and class imbalance affect the performance of conventional machine learning algorithms and deep neural networks?
RQ3: To what extent do pre-trained word embedding techniques and attention mechanisms improve sentiment and emotion classification performance?

1.2 Contribution

The core contributions of this work are:

Collecting and curating a real-world large-scale dataset of tweets that are automatically labeled with categorical emotions based on Ekman’s model using distant supervision with emotion-indicative emojis.
The new knowledge concerning performance comparison of supervised conventional machine learning algorithms and deep neural networks for sentiment polarity and emotion classification on our created dataset.
Proposed a multi-layer BiLSTM assessment model with pre-trained word embeddings and an attention mechanism for classifying both sentiment polarity and emotions (multiclass classification).
Provide an ablation analysis on the effect of the size of the dataset and the number of classes, as well as on the effect of class imbalance in the classification performance.

2 Related work

During the past decade, several studies have been conducted with regard to the sentiment analysis tasks in Twitter posts. Most of these studies can generally be grouped into two main research directions based on their core contributions: i) data curation/labeling techniques for sentiment analysis tasks, and ii) polarity/emotion classification. The first group entails studies concerning data collection and (semi) automatic labeling techniques. For instance, the research work conducted in Go et al. (2009), introduced for the first time distant supervision labels (emoticons) for classifying the sentiment polarity of tweets. The study presents one of the most widely used Twitter sentiment datasets for sentiment analysis tasks known as Sentiment140. Another similar study that uses a distant supervision strategy for automatic labeling is presented in Davidov et al. (2010). In particular, hashtags and text emoticons for sentiment annotation are applied in both studies to generate labels. A similar study that applies not only emoticons and hashtags but also emojis, as distantly supervised labels to detect Plutchik’s emotions is conducted in Suttles and Ide (2013).

There is another strand of research that focuses on creating datasets for the emotion detection task. For example, the research study in Mohammad and Kiritchenko (2015) presents Twitter Emotion Corpus annotated using distant supervision with emotion-specific hashtags for emotion annotation. An extended dataset called the Tweet Emotion Intensity dataset is presented later in Mohammad and Bravo-Marquez (2017) where the authors created the first dataset of tweets annotated for anger, fear, joy, and sadness intensities using the best-worst scaling technique. The researchers in Kralj Novak et al. (2015) present the first emoji sentiment lexicon, known as the Emoji Sentiment Ranking as well as a sentiment map that consists of the 751 most frequently used emojis. The sentiment of the emojis is computed from the sentiment of the tweets in which they occur.

A similar work was conducted in Batra et al. (2021), where the authors presented a dataset containing around 1.1 Million Urdu tweets distributed over two months. They employed a heuristics labeling approach that allowed multi-label emotion. Furthermore, the dataset is characterized by the presence of a high-class imbalance problem. In contrast to the study in Batra et al. (2021), our research work differs in both data collection and heuristic labeling. We collected tweets posted over the last 10 years with an almost proportional daily-based distribution, which helps to reduce the bias during data collection. Additionally, our collected dataset is balanced, with an equal number of samples among six basic emotion categories, even though some emotions are more representative than others on Twitter. Furthermore, our selection heuristic for determining the true label for tweets having more emojis that refer to different emotions maintained a strict one-emotion-per-tweet.

The second group of research works focuses on polarity and emotion classification using conventional machine learning algorithms and deep neural networks. For instance, such a study is conducted in Polignano et al. (2019), where the authors proposed a classification approach for emotion detection from text using deep neural networks including Bi-LSTM, and CNN, with self-attention and three pre-trained word embeddings for word encoding. Another similar example where LSTM models are used for estimating the sentiment polarity and emotions from Covid-19 related tweets is proposed in Imran et al. (2020) and in Batra et al. (2021). The later study also introduced a new approach employing emoticons as a unique and novel way to validate deep learning models on tweets extracted from Twitter. Another study focusing on emotion recognition using both emoticons and text with LSTM is conducted in Islam et al. (2020).

In Kastrati et al. (2022) authors conducted a set of experiments on their distant-supervised labeled dataset using conventional machine learning and deep learning models for sentiment polarity and multiclass emotion detection tasks. According to the authors, deep neural networks such as BiLSTM and CNN-BiLSTM outperformed other models in both sentiment polarity and multiclass emotion classification tasks.

From the literature reviewed above, we observed that there are numerous articles focused on distant supervision with hashtags, and emoticons and only a few of them use emojis as a noisy label for automatic labeling tweet datasets for sentiment and emotion analysis tasks. However, emojis are used far more extensively than hashtags and they present a more faithful representation of a user’s emotional state. Moreover, most of those studies experimented with small and imbalanced tweet datasets, which are often domain-specific. Furthermore, in most of these studies, the researchers treated the multiclass problem of emotion classification as a binary problem. Our research work is different from the above-mentioned studies in many aspects including distant supervision with emojis, size of the dataset, timeline coverage, and variety of deep learning models. Additionally, we experimented mainly with the emotion-balanced dataset and treated the emotion classification as a multiclass classification task.

3 Design and research methodology

This study uses a quantitative approach composed of five major phases. The first phase entails the collection of emoji tweets on Twitter, belonging to the period from 01 January 2012 until 31 December 2021. To be able to collect enough tweets to meet our needs, we selected 41 emojis indicative of the emotion used in research from Batra et al. (2021) and then we collected tweets that contained at least one of the selected emojis, and only those tweets that were tagged by Twitter as English (retweets excluded). In the second phase of this study, text pre-processing is performed to remove extra attributes related to tweets (author id, date of creation, language, source, etc.), duplicate tweets, extract emojis from tweets, remove hashtags/mentions, URLs, emails, phone number, non-ASCII characters and tweets with length less or equal to five characters. Additionally, all tweets were converted to lowercase. In the third phase, the automatic labeling of collected tweets was carried out using distant supervision with emotion-indicative emojis. Consequently, all emoji tweets are properly classified into one of Ekman’s six basic emotion categories, including anger, disgust, fear, joy, sadness, or surprise. In the fourth phase, a representation model to prepare and transform the tweets to an appropriate numerical format to be fed into the emotion classifiers is performed. More precisely, a bag-of-words approach (TF-IDF) with conventional machine learning algorithms, as well as three different pre-trained word embeddings (GloVe, Glove Twitter, and FastText) with deep learning neural networks, are used. The final phase of the study involves the sentiment analyzer for binary classification and the emotion analyzer for multiclass emotion classification. The analyzer involves several classifiers including conventional machine learning and deep neural networks for sentiment polarity and emotion classification. A high-level architecture of the proposed sentiment and emotion analyzer depicting all the five phases elaborated above is illustrated in Fig. 1.

Table 1 Number of tweets among emotion and sentiment classes (D1)

Leveraging distant supervision and deep learning for twitter sentiment and emotion classification

Abstract

Similar content being viewed by others

Sentiment Polarity and Emotion Detection from Tweets Using Distant Supervision and Deep Learning Models

Language-Agnostic Method for Sentiment Analysis of Twitter

Twitter Sentiment Analysis Using LSTM-Dense-Dropout Hybrid Deep Learning Approach

Explore related subjects

1 Introduction

1.1 Study objective and research questions

1.2 Contribution

2 Related work

3 Design and research methodology

4 Experimental settings

4.1 Dataset

4.1.1 Dataset statistics

4.1.2 Distant supervision of tweets

4.1.3 Dataset tagging

4.2 Architecture and parameter settings

4.2.1 Deep neural networks

4.2.2 Parameter settings

4.3 Pretrained word embeddings

4.4 Attention mechanism

5 Experimental results

5.1 Sentiment polarity classification

5.2 Emotion classification

5.3 Effect of attention mechanism

5.4 Effect of static word embeddings

5.5 Effect of having multiple classes

5.5.1 Six emotion classes

5.5.2 Five emotion classes

5.5.3 Four emotion classes

5.5.4 Three emotion classes

5.5.5 Two emotion classes

5.6 Effect of the size of training data

5.7 Effect of class imbalance

6 Discussion

7 Conclusion

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation