1 Introduction

Disasters from naturally occurring events (e.g., earthquakes, hurricanes, floods) to human-caused incidents (e.g., terrorist attacks) are typically unexpected and overwhelming. These events disrupt the mental health and well-being of affected communities (Kristia et al. 2020; Makwana 2019). Past research shows that developing post-traumatic stress disorder (PTSD) is one of the most serious psychological effect on individuals who survive the disaster (Harada et al. 2015; Neria et al. 2008). With respect to earthquakes, survivors are more likely to develop clinical symptoms (e.g., changes in eating and sleeping) and cognitive impairments (Ben Beaglehole et al. 2019; Harada et al. 2015; Kemp et al. 2011). Emotions and psychological states such as depression, helplessness, hopelessness, sense of overwhelmedness, (Kemp et al. 2011) insecurity, uncertainty, loss of trust in scientific information, and continuous hyper-vigilance are experienced by earthquake survivors as well (Beaglehole et al. 2015; Gluckman 2011). The psychological impact of earthquakes can be disparate and inequitable, with children, elders, female, those with higher exposure (e.g., disaster workers) and trauma (e.g., loss of family, displaced individuals), and certain preconditions (e.g., existing psychiatric disorders, lower socioeconomic status) and behaviours (e.g., avoidance coping style) being more susceptible to negative outcomes(Ben Beaglehole et al. 2019; Carr et al. 1995; Harada et al. 2015; Ticehurst et al. 1996).

During significant disaster events, people often turn to social media platforms to express their emotions and reactions because these provide a readily accessible and immediate outlet for individuals to share their feelings with a wide audience (Bird et al. 2012; Veer et al. 2016). Platforms like Twitter, Facebook, Instagram, and others allow users to post updates, photos, and videos in real-time, enabling them to voice their emotions as events unfold. Emoticons, hashtags, and trending topics become popular tools for conveying emotions succinctly and connecting with others who share similar sentiments. Social media's interactive nature fosters a sense of community, enabling individuals to find solace, support, and validation in knowing that others are experiencing similar emotions during these important moments (Aldrich and Meyer 2015; Taylor et al.2012). The ability to express emotions on social media not only allows individuals to process their feelings but also contributes to the collective narrative of the event, shaping the broader public discourse and memory of the occurrence (Wengenmeir 2016).

Just as social media platforms are used by individual users to express emotion, they also enable researchers to examine those emotions to determine patterns, trends, and shifts in collective emotions over time. AI has significantly contributed to solving human and societal problems across various fields. One notable area is natural language processing which employs computational and linguistic techniques to help computers understand human generated text (Acheampong et al. 2020). By leveraging on machine learning algorithms and natural language techniques, AI can sift through large amounts of data from social media platforms to interpret the emotional states of individuals who posted the content. Emotion identification represents a natural progression of sentiment analysis, offering a more detailed and nuanced model. While sentiment analysis can only capture the positive and negative sentiment within a given text, emotion identification can offer finer granularity by classifying it into distinct emotions such as fear, happiness and anger. However, this field has yet to achieve the same level of success and widespread adoption of sentiment analysis due to the linguistic complexities in expressing emotions (Seyeditabari et al. 2018). Nevertheless, offering more than just negative and positive sentiments can enhance various applications, including utilizing these analyses to aid in generating response strategies during natural disaster events.

Emotion identification can be used to explore the vast pool of posts and extract valuable insights about public sentiment during important events. When applied to disaster response, this analysis can provide a deeper understanding of how people within affected communities are responding emotionally to the immediate aftermath and the recovery periods of a disaster. This knowledge can help inform the development of support systems, interventions, and communication strategies targeted to these emotional needs. By harnessing the power of social media emotions, we can gain valuable knowledge that helps us navigate and respond to important events with greater empathy, accuracy, and effectiveness.

In this study we investigated how different types of emotions are expressed and evolved over time by individuals affected by the Christchurch earthquakes of 2010 and 2011. We chose Twitter (now X) as the social media platform for this study. Twitter is useful for its short character limit (140, revised to 280 in 2017), allowing people to post short and concise messages including emoticons, gifs and other means of expressing emotions. In the time period covered by this project it was both actively used to cope with disaster, and to research disaster response (Bird et al. 2012; Jung 2012; Mandel et al. 2012). By 2013, approximately 500 million tweets were sent every day.Footnote 1 Although Twitter has one of the lowest user rates of social media platforms globally, it has been actively used by researchers, of emotion and other aspects of social behaviour.Footnote 2

We conducted a preliminary analysis of tweets from 2010 to 2019 collected using the #eqnz hashtag and other earthquake related keywords and classified the collected tweets into six classes of emotion (anger, fear, grateful, humour, sympathy and worry) using natural language processing and machine learning. We then performed an analysis of the classified tweets to examine the emotional patterns expressed in these tweets over the ten-year period after the Christchurch earthquakes. Our analysis indicates a rise in the proportion of fear-laden tweets over the span of 2010 to 2019, and those expressing fear and worry. Our analysis suggests that these seismic events have had lasting effect on the community as evidenced by an increased sense of fear and worry whenever earthquakes occur.

2 Capturing emotions using online social medium

Emotions have long been a subject of extensive study within the fields of psychology and behavioural sciences due to their significant role in human nature (Izard 1977). According to Plutchick (1980) there are 8 basic emotions in 4 opposing pairs which are (i) joy-sadness (ii) anger-fear (iii) trust-disgust (iv) anticipation-surprise. Ekman (1992) defines basic emotions as anger, disgust, fear, joy, sadness and surprise while (Izard 1977) defines basic emotions as anger, contempt, disgust, distress, fear, guilty, interest, joy, shame, and surprise. Psychologists use various techniques, such as analysing facial expressions, heart rate, and pupil dilation, to identify and understand emotions. With the rise of social media, many individuals now express their emotions through written text.

This connection between emotions and text is crucial when attempting to map textual data onto an emotion space. Identifying emotion in the text is essentially a content-based classification problem involving various concepts of Natural Language Processing (NLP) in combination with Machine Learning (ML). In text mining, emotion detection is closely related to sentiment analysis. Sentiment analysis refers to the process of classifying given text into positive, neutral and negative while emotion analysis refers to the more differentiated effects of happy, sad, anger and disgust (Munezero et al. 2014). Sentiment analysis is the broad domain for defining polarities, but each polarity can be further classified into emotions which could be useful in defining the exact state of feeling rather than just classifying negative or positive.

Emotion detection has been successfully applied to a diverse range of texts, including detecting emotion in suicide notes (Desmet and Hoste 2014; Pestian et al. 2008) and in computer–human spoken tutoring dialogues (Litman and Forbes-Riley 2004). Emotion detection in text is not confined to English text only but has also been used in other languages as well such as Hinglish (Sasidhar et al. 2020), Chinese (Xu et al. 2015), Korean (Do and Choi 2015) and Arabic (Abdullah et al. 2020). Therefore, accurate emotion detection has numerous application and benefits, including aiding psychologists in better assisting their patients (Desmet and Hoste 2014; Pestian et al. 2008), gauging public sentiment during disasters (Bird et al. 2012; Jung 2012), and understanding consumer behaviour to enhance brand reputation and sales (Onan 2021).

With regards to tweets on Twitter, emotion detection algorithms can successfully predict emotions such as depression (De Choudhury et al. 2021), fear and hope (Wang and Wei 2020) and sarcasm (Davidov et al. 2010; González-Ibáñez et al. 2011).When emotional detection algorithms are applied to tweets at a community-level, trends in public emotion can be discovered (Hasan et al. 2017). For instance, gross community happiness can be tracked (Quercia et al. 2012), political sentiment can be monitored, and election results can be predicted (Bermingham and Smeaton 2011).

3 Techniques for developing emotion detection algorithms

Various machine learning techniques have been employed to identify emotions from text. A sequence-based convolutional neural network (CNN) with attention mechanism was used to detect 6 types of emotions (happiness, sadness, surprise, disgust, fear, anger and neutral) from text collected from a TV show’s transcript (Shrivastava et al. 2019). CNN is a category of machine learning that uses deep learning algorithm for image and text classifications with high accuracy. The corpus was manually annotated by English expert annotators with moderate agreement. The proposed model achieved 80.99% accuracy and performed better than the baseline Long Short-Term Memory (LSTM) and Random Forest classifier.

Relating emotions in social media messages to physical space enables mapping emotional responses to place and situations. Guthier et al. (2014) developed a system to detect emotions from geo-tagged tweets using a neural network model and then visualising them on a map, generating global emotion maps and one focused on the city of Chicago. In similar work, Hasan et al. (2019) developed and evaluated a supervised learning system to automatically classify emotion in the text stream of Twitter messages. To classify emotion from text, they developed a system they named Emotex, with EmotexStream to classify live streams of text-based tweets. Emotex was used to classify tweets expressing emotion. The tweets were first filtered to remove tweets with no emotion and then Emotex was used to classify the tweets into a fine-grained range of happy-active, happy-inactive, unhappy-active and unhappy-inactive with a reported F1-Measure of 90.0. Using the real-time capacity of EmotexStream, they were able to classify live streams of tweets to measure and analyse public emotion related to the death of Eric Garner, an African American man in New York who was killed by an officer from the New York City Police Department.

4 The use of social media platforms during disasters

The advent of social media and social networking has transformed the way people interact online, allowing them to share information, stay updated, connect with others, seek entertainment and express emotions. Researchers have recognized the potential of using social media data for analysis, particularly in disasters (Bird et al. 2012; Jung 2012; Mandel et al. 2012), because a significant volume of messages is generated on social media, making it crucial to have an automated classifier that can identify vital information during crises (e.g., situation awareness; (Verma et al. 2021)) which can benefit the general public. Furthermore, individuals in affected communities use social media as a medium to access important information quickly. Bird et al. (2012) explored the usage of Facebook during floods in Queensland and Victoria. Their findings revealed that most respondents relied on community-initiated Facebook groups to access information about their local communities and to communicate with family and friends during the floods. Similarly, Jung (2012), conducted a survey on social media usage during the 2011 Great East Japan Earthquake, finding that various platforms, such as Facebook, Twitter, and Mixi, were employed to acquire information and check on the safety of others.

In addition to information sharing, social media users also use platforms to share emotions in response to being affected by disasters. This trend of sharing emotions began in the early 2000s, as evidenced by the response of the local citizens in China using the Tianya online forum to express their emotions after the 2008 Sichuan earthquake (Wang et al. 2009). In this study, they classified the discussion threads into four major roles; information-related, opinion-related, action-related and emotion-related. Fourteen percent of the sampled threads were emotion-related and among the emotions expressed were sorrow, anger, empathy and pride.

5 Emotion detection algorithms from disaster-related contexts

There are a number of previous studies that used Twitter data to detect emotions during a disaster or crisis. Mandel et al. (2012) examined tweets during the natural disaster Hurricane Irene between August 18th and August 31st 2011. A total of 66,000 tweets were collected, analysed, and classified into “concerned” and “unconcerned” messages using three classifiers: logistic regression, Naïve-Bayes and Decision Tree. The logistic regression achieved an accuracy of 84.27% and was used to determine that region and gender were demographic determinants of the “concerned” messages. Choudrie (2021) collected and analysed over 2 million tweets related to COVID-19 between February and June 2020. The tweets were classified into 8 classes of emotion (hate, relief, enthusiasm, depressed, sadness, worry, surprise and anger). They used advanced deep learning technique of Transfer Learning and Robustly Optimized BERT Pretraining Approach (RoBERTa) to classify the tweets with an accuracy of 80.33%. In another COVID 19 related study, Kabir and Madria (2021) not only developed a neural network model to detect emotions from tweets at fine grained labels, but they also created a custom Q andA RoBERTa model to extract phrases from the tweets that are primarily responsible for the corresponding emotions. Based on this classifier, they performed a historical emotion analysis to show that there was an increase in negative emotions during the pandemic in certain states. These studies demonstrate that social media analysis is a useful and viable tool to complement the traditional survey methods to understand public perception during a disaster or crisis.

Comments to short videos about the “Zhengzou flood” which occurred in the summer of 2021 in Henan, China were categorised into clusters and for each cluster a sentiment value was derived (Xiaohong Wang et al. 2024). They employed a combination of machine learning and complex network analysis to process the comments. They found that these short videos were useful and provided emotional supports for those who were traumatised by the flood. In another study, Karimiziarani and Moradkhani (2023), investigated public tweets related to Hurricane Ian which hit Florida and South Carolina states in late September 2022. NLP was used to classify the sentiment of over 20 million tweets as well as to group them into various humanitarian topics. This study shows that social media can be used to assist emergency responders and disaster manager in reducing the adverse effects of such disasters. Twitter data was also used to understand how social media analytics can be used to aid government authorities in Australian States and Territories in assessing the impacts of natural disasters (Yigitcanlar et al. 2022). The study provides authorities with a novel method to analyse the geographical distribution and frequency of different disasters, as well as their associated damages, using geo-tweet analysis. Contreras et al. (2022), employed sentiment analysis to evaluate post-disaster recovery in the aftermath of the L’Aquila’s earthquake (which occurred in central Italy on April 6, 2009) using Twitter data. A total of 4349 tweets between 4 and 10 April 2019 were analysed. In this study, they found that there were higher percentage of negative tweets compared to positive and neutral tweets. Additionally, they discovered that even after a decade, reconstruction is still ongoing and there continues to be public criticism of the recovery process.

Aside from aiding community response and recovery, social media data can be used to predict disaster-related outcomes. For instance, Kanhabua and Nejdl (2013) investigated whether the temporal diversity of tweets could serve as indicators of real-world infections disease outbreaks. Their findings indicated that results varied significantly across different outbreaks, reflecting the distinct characteristics of each outbreak event (i.e., severity, duration). In another study, tweets collected from August 2009 to January 2010 were used to measure the spatio-temporal sentiment towards a new vaccine for influenza A(H1N1) vaccine (Salathé and Khandelwal 2011). Information flowed more often between users with the same sentiment and most communities are dominated by either positive or negative sentiments about vaccines, the latter which had greater likelihood of disease outbreaks. Therefore, analysing social media data can be an efficient way to identify target area for public health intervention efforts and to evaluate their effectiveness.

In this study, we are keen on examining the emotional patterns exhibited in “earthquake tweets” posted by individuals affected by the Christchurch earthquakes between 2010 and 2019. We are also interested in capitalising on machine learning techniques to classify these tweets into six classes of emotions of anger, fear, grateful, humour, sympathy and worry. The classification of these tweets enable us to examine the emotional patterns expressed in these tweets over the ten-year period after the Christchurch earthquakes.

6 Methods

In this section, we describe in some details the proposed emotion identification system to categorise earthquake tweets into six classes of emotions of anger, fear, grateful, humour, sympathy and worry.

6.1 Context

New Zealand lies at the intersection of two major tectonic plates, the Pacific Plate and the Australian Plate,Footnote 3 making it prone to frequent earthquakes. While the majority of these tremors are too small to be felt, some can be immensely powerful and can cause significant destruction. In 2010 and 2011 a sequence of earthquakes struck the Canterbury Region, beginning with a powerful 7.1 magnitude quake in the town of Darfield, which is 35 kms west of Christchurch, the largest and most populated city in the Canterbury region. Despite widespread damage in Christchurch, its early morning timing meant that no lives were lost, and only a small number of people sustained injuries. Aftershocks rocked the area for several years, but the most significant happened at lunchtime on February 22, 2011 in the volcanic hills to the south of Christchurch city. Although it was only a 6.1 magnitude, its location and peculiarly strong ground motion caused extensive damage to buildings and infrastructure, in Christchurch and the loss of 185 lives. The impact of this earthquake has endured over time, leaving lasting emotional and psychological changes on the communities that were affected (Potter et al. 2015).

Following the Canterbury earthquakes, individuals experienced a range of challenging psychological effects, including cellular fatigue, anxiety, depression, and trauma (B. Beaglehole et al. 2015). Many people exhibited signs of exhaustion, felt disconnected from others, and suffered from various psychological disorders in the aftermath. Regrettably, some of those affected did not seek or receive professional help, and this was often influenced by the level of exposure they had to the disaster. Of those who seek help, they were supported through free counselling, increased primary health care services and also other social support services (B. Beaglehole et al. 2015).

There were other significant earthquakes that followed the 2011 Canterbury earthquakesFootnote 4 up to 2019. These include the Lake Grassmere earthquake on the 16 August 2013, Gisborne on the 17 November 2014, Arthur’s Pass on the 6 January 2015, Christchurch on the 14 Feb 2016 and Kaikoura on the 14 November 2016.

6.2 The classification process

A brief overview of the classification process is shown in Fig. 1. Each process is described in more details in the following subsections.

Fig. 1
figure 1

The tweet classification process

6.3 Twitter data collection

We collected a total of 312,297 earthquake tweets sent between Sept 2010 and Dec 2019 using the hashtag #eqnz and a list of keywords (“Christchurch earthquake”, “chch eqnz”, “nzquake”, “NZ earthquake”, “quake”). The hashtag #eqnz was initially introduced during the 2010 Darfield earthquake. We collected tweets from the 4th of September 2010 to 31 December 2019. The majority of the collected tweets were in English, accounting for approximately 96% of the total tweets. Due to the high proportion of English tweets, we made the decision to exclude non-English tweets from our analysis. Working exclusively with an English corpus also eliminates the necessity of translating non-English tweets and converting them into romanised text. We have also excluded retweets to reduce bias as retweets tends to amplify popular content which might skew the overall emotional distribution of the dataset. In addition, we wanted to focus on the content generated by the original authors. After the removal process, we obtained a total of 273,698 tweets. Figure 2 shows the distribution of tweets according to year. As can be seen from Fig. 2, the largest number of tweets were observed in 2011, as there were two major earthquakes that happened in 2011 (Feb and June), followed by 2016 (Kaikoura earthquake). There were also a considerable number of tweets for 2010, 2012 and 2013. This seems to suggest that the number of tweets increases every time there is a major earthquake.

Fig. 2
figure 2

Total number of raw tweets by year

The number of unique users for each year is shown in Fig. 3. The highest number of users are observed in 2011 with 17,018 unique users and the lowest is in 2019 with a total of 849 unique users. On average, each user tweeted 10 times in a year during earthquake occurrences. This large number of earthquake tweets and unique users provide us with some level of confidence with the dataset that we are analysing.

Fig. 3
figure 3

Total number of unique users by year

6.4 Data preprocessing

Prior to categorising the tweets into six emotional classes, we conducted data preprocessing to eliminate URLs, hashtags, and mentions; expand contractions; convert words to lowercase; strip punctuation; replace slang; and address elongated words. Table 1 provides illustrations of sample tweets both before and after the application of these data preprocessing measures. In the final step, we used Python—NLTK tool to spell check and correct misspelled words. In this study, we decided to exclude the punctuations, emoticons and emojis analysis as we wanted to focus on the content of the text and to allow the classifier to concentrate on crucial features of the tweet.

Table 1 Pre-processing tweets

6.5 Binary classification

The emotion classification process was done in two stages. We observed that the tweets are diverse in nature; tweets from news providers, government authorities, telecommunication companies and community organisation as well as individual tweets. The first step in the classification process was to remove tweets that carry no emotions such as those tweets posted to provide information and updates. We utilised Convolutional Neural Network (CNN) with FastText word embedding to perform binary classification to the tweets. The CNN network consists of one convolutional layer with a global max pooling layer, which are then fed to two dense layers. We used the binary cross entropy as the loss function and Adam optimiser to minimise the loss. The model was trained using a balanced dataset of 17,000 tweets that contain emotion and no emotion. An example of a tweet that carries no emotion is “QUAKE: Mag 3.3, Friday, December 312,010 at 10:43 am (NZDT), Within 5 km of Christchurch. In contrast, a tweet infused with emotion is: “Well, that almost gave me a darn heart attack”. The model achieved an accuracy of 92.21%. Only tweets with emotion were used as input to the second stage of classification.

After the first stage, we obtained a total of 114,788 tweets with emotion. Figure 4 shows the summary of the tweet’s dataset.

Fig. 4
figure 4

Summary of tweets dataset

6.6 Emotion classification

In the second stage, we used tweets labelled as tweets with emotion to classify them into six classes of emotion. In this classification process, we considered four traditional machine learning techniques (Support Vector Machine (SVM), Naive-Bayes, Random Forest and Logistic Regression) and three deep learning models; Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) and biLSTM.

CNNs are specialised type of neural networks created to analyse grid-like data, including images or sequences (Chua 1998). They utilize convolutional layers to extract local patterns and hierarchical representations, making them particularly suited for tasks involving such data structures. When applied to emotion identification in text, CNNs can capture essential features and patterns within textual information. LSTM networks represent a specific class of Recurrent Neural Networks (RNNs) renowned for their ability to capture extensive dependencies within sequential data (Hochreiter and Schmidhuber 1997). Their strength lies in effectively modeling the context and temporal connections between words, making them highly suitable for tasks revolving around text analysis. The primary distinction between LSTM and BiLSTM lies in their approach to sequential data processing. In BiLSTM, the input sequence undergoes simultaneous processing in both forward and backward directions, employing distinct LSTM layers for each direction (Siami-Namini et al. 2019). This enables the model to gather information not just from the past (preceding words) but also from the future (succeeding words) within the input sequence. By incorporating both past and future contexts, BiLSTM possesses the potential to acquire a more comprehensive comprehension of the input sequence.

Word embedding plays a vital role in deep learning-based text classification, offering several benefits such as capturing semantic relationships, reducing dimensionality, and enabling transfer learning (Onan 2021). By representing words as dense vectors in a continuous vector space, word embedding techniques equip the model with the ability to comprehend word meanings and contextual nuances, thus improving its generalization capabilities even with limited labelled data. The contextual information embedded within word vectors aids the model in understanding text variations and subtleties, leading to enhanced classification performance. Furthermore, pre-trained word embeddings provide valuable transfer learning by leveraging knowledge acquired from extensive corpora, thereby improving model initialization and handling of out-of-vocabulary (OOV) words.

Word2Vec and FastText are two widely adopted word embedding techniques. Word2Vec employs neural networks, such as CBOW (Continuous Bag of Words) or Skip-gram models, to acquire word embeddings and capture meaningful associations among words (Rong 2014). Operating at the word level, Word2Vec offers faster training speeds; however, it has limitations in effectively dealing with out-of-vocabulary words. On the contrary, FastText expands upon the capabilities of Word2Vec by incorporating subword information through character n-grams (Bojanowski et al. 2016). This extension empowers FastText to effectively handle unseen words, capture morphological details, and efficiently handle larger vocabularies. While training FastText models may take longer, they provide robust representations for both known and unknown words. In this study, we utilized FastText embeddings due to their efficacy in handling out-of-vocabulary (OOV) words commonly found in tweets, where slang, abbreviations, and misspellings are prevalent. FastText's ability to leverage subword information enables it to accurately capture the meaning and context of these OOV words.

During this stage, the objective is to compare these seven models in terms of their classification accuracy, precision, recall and F1-score. The model that performed the best will be used to classify the selected tweets derived from the binary classification process. In the next subsections, we describe the architecture of the three deep learning models for classifying tweets into six classes of emotions.

6.6.1 CNN

To implement the CNN model, we utilized the Keras library, which consisted of five layers. The first layer employed FastText in an embedding layer to map the input text data into a continuous vector space. This enabled the model to capture semantic information effectively.

The second layer consisted of a 1D Convolutional Layer (Conv1D) with 256 filters and a kernel size of 3. By applying convolution operations, this layer focused on extracting local patterns from the input data. The Rectified Linear Unit (ReLU) activation function was utilized within this layer.

The third layer was a Global Max Pooling layer (GlobalMaxPooling1D), responsible for selecting the maximum value from each feature map generated by the previous layer. This reduced the dimensionality of the data while retaining the most crucial features.

The fourth layer was a fully connected layer (Dense) with 256 units and the ReLU activation function. This layer performed non-linear transformations on the data, enabling the model to learn complex patterns.

The fifth layer acted as the output layer, also a Dense layer, with 6 units representing the number of classes in the classification task. The softmax activation function was applied to calculate the probabilities for each class, ensuring they summed up to 1.

During the training process, the model employed categorical-crossentropy as the loss function, Adam as the optimizer, and accuracy as the metric for evaluating its performance.

6.6.2 LSTM and BiLSTM

The LSTM model consists of an embedding layer, an LSTM layer, and an output layer, leveraging FastText word embeddings similar to the CNN model. After the embedding layer, there is an LSTM layer with an output size of 128, a dropout rate of 0.2, and a recurrent dropout rate of 0.2 to prevent overfitting.

The output layer is a dense layer with 6 units, representing the number of classes in the classification task. The softmax activation function is applied to calculate the probabilities for each class, ensuring their summation equals 1.

BiLSTM architecture shares a similar structure with the LSTM model but incorporates bidirectional processing. This means that the input sequence is processed both in the forward and backward directions, allowing the model to capture information from both ends of the sequence.

Both the LSTM and BiLSTM models were trained using categorical-crossentropy loss, optimized with Adam, and evaluated based on accuracy.

6.7 Labelling the tweets

Based on the analysis of the collected tweets, these tweets can be categorised into six classes of emotions: anger, fear, grateful, humour, sympathy and worry. This classification is a slight variation from the six basic emotions defined by Ekman of happiness, sadness, disgust, fear, surprise, and anger (Ekman 1992). Based on the tweets that were collected, we note the absence of “happiness” and “surprise” tweets. Instead, “happiness” tweets were better described as “grateful” and “humorous” tweets. We considered “disgust” tweets as “anger” tweets while “sadness” tweets were described more succinctly as “sympathy” and “worry”. As the aftermath of these major earthquakes were devastating, we felt that these six classifications were more appropriate. Table 2 shows example tweets for the six classes of emotions. Within the humour tweets, there were other observed emotions found such as sarcasm (“Thanks very much for the belated Xmas gift Mother nature”) and coping (“Go back to bed or crack into the emergency baked beans hmmm”). To train our classifier, we manually annotated 1000 tweets for each category. The annotations were performed by two academic staff (second and third authors) with a Cohen Kappa of 0.715 which shows substantial agreement between the two raters.

Table 2 Tweets classified into six classes of emotions

7 Results

In this section we discuss the result obtained by all the seven classifiers. The performance of each model is measured based on four metrics of precision, recall, F1-score and accuracy. Precision is a metric that measures the proportion of correctly predicted positive instances (true positives) out of all instances predicted as positive. Recall, measures the proportion of correctly predicted positive instances (true positives) out of all actual positive instances. The F1-score is a metric that balances both precision and recall. It is the harmonic mean of precision and recall, providing a single value that considers both false positives and false negatives. Finally, accuracy is a straightforward metric that measures the ratio of correctly predicted instances (both true positives and true negatives) to the total number of instances (Alpaydin 2010). As our training data is a balanced dataset, accuracy can be used as the main performance indicator.

7.1 Emotion classification

We performed the tweets classification using the four traditional machine learning classifiers and three deep learning models and compared their performance in terms of precision, recall, F1-score and accuracy as shown in Table 3. It can be seen that CNN with FastText embedding achieved the highest precision of 0.86, recall of 0.86, F1-Score of 0.86 and accuracy of 86.00%. This is followed by BiLSTM + FastText with an accuracy of 84.00% and SVM with an accuracy of 83.33%. The worst performer is Naïve-Bayes with an accuracy of 73.83%.

Table 3 Performance of Various Machine Learning Techniques

Table 4 shows the performance metric for CNN + FastText which recorded the highest accuracy in the classification process. It achieved the highest F1-score when classifying sympathy, followed by grateful, worry and anger. It can also be seen that this classifier recorded the lowest value for precision and F1-score for humour. This is because, humour tweets contain a variety of expression that might include coping (“now it’s wine o’clock”), sarcasm (“Three aftershocks already today. I hope I don't spill my drink later”), and funny remarks (“I'm blaming #eqnz for everything lol”). On the other hand, sympathy-oriented tweets display a more direct pattern, often utilizing phrases like "thinking of," "thoughts," "sorry," and "poor". As a result, it achieves the highest F1-score of 0.90.

Table 4 Performance of CNN + fasttext for each class of emotion

The normalised confusion matrix for the classification of the six classes of emotion is shown in Fig. 5. As can be seen, CNN + FastText model was able to predict sympathy and fear 89% of the time, grateful 87% of the time, worry 86% of the time, humour 83% of the time and anger 82% of the time. Anger is misclassified as humour 7% of the time and humour is misclassified as anger 5% of the time, which seemed to suggest that anger and humours tweets are closely related.

Fig. 5
figure 5

Normalised confusion matric for the six classes of emotions

7.2 Tweets analysis and discussion

As CNN + FastText classifier achieved the highest performance in all four metrics, we used this model to classify all tweets from 2010 to 2019. Once all tweets are classified into their classes of emotions, we performed an analysis on these tweets to identify patterns of emotions over the years.

7.2.1 Word cloud

Figure 6 shows the word clouds for the 2011, 2016 and 2019 tweets. We chose 2011 as there were 2 major earthquakes that occurred in that year and the Kaikoura earthquake occurred in 2016. We compared these with word cloud in 2019 when there were no major earthquakes.

Fig. 6
figure 6

Word cloud for 2011, 2016 and 2019 tweets

During the 2011 earthquakes, “Christchurch” is the most used word as well as “people”, “thought”, “chch”, “okay”, “thank” and “now”. This is as expected as the earthquake occurred in Christchurch and many were tweeting to report that they were safe (okay). This finding is similar to Jung (2012) and Bird, Ling and Haynes (2012) where those who were affected used social media as a platform to inform and update friends and families that they are safe. In the 2016, the most popular words were “earthquake”, “okay”, “stay”, “safe”, “aftershock” and “new zealand”. The word “kaikoura” also appears in the word cloud as the epicentre of the earthquake in 2016 was in Kaikoura. For 2019, “earthquake” is still the most used word, but there are new words appearing “shake”, “felt”, “feel”, “first” and “still”. Despite the absence of significant earthquakes in 2019, the terms "earthquake," "quake," and "okay" remain prominent. This indicates that individuals continue to tweet, even in response to minor seismic events.

7.2.2 Tweets with Emotion (2010–2019)

Figure 7 illustrates the distribution of tweets across six different emotion categories spanning the period from 2010 to 2019. This figure shows a pattern of heightened Twitter activity coinciding with occurrences of earthquakes. The highest number of tweets is observed in 2011 specifically around the times of the February and June earthquakes. A large number of tweets is also observed in 2010 (Darfield, Canterbury earthquake), 2012 and 2013 (Lake Grassmere earthquake) and 2016 (Kaikoura earthquake). There was no major earthquake recorded in 2012. However, the elevated tweet numbers during that year were primarily driven by continued discussion and reactions stemming from the 2011 earthquake. This seems to suggest that a lot of people were impacted by the 2010 and 2011 earthquakes, and they continued tweeting about it every time an aftershock or minor earthquakes occur. These findings indicate that individuals directly affected by these seismic events turned to Twitter to articulate their emotions, potentially utilising it as a coping mechanism to alleviate feelings of distress, apprehension, and unease. These findings corroborate studies conducted by Bird et al. (2012), Jung (2012), Mandel et al. (2012) and Wang and Wei (2020). Similar emotional responses have been documented in prior studies on post earthquake emotions, including Kemp et al. (2011), Beaglehole et al. (2015) and Gluckman (2011).

Fig. 7
figure 7

Distribution of tweets with emotion

Figure 8 illustrates the fluctuations in various emotional categories expressed by Twitter users in direct response to the earthquakes. The percentages observed for each emotion is calculated by taking the number of tweets for each class of emotion and dividing it with the total number of tweets for the particular year. Additionally, the years marked by significant earthquakes are distinctly indicated and emotions exhibiting comparable trends have been deliberately grouped together. Figure 8a shows that fear and humour tweets were more prevalent and the percentages for both fear and humour increase steadily over time. In Fig. 8b, a rise in both grateful and sympathy tweets is observed between 2010 and 2011, followed by a gradual decline over the decade. However, during the 10-year span, there is an upsurge in the percentage of sympathy tweets specifically during the 2016 Kaikoura earthquake. The percentage of anger and worry tweets are shown in Fig. 8c where both tweets remain consistent over the years since 2010 but gradually diminishing after the 2016 earthquake. This analysis suggests that all six emotions persist over an extended period suggesting that these earthquake occurrences have lasting impacts on the affected community.

Fig. 8
figure 8

Emotion over the 10 years period

It can be observed that, when the initial significant earthquake struck in 2010, approximately 25% of the tweets were humorous in nature. This observation implies that due to this being the first major earthquake occurrence in Canterbury, people may not have been taking the situation seriously. Furthermore, the aftermath of the 2010 earthquake resulted in comparatively milder consequences, with no reported loss of life. Some of the humorous tweets during this period included phrases like “bf made milo on the hob, still no power, a bit over the aftershocks”, “I think I need coffee”, “laughing out loud, written in their sleep” and “piggedy wiggedy jiggedy, cannot breathe, I am laughing so much”. In addition to humour, fear, worry, and gratitude were also notable themes within these tweets, each constituting more than 15% of the total. Conversely, the least represented emotions were anger (11%) and sympathy (9%).

Analysing the tweets from 2011, 22% of the tweets expressed sympathy. This is a comprehensible response, given that the earthquake on February 22, 2011, was particularly severe, with its epicentre situated close to the city centre. This event resulted in extensive destruction across Christchurch, claiming the lives of 185 individuals. Notably, a substantial number of tweets in 2011 also carried tones of humour and gratitude. These expressions of gratitude revolved around the relief of surviving the earthquakes, while the humour-infused tweets served as coping mechanisms. Several instances of grateful tweets included statements such as “My family, my girlfriend and I are all fine. No injuries as far as I have seen”, “bloody glad I finished work when I did”, “Good to see your lil face pop up on the timeline, means you are okay” and “good news is, family and friends are okay”. Worry tweets contributed 16% of the tweets, followed by fear (12%) and the anger tweets are the least with 11%. Interestingly, fear tweets only contributed 12% of the total tweets which seemed to suggest that these twitter users were more overwhelmed with sympathy, coping and being grateful during this challenging period.

Between 2011 and 2015, there was a notable escalation in fear-related tweets, surging from 12 to 29%. This does seem to suggest that the 2011 earthquake may have instilled greater fears and left an imprint in the Twitter user community. A similar trend is evident in worry-themed tweets, which experienced an uptick from 16 to 19%. With each successive earthquake, the emotions of fear and concern invariably resurfaced. Conversely, the percentage of humour-infused tweets exhibited a marginal decline, shifting from 18 to 17%. Meanwhile, anger tweets saw a modest increase from 11 to 13%. There was a reduction in the expression of gratitude, as reflected by a decrease in grateful tweets from 18 to 8%. Similarly, there was a decline in sympathy tweets from 22 to 8%.

During the 2016 Kaikoura earthquake, a distinct pattern emerges in the emotional content of tweets. Worry and fear, constituting 22% and 20% of the tweets respectively, stand out as the predominant emotions. This is followed by sympathy and humour each comprising of 17% and 14% of the tweets. Anger and grateful contribute 13% and 12% of the tweets expressed.

Furthermore, a noticeable trend is the increase in the percentage of fear-related tweets, which grew from 12% in 2011 to 20% in 2016. A parallel trend can be observed for anger, rising from 11% in 2011 to 15% in 2016, as well as for worry, which saw a rise from 18% in 2011 to 22% in 2016.

Despite the absence of major earthquakes between 2017 and 2019, there was a notable rise in the proportion of fear-related and humour-infused tweets. This trend suggests that the Twitter community was becoming accustomed to the concept of frequent earthquakes, which, although not severe enough to significantly impact the community, continued to evoke these emotions. Conversely, there was a decrease in the percentages of sympathy, worry, anger, and grateful tweets.

Figure 9 shows the monthly distribution of emotions throughout 2011. In February, the primary emotions expressed in the tweets were sympathy, grateful and worry. The dynamics shifted slightly in June (when another significant earthquake hit Christchurch), with sympathy and worry still prevailing, but a majority of tweets adopting a humorous tone. The presence of humour in these tweets potentially signifies that those individuals affected by the three major earthquakes within a nine-month span (September 2010, February 2011, and June 2011) utilised social media as a coping mechanism.

Fig. 9
figure 9

Emotion by month in 2011

For a deeper analysis, Fig. 10 shows the daily tweet activity from 20th Feb 2011 to 27th Feb 2011. No tweets were posted on the 20 and 21 Feb 2011 indicating the absence of apparent earthquake-related events. However, on February 22nd, 2011, coinciding with the earthquake incident, over 3000 sympathetic tweets were recorded. These were closely followed by approximately 1500 tweets expressing worry, and around 1000 tweets conveying gratitude. The same pattern was exhibited after the earthquake. Two days after the earthquake there were more humour tweets than worry tweets and this continues for the next four days.

Fig. 10
figure 10

Emotion by days between 20 and 27 Feb 2011

During the 2016, Kaikoura earthquake, 42% of the tweets conveyed feelings of fear and worry in stark contrast to the patterns observed in the 2011 tweets. In the remaining tweets, 17% exhibited sympathy, 14% carried humour, while the rest were divided between anger and grateful tweets. Figure 11 provides an overview of tweet distribution per month in the last quarter of 2016 clearly highlighting that most of the tweets conveyed sentiments of worry, sympathy, and fear. Fear tweets are still prominent even one month after the earthquake. A similar trend is observed on the day of the earthquake itself, where the primary emotions identified were worry, sympathy, and fear (see Fig. 12). As can be seen, there are no tweets observed in the days leading up to the occurrence of the earthquake.

Fig. 11
figure 11

Emotion by month in 2016

Fig. 12
figure 12

Emotion by days between 12 and 16 Nov 2016

To summarise, an incremental rise in the proportion of fear and humour is observed from the period spanning 2010–2019. This trend aligns with previous research by Kemp et al. (2011), Beaglehole et al. (2015), Gluckman (2011) Wang and Liu (2012) which indicates that individuals affected by such events grapple with feelings of insecurity, ongoing uncertainty, and heightened vigilance. Humorous tweets may serve as a coping mechanism for those impacted, potentially alleviating the weight of perceived stress and fostering positive emotional states during challenging circumstances (Simione and Gnagnarella 2023).

Instances of grateful and sympathetic tweets increase notably with the occurrence of major earthquakes. Grateful tweets reflect a sense of thankfulness that no significant issues arise during the earthquake. Sympathetic tweets underscore the community's compassion, demonstrating their concern for one another’s well-being. Notably, there is a gradual increase in the percentage of tweets expressing anger and worry from 2010 to 2016. This suggests that the lingering effects of earthquakes not only give rise to feelings of uncertainty but also evoke frustration and irritation. The unpredictable nature of such natural disasters profoundly impacts individuals' well-being.

8 Conclusion and future work

This study explores the emotional trends within six distinct emotional categories (anger, fear, humour, grateful, sympathy, and worry) evident in “earthquake tweets” shared by individuals impacted by earthquakes occurring between 2010 and 2019. These collected tweets were then categorised into the aforementioned emotional classes, and subsequent analysis was conducted to uncover emotional patterns prevalent among those directly affected.

To accomplish the classification of earthquake-related tweets, we employed a CNN + FastText classifier. This classifier recorded the highest level of classification accuracy at 86%, coupled with an F1-Score of 0.86. Comparative evaluations against LSTM, biLSTM, and other traditional machine learning techniques substantiated its superior performance in accurately categorising these tweets.

The analysis revealed a gradual increase in the proportion of fear-laden tweets over the span of 2010–2019. Furthermore, an increase in the percentage of fear and worry tweets was observed after the 2011 earthquakes. This observation suggests that these seismic events left a lasting impact on the community to the extent that whenever an earthquake occurs, there is a heightened sense of fear and worry in the community. There was also a noticeable uptick in the percentage of humorous tweets. This could potentially indicate that the community is gaining a better understanding of the situation and actively seeking positive ways to cope with the circumstances at hand.

The study underscores the difficulty of identifying emotions. Emotions are often ambiguous and subjective and vary greatly based on the individual perspectives and cultural backgrounds and context. This complexity challenges our classifier’s ability to accurately interpret and classify emotions. Moreover, emotions can be expressed in a multitude of ways including subtle nuances, sarcasm, irony and metaphors. These intricacies present challenges for automated systems in accurately understanding emotions in text. In our experiment, humorous tweets often included elements of sarcasm and coping. Emotions are heavily influenced by the context in which they occur and so understanding a piece of text is crucial in accurately interpreting the emotions expressed within it. Additionally, a text can express multiple emotions and identifying the dominant emotion may require evaluating the intensity of each individual emotion.

This study highlights the potential of utilising deep learning to analyse social media activity during natural disasters, enabling the detection of public emotions. This, in turn, facilitates authorities in delivering strategic and targeted support to affected communities based on analysis of emotion to aid in recovery from natural disasters. For instance, public health promotion around coping and processing emotions can be provided to communities with high prevalence of anger and fear sentiments. Communities with high prevalence of worry may require more resources directed to hazard mitigation and emergency preparedness. Social media analyses of humour and sympathy sentiments may also be used to monitor the resilience of a community during and post natural disasters. An additional avenue of exploration lies in analysing the geographical origins of these tweets, which could aid in understanding the source of various sentiments. Such an analytical approach would be valuable for social workers as it could assist in identifying regions requiring supplementary support, particularly for individuals grappling with heightened levels of fear and worry. Unfortunately, the geolocation data of the tweets is currently unavailable due to the deactivation of Twitter's location feature. However, it might be feasible to deduce approximate locations based on the profiles of Twitter users.

An extension of this work could involve immediate emotion detection based on tweets, culminating in the creation of a heat map showcasing different types of emotions experienced by the community during a disaster. Such an expansion would further enhance our understanding of emotional dynamics in the aftermath of calamitous events. We have also confined the classification to conventional machine learning and deep learning techniques. It would be useful to explore the classification using transformers and large language model for improved accuracy. It was also observed that some of the words used in the tweets are closely related such as “feel” and “felt”, “thinking” and “think”. As future work, we can potentially cluster these words using stemming, lemmatization or synonyms.

In this study, we opted to exclude emojis and emoticons from our analysis. However, considering these symbols could prove beneficial, as many Twitter users tend to use them to convey their emotions. Our current approach involves categorising each tweet into a singular emotion class. Nonetheless, it's important to note that tweets can encompass multiple emotions. For instance, a tweet like "Good night chch, Geez that was a frightening jolt just as I started to type. See you all safe and sound in the morning" encapsulates both "fear" and "worry," even though our current classifier designates it as "fear." To address this kind of multifaceted classification, an exploration into multi-label classification methods is warranted such as the work by Yang et al. (2024). This adaptation would enable us to precisely capture the coexistence of multiple emotions within a single tweet, leading to a more accurate classification of emotions.