1 Introduction

Globally, heat is a public health concern (Capon et al. 2019; Ebi et al. 2021; Jay et al. 2021) and “one of the most underappreciated hazards of climate change” (Nature Editorial 2021). Extreme heat has led to the deaths of 30 million people in the last three decades (Vicedo-Cabrera et al. 2021). The 2003 European heat wave, for example, killed 70,000 people (Robine et al. 2008). Heat affects both rural and urban populations where there is increasing exposure to morbidity and mortality from extreme heat and related heat illnesses (Hatvani-Kovacs et al. 2016; Wouters et al. 2017; Zander et al. 2018). Temperatures in some regions in the world are already exceeding life-threatening thresholds during heat waves and eventually such places will become inhabitable (Horton et al. 2021). Apart from killing people, often the elderly and those with chronic illnesses, heat decreases well-being through mild symptoms such as headaches and fatigue (Zander et al. 2017) and productivity (Zander et al. 2015; Andrews et al. 2018) and disrupts comfort and everyday life. Heat stress can also increase violence and crime (Stevens et al. 2019).

Given that average temperatures are predicted to be 2.4 degrees higher than pre-industrial levels by 2100, almost a full degree above the Paris limit (Climate Action Tracker 2021), and heat waves are set to become more frequent and longer (IPCC 2021), there is increasing recognition of the need for research on how to increase adaptation to the direct effects of heat on human society. Adaptation to heat includes changes in individual behaviour (e.g. drinking, resting, shifting activities to cooler times of the day), improving infrastructure at both household level and in public places, and adjusting institutional policies (Turek-Hankins et al. 2021). While there is research on how people are learning to cope with increased heat (Bakhsh et al. 2018; Jay et al. 2021; Zander et al. 2021), adaptation to heat is ultimately limited by physiological responses of human organism (Sherwood and Huber 2010; Asseng et al. 2021). One adaptation, therefore, is migration to cooler places to avoid negative health implications and a decline in well-being (Cattaneo and Peri 2016; Zander and Garnett 2020).

Notwithstanding the impacts of heat at a local level, it is unclear how many people suffer from heat globally. Studies on heat adaptation are limited and mostly confined to developed countries where heat is mostly treated as a health issue, particularly in urban areas (Turek-Hankins et al. 2021). Moreover, data are scarce since people do not necessarily report to medical doctors or hospitals when they only feel mild symptoms. To overcome the data shortage, researchers have used data from ambulance call-outs (Guo 2017) or occupational accidents (Varghese et al. 2019) in order to assess the relation between heat-related illnesses and ambient temperatures, but this has also required that heat-related illness be sufficiently severe to warrant medical intervention. A limited number of studies have collected primary data on heat adaptation through surveys (e.g. Hatvani-Kovacs et al. 2016; Rauf et al. 2017; Zander et al. 2018; Beckmann et al. 2021).

In this study, we aimed to explore the potential utility of a large and hitherto neglected dataset-freely available social media data—for communications about heat waves. One of the most frequently used social media platforms by researchers is Twitter (Lupton 2014). Twitter provides a certain percentage of all tweets, the messages sent on Twitter, including associated meta-data, free of charge to be used for non-commercialised research via a free accessible API. We collected Twitter data to explore (1) what kind of useful information social media data can provide from communications about heat waves, (2) what we can learn from Twitter conversations about how people are affected by and cope with heat waves, and (3) who engages and to what extent in conservations about heat waves.

Social media is now the most common means for disseminating information rapidly about disasters (Abedin and Babar 2018). Increasingly, the public turns to social media during crises to share the latest information, while agencies (governmental, non-governmental or non-institutional) use social media for formal communication of updated information, warnings and specific advice or give directions to the public and stakeholders such as volunteers (Bird et al. 2012; Abedin and Babar 2018). Many studies have shown that the number of social media messages from people close to events increases substantially immediately before and during their occurrence. This suggests that social media use, and in particular Twitter messages, are reliable indicators of public awareness (Bird et al. 2012; Kryvasheyeu et al. 2016; Wang et al. 2016; Silver and Andrey 2019). Globally, researchers have taken advantage of the increasing volume of social data provided by the public to improve understanding of how people respond to (Wang et al. 2016; Reynard and Shirgaokar 2019; Martín et al. 2020) and recover from (Jamali et al. 2020) sudden-onset disasters such as floods, bushfires, earthquakes and cyclones/hurricanes. Social media has also been used for early detection and tracking of such events (Wang et al. 2016) and to assess the level of damage afterwards (Kryvasheyeu et al. 2016).

Our paper contributes to extending the applications of disaster related Twitter studies to heat waves which, although often deadly, are rarely perceived to be as severe and destructive as are some other natural hazards. To the best of our knowledge, social media analysis has been employed to explore how people are impacted by heat waves in only a handful of country-specific case studies (Japan: Murakami et al. 2020; India: Cecinati et al. 2019; Italy: Grasso et al. 2017; and the US: Jung and Uejio 2017; Moore et al. 2019). Our paper differs to this existing body of the literature in that we collected and analysed Twitter data globally over a period of two months. We also used an innovative method of topic modelling designed for short text such as Twitter tweets, a Dirichlet multinomial mixture model (GSDMM), which extends the commonly applied latent Dirichlet allocation (LDA) model.

2 Data and methods

2.1 Twitter data collection

The data in this research were collected using snscrape, a pre-built scraper for social networking services in Python. The TwitterSearchScraper method of snscrape’s Twitter module (JustAnotherArchivist 2020) allows users to retrieve historical tweets based on a set of keywords. Here, we searched for historical heat-related tweets applying two inclusion criteria. Firstly, the tweets must contain any of the following ten keywords: ‘heat wave’, ‘heatwave’, ‘sweltering heat’, ‘unbearable heat’, ‘torrid heat’, ‘heat warning’, ‘heat watch’, ‘heatstroke’, ‘heat stroke’ and ‘extreme temperature’. Secondly, these tweets must have been published over a period of four months from January through to April of 2022. The reason we chose this four-month time frame was that the global temperature during January–April 2022 was the fifth highest in the 143-year record (NOAA 2022; Figure S1 in Supplementary Materials) with above-average temperatures observed in Asia, South America, the Atlantic and Pacific Oceans and Australia.

Among all information available in the Tweet object of TwitterSeachScraper, we selected 13 variables to download (Table S1 in Supplementary Materials). The data collection process resulted in a raw data set containing 62,920 rows of data which corresponded to 62,920 tweets that matched the two criteria described above. Figure 1 outlines the data extraction, processing, analysis and visualisation processes.

Fig. 1
figure 1

Data collection, processing and analysis

2.2 Data processing

Before any data analysis and topic modelling were performed, data processing was required to transform the raw data into an appropriate format. We applied common standards for processing Twitter text data with some adjustments to suit our data set (Xue et al. 2020). The detailed steps are provided in the supplementary materials.

In order to remove noise from the data set, tweets with irrelevant content were discarded after an initial analysis. We found that entertainment was a frequently occurring topic which we removed by filtering the words ‘film’, ‘music’ and ‘movie’. This step removed another 1302 tweets, which brought the total number of tweets removed during data processing to 12,656 tweets. The final data set contained 50,264 tweets with a total of 351,677 words which were then used as inputs for the topic modelling.

2.3 Topic modelling and coherence score

Topic modelling is a popular and important unsupervised machine learning and natural language processing (NLP) algorithm. It is a tool designed to analyse and extract hidden topics from large datasets, in this case, Twitter tweets, which cannot easily be read and analysed objectively without a machine. Topic models classify articles according to their key topics and word frequencies and thereby reveal hidden themes (Chang et al. 2009). Articles within each topic have similar words that occur frequently together (Blei et al. 2003). However, the researchers’ knowledge and experience are also needed to interpret the model outcome into meaningful themes (Karami et al. 2018).

There are different approaches to topic modelling. A common topic modelling approach is latent Dirichlet allocation (LDA), an unsupervised machine learning technique that generates topics based on the pattern of occurrence or co-occurrence of words in a document (Jacobi et al. 2016). LDA, however, does not perform well when the texts are very short, such as is the case with Twitter tweets, due to the lack of information on co-occurrence of words (Qiang et al. 2020). We therefore used Gibbs sampling algorithm for the Dirichlet multinomial mixture (GSDMM) proposed in Yin and Wang (2014). Not only does GSDMM achieve a good balance in both completeness and homogeneity, but the model also obtains better clustering results in short text topic modelling tasks compared to LDA and other traditional topic modelling techniques (Agarwal et al. 2020; Qiang et al. 2020).

The GSDMM topic modelling algorithm has four parameters, namely k, \(\alpha\), \(\beta\), and the number of iterations. Unlike other topic models, the k in GSDMM represents the maximum number of topics available during the clustering process, and it is set larger than the true number of topics. \(\alpha\) and \(\beta\) are two real numbers with values set between 0 and 1. \(\alpha\) relates to the probability that a document will be assigned to a new topic, while \(\beta\) relates to the probability that a document will be assigned to a topic with similar words (Yin and Wang 2014). The performance of the GSDMM model is stable after only five iterations (Yin and Wang 2014), but the model gives the best coherence score with fifty iterations (Omurca et al. 2021), which we used here. The implementation of GSDMM model in Python can be found at a public GitHut repository named rwalk/gsdmm (JustAnotherArchivist 2020).

Unlike α, β and the number of iterations, setting the number of topic (k) for GSDMM is less straightforward, since the true number of topic number is unknown. To determine the number of topics for the GSDMM model, the coherence score for seventeen different values of k, starting with k = 4 and ending with k = 18, was calculated by analysing the frequency of the hundred and fifty most used words and phrases (see Röder et al. 2015). With each value of k, the GSDMM model was run ten times, and the coherence score for each iteration was used to compute an average.

We visualised the outcomes of the GSDMM topic model with word clouds and used different metrics, including the number of tweets, retweets, likes, replies, as well as the number of uniform resource locators (URLs) included and the number of tweets which used hashtags, to compare the different topics. To assign the most likely topic to each tweet, we used the pre-built MovieGroupProcess’s choose_best_label function.

2.4 Measuring the level of activity, popularity and influence of Twitter accounts

We applied a range of different metrics to measure the popularity, influence, and activity of different Twitter accounts. The most straightforward metric to measure activity is the TweetRank metric (Nagmoti et al. 2010), which is defined by the number of tweets posted by a Twitter account over a period of time. Measuring the popularity of Twitter accounts was done using the in-degree metric (Cha et al. 2010), as well as the FollowerRank (FR) (Nagmoti et al. 2010), which is the normalised version of the traditional in-degree metric. The FR of an account is defined by the number of its followers divided by the sum of the number of its followers and the number of its followees. The FR ranges from 0 and 1. Accounts with a FR close to 0 are not considered popular, while users with a FR close to 1 are extremely popular (Primo et al. 2021).

Compared to measuring activity and popularity, measuring the degree of influence of a Twitter account is more challenging since there is no unified definition for an influential user. There is, however, a large number of possible measures to estimate influence (Riquelme and González-Cantergiani 2016). We opted for three metrics proposed by Cha et al. (2010) to measure the level of influence of a Twitter account. These three metrics are indegree influence, which considers the number of followers of a Twitter account; retweet influence, which considers the number of retweets a Twitter account received; and mention influence, which considers the number of times a Twitter account was mentioned by others.

Apart from the aforementioned metrics, we also measured the number of likes, previously termed ‘favourites’, and the number of replies a tweet received. Liking a tweet is a means to show acknowledgement and agreement, while replying to a tweet is a way for Twitter users to join a conversation and voice their opinions on the topic of discussion (Zhang et al. 2018). Hence, the number of likes and the number of replies can also be used to measure how popular and influential a Twitter account is.

3 Results

3.1 Description of Twitter activity and engagement

3.1.1 Number of heat or heat wave-related tweets

Overall, the number of heat or heat wave-related tweets increased by almost 30% from January to April 2022 (Fig. 2). Compared to January 2022, the number of tweets dropped slightly in February 2022, which can be attributed to the lower-than-average temperatures observed in many regions in the world (NOAA 2022). The number of tweet was the highest in April 2022, reaching 15,467.

Fig. 2
figure 2

Number of tweets per month, from January 2022 to April 2022

The number of heat or heat wave-related tweets corresponded well to the occurrence of several heat waves during this time period. In the second week of January (between 10/01/2022 and 16/01/2022), a heat wave hit several countries in South America, including Argentina (Earth Observatory 2022). This led to a steep increase in the number of tweets containing the words ‘argentina’, ‘buenos’, ‘buenos aires’ or ‘south america’ compared to the week before the heat wave hit the country. In the third week of January (between 17/01/2022 and 23/01/2022), Western Australia experienced a week-long heat wave with “severe-intensity to extreme-intensity” (Bureau of Meteorology 2022). As a result, the number of tweets containing words such as ‘australia’, ‘perth’ or ‘western australia’ also increased significantly. In the second week of February (between 07/02/2022 and 13/02/2022), a heat wave hit several cities in California, which led to an increase in the number of tweets containing the words ‘california’, ‘san francisco’, ‘sacramento’, ‘los angeles’ or ‘san diego’ from five in week 5 to 167 in week 6.

In early March 2022, a heat wave hit several South Asian countries, including Pakistan and India. There was an increase in the number of tweets with the words ‘pakistan’, ‘india’ or ‘asia’ from the first week of March (week 9), and this number peaked in the last week of April (between 25/04/2022 and 01/05/2022) when India and Pakistan experienced record high temperatures. Compared to the heat waves in other regions earlier in 2022, the heat wave in South Asia gained higher media attention (e.g. The Guardian 2022) and the number of tweets with the words ‘pakistan’, ‘india’ or ‘asia’ peaked at a much higher number of 292.

3.1.2 Activity, popularity and influence of Twitter accounts

The ten most active Twitter accounts were personal accounts belonging to individual users (Table 1). The account with the highest number of heat or heat wave-related tweets was ‘heatstroke_x’, who posted 595 tweets over the four-month period. Despite the high number of tweets posted, ‘heatstroke_x’ received minimal engagement from their followers as illustrated through the number of replies, retweets and likes. Other accounts such as ‘climateguyw’ (120 tweets posted), and ‘wsl’ (117 tweets posted) had higher numbers of followers engaging with their tweets (Table 1).

Table 1 The ten accounts with the highest number of tweets

Unlike the ten most active accounts, the ten Twitter accounts with the most followers belonged to large organisations. Eight out of the ten accounts with the most followers belonged to news outlets and broadcasting corporations, one account belonged to a sports association (NBA: National Basketball Association), and one to a government official (rashtrapatibhvn). Most of the highly active accounts were from the UK, USA and India (Table 2), three countries with a large English-speaking population. However, despite having many followers, the heat or heat wave-related tweets posted by these accounts did not receive a lot of engagement, as shown by the low number of retweets and likes (Table 2). All these ten accounts were very popular (FR values close to 1.0), meaning that they had many more followers than followees. Although there was no significant difference in the FR values across the ten accounts, the number of followers varied greatly. The two Twitter accounts ‘TheEconomist’ and ‘BBCNews’, for example, had the same FR value, but ‘TheEconomist’ had 83% more followers than ‘BBCNews’ (Table 2).

Table 2 The ten accounts with the most followers

The ten accounts with the most retweets also belonged to personal accounts administered by individual Twitter users (Table 3). ‘PlatformAdam’, a Twitter account managed by a company that specialises in developing environmental data products, had the second-highest number of retweets, despite having posted only one relevant tweet. ‘PlatformAdam’ received 3301 retweets for their tweet posted on April 29 that read: “The current extreme #heatwave in #Pakistan and #India as seen today, on the fourth intense hot day, by #Copernicus #Sentinel3 LST (Land Surface Temperature, not Air!). LST collected on April 29 shows max value exceeding 62 °C/143°F. Gaps due to cloud/snow/nodata. #ClimateEmergency”.

Table 3 The ten accounts with the highest retweet counts

Another notable Twitter account in the top ten was ‘khanthefatima’ with a tweet posted on April 21 about the heat wave in India, reading: “Almost every journalist going to report on Jahangirpuri is returning unwell. The heat is, with no exaggeration, unbearable. Now think of all those who lost their homes today, many of whom would be fasting. Parched and empty stomach, the roof over their head has also been stolen”. This tweet received many likes and replies (Table 2).

3.2 Topic modelling development

The α and β parameters for the GSDMM model were set at 0.1 and 0.01, respectively (see Sect. 2.3). To determine the number of topics (k), we estimated multiple GSDMM models. The plot of the average coherence scores for fifteen different values of k showed no major variance (Figure S2 in Supplementary Materials). The GSDMM model achieved the highest average coherence score of 0.469 with k = 6, closely followed by a 0.462 coherence score with k = 12. However, upon further inspection, we discovered many semantic similarities between the six clusters of words generated by GSDMM with k = 6. As a result, five topics were selected for our final GSDMM model.

3.3 Topic analysis and labelling

To label each topic, we investigated the one hundred most frequently used words within. Topic 1 was the largest cluster with 16,930 tweets (33%), followed by topic 2 with 10,299 tweets (20%), topic 4, 3 and 5 (Fig. 3). Four of the five topics (66% of all tweets) could be assigned to communications about climate-related heat, while tweets belonging to topic 1 could not.

Fig. 3
figure 3

Topic distribution and labelling

In topic 1, ‘watch’ and ‘heat’ were the two most frequently used words. Words such as ‘game’, ‘play, and ‘team’ being the third, fourth and fifth most frequently used words suggested that this topic was about sports events and labelled accordingly. Words such as ‘basketball’, along with ‘buck, ‘bull’, ‘hawk’ and ‘raptor’, names of major basketball teams in the NBA, indicated that many tweets in this topic were about basketball. We could not detect a reference to temperatures, and ‘heat’ was mostly used in relation to a basketball team called ‘Miami Heat’.

Topic 2 was the second largest cluster with 10,299 tweets (20%). Similar to topic 1, ‘heat’ and ‘watch’ were among the most frequently used words in this cluster. However, unlike tweets related to sport events (cluster 1), tweets in topic 2 also included words such as ‘unbearable’, ‘sweltering’, ‘feel’ and ‘die’, indicating that people talk about how the heat strained on them. Other frequently used words included ‘sleep’, ‘work’, ‘body’, ‘sweat’, ‘heatstroke’, ‘people’ and ‘life’, which indicated that the tweets in this topic were about the impacts of heat on the health and well-being of people. Hence, the topic was named ‘Health impacts’. Moreover, words such as ‘wear’ and drink’ suggested people talking about actions needed to keep healthy during extreme heat.

In topic 3 (17%), ‘extreme, ‘heat’ and ‘temperature’ were the three words with the highest frequencies. Words such as ‘climate’, ‘climate_change’, ‘wildfire’, ‘drought’, ‘flood’, ‘water’ and ‘record’ were also frequently used, suggesting that this topic consisted of tweets about extreme temperature and weather conditions attributable to climate change. The topic was therefore named ‘Extreme weather and climate change’. This topic had the highest number of retweets, and more URLs and hashtags were included in tweets in this topic than in other topics.

Topic 4 (16%) consisted of many frequently used heat wave-related words and adjectives indicating feelings and frustration, including ‘sweltering_heat’, ‘unbearable’, ‘fuck’, ‘unbearable_heat’ and ‘hot’. Combined with other frequently used words such as ‘work’, ‘school’, ‘kid’, ‘family’, ‘hospital’, and ‘heatstroke’, we assumed that this cluster mainly consisted of tweets expressing opinions on how extreme temperatures and heat waves impact the daily lives of people, as compared to their health which was the main theme of topic 2. We also found the words ‘air_conditione’, ‘cool’, ‘stay’, and ‘electricity’ in this topic, indicating cooling strategies. The word ‘car’ is among the top 20 words, and ‘child’ among the top 50, indicating that users talk about extreme heat in cars and the consequences. We labelled this topic ‘Social impacts’.

Topic 5, the smallest cluster (13%), contained many words that also appeared in topics 2 and 4. However, this topic also included the words ‘warn’ and ‘warning’ more frequently than in the other topics. Other words such as ‘check’, ‘know’, ‘think’, ‘listen’ and ‘feel’ made us label this topic ‘Perceptions and warning’. The word clouds in Fig. 4 show the most frequently used words in tweets belonging to each of the five topics.

Fig. 4
figure 4

Word clouds showing most frequent words in each topic

4 Discussion

4.1 What we learned from the Twitter communications

Our exploratory study is the first of its kind to analyse global Twitter communication about heat waves. We aimed to explore (1) whether social media communications data can provide useful information about heat waves, (2) what we can learn from Twitter conversations about how people are affected by and cope with heat waves, and (3) who engages with conservations about heat waves, and to what extent. By addressing these three aims, six lessons emerged that are worth highlighting.

First of all, the intensity of Twitter activity corresponds well with occurrences of major heat waves, such as those identified in California, Argentina, Western Australia and South Asia. Previous studies have shown that the number of social media messages from people close to events increases substantially immediately before and during their occurrence, suggesting that social media conversations are reliable indicators of natural hazard occurrences, in particular of sudden hazards such as floods (Bird et al. 2012), tornados (Silver and Andrey 2019), or fires (Wang et al. 2016). Our study is among the very few, besides Grasso et al. (2017), to establish this relationship between occurrence and increasing Twitter activity related to heat, which is often referred to as a slow onset hazard (Zander et al. 2019; Oppermann et al. 2021). The increasing number of Twitter messages during heat waves can also be interpreted as an indicator of public awareness, which relates to the third lesson learned.

Second, we learnt little from the Twitter conversations about peoples’ behaviour during hot days and heat waves. Among the most frequent 100 words in each topic, we identified only a few words that appeared to indicate adaptation, coping, and heat relief strategies. Tweets belonging to topic 4 (‘Social impacts’) included words such as ‘air_conditione’, ‘cool’ and ‘stay’, which may indicate people choose to stay in artificially cooled rooms on hot days, one of the main heat relief strategies in Australia (Zander et al. 2021), the USA (Lee and Shaman 2020) and many emerging countries, including in Asia (He et al. 2021). The word ‘electricity’ was also used frequently within that topic, suggesting that people may converse about high energy use in this context (see, e.g. Davis and Gertler 2015). The topic about the health impacts that frequently included words such as ‘drink’ and ‘wear’ also indicates some awareness about hydration during hot days, and the role of appropriate clothing. People seem to talk more about how they feel than what they might do to cope with heat. This expression of feelings can be interpreted as a consciousness of heat and heat waves as a potential threat, an awareness that can heighten peoples’ resilience since people who are aware of a threat are more likely to take adaptation actions, as shown for sudden onset hazards such as floods (Kellens et al. 2011; Liu et al. 2018) and also for heat waves (Zografos et al. 2016; Esplin et al. 2019; Howe et al. 2019). While we may have gathered more information about adaptation and coping strategies had we included additional search keywords such as ‘prepare’, ‘plans’ and other heat relief measures (see Huang and Xiao 2015), such words should have emerged in the topic modelling had they been present.

Third, we detected little fake news or account activities by users who disseminate false information about climate change and related heat, as can sometimes happen on social media (Allcott and Gentzkow 2017). The most prolific accounts belonged to individual scientists or scientific organisations, such as climateguyw or PlatformAdam, the official Twitter account of the residence of the President of India (rashtrapatibhvn), and large media corporations such as the ABC, Fox News, The Economist, and Reuters. This suggests that people rely strongly on scientific information and traditional news channels during extreme weather events (Freeman et al. 2018) and that these channels are not having to compete with misinformation.

Fourth, we found that no emergency agencies were among the ten most prolific Twitter accounts, whether in terms of the number of tweets, retweets or likes. For sudden hazards such as floods and fires, accounts managed by government emergency services used by many people, being considered a trusted source of warnings and updated information (Bird et al. 2012; Abedin and Babar 2018). Government emergency services did not feature highly among tweets posted before or during heat waves. This might be because government emergency and disaster management authorities do not consider heat, a slow-onset hazard, as their responsibility. For people affected by heat who prefer to receive alerts and information through official government Twitter accounts, rather than media channels or personal accounts, this may represent a gap in government services.

Fifth, we encountered some ‘noise’ in the collected data. After an initial analysis, we realised that a large number of tweets mentioned a movie named ‘Heatwave’ or a song named ‘Heat Waves’, neither of which was relevant to climate-related heat or heat waves. We therefore decided to clean the dataset by deleting all tweets referring to movies, films and music. Although this might distort the true picture of what people talk about on Twitter, we deemed this the correct approach in the light of our aims. Although we chose our search keywords carefully, this type of noise is unavoidable when collecting global Twitter data and we recommend multiple rounds of cleaning before starting topic modelling. Once the data have been cleaned, useful information and lessons emerged, as discussed here, even though we were left with one topic (sport events; 34% of all tweets) unrelated to climate-related heat and heat waves.

Sixth, when comparing different metrics of popularity and activity, we uncovered surprising details about the engagement of users, as discussed in the next section. We found significant effects of the content, the timeframe of the tweeting and the likelihood of a tweet being retweeted, as discussed in more detail in the next section. This knowledge can be useful for emergency authorities and organisations wanting to disseminate information about extreme heat situations rapidly and effectively to a large number of people.

4.2 Factors impacting the retweetability of tweets

We learnt five lessons by applying metrics that can identify the characteristics of the Twitter accounts that disseminate heat or heat wave-related information most rapidly. First, echoing Cha et al. (2010), we found that the number of followers is a poor indicator of an account’s influence. The ten accounts with the most followers had less engagement from their followers than accounts belonging to individual users with fewer followers.

Second, we found that tweets may have similar content but can trigger different levels of engagement. For example, a tweet about extreme temperatures in India and Pakistan posted on 29 April by PlatformAdam was retweeted 9.22 times more and had 17 times more replies than a tweet with similar content posted by LicypriyaK a day later, an account with almost 100 times more followers than PlatformAdam (167,000 compared to 7100 followers). This difference appears to have been the timing of the tweet—the earlier a tweet is posted about an imminent heat wave, the more likely it is to be retweeted. Given the importance of information timeliness in disaster communication (Taylor et al. 2007; Son et al. 2019), this finding was not surprising. Users retweeting the earliest tweets providing timely updates and alerts might be perceived as credible sources of information and thus generate higher engagement among followers.

Third, tweets about extreme weather and climate change (topic 4), which also included weather updates, were more likely to be shared than tweets about social and health impacts despite having fewer tweets. Tweets discussing the health and social impacts of heat or heat waves attracted the attention and gained approval from Twitter users (indicated by having more ‘likes’ and ‘replies’) but failed to become viral. This may mean that Twitter users consider tweets about ‘Extreme weather and climate change’ to be interesting, relevant and thus worth sharing. This insight is consistent with Abdullah et al. (2015) and Lee and Yu (2020), who show that content relevance impacts retweetability. Furthermore, from manual inspection of the content of the tweets in each topic, we found that numbers and prepositions were used more frequently in tweets that shared information about extreme weather. Hence, the frequency of the use of numbers, quantifiers and prepositions (Lee and Yu 2020) could also have impacted the retweetability of heat or heat wave-related tweets. Since retweeting allows for quick dissemination of information about significant events (Zhao et al. 2011), it is important to understand what factors might influence the retweeting behaviour of Twitter users. We can therefore apply this understanding to improve the effectiveness of disaster communication, especially during heat waves.

Fourth, besides the differences in content relevance, we also found that URLs and hashtags were more commonly found in tweets on the ‘Extreme weather and climate change’ topic than other topics, which was reflected through the higher proportion of tweets that included URLs and hashtags (Table 4). This result is consistent with studies by Suh et al. (2010) and Bruns and Stieglitz (2012), who concluded that the inclusion of URLs or hashtags in tweets positively affects the retweetability of disaster tweets.

Table 4 Number of tweets, retweets, likes, replies and links by topic

Fifth, our analysis of tweet engagement suggests that the social network of the author of a heat or heat wave-related tweet also affects retweetability. We found that PlatformAdam’s tweet was retweeted by Karl_Lauterbach, Germany’s Federal Minister of Health with 1 million Twitter followers, as well as cathmckenna, the former Canadian Climate Minister with nearly 182,000 followers. A tweet with the same content posted by LicypriyaK on the following day, however, was not retweeted by any notable Twitter account. Through retweeting, the user shows agreement with the content and its general credibility (Son et al. 2020). Therefore, in addition to being the first account to raise awareness about the extreme temperatures in India on 29 April 2022, receiving the approval from other influencers in their social network might be the reason PlatformAdam’s tweet had such high engagement counts. This is consistent with the results in Xu and Yang (2012), who suggested that the social relationship of a Twitter user is crucial to the likelihood of their tweets being retweeted.

4.3 Study limitations

Our study has three limitations. The first relates to the limited data collection which happened because we only searched for Twitter tweets. Social media users employing other channels such as Facebook or Instagram to talk about heat or heat waves in the relevant time horizon were not included. However, we are confident that we captured many relevant conversations since Twitter is the outlet used most frequently by people seeking and communicating information in disaster situations (Simon et al. 2015), although Facebook has greater potential for conveying detail (Anikeeva et al. 2015). Also related to the limited representativeness of the data collected is the fact that we only searched tweets written in English. This means users from English-speaking countries with a large population are overrepresented, including the USA, UK and India. This is a common problem of research on social media use with very few exploring tweets in languages other than English (e.g. in German: Netzel et al. 2021).

The second limitation is related to the timeframe over which the data for our study were collected. The data presented in this study covered a four-month period from January to April 2022. The months after this have also seen higher-than-average temperatures relative to the same period from previous years, in particularly in Europe. We suggest that tweets be collected for analysis over years, not months, to reduce the impact of single non-climate change-related events (such as the launch of a movie called ‘Heatwave’ in India, or specific sport events, as was the case in our study).

The third limitation is related to the GSDMM topic model. At the core of GSDMM is the maximum number of clusters (topics) KMax assumed to be present in the entire text corpus. While the GSDMM model can perform well with a KMax higher than the true number of clusters (Yin and Wang 2014), it is challenging to find an appropriate KMax when the true number of clusters is unknown (Yin and Wang 2016). In our paper, we assumed that the value of KMax was 18 and then evaluated the performance of the GSDMM topic model with different values of k ranging from 4 to 18. However, the value we selected for KMax may actually be less than the actual KMax, and so the optimal k was not in the evaluation range.

5 Conclusions

This study contributes to disaster communication research by analysing social media communications (Twitter tweets) to improve understanding of the global impacts of, and responses to, extreme heat and heat waves. Our study, one of the few to investigate social media use during extreme heat, analysed 62,920 tweets from January to April 2022 and found a strong correspondence between the number of tweets in each week and major heat wave occurrences in Argentina, Australia, the USA and South Asia. This was consistent with previous findings that social media use is a good indicator of the intensity of natural hazards. An unsupervised text analysis (topic modelling) identified five topics. Four of these topics and most tweets (66%) related to climate-related extreme heat and heat waves. Approximately one-third of tweets (34%) were about sport events, mainly basketball games in the USA, and could not clearly be attributed to communications about climatic events. The four relevant emerging topics were ‘Health impacts’ (20%), ‘Extreme weather and climate change’ (17%), ‘Social impacts (16%)’, and ‘Perceptions and warning’ (13%). Our analysis suggests that many Twitter users are aware of heat or heat waves and that social media can help to disseminate information about extreme temperatures and weather updates. Many users employ Twitter to share how they feel about unbearable heat and Twitter communications were also used, albeit to a lesser extent, to talk about how to cope and relief from heat stress. We also identified the circumstances under which tweets are likely to be retweeted, including the role of timeliness and social networks, which can help when organisations need to disseminate information in an emergency.