1 Introduction

Typhoons are one of the major natural hazards in China. In 2019, typhoons caused about CNY 58.9 billion in direct economic losses in China, accounting for 18% of the annual direct economic losses caused by natural hazards (UNDRR 2020). When a typhoon occurs, people in the disaster-stricken areas often share personal experiences and views on the disaster events or the relief processes on social media (Tang et al. 2021). The popularity and content of disaster-related topics on social media reflect public perception of the disaster risk. In recent years, scholars have begun to use social media data in disaster impact assessment (Butler 2013; Kryvasheyeu et al. 2016; Baranowski et al. 2020; Eyre et al. 2020) and risk perception (Marshall 2019; Rahmanti et al. 2021), and have pointed out that there exist variations in the disaster impact and risk perceptions of different individuals. Fan et al. (2020) found uneven spatial distribution of social media attention, although the hazard intensity of Hurricane Harvey was similar in some areas of the United States. Zhao et al. (2019) proposed that victims tend to share information, while the nondisaster-affected are more likely to express their feelings and opinions. Xiao et al. (2018) found that Twitter information was concentrated in hard-hit areas, landmark public places, and areas with easy access to the Internet. Macias (2016) indicated that variations in environmental risk perception existed among different race and ethnic groups.

Understanding disaster risk perceptions in a disaster-related context is crucial, because attitudes and risk preference shape how individuals, groups, and public- and private-sector organizations adopt risk mitigation practices (Gotham et al. 2018), how they develop post-disaster rebuilding activities (Albrecht et al. 2021), and how they plan for future disasters (Harman et al. 2021). Inevitably, different groups vary in responses to disasters (Lee et al. 2015). Social media data can be a valuable source to research disaster risk perception for more accurate and effective disaster education and risk commumication. However, a comprehensive framework for the application of social media data in typhoon risk perception research has not been formed.

In this study, we established a novel framework based on the regional disaster system theory (Shi 2005) to comprehensively assess social media data for the analysis of risk perception in which the hazard, the surrounding social and geographical environments, and the affected population groups jointly affect typhoon risk perception. We took the Sina Weibo posts during Super Typhoon Lekima as an example to verify our model. Sina Weibo is a popular Chinese microblog that is similar to Twitter (Yao et al. 2021). Super Typhoon Lekima is the ninth named storm of the 2019 Pacific tropic cyclone season (Bao et al. 2020). At about 1:45 on 10 August 2019, it landed in the coastal area of Wenling City, Zhejiang Province, and moved into the Yellow Sea via Jiangsu Province. At around 20:50 on 11 August, it landed again in Qingdao City, Shandong Province twice, and then moved into the Bohai Sea and weakened continuously thereafter. At around 8:00 on 13 August, the National Meteorological Center downgraded Lekima to a tropical depression. Lekima was the fifth strongest typhoon that has landed in China since 1949. It brought extreme rainfall and a wide range of strong, long-duration winds. In eastern China, particularly in Zhejiang, Anhui, Jiangsu, and Shandong Provinces, urban and rural waterlogging, river floods, flash floods, and landslides have occurred in varying degrees. Lekima caused CNY 51.53 billion in direct economic losses. According to incomplete statistics, there are nearly 300 Lekima-related hashtags on Weibo, read by over 10 billion people, and over 1.7 million comments (Zhang et al. 2020). Lekima is a typical typhoon event with a wide impact area, high disaster loss, and strong public concern.

We used multiple machine learning models to extract hidden key information including spatial location, crowd sentiment, and rainfall intensity from Weibo. Spatial-temporal comparisons of multiple heterogeneous data were conducted based on three aspects: hazards, the disaster-formative environment, and hazard-affected elements especially humans. Combined with the three-dimensional risk perception cube, we investigated the feasibility of using social media data for typhoon disaster risk perception analysis and crowd sensitivity variation. Our goal was to provide directions for planners and policymakers on how to develop more effective typhoon risk communication strategies and risk reduction policies.

2 Methodology

This section outlines the methodology that was employed in the study. The specific issues discussed include the regional disaster system theory, research data collection, and some algorithms and processes used for extracting hidden information in Weibo texts.

2.1 Regional Disaster System Theory

The concept of regional disaster system theory was first introduced in 1991 (Shi 1991) and elaborated further several times (Shi 1996, 2002, 2005, 2009). It is widely used to analyze natural hazard-related disasters (Jia et al. 2016) and technological disasters (Yang et al. 2020) in China. The regional disaster system theory considers that the intensity of hazards \(H\), the stability or sensitivity of the disaster-formative environment \(S\), and the vulnerability of hazard-affected elements \(V\) together constitute the functional structure of a regional disaster system \(\it {D}_{\text{f}}\). That is,

$$\it {D}_{\text{f}}=H\cap S\cap V$$

According to this theory, we proposed a corresponding framework for the application of social media data in typhoon risk perception research (Fig. 1). The typhoon risk perception \({R}_{\mathrm{p}}\) is a result of the combined effects of the hazard and disaster \(H\) (hazard intensity and disaster damages), the surrounding social and geographical environment \(S\) (socioeconomic stability, geographical environment sensitivity), and the affected population groups \(V\) (gender variation, residency variation). That is,

Fig. 1
figure 1

A framework for the application of social media data in typhoon risk perception research based on regional disaster system theory

$${R}_{\mathrm{p}}=H\cap S\cap V$$

We included three adjustments according to the characteristics of risk perception: (1) For the general public, it is difficult to accurately distinguish between hazards and disasters, so we discussed both together; (2) We divided the environment into two parts: social environment and geographical environment; and (3) The public shares their perceived typhoon risk on social media platforms, so we regarded population groups as the most important hazard-affected elements and analyzed the gender and residency variations of those groups. Next, we explored the feasibility (red colored text in Fig. 1) and crowd sensitivity (purple colored text) based on risk perception-oriented regional disaster system theory. Here, the feasibility analysis of using social media data demonstrates the viability of the proposed framework, and the crowd sensitivity analysis highlights the variation of typhoon risk perception.

2.2 Research Data

We used Sina Weibo API to collect original Weibo texts and user attribute data, taking “ Lekima (利奇马) | Typhoon (台风) | Storm (暴雨)” as the keywords, and 10 min as the minimum retrieval time unit. The period is from 8 August to 14 August 2019. As mentioned above, Typhoon Lekima landed on 10 August and stopped on 13 August. Before the typhoon landed, meteorological agencies issued early warnings, and the emergency management agencies communicated disaster prevention and preparedness news on Weibo. Therefore, the public paid attention to the typhoon trend in advance. With the approach of the typhoon, their concerns continued to rise. After the typhoon event, disaster relief in some areas was still in progress. The public was concerned about the disaster impact. Therefore, in order to show the whole process of change of the public attention to Typhoon Lekima on Weibo, we chose 8–14 August as our research time span that slightly exceeded the impact period of Typhoon Lekima in China. After eliminating duplication, 69,936 out of 342,984 Weibo texts were identified with location information, based on which we conducted further study. The data fields include the unique identifier of Weibo, publishing time, text content, Weibo location (may be empty), user registration position, user gender, and so on.

According to the tropical cyclone landfall records in China from the China Meteorological Administration (CMA) Tropical Cyclone Data Center,Footnote 1 Guangdong (264), Hainan (165), Taiwan (149), Fujian (130), and Zhejiang (51) were the provinces where typhoons landed more than 50 times between 1949 and 2019. We designated these five provinces as typhoon-prone areas, and other regions of China as areas not prone to typhoons.

This study also used multisource data for model building and comparative analysis, which mainly include: Typhoon Lekima best track dataFootnote 2 (from CMA Tropical Cyclone Data Center); weather station observational dataFootnote 3 (from the National Climate Center); disaster data of some provinces and cities (from the National Disaster Reduction Center of ChinaFootnote 4); Digital Elevation Model (DEM) 90 m data of Zhejiang ProvinceFootnote 5 (from the Resource and Environment Science and Data Center); data of gross domestic product (GDP), population, and grain sown area of some provinces and cities (from the National Bureau of StatisticsFootnote 6 and Zhejiang Provincial Bureau of StatisticsFootnote 7).

2.3 Hidden Information Extraction in Weibo Texts

Weibo texts contain abundant hidden information. In addition to the directly obtained Weibo data, we extracted the rainfall intensities, sentiments, spatial information, and keywords from the texts. We constructed several models including the text sentiment analysis model and the rainfall intensity classification model by using deep learning techniques. By comparing diverse data, we analyzed the feasibility of using social media data for typhoon risk perception analysis and crowd sensitivity variation based on the three aspects of the regional disaster system theory (Fig. 2). First, we compared the spatiotemporal distribution of Lekima-related Weibo posts with the hazard intensity data, disaster damage statistics, and socioeconomic data, which verified the feasibility of using the Weibo data for risk perception analysis. Then, the differences in crowd perception sensitivity were analyzed from the perspectives of gender, residency, and the environment.

Fig. 2
figure 2

Technology roadmap for Weibo data and other data sources used in this typhoon risk perception study

2.3.1 Rainfall Intensity Classification Based on BiLSTM Algorithm

This study used Bidirectional Long Short-Term Memory (BiLSTM) networks (Cai et al. 2021; Li et al. 2021) to build a rainfall intensity classification model to map the relationship between Weibo text and near-time local rainfall intensity (Fig. 3). For each Weibo text with location information, we matched the latest observation data from nearby weather stations, and converted the hourly precipitation into light rain, moderate rain, heavy rain, and torrential rain (and above) according to the national standard of Grade of PrecipitationFootnote 8 (GB/T 28592-2012). Here, 28,065 Weibo texts were used to create a sample dictionary. Among them, 11,671 were labeled as light rain, 6,562 as moderate rain, 6,370 as heavy rain, and 3,462 as torrential rain (and above). Then, a BiLSTM-based rainfall intensity classification model was built.

Fig. 3
figure 3

BiLSTM-based rainfall intensity classification model framework

Long Short-Term Memory (LSTM) is a variant of the recurrent neural network (RNN) that overcomes the vanishing and exploding gradient problems of traditional RNN models (Hochreiter and Schmidhuber 1997). With the gate structure of LSTM, the model can selectively store context information. The basic unit of the LSTM architecture is a memory block, which includes a memory cell (denoted as \(m\)) and three adaptive multiplication gates (that is, an input gate \(i\), a forget gate \(f\), and an output gate \(o\)). Formally, the calculation operations of updating the LSTM unit at time \(t\) are:

$${i}_{t}=\sigma ({W}_{i}{X}_{t}+{U}_{i}{h}_{t-1}+{b}_{i})$$
$${f}_{t}=\sigma ({W}_{f}{X}_{t}+{U}_{f}{h}_{t-1}+{b}_{f})$$
$${o}_{t}=\sigma ({W}_{o}{X}_{t}+{U}_{o}{h}_{t-1}+{b}_{o})$$
$$\stackrel{\sim }{{m}_{t}}=tanh({W}_{m}{X}_{t}+{U}_{m}{h}_{t-1}+{b}_{m})$$
$${m}_{t}={i}_{t} \odot \stackrel{\sim }{{m}_{t}}+{{f}_{t}\odot m}_{t-1}$$
$${h}_{t}={o}_{t}\odot tanh({m}_{t})$$

where \({X}_{t}\) and \({h}_{t}\) represent the input vector and hidden state respectively at time \(t\). \(\upsigma\) is the elementwise sigmoid function, and \(\odot\) is the elementwise product. \({W}_{i}, {W}_{f},{W}_{o},{W}_{m}\) are the weight matrices for the input vector, \({U}_{i}, {U}_{f},{U}_{o},{U}_{m}\) are the weight matrices for the hidden state, and \({b}_{i}, {b}_{f},{b}_{o},{b}_{m}\) denote the bias vectors. However, LSTM only considers the information from the past, ignoring future information. To efficiently use contextual information, we can use BiLSTM. BiLSTM uses a forward LSTM and a backward LSTM for each sequence to obtain two separate hidden states: \(\overrightarrow{{h}_{t}},\overleftarrow{{h}_{t}}\), and then the final output at time \(t\) is formed by concatenating these two hidden states:


Therefore, the final output of the BiLSTM layer for the input sentence \(S\) can be represented by \(H={\{{h}_{t}\}}_{t=1}^{n}\), where \(H\in {\mathbb{R}}^{n\times d}\), and \(d\) is the layer size of the BiLSTM layer.

BiLSTM is suitable for modeling time series data and can capture contextual information in texts (Siami-Namini et al. 2019). This is one of the widely accepted methods in natural language processing (NLP) tasks. In this study, the rainfall intensity classifier is based on BiLSTM in PaddlePaddleFootnote 9 (PArallel Distributed Deep LEarning, Baidu’s open-source deep learning framework). PaddlePaddle outperforms other frameworks in Chinese NLP tasks with the help of powerful Baidu Chinese search database (Ma et al. 2019). The training dataset is the self-built dictionary mentioned above. The loss function is cross entropy. The optimizer is the adaptive gradient (Adagrad) algorithm. The learning rate is 0.002. The batch size is 128. Section 3.2.1 discusses whether the model error caused by the subjective description of netizens has a gender bias.

2.3.2 Text Sentiment Analysis Based on BiLSTM Algorithm

Sentiment analysis is the process of interpreting the meaning or polarity of larger text units (sentences, paragraphs, articles) through the semantic composition of smaller elements. It is essentially a task of text classification or regression. Specifically, language is regarded as a digital signal, and words are represented by a word vector algorithm. Then, the traditional machine learning algorithms (such as Bayesian, random forest) or deep learning neural network algorithms are used to train vector features in the text, so as to realize the classification or regression of the text sentiment polarity (de Diego et al. 2018).

Our text sentiment analysis model was based on Senta,Footnote 10 a Chinese sentiment analysis tool of PaddlePaddle. The model was trained based on the BiLSTM algorithm and the existing SentiCrop—a Chinese sentence-level sentiment classification dataset. Then the model outputs a sentiment polarity category of Weibo text and the corresponding sentiment intensity \(x\in [\mathrm{0,1}]\). Here, the closer the \(x\) value is to 0, the stronger the negative sentiment is. The model analyzes public attitudes towards disaster occurrence and relief processes. Section 3.2.1 discusses whether the sentiment of netizens has a gender bias.

2.3.3 Spatial Information Extraction

This study involved three kinds of spatial information: user registration positions, Weibo locations, and spatial information mentioned in the text. We got the registered province and city directly from the user attributes, and used the inverse geocoding interface of Baidu Map API to convert coordinates of Weibo location into corresponding administrative divisions. Extracting spatial information from text includes toponym recognition and toponym resolution (Hu et al. 2019).

Toponym recognition refers to extracting words representing place names from text. Due to the common improper grammar and expression, misspelling, and irregular alias abbreviations in social media text, toponym recognition is a major challenge in NLP tasks (Sagcan and Senkul 2015). The smaller the granularity of spatial information extraction, the greater the error. We identified the terms of provinces, cities, and districts in the Weibo text, and completed the geographical location information according to China’s three-level administrative divisions.

Toponym resolution is the conversion of the structured address nouns into coordinates. We then used the geocoding interface of Baidu Map API to achieve toponym resolution. Through the toponym recognition and resolution from Weibo texts, we established the spatial position in which netizens were interested.

2.3.4 Text Keyword Extraction Based on Latent Dirichlet Allocation (LDA) Algorithm

Text keyword extraction is to dismantle long text into several keywords for rough screening and clustering of disaster-related text. Common text keyword extraction methods include those based on statistical features, graphic models, and topic models (Onan et al. 2016). The statistical feature-based methods, such as the TF-IDF algorithm, rely heavily on a high-quality corpus related to the target sample set for training. The graphic model-based methods, such as the PageRank algorithm, have high computational complexity and are not widely used. The topic model-based methods, such as the Latent Dirichlet Allocation (LDA) algorithm (Jelodar et al. 2018), are statistical models that extract keywords from implicit semantic structures of documents in an unsupervised way. The LDA algorithm maps words into a semantic space by fitting the word-document-subject distribution based on the analysis of word cooccurrence information.

LDA assumes that each document can be expressed as the probability distribution of latent topics, and that topic distribution in all documents shares a common Dirichlet prior. Each latent topic in the LDA model is also represented as a probability distribution over words. The word distributions of topics share a common Dirichlet prior as well. Given a corpus \(D\) consisting of \(M\) documents, document \(d\) having \({N}_{d}\) words \((d\in 1,\dots ,M)\), the number of topics \(T\), LDA models \(D\) according to the following generation process (Blei et al. 2003): Choose a multinomial distribution \({\varphi }_{t}\) for topic \(t(t\in \{1,\dots ,T\})\) from a Dirichlet distribution with parameter \(\beta\). Choose a multinomial distribution \({\theta }_{d}\) for document \(d(d\in \{1,\dots ,M\})\) from a Dirichlet distribution with parameter \(\alpha\). For a word \({w}_{n}(n\in \{1,\dots ,{N}_{d}\})\) in document \(d\), select a topic \({z}_{n}\) from \({\theta }_{d}\), and a word \({w}_{n}\) from \({\varphi }_{zn}\).

In the above generative process, words in documents are the only observed variables while others are latent variables (\(\varphi , \theta\)) and hyper-parameters (\(\alpha , \beta\)). The probability of observed data \(D\) was calculated and obtained from a corpus as follows:

$$p(D|\alpha ,\beta )=\prod_{d=1}^{M}\int p({\theta }_{d}|\alpha )\left(\prod_{n=1}^{{N}_{d}}\sum_{{z}_{dn}}p({z}_{dn}|{\theta }_{d})p({w}_{dn}|{z}_{dn},\beta)\right)d{\theta }_{d}$$

LDA is a distinguished tool for latent topic distribution for a large corpus. To extract the hot topics from netizens, we used an LDA algorithm to explore the deep meaning of texts, that is, the topic of the texts. Specifically, we used lda_webpageFootnote 11—a Baidu self-built web page dataset—to create a dictionary and used an LDA algorithm to find the first (no more than) 10 keywords of each Weibo text and their corresponding similarity of lda_webpage dataset topics.

3 Results

This section presents the findings based on the risk perception-oriented regional disaster system theory, including the degree of hazard and disaster, the social and geographical environments, and the variation of population groups that jointly affect the typhoon risk perception.

3.1 The Degree of Hazard and Disaster

This section verifies the feasibility of using social media data for typhoon risk perception analysis according to the high correlation between Weibo posts and hazard intensity as well as elevated disaster damage.

3.1.1 Large Number of Weibo Posts in the Most Typhoon-Stricken Areas

Based on Weibo locations, we analyzed the spatial distribution (Fig. 4) and spatiotemporal evolution of Weibo locations (Fig. 5). The results suggest that there is reasonable consistency between public concern and hazard intensity.

Fig. 4
figure 4

Spatial distribution of typhoon Lekima-related Weibo posts

Fig. 5
figure 5

Changes in daily Weibo amount rankings in the major affected areas of Lekima

From the overall spatial distribution of Weibo locations (Fig. 4), we found that Weibo locations are consistent with the spatial distribution of typhoon impact. From 8 to 13 August 2019, the overall point density distribution of Lekima-related Weibo posts indicates that most of the Weibo locations are concentrated in the southeast coastal area. The number of Weibo posts in typhoon-stricken areas is significantly larger than posts in other areas.

From the changes of Weibo posts’ amount rankings in some typhoon-stricken regions (Fig. 5), we found a high conformity between the spatiotemporal evolution of Weibo posts’ popularity and that of Typhoon Lekima. The netizens in Guangdong, Fujian, Zhejiang, Shanghai, Anhui, Jiangsu, Shandong, Hebei, and Liaoning paid high attention to this typhoon event. Before Lekima landed, netizens in Guangdong, Fujian, and Zhejiang paid close attention to the typhoon event, which may be related to local weather warnings and safety publicity. With the northward movement of the rain belt, the number of Weibo posts from the provinces and cities on the typhoon tracks also increased, and the heat of Lekima-related Weibo in the southeastern region remained high, especially in Zhejiang Province. Strongly typhoon-stricken areas have many Weibo posts before, during, and after the disaster.

3.1.2 High Spatial Matching between Weibo Posts and Severe Disaster Impacts

We analyzed the correlation between the number of Lekima-related Weibo posts and disaster damage statistics and socioeconomic data (Table 1). These data covered nine provinces (municipality) including Zhejiang, Shanghai, Jiangsu, Shandong, Anhui, Fujian, Hebei, Liaoning, and Jilin. The number of Weibo posts has the highest correlation with the affected population (r = 0.8940, p = 0.0011), followed by direct economic losses (r = 0.8920, p = 0.0012) and the number of houses moderately damaged (r = 0.8791, p = 0.0018). We concluded a high correlation between Weibo posts and disaster damage statistics.

Table 1 Correlation between the Weibo posts and disaster damage and socioeconomic statistics in nine provinces (municipality) of eastern China

3.2 Social and Geographical Environments

This section investigates the impact of social and geographical environments on public risk perception at the provincial and county level, and also verifies the feasibility of using social media data for risk perception analysis.

3.2.1 No Clear Correlation between Weibo Posts and Socioeconomic Factors

Figure 4 shows that the Lekima-related Weibo posts were concentrated in the typhoon-affected areas. Table 1 shows that, at the provincial level, some socioeconomic factors, such as population, GDP, and grain sown area, are weakly correlated with the number of Weibo posts, which indicates that spatial heterogeneity of the population and economic factors may have little effect on the Lekima-related Weibo posts.

We took Zhejiang Province, one of the most severely affected provinces, as an example to compare the number of Weibo posts and economic indicators at the county level. The localized Weibo posted from Zhejiang were most intensive, with a total number of 20,946. The Moran’s I index of 0.76 for the Weibo post locations shows a clear spatial clustering phenomenon. We integrated the main socioeconomic indicators in 2019 at the city and county levels in Zhejiang Province, and obtained 62 effective data after data processing. Similarly and interestingly, we found that there was no significant relationship between the number of Weibo posts and these economic indicators, including land area, resident population, and per capita GDP. Thus, we speculated that the spatial distribution of disaster-related Weibo posts was not affected by socioeconomic factors.

3.2.2 Higher Geographical Environment Sensitivity, Higher Attention

The spatial distribution of Lekima-related Weibo posts in Zhejiang Province differed from the typhoon track (Fig. 6). It was noted that Weibo users living at lower altitudes or in areas with larger proportions of waterbodies were more concerned about typhoon disasters (shown as red dots in Fig. 6). The words “flooded (被淹, 积水)” and “flood (洪涝, 洪水)” appeared frequently, possibly because people in the low altitude areas or along the rivers generally paid more attention to flood risk. This indicates that public perception was consistent with the risk in their neighborhoods.

Fig. 6
figure 6

Spatial distribution of Lekima-related Weibo posts in Zhejiang Province, China. DEM digital elevation model

3.3 Variation among Population Groups

In this section, we take gender and residency as examples to illustrate the typhoon risk perception of different population groups.

3.3.1 Gender Variation

During Typhoon Lekima, the proportion of Weibo posts by male and female users was about 1:2 (24,140 by males, 45,343 by females, and 452 without user gender information). Compared with the data from Weibo User Development Report 2020Footnote 12 where the ratio of male and female users was 54.6:45.4, the difference in male and femal ratio indicates that female users were more willing to voice their concerns about this typhoon disaster on Weibo than were male users. One possible reason is that females are more vulnerable to disasters (Ginige et al. 2009), and they tend to be the primary caregivers of families. They are possibly more sensitive about disasters also because they care more about the disaster event’s livelihood impact. This finding is consistent with previous studies in which females are more sensitive to negative events on social media than males (Zheng et al. 2019; Aak et al. 2020).

The overall accuracy of the rainfall intensity classification model fluctuated around 75%, and the cross entropy loss is around 0.9 (Fig. 7). Further, we tested the rainfall intensity classification model and the text sentiment analysis model for the two gender groups, respectively (Table 2). We found that males showed slightly higher negativity in their Weibo posts than females, while females’ hazard descriptions were slightly closer to the reported meteorological observation data. The univariate variance analysis results show that the negative sentiment expression (F = 28.32794447, P = 1.02722E−07) and the hazard intensity description (F = 488.4877355, P = 1.0528E−07) of the two genders were significantly different (passed the significance test of 1% level). This is a direction worth studying in the field of typhoon disaster risk perception.

Fig. 7
figure 7

BiLSTM-based rainfall intensity classification model performance

Table 2 Gender differences in text sentiment analysis and rainfall intensity classification models

3.3.2 Residency Variation

We obtained the user locations, user registration positions, and the text mentioned positions by spatial information extraction described in Sect. 2.2.3. Then we counted the number of nonempty data and the same-in-pair data (Table 3). We found that the proportion of provincial consistency is larger (all over 50%), and the proportion of municipal consistency is slightly lower.

Table 3 Comparison among locations, registration positions, and text mentioned positions

Table 3 represents two statistical values of nonempty data and the same-in-pair data in the comparison among locations, registration positions, and text mentioned positions. Of Weibo positioning provinces, 76.37% are the same as those mentioned in the text (16,390 out of 21,461), and this is 72.24% at the city level (9,589 out of 13,274). These two ratios illustrate that most users are concerned about the disaster information in their neighborhoods. For the Weibo texts whose locations and text-mentioned positions are inconsistent, most of the contents are related to the status of a specific extreme disaster-stricken area or travel destination.

We defined a user as a “resident” if the user registration city is the same as the user location. Lekima-related Weibo locations cover 324 cities, totaling 63,133 posts. Among them, 36,793 Weibo posts contain user registration positions and 19,593 were from residents, which account for 53.25%. The number of Weibo posts in different regions varied greatly, so we took the natural logarithm of the number of Weibo posts as the Weibo popularity. The average popularity of residents’ Weibo posts was 2.27 and that of nonresidents’ Weibo posts was 2.04. We compared the difference of residents’ and nonresidents’ Lekima-related Weibo posts at the city level. Considering the three cases of all areas, typhoon-prone areas, and areas not prone to typhoons respectively, the univariate variance analysis results of the two residency groups failed the significance test. It shows that the difference in Weibo popularity between residents and nonresidents was not obvious in this event. Similar results appeared in some urban flood studies (Wang et al. 2020). Thus, we suggest that the distinction between locals and nonlocals may not be critical in the analysis of public responses towards typhoons on Weibo.

We extracted the keywords for Weibo posts located in Zhejiang Province. Then we found that nonresidents used more words such as “hotel (酒店),” “outage (停运),” “high-speed railway (高铁),” and “on the road (路上)” than residents, indicating that nonresidents pay more attention to travel. This typhoon greatly disrupted travel-related activities, so nonresidents with travel needs were more sensitive to its impact. This result is consistent with the behavior of the public, which shows the feasibility of using Weibo data in disaster assessment.

4 Discussion

This section summarizes the dynamic characteristics of social media-based typhoon risk perception, then discusses some implications of the results for typhoon risk communication and management, and gives future research directions based on the limitations of this work.

4.1 Dynamic Typhoon Risk Perception

We attempted to understand how people perceive typhoon risks based on regional disaster system theory. Our case study of Typhoon Lekima found that the public risk perception in typhoon disasters has dynamic characteristics in different times, regions, and populations.

  1. (1)

    Public risk perception is closely related to the degree of hazard and disaster. The overall spatiotemporal distribution of Lekima-related Weibo posts is significantly clustered and highly correlated with typhoon processes (Fig. 4). The provinces and cities that Weibo users are located in are consistent with the daily typhoon-affected areas, and the number of Weibo posts generated in these areas is significantly higher than that in other areas (Fig. 5). Zhejiang and Shandong Provinces are where Lekima landed. These two provinces had the highest Weibo post activities. The number of Weibo posts has a high correlation to the disaster loss data (Table 1).

  2. (2)

    The geographical, but not social, variables could better explain the spatial heterogeneity of Weibo interests. The population density, GDP, and other socioeconomic factors are not the main influencing factors of the spatial distribution of Lekima-related Weibo posts (Table 1). In previous studies, some researchers found that the number of social media users in a region is highly correlated with the level of socioeconomic development (Khare et al. 2020). Our results remind us that this conclusion may not be universal. Therefore, we suggest that before using social media data for disaster analysis, scholars should first examine the relationship between social media data and socioeconomic indicators. More importantly, our results show that the Weibo data exactly reflect the impact of typhoon disasters.

  3. (3)

    The disaster-related social media data from different groups indicate biases. From the perspective of gender, female users’ overall response to disasters is more sensitive and objective. First, the number of Weibo posts shows that females are more active in online discussions related to typhoon disasters than males. Second, there is a gender bias in errors in the text sentiment analysis and rainfall intensity classification. Males have a slightly higher negative sentiment than females, and females’ descriptions of rainfall intensity are closer to observations (Table 2). From the perspective of geographical environment, people living in typhoon-affected areas at low altitudes or near waterbodies pay more attention to typhoon disasters (Fig. 6). It indicates that the perception of Weibo users is consistent with local risks, and further demonstrates the feasibility of retrieving disaster risk information from Weibo data. From the perspective of residency, most people are concerned about their local disasters (Table 3), and there is no significant difference in being a resident or not for the focus on typhoon disasters, but nonresidents are generally more concerned about the travel impact caused by typhoon rains. It is worth noting that user registrations, positions mentioned in the text, and locations usually do not exactly match, but most of the existing social media-based natural hazard and disaster research only used one or two sets of the spatial information. In practical application, one local data source is used instead of the other because of data acquisition difficulty, such as regarding the user registration as a Weibo location. However, our study shows that these biases cannot be neglected.

This study revealed the crowd sensitivity variations of social media with regard to typhoon perception, which are reflected in three aspects: loss possibility, gender sensitivity, and travel needs. The confirmed feasibility of using social media data indicates the potential of this new data source in risk perception research.

4.2 Implications

Social media platforms have a wide range of users, fast information production and dissemination, and are not subject to geographical constraints (Ilieva and McPhearson 2018). It is feasible to obtain disaster information and analyze public perception based on Weibo data. Social media conveys the public disaster risk perception and can reflect the occurrence and evolution of a disaster at multiple spatiotemporal scales. Based on the previous discussion, we verified the feasibility of using social media in typhoon disaster risk perception analysis, and emphasized that the crowd sensitivity variation cannot be ignored in risk communication. We have attempted to advise what risk management strategies governments can implement to improve risk perception and reduce the negative consequences of typhoons.

The typhoon disaster risk perception based on social media data indicates that disaster emergency management should be strengthened by an enriched understanding of differences in targeted public groups. Gender, residency, and geographical environment tend to be important sociodemographic predictors of risk perceptions. First, the higher sensitivity and accuracy of females’ typhoon risk perception suggest that scholars or policymakers should consider a more important role for females in disaster risk reduction, especially in early warning and evacuation. Social media platforms could also promote more targeted disaster information for male users to increase their disaster awareness. Second, there is no obvious difference in the attention to disasters caused by residency differences, but nonresidents have a lower cognition level of local vulnerability (Ushiyama et al. 2009), so they are a relatively higher risk group. Moreover, nonresidents with travel needs were more sensitive to typhoon impacts. Social media platforms could supplement and publicize more local environmental information and updated travel-related disaster news for nonresidents. Third, exposure affects public risk perception. People in areas with high exposure to typhoons are more concerned about the typhoon and secondary disasters such as floods. Disaster information dissemination for this group could be updated more intensively. We advocate individualizing disaster publicity and risk communications to improve the effectiveness of emergency management.

4.3 Future Research Directions

Several extensions to the methodology are anticipated in future work. In this work, the social media data come only from Sina Weibo during Typhoon Lekima. Subsequent studies could consider data from other social media platforms and disaster cases, and would make the model performance more robust.

We also recognize that the current rainfall intensity classification method does not directly map social media text to artificially labeled hazard or disaster intensity. We use discrete meteorological observation data as the classification label. There exist a few Chinese terminology thesauri, social media corpora, and sentiment analysis dictionaries in the field of typhoon disasters or emergency management. The accuracy of classification using existing datasets in other fields directly may not be high. Therefore, researchers mostly collected social media texts based on several event cases and labeled them manually, which was time-consuming and laborious. It is also susceptible to specific hazard vocabulary, such as typhoon names, which can lead to the over-fitting of models. Based on the principle of space-time proximity, our model directly matches Weibo text to meteorological observation data in batches without considering the influence of individual vocabulary weights in the text, which makes the classification efficient, and avoids subjective errors in manual labeling. But this method may not directly map the relationship between independent and dependent variables.

The proposed method is a comprehensive feasibility and crowd sensitivity variation analysis approach of social media in typhoon risk perception. The formation of crowd sensitivity on social media is a multidimensional issue. Crowd sensitivity varies among population groups. Due to the limited fields available in the Weibo API, this study excavates information from the fields of user gender, user registry, Weibo location, text content, and so on. In future research, we can further analyze the mechanism of Weibo information transmission, public opinion control measures on social media, and subjective error causes of different genders.

5 Conclusion

A new framework for the feasibility and crowd sensitivity variation analyses of using social media data in typhoon risk perception is proposed based on the regional disaster system theory. A comprehensive assessment result is accomplished with the availability of multisource data and machine learning techniques.

In this study, we obtained nearly 70,000 Weibo texts with the location and user attribute information during super typhoon Lekima by Sina Weibo API, extracted spatial information in each text by combining toponym recognition technology and Baidu Map API geocoding, and constructed several models for keyword extraction, rainfall intensity classification, and sentiment analysis based on LDA and BiLSTM algorithms. We used various machine learning techniques to extract hidden information from Weibo texts, breaking through the limitation of insufficient information channels.

Our study is the first attempt to understand the linkages among typhoon risk perceptions of Weibo users, gender differences, residency vulnerability, and geographical environment sensitivity. These findings enlighten us to the need to individualize disaster information dissemination rather than emphasize general broadcasting. Targeted disaster publicity and communications with localized risk information will be powerful tools to improve emergency management and implement disaster risk reduction strategies. This study provides evidence and technical support for the effective utilization of typhoon-related social media information in disaster risk perception research.