Predicting gasoline shortage during disasters using social media
 110 Downloads
Abstract
Shortage of gasoline is a common phenomenon during onset of forecasted disasters like hurricanes. Prediction of future gasoline shortage can guide agencies in pushing supplies to the correct regions and mitigating the shortage. We demonstrate how to incorporate social media data into gasoline supply decision making. We develop a systematic approach to examine social media posts like tweets and sense future gasoline shortage. We build a fourstage shortage prediction methodology. In the first stage, we filter out tweets related to gasoline. In the second stage, we use an SVMbased tweet classifier to classify tweets about the gasoline shortage, using unigrams and topics identified using topic modeling techniques as our features. In the third stage, we predict the number of future tweets about gasoline shortage using a hybrid loss function, which is built to combine ARIMA and Poisson regression methods. In the fourth stage, we employ Poisson regression to predict shortage using the number of tweets predicted in the third stage. To validate the methodology, we develop a case study that predicts the shortage of gasoline, using tweets generated in Florida during the onset and post landfall of Hurricane Irma. We compare the predictions to the ground truth about gasoline shortage during Irma, and the results are very accurate based on commonly used error estimates.
Keywords
Social media analytics Gasoline shortage prediction modeling Disaster management Hybrid loss function Hurricane Irma1 Introduction
Top five disasters by tweet distribution in five metropolitan areas in 2015
Disaster type  Num. of tweets  Percentage (%)  Related keywords 

Earthquake  114,428  53.91  haiti, nepal, america, ene, california, julian california, wnw, japan, ssw, united states, earthquake, ese, magnitude, italy 
Hurricane  29,098  13.71  wind, united states, atlantic ocean, alcohol, europe, work, rain, hit, school, america, people, storm 
Drought  17,114  8.06  california, louisiana, lips, sacrifice, last year, COP21, poor people, lush, boreal forests, syria, ethiopia, maryland 
Tornado  14,398  6.78  western, working, stay, phone, california, area warning, basement, work, county, canada, shelter, rotation 
Recently, social media is transforming the way people communicate not only in daily lives, but also during disasters. There is a surge in usage of social media during an emergency in the affected regions. Nowadays, many people are willing to share the disaster information through social media. One evidence is given in Table 1 which contains top five disasters (by type) discussed in tweets. We collected these tweets from four major metropolitan areas during the year 2015. More than 180,000 tweets are found to discuss top five types of disasters, collected from four major US cities in 2015 only. Public uses social media to communicate, seek information, raise concerns and express sentiments, and responders use it to plan and communicate important messages to the public (Lachlan et al. 2014; Liu et al. 2016; Panagiotopoulos et al. 2016; van Gorp et al. 2015). As a result, there is a keen interest in employing social media for disaster management. For example, social media has been used to build a mass communication channel, in order to inform large numbers of stakeholders at once (Ki and Nekmat 2014; Stříteskỳ et al. 2015; Utz et al. 2013). Social media can also aid in decision support systems and emergency management processes by utilizing the enormous amounts of realtime data it generates (Gaynor et al. 2005; Boulos et al. 2011). Consequently, multiple social media data analysis techniques have been developed in the context of a disaster, ranging from tools for event detection, prediction and warning; impact assessment; situation awareness; disaster tracking; and response planning.
“The shelters are full, there is no gas. Tornados could happen, and storm surge is predicted. So what are people supposed to do? Irma”
“Insane..95 percent of Florida trying to leave at one time. Roads r slammed. No gas. No hotels available. Scared to see my neighborhood after irma”
The natural question that arises in such a scenario is if Twitter posts be used to sense current shortage and forecast future shortages? There are two main challenges related to this question:“Gas stations out of gas, water shelves empty, stores and airports closed. Stocked up on food and wine, waiting on irma”

Challenge C1—how to identify tweets about shortage Social media data, especially from twitter, is difficult to process and classify as it is unstructured, noisy and contains a plethora of information (large number of tweets). Also, a single tweet contains a maximum of 140 characters, is informal and contains abbreviations and spelling mistakes. Interpreting the semantics of such a short message and classifying it is a hard problem. There are methods in the literature that have classified tweets generated during crisis into caution/advise, information source, people, casualties and damage (Imran et al. 2013), predisaster or postdisaster (tweet4act) (Chowdhury et al. 2013), tweets reporting casualty or damage (Tweedr) (Ashktorab et al. 2014), information, preparation and movement (Stowe et al. 2016). However, classifying tweets for a specific problem like identifying gasoline shortage has never been done. Identifying important features for this classification task is a novel and unique question. If these issues are resolved, then one can identify tweets about gasoline shortage and treat them as sensors for shortage.

Challenge C2—forecasting the spatiotemporal shortage from tweets The spatiotemporal distribution of tweets about shortage is not equivalent to the spatiotemporal shortage distribution. Spatial and temporal lag between the origin of the shortage and the tweet about shortage is an uncertain quantity. This makes forecasting shortages using tweets a challenging problem.

Building of a classifier that identifies tweets about gasoline shortage from the corpus of all the tweets generated in the affected area.

Discovering that the arrival of tweets about gasoline shortage follows a Poisson distribution.

Developing a hybrid loss function method (HLF) that forecasts the number of tweets about gasoline shortage.

Developing a fourstage gasoline shortage prediction methodology which takes tweets generated on a day in an affected city as input and generates the number of stations that will be out of gas on the next day as the output.

Model validation with a case study based on Hurricane Irma, which contains around 1 million tweets, hurricane path data and ground truth about gasoline shortage.
2 Related work
There is a surge in the use of social media during crisis as stated in Sect. 1. Social media Web sites like Facebook and Twitter have started playing a major role in disaster management such as post Japan Tsunami in 2011 (Kaigo 2012) and US Hurricane Sandy in 2012 (Hughes et al. 2014). Other Internetbased social applications like Waze (Waze 2017) and GasBuddy (Gasbuddy 2017b) have also set up specialpurpose services to allow individuals to participate and report the availability of various resources (e.g., gas stations) via the Web or smartphones. These services were used by a large population after Hurricane Irma and Sandy. Apart from these cases of direct applications of social media during disasters, surveys by Imran et al. (2013) and Nazer et al. (2017) provide evidence that there has also been an uptick in research interest in the development of social media data analysis techniques in the context of disaster management. These techniques can be categorized into: (1) data extraction and filtering, (2) event detection and impact assessment and (3) response planning and relief delivery (Nazer et al. 2017). Some papers address multiple issues that cross these boundaries. However, for ease of reading, we have categorized them as indicated above.
Data extraction and filtering Social media data are gigantic, noisy and unreliable. To obtain the posts that contain relevant information, posts are either extracted on the basis of important keywords (Imran et al. 2013; Starbird and Stamberger 2010; Olteanu et al. 2014) or using geolocation (Morstatter et al. 2014; Cheng et al. 2010; Han et al. 2013; Schulz et al. 2013). The posts extracted using the aforementioned techniques often contain rumors and spam and cannot be trusted as pointed out by Mendoza et al. (2010). Although rumor and spam detection is a hard problem, few methods have been successful in particular cases in recent times (Gupta et al. 2013; Sampson et al. 2015).
Event detection and impact assessment Twitter has been shown to have a potential for earthquake detection and act as an early warning system by using tweets as sensors. (Sakaki et al. 2010; Faulkner et al. 2011). Apart from this, there are some event detection and impact assessment methods that include sentiment analysis methods (Beigi et al. 2016; Caragea et al. 2014) and language change methods (Atefeh and Khreich 2015; Cordeiro and Gama 2016).
Response planning and relief delivery Response planning requires situation awareness for which there are classifiers that classify posts into caution/advise, information source, people, casualties and damage (Imran et al. 2013), predisaster or postdisaster (tweet4act) (Chowdhury et al. 2013), tweets reporting casualty or damage (Tweedr) (Ashktorab et al. 2014), information, preparation, movement etc. (Stowe et al. 2016), into userdefined categories (AIDR) (Imran et al. 2014). For response delivery, there are tools crowdsourcing communities like Digital Volunteers (translation of posts, geotagging, building maps of damaged region) and OpenStreetMap [OSM, for volunteer to build maps for response used effectively in Haiti earthquake (Zook et al. 2010)]. When it comes to datadriven relief delivery tools, there is Ushahidi (2017) which is a platform that maps information from different sources like Twitter, RSS feed, SMSs, manual commaseparated files to a singular map of the affected area. AIDR (Imran et al. 2014) is an endtoend data pipeline that extracts and classifies tweets for responders to assess and respond to the situation on the ground. TweetTracker is a system that tracks, analyzes and understands tweets related to specific topics. It has many functionalities and can use data from multiple social media Web site. However, it has a special module for disaster relief. It detects request for help tweets using a classifier based on ngrams and tweet metadata and shows geolocation of tweet on the map if available (Kumar et al. 2011).
Aside from the papers cited above, we identified a paper by Gu et al. (2014) which is closest to our work, and hence presented separately. Their paper develops a methodology for sensing demands of essential commodities like food, water and gasoline using data extrapolation in participatory sensing applications. Participatory sensing technologies include sources that measure the state of the point of interest and report it at a later time (e.g., on getting access to WiFi). Their paper argues that data extrapolation algorithms that rely predominantly on spatial correlations or predominantly on temporal correlations tend not to work consistently well, as the relative importance weights of temporal versus spatial correlations change significantly between periods of calm and periods of change post a disaster. Therefore, they develop a hybrid predictions algorithm combining spatial and temporal prediction methods which predicts the status of pointofinterest (POI) sites, when collected data are incomplete. Their methodology combines spatial and temporal extrapolations method for shortage prediction. We tackle this issue by combining temporal extrapolations with predictions using other factors related to the disaster (like hurricane path, days from arrival) to improve the accuracy of shortage prediction. For this, we fuse the ARIMA method with Poisson regression method using a hybrid loss function. As far as application is concerned, their methodology does a fine grain prediction at the level of POI and our methodology is suitable for making predictions at a city level.
As evident, the literature on social media datadriven disaster management is plentiful. However, there is limited literature that proposes to use social media assessing the demand and shortage of essential commodities in the affected population during a disaster. Our work addresses this research gap. Our methodology provides a means to assess shortage of commodities and can be used to prepare, preposition and redirect supplies before a disaster.
3 Data description
Summary statistics of tweet data
Summary statistic  Values 

Number of tweets collected  1,048,575 
Number of unique twitter users  111,801 
Period of data collection  September 6–15, 2017 
Date of Irma landfall in Florida  September 9, 2017 
Number of tweets prior to Irma landfall in Florida  456,530 
Number of tweets during Irma in Florida  151,792 
Number of tweets post Irma in Florida  440,253 
Number of gasrelated tweets before landfall  2,805 
Gas shortage and hurricane prediction for different cities in Florida
City  Date  Proportion of gas stations without gas  On hurricane path  Inside 3day cone  Inside 5day cone  Days to arrival  Watch/warning  Wind speeds (mph) 

Gainesville  09/07/17  0.58  y  n  y  4  n  175 
Jacksonville  09/08/17  0.31  n  y  y  3  n  155 
Miami  09/07/17  0.42  y  y  y  3  Watch  175 
Orlando  09/08/17  0.35  y  y  y  3  Watch  155 
Tallahassee  09/08/17  0.46  n  n  y  3  n  155 
Tampa  09/06/17  0.3  n  n  y  5  n  185 
Naples  09/07/17  0.54  n  y  y  3  Watch  175 
4 Methodology
4.1 Stage 1: tweet filtering and creation of tweet corpus and document–term matrix
Stage 1 has two steps, tweet filtering to generate gasolinerelated tweets and creation of a tweet corpus and a document–term matrix.
4.1.1 Tweet filtering
A small sample of the document–term matrix
Terms  

Docs  can  gas  get  got  hurricaneirma  irma  just  line  station  water 
1195  0  2  0  0  0  0  0  0  2  0 
1433  0  2  0  0  0  1  0  0  0  0 
267  0  2  1  1  0  0  0  1  0  0 
272  1  1  0  0  0  0  0  0  0  1 
298  0  0  0  0  0  0  0  0  0  0 
408  0  1  0  1  0  0  1  0  0  0 
443  0  1  0  1  0  0  1  0  0  0 
556  0  1  0  0  0  1  0  0  0  1 
680  0  1  0  0  0  0  2  0  1  0 
901  0  1  0  0  0  1  1  0  0  1 
Total  1  12  1  3  0  3  5  1  3  3 
4.1.2 Corpus and document–term matrix generation
For any text mining application, there is a need for a framework of managing and manipulating heterogeneous text documents. (In our case, it is tweets.) The conceptual entity which provides this functionality is a text corpus which is a collection of the text documents being analyzed. According to Meyer et al., “It represents a collection of text documents and can be interpreted as a database for texts. Its elements are TextDocuments holding the actual texts and local metadata” (Meyer et al. 2008). In our application, a text corpus (tweet corpus) is created using the tm package in R (Feinerer 2008). We note that, in our case, one tweet is equivalent to one text document of the corpus. From the corpus, stopwords (common words which have little or no value in classification, e.g., “the,” “and” and “a”), cursewords and numbers are removed. Words are reduced to their stemwords, e.g., words like “going” and “gone” are converted to “go.” Next, a term–document matrix is exported from the tweet corpus. Table 4 shows a sample from the term–document matrix of our case study. The document ID (Tweet ID) represent rows and terms/words represent the columns. The matrix elements are term frequencies. For instance, the term “gas” has been used twice in the tweet with Tweet Id \(=\) 1195.
4.2 Stage 2: classification to identify gasoline shortage tweets
To classify the tweets as “gasoline shortage,” first, they are manually annotated. Next, a SVM classifier is trained on a training set that classifies tweets using two kinds of features: unigrams and abstract topics. In the following subsections, we describe unigrams and latent topics in depth.
4.2.1 Finding important unigrams
In the field of computational linguistics, an ngram is a sequence of n items (phonemes, syllables, letters, words or base pairs depending on application) from a given sample of text or speech. An ngram of size one is called a unigram. In case of classification of text or documents, generally, words are treated as unigrams. In our application of tweet classification, we consider a word of the tweet as a unigram. We use unigrams as features as they have predicted power for the classification task. For instance, if a tweet contains words like “gas,” “gasoline,” “shortage,” “no,” “long,” “line,” it is likely the tweet is about “gasoline shortage.”
However, all the unigrams present in a tweet do not have predictive power. Hence, it is important to remove the less important terms in a tweet. There are two known ways in text mining to do this, using the measure of term frequency (tf) or the measure of term frequency–inverse document frequency (tf–idf). The number of times a term occurs in a document/tweet is called its tf. The elements of the document–term matrix in Table 4 are tfs. On the other hand, inverse document frequency (idf) is a measure of how much information the word provides, i.e., if it is common or rare across all documents. It is the logarithmically scaled inverse fraction of the documents that contain the word (obtained by dividing the total number of documents by the number of documents containing the term). So a high term frequency–inverse document frequency (tf–idf) is reached by a high term frequency (in the given document) and a low document frequency of the term in the whole collection of documents. In our case study, we found that using tf provided us better classification accuracy than tf–idf. This is because terms like “gas,” which were very common across all documents, have small tf–idf values.
We now present the process of filtering using tf. Suppose we define important unigrams as words that have been used at least ten times in the tweet corpus. This value “10” is our threshold. Suppose Table 4 shows the term–document matrix we exported from the corpus. Each element in the matrix is the tf of the term (represented by the column) in the tweet (represented by the row). Last row in Table 4 shows the total usage of each term in the matrix (sum of term frequency). Recall, we chose threshold to be 10. Now, “gas” is the only important unigram as it is used 12 times in the matrix. However, if we reduce the threshold to 3, the important unigrams are “got”, “gas”, “irma”, “just”, “station” and “water.” As the threshold increases, the number of important unigrams decreases.
4.2.2 Finding important topics
The other set of features is abstract topics that exist in the “gasoline shortage” tweet corpus. For example, in multiple instances, people were tweeting “to inquire which gasoline stations had gas.” In another topic, people were “complaining about long lines at the gas stations.” These topics, if identified, could have high predictive power and could be used as features for predicting tweets about gasoline shortage. Therefore, in our methodology, we use topic models to identify hidden and abstract topics in our tweets.
An empirical comparison by Lee et al. shows the advantages and limitations of four different topic models, namely latent semantic analysis (LSA), probabilistic latent semantic analysis (PLSA), latent Dirichlet allocation (LDA) and correlated topic models (CTM) (Lee et al. 2010). Their comparison showed that LDA and CTM outperformed the other two techniques. LSA works well for unique and distinctive topics, and PLSA works well in identifying a single topic in document. Therefore, we modeled our tweets using LDA and CTM. LDA is a Bayesian mixture model which assumes that topics are not correlated (Blei et al. 2003). CTM eliminates the correlation assumption in LDA (Blei et al. 2007). We used the R package “topicmodels” for the implementation of LDA and CTM (Hornik and Grün 2011).
Since LDA and CTM are Bayesian models, we need to use Bayesian inference for parameter estimation. For estimation of CTM parameters, the package “topicmodels” uses the variational expectation minimization (VEM) algorithm, while for LDA both VEM and Gibbs sampling are available (Hornik and Grün 2011). This package currently provides an interface to the code for fitting an LDA model and a CTM with the VEM algorithm as implemented by Hoffman et al. (2010) and to the code for fitting an LDA topic model with Gibbs sampling written by Phan et al. (2008). Both VEM and Gibbs sampling provide approximate estimates. VEM is a deterministic method that converges faster but has higher bias in its estimate (Wainwright et al. 2008). Gibbs sampling is a Markov chain Monte Carlo sampling method (stochastic) which is computationally expensive but its bias and variance approach zero as you draw more samples (Geman and Geman 1987). The parameter \(\alpha \) in LDA model is estimated by default in both VEM and Gibbs sampling methods of topicmodels package. Its starting value is kept at 50 / k, where k is the number of topics, as suggested by Griffiths et al. (2004). However, there is an option of fixing the value of \(\alpha \) as 50 / k.
 1.
LDA 1 (estimation using VEM),
 2.
LDA 2 (estimation using VEM with a fixed \(\alpha \) parameter),
 3.
LDA 3 (estimation using Gibbs sampling),
 4.
CTM.
4.2.3 Model selection
In the previous two subsections, we explained how to filter out important unigrams and topics. In this subsection, we explain how model selection was performed (i.e., selecting the best set of unigrams and topics for classification). We used the standard technique in which we divided our tweet data into training and testing datasets. We trained multiple models with different sets of unigrams and topics and measured their performance using the F1 score measure. F1 score is the harmonic mean of precision and recall. In binary classification, “precision” is the number of true positives divided by the total number of true and false positives. In our application, it measures the fraction of tweets that were correctly classified as gasoline shortage tweets. “Recall”, also known as “specificity,” is defined as the number of true positives divided by the sum of true positives and false negatives. In our application, it measures the fraction of tweets that are correctly classified out of the tweets which were originally about gasoline shortage. In most classification problems, there is often a tradeoff between “precision” and “recall.” When one tries to increase precision, “recall” decreases and vice versa. Since F1 score is the harmonic mean of both, it achieves a high value only when both “precision” and “recall” are reasonably high. Therefore, in our case study in Sect. 5 we compare the F1 score of different classifiers with different sets of unigrams and topics to find the best classifier for our application.
4.3 Stage 3: forecasting gasoline shortage tweets using a hybrid loss function (HLF)
In this stage, we aggregate the number of tweets about gasoline shortage for each city. Upon a Poisson regression analysis, we found that using gasoline shortage tweets identified in the previous stage shows us that the number of tweets about shortage (in a day, in a city) is a good predictor of the amount of shortage, i.e., the number of stations out of gasoline. Table 8 in our case study shows that the number of tweets along with other variables is statistically significant predictors of the number of stations out of gas. We can use this methodology to predict future shortage if we could forecast the number of tweets about gasoline shortage. This motivates the need to forecast the number of tweets about gasoline shortage.
We explored three methods to forecast the tweets, namely (a) a Poisson regression model, (b) timeseries models like ARIMA and SARIMA and (c) regression model with hybrid loss function that we developed to combine the properties and results of the Poisson regression and timeseries models. In the following subsections, we explain the motivation and details of the three methods. The results of the model selection (for each type of method) on the Irma data are discussed in Sect. 5. Furthermore, in Sect. 4.3.4, the model we describe the model selection methodology meant for selecting the best procedure among the three.
4.3.1 Poisson regression model
We analyzed the hourly and daily arrival of tweets in all the cities and found the distributions to be Poisson. The details of this analysis are given in Sect. 5. Thus, Poisson regression was a candidate method. The results in Table 8 confirm that Poisson regression allows us to conclude that the “number of tweets” and other variables (about hurricane path) can be used to predict the number of stations out of gas. This motivated us to explore that whether the number of tweets on the next day could be predicted using Poisson regression and variables like number of stations out of gas. We detail these results in Sect. 5.
4.3.2 Timeseries models
Natural candidates for forecasting gasoline shortage tweets are timeseries models like ARIMA and SARIMA (Brockwell et al. 2002). We now provide some background for these models. The autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. The difference is that ARMA is used when the time series is known to be stationary, whereas ARIMA is used when the data show evidence of nonstationarity. ARMA models provides a description of a (weakly) stationary stochastic process in terms of two polynomials, one for the autoregression (AR) and the other for the moving average (MA) (Box et al. 2015). ARIMA involves an initial differencing step to eliminate the nonstationarity (Brockwell et al. 2002). We used the augmented Dickey–Fuller test (Said and Dickey 1984) to test stationarity (see Sect. 5). Our results showed that the times series of the number of tweets (in 1 h) for a number of cities in Florida were stationary. For other cities, a differencing operation made the series stationary. Therefore, ARIMA models were suitable for modeling the time series of the tweets. Miami data also needed seasonality adjustment (24h seasonality), and hence, SARIMA, a version of ARIMA that models seasonality, was employed to analyze and fit the Miami data.
 1.
Model identification We ensured that the variables/differenced variables were stationary using the augmented Dickey–Fuller test. Seasonality was identified if present (seasonally differencing it, if necessary). Plots of the autocorrelation and partial autocorrelation functions of the time series were used to decide which components to use, among autoregressive, moving average, differencing and seasonality.
 2.
Model selection Multiple ARIMA models were fit to the time series and the best model was selected using the Akaike information criterion (AIC).
 3.
Parameter estimation This was done using maximum likelihood estimation.
 4.
Model checking We checked whether the selected model conforms to the properties of a stationary univariate series. In particular, we check that the residuals are uncorrelated and normally distributed using their autocorrelation functions for several lags. We further checked they were uncorrelated using the Ljung–Box test.
4.3.3 The hybrid loss function model
Our results from both Poisson regression and timeseries models reflected that there was room to improve. Poisson regression captures variations due to the number of stations out of gas and hurricane path variables. Timeseries models capture variation in the form of temporal covariances. Even though there is overlap in the explanation of variance through the two methods, it is possible that a combination of the two methods could potentially explain greater amount of variance in tweet data. This motivated us to combine the two methods. In the literature, we found multiple instances where ARIMA models were combined with other predictive algorithms. They have been combined with linear regression (Xu et al. 2016), a variety of neural networks (Zhang 2003; Tseng et al. 2002; Cadenas and Rivera 2010) and support vector machines (Pai and Lin 2005; Nie et al. 2012; Zhu and Wei 2013; Ni et al. 2017). However, there is no literature that builds a hybrid of ARIMA and Poisson regression. The application that comes closest to our work is the combination of SARIMA and SVM regression by Ni et al. which forecasts subway passenger flow using tweets (Ni et al. 2017).
4.3.4 Model selection
To select the best model for a city, the performance of the selected Poisson regression model, ARIMA model and HLF model is measured on a testing dataset. The measures of mean absolute percentage error (MAPE) and root mean squared error (RMSE) are compared. It must be noted that the training and testing data used in model selection of individual methods do not overlap with the testing data for the model selection between the three methods.
4.4 Stage 4: prediction of the gasoline shortage using the forecasted tweets
5 Case study
In this section, we describe the application of our methodology to predict the gasoline shortage in Florida during Hurricane Irma. While the landfall of Irma in Florida happened on the September 10, 2017, the shortage of gasoline in Florida was observed in multiple cities in the period September 6–15, 2017 (i.e., during onset and beyond landfall). The Web site Gasbuddy has the data about the percentage of gas stations out of gasoline on all these dates in all major cities of Florida (Gasbuddy 2017a). We use this data as ground truth about shortage to validate our findings.
We accessed more than 1 million tweets from Florida during this period, and the details of the tweet data are presented in Sect. 3. The National Hurricane Center (National Hurricane Centre 2017) Web site provided the data about the Hurricane path which we used as features and predictors in our model.
In stage 1, we filtered out gasolinerelated tweets from the corpus of 1 million tweets. For this, we combined the tweets curated from hashtag and tweet search to filter down to 4070 relevant gasolinerelated tweet. The hashtags and words that we found using the regular expressions included gasoline, gas, gasinmiami, gaspricefixing, gasstation, gasservice, gastateparks, gasshortage, gasoil, gastation, gaswaste, nogas, outofgas, findgas. We also applied the data cleaning procedure described in Sect. 4.1.
Perplexity measure of different topic models for different number of topics on test data
Number of topics  Perplexity (CTM)  Perplexity (LDA 1)  Perplexity (LDA 2)  Perplexity (LDA 3) 

2  2.70E\(+\)36  618,944.7  620,294.5  656.9755 
4  2.60E\(+\)36  621,846.1  631,295.4  619.1762 
5  2.57E\(+\)36  622,815.2  637,068  609.0368 
8  2.51E\(+\)36  625,236.7  655,194  591.0772 
10  2.48E\(+\)36  626,552.5  667,867.9  583.622 
12  2.45E\(+\)36  627,872.7  680,960.5  578.05 
15  2.41E\(+\)36  629,781.3  701,214.6  580.2036 
20  2.39E\(+\)36  632,543.1  638,136.2  584.5208 
40  2.34E\(+\)36  639,615.3  663,342.7  627.7097 
50  2.31E\(+\)36  639,811.1  685,070.5  651.7323 
100  2.23E\(+\)36  659,971.7  800,349.3  787.987 
Topics identified by topic modeling techniques in the gas shortage tweet corpus
CTM  

Topic 1  Topic 2  Topic 3  Topic 4  Topic 5 
gas  station  gas  gas  gas 
cannot  gas  no  station  price 
find  need  station  wait  high 
know  hurricaneirma  line  line  got 
irma  close  miami  irma  irma 
Performance of SVM using topics and unigrams (varied word frequency threshold, number of topics \(=\) 5, training/testing = 70/30
Word frequency  Number of words  Precision  Recall  F score 

5  937  0.941  0.714  0.811 
6  797  0.961  0.788  0.866 
7  710  0.950  0.762  0.846 
10  519  0.969  0.771  0.859 
20  282  0.963  0.767  0.854 
50  109  0.972  0.775  0.862 
100  38  0.972  0.779  0.865 
350  5  0.985  0.789  0.876 
Performance of SVM using topics and unigrams (varied number of topics, word frequency threshold \(=\) 350, training/testing \(=\) 70/30
Number of topics  Precision  Recall  F score 

2  0.963  0.767  0.854 
4  0.983  0.761  0.858 
5  0.985  0.789  0.877 
6  0.950  0.790  0.863 
10  0.972  0.779  0.865 
12  0.975  0.713  0.824 
For studying the temporal dynamics of the tweet arrival at the city level, we chose eight major cities in Florida, namely Tampa, Orlando, Jacksonville, Miami, Gainesville, Tallahassee, Naples and West Palm Beach. For each city, we grouped all the gasoline shortage tweets that came within the same hour for period of September 6–15, 2017. Figure 5 shows the frequency distribution histogram of the number of hourly tweets about gasoline shortage in the six cities. We tested whether the arrival of the tweets followed a Poisson distribution. For this, we calculated the mean value for the number of tweets arriving in an hour for each city. Using these values as arrival rates \(\lambda \)’s, a Poisson probability distribution was generated for each city. Next, for each city, we performed three goodnessoffit tests to test the distribution of number tweets per hour against a sample generated Poisson distribution. We did the same tests for distribution of tweets per day for each city. the We did the Chisquare test (Chsq), the Kolmogorov–Smirnov test (KS) and the Cramer–von Mises criterion (VM). In the Chisquare test, the p values were simulated by the Monte Carlo simulation method of Hope (1968). (This is advised for small reference sets.) In the Kolmogorov–Smrinov test, the p values are approximated as exact p values are not available for the twosample case if onesided or in the presence of ties (Conover 1971). In addition to the six cities, we also modeled the arrival of gasoline shortage tweets for the state of Florida as a whole.
Table 9 shows the results for the two Chisquared tests for each city and the state of Florida. Assuming a significance level of 0.05 the Chisquare test fails to reject the null hypothesis for all the cities, i.e., our observations and samples from the Poisson distribution follow the same distributions. Similarly, using Von Mises criterion the arrival of our tweets follows a Poisson distribution. KS test shows that Orlando, Tallahassee, Jacksonville, Gainesville and West Palm Beach follow a Poisson distribution (as the test fails to reject the null hypothesis). Therefore, we conclude that the arrival of tweets follows a Poisson distribution. The same conclusion is drawn for the distribution of daily arrival of tweets for each city from Table 10.
Goodnessoffit tests for arrival of tweets (hourly)
City  Lambda  Chisq p value  VM p value  KS test p value 

Tampa  1.281915  0.1894  0.000554025  0.0006129 
Miami  3.356383  0.3073  0.000729358  2.22E−16 
Orlando  0.7765957  0.1064  8.96E−05  0.09341 
Tallahassee  0.2925532  0.3513  6.06E−09  0.5038 
Jacksonville  0.2340426  0.4963  4.86E−09  1 
Gainesville  0.4787234  0.1764  1.42E−06  0.6744 
West Palm Beach  1.303191  0.2044  9.10E−04  0.01205 
Naples  0.462766  0.2094  1.40E−06  0.5038 
Florida  10.23936  0.1989  0.000347264  3.33E−15 
Goodnessoffit tests for arrival of tweets (daily)
City  Chisq p value  VM p value  KS test p value 

Tampa  0.2705  0.1220301  0.6994 
Miami  0.2425  1.00E−01  3.36E−02 
Orlando  0.1426  1.42E−01  0.3364 
Tallahassee  0.3532  1.34E−01  0.6272 
Jacksonville  0.3406  1.31E−01  0.6994 
Gainesville  0.297  1.36E−01  0.6272 
West Palm Beach  0.3675  1.22E−01  0.27 
Naples  0.2578  1.49E−01  0.6272 
Florida  0.2425  0.1001568  3.36E−02 
As described in Sect. 4.3, in stage 3, we explore three methods for forecasting tweets about gasoline shortage : Poisson regression, timeseries models, a HLF method that combines properties of Poisson regression and timeseries models. First, we fit models of each kind using methodologies described in Sects. 4.3.1–4.3.3 on data for the period September 6–9, 2017. To find the best model among the three, we forecasted tweets about gasoline shortage for the period September 10–15, 2017, and compared it to ground truth.
Different timeseries models for Miami tweets data and their AIC values
Model  AIC  Model  AIC 

ARIMA(1,0,0) \(=\) AR(1)  1048.89  ARIMA(3,1,0)  1024.11 
ARIMA(2,0,0) \(=\) AR(2)  1031.46  ARIMA(3,1,1)  1023.77 
ARIMA(0,0,1) \(=\) MA(1)  1117.8  ARIMA((3,1,1),(1,0,0)) period =24  1012.12 
ARIMA(0,0,2) \(=\) MA(2)  1095.47  ARIMA(4,1,2)  1004.52 
ARIMA(3,0,0) \(=\) AR(3)  1029.65  ARIMA((4,1,3),(1,0,0)) period =25  1005.82 
ARIMA(2,1,0)  1025.76  ARIMA((4,1,3),(1,0,0)) period =25  1002.79 
ARIMA(2,1,1)  1022.19  ARIMA((4,1,2),(1,0,0)) period =25  1001.93 
Timeseries models selected for various cities of Florida
City  Model selected 

Miami  ARIMA((4,1,2),(1,0,0)) 
Naples  ARIMA(0,0,2) 
Jacksonville  ARIMA(0,0,4) 
Tampa  ARIMA(3,1,0) 
Gainesville  ARIMA(0,1,2) 
West Palm Beach  ARiMA(3,0,0) 
Tallahassee  ARIMA(1,0,0) 
Results of Poisson regression model with the best fit to forecast tweets about gasoline shortage (lag \(=\) 1 day)
Estimate  SE  zvalue  p value  

(Intercept)  \(\) 5.632856917  1.342732525  \(\) 4.195069988  2.73E−05  *** 
Gas shortage  10.05553126  1.316275539  7.639381701  2.18E−14  *** 
Number of gas stations  0.006732827  0.000395052  17.04290458  3.95E−65  *** 
On hurricane path  0.679578601  0.142605366  4.765449025  1.88E−06  *** 
Inside 3day cone  1.308414331  0.143248048  9.133906853  6.61E−20  *** 
Days to arrival  \(\) 1.71048558  0.148740094  \(\) 11.49982855  1.32E−30  *** 
Watches/warning  \(\) 5.763048725  0.418491761  \(\) 13.77099686  3.81E−43  *** 
Watches/warning  \(\) 3.698064077  0.265035222  \(\) 13.95310424  3.01E−44  *** 
Wind speeds  0.058446979  0.007923946  7.375993819  1.63E−13  *** 
Results of Poisson regression model with the best fit to forecast tweets about gasoline shortage (lag \(=\) 2 days)
Estimate  SE  zvalue  p value  Sig  

(Intercept)  1.88E\(+\)00  1.342733  1.48E−01  < 2E−16  *** 
Gas shortage  7.88E−08  1.70E−07  4.63E−01  6.43E−01  
Number of gas stations  8.65E−06  2.06E−04  0.042  9.67E−01  
On hurricane path  7.39E−01  9.88E−02  7.4809  7.46E−14  *** 
Inside 3day cone  1.49E\(+\)00  4.16E−01  3.571  3.55E−04  *** 
Days to arrival  \(\) 3.23E−01  4.51E−02  \(\) 7.162  7.95E−13  *** 
Watches/warning  \(\) 1.54E\(+\)00  2.01E−01  \(\) 7.681  1.58E−14  *** 
Watches/warning  \(\) 1.87E\(+\)01  1.72E−01  \(\) 10.895  < 2E−16  *** 
Wind speeds  5.44E−03  2.69E−03  2.021  4.33E−02  * 
Next, we estimated the best HLF model using the gradient descent algorithm. The features selected are the features of the model selected from the Poisson regression method which is described in Table 13. The data from 6–9 September were used for training data (\(X_\mathrm{train}\) and \(Y_\mathrm{train}\)). Data from Naples and Miami for the September 10 are as testing data (\(X_\mathrm{test}\)). \(Y_\mathrm{ts}\) is determined by from ARIMA model predictions for Miami and Naples for 10 September. \({\varLambda }{1} = {\varLambda }{1} = 1\) achieves the smallest RMSE value on the test data. The gradient descent algorithm converges to an optimum fastest at a learning rate, \(\omega _{1} = \omega _{2} = 10^{5}\).
In stage 4, we predict the number of stations out of gasoline the next day by using a Poisson regression (for the period September 13–15, 2017, for eight cities). To find the model with the best fit, we use the crossvalidation technique described in Sect. 4.4. Data from 6–12 September are used as training data, and the data from 13–15 September are used as a testing set. Table 15 shows the estimation of the Poisson regression model which had the best fit and achieved the lowest MAPE and RMSE on the test set. It tabulates the maximum likelihood estimates of regression coefficients and the standard error, zscore and p value from the ztest. All predictors are statistically significant. Null deviance \(=\) 51,961.01, residual deviance \(=\) 689.17 and pseudo\(R^{2} = 0.987\) show the model is a very good fit. On the test data, MAPE = 0.31 and RMSE \(=\) 9.13 are achieved. Figure 11 shows the comparison of the predictions on the test set with the ground truth.
Results of Poisson regression model with the best fit to predict gasoline shortage (lag \(=\) 0 day)
Estimate  SE  zvalue  p value  Sig  

(Intercept)  4.255101001  0.056026652  75.94780113  0  *** 
Population  \(\) 1.18E−06  1.35E−07  \(\) 8.700698039  3.30E−18  *** 
Number of gas stations  0.00310295  0.000121362  25.56764236  3.50E−144  *** 
Number of tweets  0.002997528  0.000281172  10.66083059  1.55E−26  *** 
Days to arrival  \(\) 0.137963866  0.020294006  \(\) 6.798256761  1.06E−11  *** 
Warning  \(\) 0.20750846  0.049436483  \(\) 4.19747623  2.70E−05  *** 
Results of Poisson regression model with the best fit to predict gasoline shortage without tweets information (lag \(=\) 0 day)
Estimate  SE  zvalue  p value  Sig  

(Intercept)  3.75E\(+\)00  2.73E−02  137.37  < 2E−16  *** 
Population  2.64E−07  5.28E−08  5.002  5.67E−07  *** 
Number of gas stations  0.00310295  0.000121362  25.56764236  3.50E−144  *** 
Days to arrival  0.2374  2.22E−02  10.714  < 2E−16  *** 
Warning  \(\) 1.01E−02  6.78E−04  \(\) 14.854  < 2E−16  *** 
Results of Poisson regression model with the best fit to predict gasoline shortage (lag \(=\) 1 day)
Estimate  SE  zvalue  p value  Sig  

(Intercept)  3.64E\(+\)00  3.03E−02  120.205  < 2E−16  *** 
Population  \(\) 3.56E−07  6.00E−08  \(\) 5.932  2.99E−09  *** 
Number of gas stations  2.83E−03  5.85E−05  48.46  < 2E−16  *** 
Number of tweets  2.20E−03  2.95E−04  7.442  9.90E−14  *** 
Days to arrival  \(\) 2.26E−01  2.16E−02  \(\) 10.43  < 2E−16  *** 
Warning  8.76E−03  7.57E−04  11.565  < 2E−16  *** 
6 Using social media data to drive decisionmaking models in the gas shortage domain
In our paper, we develop a coarse grain prediction of gasoline shortage, in that it only predicts the proportion of stations without gas in the city. If we had access to the groundtruth data at the individual gas station level, a finer grain prediction model that predicts gasoline shortage at individual gas station level could be validated. In the rest of this section, we assume that gasoline shortage predictions are available at the individual gas station level and outline the development of two key decisionmaking models, one related to supply of gasoline (by authorities) and the other related to search for gasoline (by individuals).
6.1 Supply of gasoline
Analysis of Twitter data can yield either probabilistic inference of individual gas station shortages or deterministic inference, depending on which set of analytical methods is used. If gasoline shortage at individual stations is known on a probabilistic basis, the resultant vehicle routing problem can be modeled using a prize collection methodology, where the prize for a visiting a gas station is larger if it has a higher likelihood of a shortage. It could also be modeled using stochastic programming methods as applied to VRP approaches. If the gasoline shortage is known on a deterministic basis, the vehicle routing can be modeled using traditional VRP approaches.
6.2 Search for gasoline
An individual searching for gasoline presents an interesting modeling situation, because they start with some level of gasoline and have limited travel capability in the search process. Also, there can be significant waiting lines at a gas station that has gasoline and the consumer has to decide whether to wait (if they run out of gas while waiting they would simply push their car in line) or travel to another gas station to seek gas. This can be modeled using a dynamic programming framework, in which each gas station can be in a variety of states with known probability. One of these states is the “no gas” state and the other states all have “gas” but different amounts of waiting time to obtain the gas.
In our upcoming work, we are building a model that can estimate the probability of gasoline shortage at an individual gas station using the spatiotemporal distribution of gasoline shortage tweets. This will provide needed data for both the models for supply of gasoline and search for gasoline.
7 Conclusions and future research
Our methodology helps answer two major questions. First, can social media be used to predict gasoline shortage during disasters and, second, what is a good methodology to make such a prediction. People tweet and use social media during emergencies. Hence, we believe this methodology can be generalized for other applications like predicting shortage of other commodities during and forecastable emergencies. Our methodology produces very accurate results for the case of gasoline shortage during Hurricane Irma in Florida in 2017. In particular, the method with HLF to predicts the number of future gasoline shortage tweets with high accuracy. ARIMA models successfully capture timerelated covariance between number of tweets, while the Poisson regression captures the variation in number of tweets due to gasoline shortage and other variables that cause panic. The HLF model successfully combines these two properties and hence achieves more accurate results. For the gasoline shortage prediction, our model achieves MAPE \(=\) 0.31 and RMSE \(=\) 9.13.
In the future, there are several fruitful directions for future research. Our first suggested future research direction stems from the recognition that although the F1 score in the classification model is reasonably good, the recall values could be improved by decreasing the relatively high number of false positives. We believe that this can be improved by further analysis of stage 2 (Classification) model. Our second future research direction stems from the fact that our method does a course grain prediction of gasoline shortage, in that it only predicts the proportion of stations without gas in the city. If we had access to the ground truth data at individual gas station level, a finer grain prediction model that predicts gasoline shortage at individual gas station level could be validated. Therefore, in our upcoming work, we are building a model that can estimate the probability of gasoline shortage at an individual gas station using the spatiotemporal distribution of gasoline shortage tweets. Our third future research direction relies on successful completion of the second future research task. Once future shortage data are available at the individual gas station level, they can be fed into a decisionmaking model for gasoline delivery to gas stations to ensure adequate supply where it is needed. This would likely be a vehicle routing type of formulation.
Notes
Acknowledgements
The authors would like to thank two anonymous referees who provided detailed comments that significantly enhanced our paper.
Funding
Funding was provided by National Science Foundation (Grant No. 1663101).
References
 Ashktorab Z, Brown C, Nandi M, Culotta A (2014) Tweedr: mining twitter to inform disaster response. In: ISCRAM 2014 conference proceedings—11th international conference on information systems for crisis response and management (May), pp 354–358. https://doi.org/10.1145/1835449.1835643, http://www.scopus.com/inward/record.url?eid=2s2.084905845531&partnerID=40&md5=ee57e6c3d9498b083428cdae67d83396
 Atefeh F, Khreich W (2015) A survey of techniques for event detection in twitter. Comput Intell 31(1):132–164CrossRefGoogle Scholar
 Beigi G, Hu X, Maciejewski R, Liu H (2016) An overview of sentiment analysis in social media and its applications in disaster relief. In: Pedrycz W, Chen SM (eds) Sentiment analysis and ontology engineering. Studies in Computational Intelligence, vol 639. Springer, Cham, pp 313–340. https://doi.org/10.1007/9783319303192_13
 Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(Jan):993–1022Google Scholar
 Blei DM, Lafferty JD et al (2007) A correlated topic model of science. Ann Appl Stat 1(1):17–35CrossRefGoogle Scholar
 Boulos MNK, Resch B, Crowley DN, Breslin JG, Sohn G, Burtner R, Pike WA, Jezierski E, Chuang KYS (2011) Crowdsourcing, citizen sensing and sensor web technologies for public and environmental health surveillance and crisis management: trends, OGC standards and application examples. Int J Health Geogr 10(1):67CrossRefGoogle Scholar
 Box GE, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. Wiley, HobokenGoogle Scholar
 Brockwell PJ, Davis RA, Calder MV (2002) Introduction to time series and forecasting, vol 2. Springer, BerlinCrossRefGoogle Scholar
 Cadenas E, Rivera W (2010) Wind speed forecasting in three different regions of Mexico, using a hybrid ARIMA–ANN model. Renew Energy 35(12):2732–2738CrossRefGoogle Scholar
 Caragea C, Squicciarini A, Stehle S, Neppalli K, Tapia A (2014) Mapping moods: geomapped sentiment analysis during hurricane sandy. In: ISCRAM 2014 conference proceedings—11th international conference on information systems for crisis response and management (May), pp 642–651. http://www.iscram.org/legacy/ISCRAM2014/papers/p29.pdf
 Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a contentbased approach to geolocating twitter users. In: Proceedings of the 19th ACM international conference on information and knowledge management. ACM, pp 759–768Google Scholar
 Chowdhury R, Chowdhury SR, Castillo C (2013) Tweet4act : using incidentspecific profiles for classifying crisisrelated messages. In: Proceedings of the 10th international ISCRAM conference (May), pp 834–839Google Scholar
 Conover WJ (1971) Practical nonparametric statistics. Wiley, New York, pp 295–301Google Scholar
 Cordeiro M, Gama J (2016) Online social networks event detection: a survey. In: Solving large scale learning tasks. Challenges and algorithms. Springer, Cham, pp 1–41. https://doi.org/10.1007/9783319417066_1
 Faulkner M, Olson M, Chandy R, Krause J, Chandy KM, Krause A (2011) The next big one: detecting earthquakes and other rare events from communitybased sensors. In: 2011 10th international conference on information processing in sensor networks (IPSN). IEEE, pp 13–24Google Scholar
 Fdot (2017) Hurricane IRMA report by Florida department of transportation. http://www.fdot.gov/info/CO/news/newsreleases/020118_FDOTFuelReport.pdf
 Feinerer I (2008) An introduction to text mining in R. Newslett R Proj 8/2:19Google Scholar
 Fessenden H (2017) Price gouging. https://www.richmondfed.org//media/richmondfedorg/publications/research/econ_focus/2017/q4/jargon_alert.pdf
 Flood R (2017) Express UK website. https://www.express.co.uk/news/weather/850222/HurricaneIrmapathdestructionUSAFloridapanicbuyingstorm
 Gasbuddy (2017b) https://tracker.gasbuddy.com/?q=Buffalo,%20NY
 Gaynor M, Seltzer M, Moulton S, Freedman J (2005) A dynamic, datadriven, decision support system for emergency medical services. In: International conference on computational science. Springer, pp 703–711Google Scholar
 Geman S, Geman D (1987) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. In: Readings in computer vision. Elsevier, pp 564–584Google Scholar
 Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235CrossRefGoogle Scholar
 Gu S, Pan C, Liu H, Li S, Hu S, Su L, Wang S, Wang D, Amin T, Govindan R, et al (2014) Data extrapolation in social sensing for disaster response. In: 2014 IEEE international conference on distributed computing in sensor systems (DCOSS). IEEE, pp 119–126Google Scholar
 Gupta A, Lamba H, Kumaraguru P, Joshi A (2013) Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In: Proceedings of the 22nd international conference on World Wide Web. ACM, pp 729–736Google Scholar
 Han B, Cook P, Baldwin T (2013) A stackingbased approach to twitter user geolocation prediction. In: Proceedings of the 51st annual meeting of the association for computational linguistics: system demonstrations, pp 7–12Google Scholar
 Hoffman M, Bach FR, Blei DM (2010) Online learning for latent dirichlet allocation. In: Advances in neural information processing systems, pp 856–864Google Scholar
 Hope AC (1968) A simplified Monte Carlo significance test procedure. J R Stat Soc: Ser B (Methodological) 30(3):582–598Google Scholar
 Hornik K, Grün B (2011) topicmodels: an R package for fitting topic models. J Stat Softw 40(13):1–30Google Scholar
 Hughes AL, St Denis LA, Palen L, Anderson KM (2014) Online public communications by police & fire services during the 2012 hurricane sandy. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1505–1514Google Scholar
 Imran M, Elbassuoni S, Castillo C, Diaz F, Meier P (2013) Practical extraction of disasterrelevant information from social media. In: Proceedings of the 22nd international conference on World Wide Web. ACM, pp 1021–1024Google Scholar
 Imran M, Castillo C, Lucas J, Meier P, Vieweg S (2014) AIDR: Artificial intelligence for disaster response. In: Proceedings of the companion publication of the 23rd international conference on World Wide Web companion (October), pp 159–162. https://doi.org/10.1145/2567948.2577034. https://mimran.me/papers/ imran_castillo_lucas_meier_vieweg_www2014.pdf
 Kaigo M (2012) Social media usage during disasters and social capital: Twitter and the great East Japan earthquake. Keio Commun Rev 34(1):19–35Google Scholar
 Ki EJ, Nekmat E (2014) Situational crisis communication and interactivity: usage and effectiveness of Facebook for crisis management by fortune 500 companies. Comput Hum Behav 35:140–147CrossRefGoogle Scholar
 Kumar S, Barbier G, Abbasi MA, Liu H (2011) Tweettracker: an analysis tool for humanitarian and disaster relief. In: Fifth international AAAI conference on weblogs and social mediaGoogle Scholar
 Lachlan KA, Spence PR, Lin X (2014) Expressions of risk awareness and concern through Twitter: on the utility of using the medium as an indication of audience needs. Comput Hum Behav 35:554–559. https://doi.org/10.1016/j.chb.2014.02.029 CrossRefGoogle Scholar
 Lee S, Song J, Kim Y (2010) An empirical comparison of four text mining methods. J Comput Inf Syst 51(1):1–10Google Scholar
 Liu BF, Fraustino JD, Jin Y (2016) Social media use during disasters: how information form and source influence intended behavioral responses. Commun Res 43(5):626–646. https://doi.org/10.1177/0093650214565917 CrossRefGoogle Scholar
 Mendoza M, Poblete B, Castillo C (2010) Twitter under crisis: can we trust what we RT?. In: Proceedings of the first workshop on social media analytics. ACM, pp 71–79Google Scholar
 Meyer D, Hornik K, Feinerer I (2008) Text mining infrastructure in R. J Stat Softw 25(5):1–54Google Scholar
 Morstatter F, Lubold N, PonBarry H, Pfeffer J, Liu H (2014) Finding eyewitness tweets during crises. arXiv:1403.1773
 National Hurricane Centre (2017) National hurricane centre website. https://www.nhc.noaa.gov
 Nazer TH, Xue G, Ji Y, Liu H (2017) Intelligent disaster response via social media analysis a survey. ACM SIGKDD Explor Newsl 19(1):46–59CrossRefGoogle Scholar
 Ni M, He Q, Gao J (2017) Forecasting the subway passenger flow under event occurrences with social media. IEEE Trans Intell Transp Syst 18(6):1623–1632Google Scholar
 Nie H, Liu G, Liu X, Wang Y (2012) Hybrid of ARIMA and SVMS for shortterm load forecasting. Energy Procedia 16:1455–1460CrossRefGoogle Scholar
 Olteanu A, Castillo C, Diaz F, Vieweg S (2014) CrisisLex: a lexicon for collecting and filtering microblogged communications in crises. In: Proceedings of the 8th international conference on weblogs and social media, p 376. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/download/8091/8138
 Pai PF, Lin CS (2005) A hybrid ARIMA and support vector machines model in stock price forecasting. Omega 33(6):497–505CrossRefGoogle Scholar
 Panagiotopoulos P, Barnett J, Bigdeli AZ, Sams S (2016) Social media in emergency management: Twitter as a tool for communicating risks to the public. Technol Forecast Soc Change 111:86–96. https://doi.org/10.1016/j.techfore.2016.06.010 CrossRefGoogle Scholar
 Phan XH, Nguyen LM, Horiguchi S (2008) Learning to classify short and sparse text & web with hidden topics from largescale data collections. In: Proceedings of the 17th international conference on World Wide Web. ACM, pp 91–100Google Scholar
 Said SE, Dickey DA (1984) Testing for unit roots in autoregressivemoving average models of unknown order. Biometrika 71(3):599–607CrossRefGoogle Scholar
 Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: realtime event detection by social sensors. In: Proceedings of the 19th international conference on World Wide Web. ACM, pp 851–860Google Scholar
 Sampson J, Morstatter F, Zafarani R, Liu H (2015) Realtime crisis mapping using language distribution. In: 2015 IEEE international conference on data mining workshop (ICDMW). IEEE, pp 1648–1651Google Scholar
 Schulz A, Hadjakos A, Paulheim H, Nachtwey J, Mühlhäuser M (2013) A multiindicator approach for geolocalization of tweets. In: Seventh international AAAI conference on weblogs and social media, pp 573–582Google Scholar
 Starbird K, Stamberger J (2010) Tweak the tweet: leveraging microblogging proliferation with a prescriptive syntax to support citizen reporting. In: Proceedings of the 7th international ISCRAM conference, information systems for crisis response and management Seattle, WA, vol 1, pp 1–5Google Scholar
 Stowe K, Paul MJ, Palmer M, Palen L, Anderson K (2016) Identifying and categorizing disasterrelated tweets. In: Proceedings of The fourth international workshop on natural language processing for social media, pp 1–6Google Scholar
 Stříteskỳ V, Stránská A, Drábik P (2015) Crisis communication on facebook. Studia Commercialia Bratislavensia 8(29):103–111CrossRefGoogle Scholar
 Tien Nguyen D, Mannai KAA, Joty S, Sajjad H, Imran M, Mitra P (2016) Rapid classification of crisisrelated data on social networks using convolutional neural networks. arXiv:1608.03902
 Tseng FM, Yu HC, Tzeng GH (2002) Combining neural network model with seasonal time series ARIMA model. Technol Forecast Soc Change 69(1):71–87CrossRefGoogle Scholar
 Ushahidi (2017) Ushahidi. https://www.ushahidi.com
 Utz S, Schultz F, Glocka S (2013) Crisis communication online: how medium, crisis type and emotions affected public reactions in the Fukushima Daiichi nuclear disaster. Public Relat Rev 39(1):40–46CrossRefGoogle Scholar
 van Gorp A, Pogrebnyakov N, Maldonado E (2015) Just keep tweeting: emergency responder’s social media use before and during emergencies. In: Proceedings of the 23rd European conference on information systems (ECIS 2015), pp 1–15. https://doi.org/10.18151/7217512
 Wainwright MJ, Jordan MI et al (2008) Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1(1–2):1–305Google Scholar
 Waze (2017) Waze. https://www.waze.com
 Xu Q, Tsui KL, Jiang W, Guo H (2016) A hybrid approach for forecasting patient visits in emergency department. Qual Reliab Eng Int 32(8):2751–2759CrossRefGoogle Scholar
 Zhang GP (2003) Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50:159–175CrossRefGoogle Scholar
 Zhu B, Wei Y (2013) Carbon price forecasting with a novel hybrid ARIMA and least squares support vector machines methodology. Omega 41(3):517–524CrossRefGoogle Scholar
 Zook M, Graham M, Shelton T, Gorman S (2010) Volunteered geographic information and crowdsourcing disaster relief: a case study of the Haitian earthquake. World Med Health Policy 2(2):7–33CrossRefGoogle Scholar