1 Introduction

The use of social media is being explored as a tool for disaster management by developers, researchers, government agencies and businesses. The disaster-affected area requires both, cautionary and disciplinary measures (Sushil 2017). Dai et al. (1994) first suggested the need for a computerized decision-making system during emergencies. Nowadays, information and communication technology (ICT) is being used widely during different phases of disaster for relief activities (Kabra and Ramesh 2015). Twitter plays a major role in informing people, acquiring their status information, and also gathering information on different rescue activities taking place during both, natural disasters (tsunamis/floods) and man-made disasters (terrorist attack/food contamination) (Al-Saggaf and Simmons 2015; Gaspar et al. 2016; Heverin and Zach 2012; Oh et al. 2013).

Social media platforms can be efficiently used for supply chain management by professionals, organizations, and retailers for their operations (Chae 2015; Mishra and Singh 2016; Papadopoulos et al. 2017). Social networks like Twitter and Facebook allow users to update information on social activities that they undertake (Mishra et al. 2016). Twitter provides the space where both official and common people can post their experiences and advice regarding disasters (Macias et al. 2009; Neubaum et al. 2014; Palen et al. 2010), which makes it a popular choice for disaster management. A lot of research work is going on to make this platform more suitable for disaster management. However, as suggested by Comfort et al. (2012), a more systematic study of social media is needed to improve public response. Turoff et al. (2013) is also of the same view, and have appealed to the research community to devise methods to improve citizen-engagement during emergencies. Quick and accurate responses from the leaders during disaster may boost their personal political standing (Ulku et al. 2015). Several agencies such as BMKG in Indonesia are actively engaged in providing updates and warnings to public through Twitter. Social media is also used by various agencies to coordinate rescue efforts and help victims.

Twitter is a micro blog where users send brief text messages, photographs and audio clips. Since users write small messages, they regularly send it and check for updates from others. Twitter updates include social events such as parties, cricket match, political campaigns, and disastrous events such as storms, heavy rainfall, earthquakes, traffic jams etc. A lot of work (Atefeh and Khreich 2015) has been done to detect events, both social as well as disastrous from Twitter messages. Most disastrous event detection systems are confined to detect whether a tweet is related to the disaster or not, based on textual content. The related tweets are further used to warn and inform people about precautionary measures (Sakaki et al. 2010, 2013). These tweets are also used to study the tweeting behavior of users during disasters. We view Twitter not only as an awareness platform, but a place where people can ask for help during disaster. The tweets asking for help need to be separated from other tweets related to the disaster. These tweets then can be used to guide the rescue personnel.

To help victims in need, one needs to have his/her exact location in their tweet, which is another important issue in emergency situations. Distribution centers play a big role in helping victims. Burkart et al. (2016) proposes a multi-objective location routing-model to minimize the cost of opening a distribution center for relief routing. The real time location estimation plays a big role in logistics, stockpiles, and medical supply planning (Duhamel et al. 2016; Lei et al. 2015; Paul and Hariharan 2012; Ozdamar et al. 2004). The growing number of location-based Social Networks provide the spatiotemporal data that has substantial potential to increase situational awareness and enhance, both planning and investigation (Chae et al. 2014). The analysis by Cheng et al. (2010) shows that only 26% users mention their location at a city level or below, and the remaining are mostly a country name, or even words with not much meaning, such as Wonderland. According to Cheng et al. (2010), only 0.42% tweets have geo-tags, but Morstatter et al. (2013) found that about 3.17% tweets are geo-tagged. These analyses reveal that Twitter has limited applicability as a location-based sensing system.

The rise of mobile Internet users in the last couple of years has significantly increased the number of mobile twitter users. According to a report by IAMAI (2016), the mobile Internet users in India will be 371 million by the end of 2016. The same report also highlights the fact that in rural areas, 39% of users are using social media, whereas in urban areas, this percentage is much higher. Mobile Twitter users can switch on and off their geo-tagging, as and when preferred. The battery power of smart phones plays a significant role here, as the global positioning system (GPS) consumes significant amount of battery power. Users prefer switching off their GPS to save power. On the other hand, applications such as taxi hiring services and e-commerce sites such as flipkart.com require GPS to work properly. The analysis of mobile Twitter users thus shows some tweets with geo-tagging, and others without geo-tagging. During emergencies, people want to preserve the battery power of their phones; hence, tweets with geo-tags will be very few on such occasions.

Fig. 1
figure 1

Sample tweet asking for help during floods

Fig. 2
figure 2

Another sample tweet asking for help during floods

India is a multilingual country, where English is used as the main language for communicating on social media websites. However, users of these sites also use their regional languages (Fig. 2). Hence, event detection in the Indian context also needs to identify variations in the language used.

The major contribution of this paper is a tweet classification system to classify tweets into high and low priority. High Priority tweets are those, which ask for help, such as food, shelter, medicine etc. during a disaster. Two sample tweets of high priority are shown in Figs. 1 and 2. Tweet in Fig. 2 is in the English script, but the words used here are in the Hindi language. The translation of the tweet is, “Mr. @narendramodi, heavy floods in Chhapra Bihar, please arrange for administrative help, people here are very worried.” Low priority tweets convey information related to a disaster, such as “Rescue team has done a good job.” An example is shown in Fig. 3, where a user thanks Twitter for its help during a disaster. The other contribution of this paper is location prediction of high priority tweets, if geo-tagging information is missing in a tweet. To predict location, we use historical geo-tagged tweets of the specific users and build a Markov chain. The low priority tweets are analyzed to find the spread of the disaster. These may also be used to evaluate the performance of different agencies during a disaster.

Fig. 3
figure 3

A sample tweet thanking Twitter for help during floods

The rest of this paper is organized as follows. Section 2 reviews the existing literature. Our proposed work and algorithms are presented in Sect. 3. The results are documented in Sect. 4. Section 5 discusses the work presented in this paper. Theoretical contributions are listed in Sect. 6. Implications for practice are listed in Sect. 7. We conclude this paper in Sect. 8, with some future research directions.

2 Related works

Both academia and industry have started to explore Twitter as a tool for disaster management. Steiger et al. (2015) did a comprehensive review of Twitter related research papers and found that about 46% of them dealt with event detection and 13% were about location estimation. Around 27% of all papers discussed by him were related to event detection in emergency situations.

Studies such as Sakaki et al. (2010, 2013), Earle et al. (2011), and Lin et al. (2016) focused on tweets associated with natural disasters such as earthquakes and extreme weather conditions. Sakaki et al. (2010, 2013) developed an earthquake reporting system in Japan using Twitter messages. Their system was able to detect 93% of earthquakes (seismic intensity of 3 and more), as reported by Japan meteorological agency (JMA). They used simple linguistic features such as word count, and context of target event words etc. to train a SVM-based classifier for detecting earthquakes. Particle filter was employed to predict the location of the detected event. The system was much faster than the JMA broadcast announcements in sending notifications to the public after sensing an earthquake. Another study by Earle et al. (2011) also proposed an earthquake detection algorithm that relied solely on Twitter data. They constructed a tweet-frequency time series called tweetgram from tweets with the keyword, earthquake. The tweetgram showed large peaks correlated with the origin times of earthquakes. They reported that their system was able to find 48 globally distributed earthquakes with only two false triggers in 5 months of data. The detection accuracy of their system was faster than some seismographic detection, as 75% of the events were detected within 2 min of their origin time. Lin et al. (2016) compared the content of communication and frequency of communication on Twitter and Weibo during extreme weather events. Twitter retweets and Weibo reposts were compared, and the similarities and dissimilarities of these two platforms were listed on reposting behavior and post content attributes.

On the other hand, studies such as Li et al. (2012), Imran et al. (2013), and Laylavi et al. (2016a) focused on ranking and classification techniques to identify tweets on a priority basis. Li et al. (2012) proposed a system that used tweets to detect and analyze crime and disaster related events, such as shootings, car accidents, tornadoes etc. Their system was able to detect new events, rank those events according to their importance, and find spatial and temporal patterns for the detected events. Imran et al. (2013) extracted relevant information from tweets to find informative tweets that contributed to situational awareness. Their approach used text classification techniques to map tweets related to an emergency situation with different types of emergency related information. However, very less attention was given to assessment and classification of Twitter messages based on the level of informativeness and relatedness to a specific type of event. Laylavi et al. (2016a) proposed a method for detecting event-specific informative tweets related to a storm event. They used the term frequency analysis and relationship scoring function to define event-related term classes. Each tweet was given an event relatedness score. The results of the proposed system were compared against a manually annotated dataset to evaluate the performance. About 87% of event related tweets were classified accurately by the proposed system.

Other studies such as Zhou and Chen (2014) and Kwon and Kang (2016) relied on time series in utilizing tweets to identify events. Zhou and Chen (2014) proposed a graphical model representing the content, time, and location of tweets. Every tweet is represented as a probability distribution over a set of topics by their model called location-time constrained topic (LTT). The distance between the distributions of two messages defines the similarity measure. They proved the effectiveness and efficiency of their proposed approach through extensive experiments. Kwon and Kang (2016) quantified the risk level of disaster occurrences in Seoul by analyzing tweet text. The usage frequency of keyword - flood, inclusion of disaster sign word, and degree of adverbs present in tweets were used to quantify the risk levels. They also proposed tools to visualize these risk levels based on tweet locations with the help of a time series.

Some studies like Zhang et al. (2015) and Laylavi et al. (2016b) showed more interest in user profiles to better understand the origin of tweets. Zhang et al. (2015) detected burst words from micro-blogging text streams using term co-occurrence information and user social relation information. They proposed a spread model based on the analysis of both event content and user profiles. Their system was able to distinguish users’ contributions based on their status/position, and interest in the predicted event. The future popularity of an event was also predicted using the historical popularity of an event data. Laylavi et al. (2016b) introduced a multi-elemental location inference method to predict the location of tweets by exploiting the textual content, user profile location and place labeling. Three granularity levels of location name classes were defined to look up the location references from the location associated elements. The location assigned to a tweet is the finest granular level. They reported that 87% of their tweets are successfully geo-located with a mean distance error of 12.2 km, and median distance error of 4.5 km.

Weiler et al. (2016) evaluated task-based performance measure, and runtime behavior of state-of-the-art event detection techniques for Twitter. They used the data stream management system to implement all available event detection techniques to measure the run-time performance. They also proposed several new measures for task-based performance measure of event detection techniques. They did extensive experiments to prove that their measures were sound and discriminating.

Location inference is retrieval of location information from Twitter data. So far, it has received little attention in Twitter data research. In fact, a number of studies involving Twitter have collected only geo-tagged tweets and analyzed those tweets in different domains such as public health (Paul and Dredze 2011), societal events (Ciulla et al. 2012), political elections (Skoric et al. 2012), tourist spots (Oku et al. 2014), and earthquakes (Sakaki et al. 2010). However, Cheng et al. (2010) reported that only 0.42% tweets are geo-tagged, whereas Morstatter et al. (2013) reported that around 3.17% tweets are geo-tagged. This number is so small that it becomes necessary to devise methods to extract location information from only publicly available components of tweets. Researchers have employed different machine learning, statistical, probability and natural language processing techniques to estimate the location from tweets (Ajao et al. 2015). Most works have considered the geographical references used in tweets to determine the location in the absence of geo-tagging. These geographical references are either “location indicative words” (LIWs) such as local dialectal terms (e.g. yinz) and place names (e.g. Portland) (Bo et al. 2012) or gazetteer terms.

Eisenstein et al. (2010) employed a rather unique approach to identify tweet locations. They presented a model that identifies words with high regional affinity, geographically coherent linguistic regions, and the relationship between regional and topic variation. They found that high-level topics such as sports, entertainment, etc. are spoken differently in each geographic region, revealing topic-specific regional distinctions. They used these distinctions to geo-locate users based on their tweets. Performance was measured as error metrics, which are the mean and median distance between the predicted and true location in km. The median distance error of their model was reported at 494 km. Cheng et al. (2010) also followed a similar approach, where they analyzed the content of geo-tagged tweets and calculated statistics for the most frequently used words in each city. They used a lattice-based neighborhood-smoothing model to refine a user’s location estimate. Han et al. (2014) presented a geo-location prediction platform by detecting and analyzing LIWs. They proposed several methods to select a feature for identifying LIWs. They also analyzed the impact of non-geo tagged data, the influence of language, and the complementary geographical information in the user metadata. Their method obtained a median prediction error of 209 km.

Watanabe et al. (2011) presented a real-time local-event detection system called Jasmine, which was able to geo-tag the event automatically by identifying the location. The degree of association of a place name with a location in the real world is estimated. For instance, Times Square in a document may refer to Times Square in New York. They assigned geo-location information to non-geo-tagged documents by identifying such place names. Graham et al. (2014) explored the accuracy of various language detection methods on tweets by identifying common sources of errors. They also did a comprehensive study of different location information, such as profile location, device location and time zone information within tweets. They proposed methods to be employed to map and measure the geo-linguistic contours of people’s information trails on twitter. Hecht et al. (2011) did an extensive study of users’ profile locations. They found that 34% users did not provide real location information. However, by analyzing a user’s tweets, their country and state can be determined easily with decent accuracy by some simple machine learning techniques. Hiruta et al. (2012) proposed a method to detect and classify tweets based on the possible correlation of user profile locations using both textual content and geo-tagging in different categories.

Wing and Baldridge (2011) represented the earth’s surface with a discrete grid using a unit of text such as a word, phrase, or document. They used simple supervised methods to find the location of a document based only on its text. They obtained a median error of 479 km, and a mean error of 967 km for Twitter. Dalvi et al. (2012) presented a model to locate users based on indirect spatial references found in tweets. They used restaurants as the target object for their study. Schulz et al. (2013) presented a technique to determine the location from where a tweet originated. They detected the spatial indicators in the text message and in the user profile. The area referred by that spatial indicator is determined and represented by a weighted polygon. Weights of the polygon were determined using an optimization algorithm considering the reported uncertainty of the spatial indicators. The geo-localization is done by intersecting and stacking the 3D polygons over each other. They reported that their method is capable of locating 92% tweets with a median accuracy of below 30 km, and predicting the users’ residence location with a median accuracy of below 5.1 km.

Minot et al. (2015) proposed a method for estimating the home location of users based on the content of their posts and their social connections on Twitter. They achieved an accuracy of 77% within 10 km compared to the techniques using only social connections. In a similar effort, Rodrigues et al. (2016) proposed a method to infer the spatial location of Twitter users by using the tweet text and their friendship network. They build a friendship network graph with the geographical labels and Twitter texts. Markov Chain Monte Carlo simulation technique was used to learn the posterior probability distribution of geographical labels. The method presented promising results with little sensitivity to parameters and high values of precision for a large dataset of Twitter users. Duong-Trung et al. (2016) developed a generative content-based regression model via matrix factorization technique to tackle the near real time geo-location prediction problem. They showed that a real time geo-location prediction is possible without concatenation of individual tweets. They build a regression model on the real-value properties of latitudes and longitudes, which proves to be better than the existing techniques.

The references related to event detection and location estimation are shown in Tables 1 and 2 respectively for a quick view.

Table 1 Event detection literature on Twitter
Table 2 Location estimation literature on Twitter

3 Methodology

Overall structure of the proposed system is given in Fig. 4. The system consists of the following modules, which have been described in the subsequent sections.

  • Data collection

  • Data pre-processing

  • Event classification

  • Location estimation

3.1 Data collection

In order to train and validate our model, sufficient tweets related to an event are needed, which should reflect the realistic scenario of that event. We used Twitter API to capture live tweets related to floods in southern and eastern states of India. The data collection was done using streaming API of Twitter with tweepy python library. The tweets were collected during November–December 2015 for Chennai floods (south India), and in July–August 2016 for Bihar. A total of 32,400 tweets were collected with keywords “flood”, “water”, “Baarh”. The collected tweets were in English, Hindi, and some other regional languages. For this study, we concentrated only on tweets in English and Hindi languages.

One of the major problems with data collected from Twitter is that it may contain a lot of irrelevant tweets such as advertisements. There are many spammers, also known as ‘spambots’, sending huge number of tweets. Finding spammers is a very difficult task and a number of researchers (Benevenuto et al. 2010; Gayo-Avello 2013; Li and Du 2014; Yardi et al. 2010) are focusing on fixing this issue. In our case, spamming does not pose a significant problem, as we were collecting tweets originating from mobile phones only. The rationale behind this is that hand-held devices are used as personal devices, and they are hardly used for mass tweet dissemination. To filter out the tweets coming from hand-held devices, the source field of the tweets is used. To further reduce the effect of spambots, only tweets from users having a ratio of the number of followers to the number of those following less than one was stored, as suggested by Gayo-Avello (2013) and Li and Du (2014).

Fig. 4
figure 4

System architecture for event detection of tweets

3.2 Data pre-processing

Tweets contain different types of noise and redundancies, such as emoticons, user mentions, Internet links etc. A proper data pre-processing is needed in order to use these tweets for any meaningful purpose. A number of steps were used to clean the tweets for this study: If the tweet contained “RT”, then that tweet was deleted, as this was not originally created by the sender, and it did not qualify for our analysis. The Internet links (starting with http://) were also deleted from the tweets, as we were concentrating on the tweet text only. Removing Internet links to photos, videos, news items or maps in a tweet can result in loss of useful details about the incident. However, since we were focusing on the textual content of the tweets, the Internet links were ignored. Any unwanted multiple dots were removed, and multiple spaces were merged into one. All non-ASCII characters were also deleted from the tweets. The stop words were removed, as they do not convey any meaningful information. Finally, the textual content of the tweets was converted to lowercase characters, as Uysal and Gunal (2014) showed that lowercase conversion is an effective pre-processing step. As a last step, the text in Hindi was translated to English.

3.3 Event classification

Hash tags and keywords in tweets help us extract tweets related to a target event. However, some of those tweets may be referring to general information such as “Floods have become a regular occurrence in Bihar”. The above tweet refers to floods, which may be a target event, but it does convey real time report of the event. Hence, we build a machine learning based system to categorize tweets into high and low priority classes. The tweets asking for help regarding food, shelter or medicines etc. are put in the high priority class, whereas tweets related to general information such as “50 people rescued from Bhagalpur” are put in the low priority class. The pre-processed tweets are manually annotated with class information. Manual annotation is needed, as tweets do not contain any class information. This feature is needed in order to train as well as test the system. Following features are extracted from the pre-processed tweet texts.

  • The number of words denoted by (w)

  • Verb in the tweet denoted by (verb)

  • Number of verbs denoted by (v)

  • Position of query word denoted by (pos)

  • Word before query word denoted by (before)

  • Word after query word denoted by (after)

The features along with the class label obtained by manual annotation are given as input to the classifier. We applied 3 popular classification algorithms, namely (i) support vector machine (SVM), (ii) gradient boosting and (iii) random forest. The results of SVM were very poor compared to the other two algorithms. Hence, the results of SVM are not reported here. The results of other two algorithms are reported in Sect. 4. The classifiers are implemented in Python using scikit-learn package for machine learning.

3.4 Location estimation

The location of users, who have tweeted asking for help, is determined. During a disaster, the close relatives or friends also tweet asking help for their dear ones with addresses/names. The system detects such tweets and extracts the location information (address) given in the tweet. The first step in location prediction phase is to find whether the tweet refers to the person tweeting, or someone else. If the tweet refers to someone else, the geo tag in the tweet does not help locate the referred person. The system then finds the referred user’s twitter handle mentioned in tweet text, and Markov chain technique of finding the user location is applied.

If the users with high priority tweets have posted with geo-tagging, then the tweets are simply forwarded to the rescue team. On the other hand, if the user with high priority tweet has not tweeted with geo-tagging, the historical tweets of that user for the last 7 days is extracted by the system, and the spatio temporal sequences are extracted from their historical tweets. The rationale behind using historical tweets is that most of the user activities are confined to a very limited area, which is close to their home location (Cho et al. 2011). Most of the users visit locations such as their home, workplace, shopping markets, friends place etc. on a regular basis. Hence, it can be assumed that for most users, their activity area is small, and users with large activity areas represent only a small fraction of the total twitter users. So, given the historical locations of a user, a Markov model can be established to predict the current location of the user. The presence of user at a specific location with respect to time is a stochastic process and can be easily modeled by a Markov chain.

Say, \(\left\{ {X_t ,t=0,1,2\ldots ..} \right\} \) is a stochastic process that takes on finite or countable number of possible values. \(X_t =i\) represents that the process is in state i at time t. The process in state i may move to another state \({ j}\) with a fixed probability \(P_{ij} \). A process is called Markov process or Markov chain, if it satisfies a Markov property called “memorylessness”. Mathematically, it can be represented by Eq. 1

$$\begin{aligned}&P(X_{t+1}= j{|}X_t =i,X_{t-1} =i_{t-1} ,X_{t-2} =i_{t-2} \ldots ..X_0 =i_0 )\nonumber \\&\qquad ~~~~~~~= P(X_{t+1} =j{|}X_t =i)\nonumber \\&\qquad ~~~~~~~= P_{ij} \end{aligned}$$
(1)

for all states \(i_{0}, i_{1}, i_{2} {\ldots }.{ i}_{t-1}, i, j\) and all t \(\ge 0\).

The matrix of one step probability P is denoted in Eq. 2

$$\begin{aligned} P=\left[ {{\begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} {P_{00} }&{} {P_{01} }&{} {P_{02} }&{} \ldots &{} {P_{0j} }&{} \ldots \\ {P_{10} }&{} {P_{11} }&{} {P_{12} }&{} \ldots &{} {P_{1j} }&{} \ldots \\ {P_{20} }&{} {P_{21} }&{} {P_{22} }&{} \ldots &{} {P_{2j} }&{} \ldots \\ \ldots &{} \ldots &{} \ldots &{} \ldots &{} \ldots &{} \ldots \\ {P_{i0} }&{} {P_{i1} }&{} {P_{i2} }&{} \ldots &{} {P_{ij} }&{} \ldots \\ \ldots &{} \ldots &{} \ldots &{} \ldots &{} \ldots &{} \ldots \\ \end{array} }} \right] \end{aligned}$$
(2)

where, \(P_{ij} \ge 0,i,j\ge 0,\mathop \sum \nolimits _{j=0}^\infty P_{ij} =1,i=0,1,2\ldots \)

The Markov chain for a user is shown in Fig. 5, where the circles represent the user’s location and the arcs represent the transitions from one location to another. The values shown on the arcs represent the probability with which a user moves from one state to another. As shown in Fig. 5, a user currently at location \(\hbox {L}_{1}\) may stay at location \(\hbox {L}_{1 }\) with a probability of 0.57, while they move to locations \(\hbox {L}_{2}\), \(\hbox {L}_{3}\), \(\hbox {L}_{6}\) and \(\hbox {L}_{7}\) with probabilities 0.22, 0.07, 0.07, 0.07, respectively. The user at location \(\hbox {L}_{2}\) at time t is represented by vector \(\hbox {L}_{t}\) (Eq. 3).

$$\begin{aligned} L_t =\left[ {{\begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 0&{} 1&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0 \\ \end{array} }} \right] \end{aligned}$$
(3)

While finding the Markov chain, we have grouped all locations, which are within 1 km into one location. The size of matrix P is N\(\times \)N, where N is the different locations from where the user has tweeted. The next location \(\hbox {L}_{\mathrm{t}+1}\) can be predicted by multiplying the current location vector \(\hbox {L}_{\mathrm{t}}\) with P, given in Eq. 4.

$$\begin{aligned} L_{t+1} =L_t \times P \end{aligned}$$
(4)

Where \(\hbox {L}_{\mathrm{t}}\) is vector of size N representing the historical user locations from where user has tweeted, P is the transition matrix of size N\(\times \)N. The next location of the user currently at location \(\hbox {L}_{2}\) for Fig. 5 is predicted using Eq. 5 and the vector shown in Eq. 6 represents the result.

Once the model is built from user’s historical locations, it can be used to predict the next location of the user using Eq. 4, even if the user has not turned on their geo-location. Results of the proposed system are elaborated in the following section. User locations are predicted for users posting high priority tweets only. The low priority tweets are stored in the database, and they are further analyzed for user behavior during disasters, and for rescue teams’ performance analysis.

$$\begin{aligned}&L_{t+1} =\left[ {{\begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 0&{} 1&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0 \\ \end{array} }} \right] \times \left[ {{\begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} {0.57}&{} {0.22}&{} {0.07}&{} 0&{} 0&{} {0.07}&{} {0.07}&{} 0 \\ {0.4}&{} 0&{} {0.2}&{} 0&{} {0.2}&{} 0&{} {0.2}&{} 0 \\ 0&{} 0&{} 0&{} 1&{} 0&{} 0&{} 0&{} 0 \\ 1&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0 \\ 1&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0 \\ 0&{} 1&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0 \\ {0.25}&{} 0&{} 0&{} 0&{} 0&{} 0&{} {0.5}&{} {0.25} \\ 0&{} 1&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0 \\ \end{array} }} \right] \end{aligned}$$
(5)
$$\begin{aligned}&\left[ {{\begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} {0.4}&{} 0&{} {0.2}&{} 0&{} {0.2}&{} 0&{} {0.2}&{} 0 \\ \end{array} }} \right] \end{aligned}$$
(6)
Fig. 5
figure 5

Markov chain for a specific user

4 Results

Performance of the classifiers is based on popular measures, such as precision, recall, F1-score and receiver operating characteristic (ROC) curve. The high priority and low priority classes are represented as class 0 and class 1, here.

Precision is the fraction of actual true class data that has been correctly classified, to the total data that is classified as true. In our case, out of total tweets classified as high priority, how many tweets are actually high priority and vice versa. Mathematically, it is represented in Eq. 7

$$\begin{aligned} Precision=\frac{T_p }{T_p +F_p } \end{aligned}$$
(7)

Where \(T_p \) is the number of true class samples that have been classified as true, and \(F_p \) is the number of false class samples that have been classified as true.

Recall is the fraction of true data that has been classified as true, to the total number of true data. For our case, this measure says how many high priority tweets are actually predicted as high priority and vice versa. Mathematically, it is represented in Eq. 8

$$\begin{aligned} Recall=\frac{T_p }{T_p +F_n } \end{aligned}$$
(8)

Where \(F_n \) is the number of true class samples that have been classified as false.

F1 Score is the harmonic mean of precision and recall, and it bundles precision and recall in one simple index. Mathematically, it is represented in Eq. 9

$$\begin{aligned} F1~Score=2 \times \frac{Precision\times Recall}{Precision+Recall} \end{aligned}$$
(9)

ROC curve is the plot between true positive rate and false positive rate of the classifier for different thresholds. The classifier performance is more accurate, if the curve follows the left-hand border and then the top border of the ROC space. Davis and Goadrich (2006) have described in excellent detail the relationship of precision, recall and ROC curve.

Table 3 Precision, recall and F1 scores using gradient boosting classifier
Table 4 Precision, recall and F1 scores using random forest classifier

The classification result in terms of precision, recall and F1-score of our classifier is presented in Table 3 for gradient boosting classifier. The classifier offers a promising result, but not a very good one. The result of the classifier using random forest algorithm is presented in Table 4. The results are better than gradient boosting algorithm, as the precision and recall of class 0 (high priority) are better, which is our target class here. We want to optimize the precision and recall of class 0. The confusion matrix represents how many samples are classified in which class (Figs. 6 and 7). Out of the 38 samples used for testing, six class 0 (high priority) samples have been classified as class 1 (low priority) by the gradient boosting classification algorithm (Fig. 6). On the other hand, random forest classification algorithm has misclassified four class 0 (high priority) tweets as class 1 (low priority). The ROC curve for both classifiers is shown in Figs. 8 and 9, which confirm that random forest classifier offers better results than gradient boosting classifier.

Fig. 6
figure 6

Confusion matrix for gradient boosting classifier

Fig. 7
figure 7

Confusion matrix for high/low priority class for random forest classifier

Fig. 8
figure 8

ROC for gradient boosting classifier

Performance of location prediction is measured in terms of percentage of successful predictions (exact location prediction) to total predictions. The location prediction algorithm is applied for users with high priority class tweets. 100 users were chosen for location prediction, and 7 days of their tweeting location histories were maintained. The reason for considering the 7-day history is that Twitter allows fetching 7 days of old tweets by its REST API. The success ratio of the location prediction algorithm is found to be 0.87 (87%).

Fig. 9
figure 9

ROC for random forest classifier

5 Discussion

The current research attempts to effectively utilize social media in locating users asking for help during a disaster/emergency situation with the use of an automatic tweet parsing system. The system proposed in this study is capable of identifying 81% users tweeting for help during a flood related disaster in an Indian context (see Recall column, Table 4 for class 0). The users tweeting about general information related to the disaster are correctly classified in 76% cases. The objective of this research was to identify high priority users, which was successfully accomplished by our current system, as the classification accuracy of the high priority class is higher than the others. About 81% of the predicted high priority class samples are in fact high priority (see Precision column, Table 4 for class 0). This confirms that tweets can be used to identify users requiring assistance during a disaster with an automated system, as it shows good results for all tweets in English and Hindi languages. The users needing help were also localized, i.e. their latitudes and longitudes were determined with 87% accuracy. The location was determined using (i) address provided in the tweet, (ii) geo-tagging, and (iii) Markov chain. To the best of our knowledge, no one has used historical tweets to predict the current location of a twitter user. This result also supports the claim by Cho et al. (2011), which states that most twitter users have a limited moving zone.

The system picks up tweets from people mentioning names/addresses of other people in need of help, and extracts the location information (address) given in those tweets. The current research enriches the dimension of twitter research by finding whether the tweet refers to the person tweeting, or someone else. If the tweet refers to someone else, the geo-tagged tweet does not help in locating the referred person. If the referred user’s twitter handle is mentioned in the tweet, our Markov chain technique of finding the user location is very helpful. In case the user does not have historical geo-tagged tweets to infer her/his location, the user is automatically prompted by the system asking for their location. The current system adds a new dimension to social media, where it can be used as a tool for public help. Most of the prior research (Xiao et al. 2015; Huang and Xiao 2015; Carley et al. 2016) on uses of social media has mainly concentrated on either evaluating the fitness of social media as a disaster management tool, or detecting tweets related to disasters to warn users about disasters. Some others researchers (Hara 2015) have studied the behavior of Twitter users during disasters. This is probably the first article to use it for social help.

6 Theoretical contributions

The major contribution of this research is the development of text mining algorithm to detect flood related tweets in English and Hindi languages. The other contribution is the classification of these tweets into high and low priority classes to identify tweets needing urgent attention. Six features are extracted from the textual part of the tweet. The lightweight data pre-processing and feature extraction allows to pre-process data and extract the features from tweets as soon as they are collected. The proposed system does not need any extra storage to do all the computations. The other major contribution is location extraction from a tweet when location is mentioned in the tweet text, and in absence of location information, the use of historical locations of the user to predict their probable location using Markov chain. The formulation of Markov Chain using historical locations of a user is one of the highlights of the current research. The location accuracy of the proposed work is also very satisfactory at 81% on an average.

7 Implications for practice

It is important for disaster relief organizations, NGOs etc. to get real time information about the help required by the victims. It is not possible for human beings to directly scan the streaming tweets and filter urgent tweets, because of the volume and velocity of tweets. The proposed system can help separate the tweets requiring urgent attention (high priority tweets) from other tweets. The high priority tweets can then be easily inspected by a human operator, who can then inform their members about the type of help required by the user sending that tweet. This will make the job of relief operation agencies faster and easier. There are some tweets, which are misclassified by the proposed system, which can be studied by the researchers to identify the reasons behind such misclassification. The reasons of misclassification can be used to educate common public on how to write their tweets properly during emergency situations. The low priority tweets can be clustered into groups based on the content of the tweets. This will help the government agencies and researchers to analyze the behavior of users during different phases of an emergency/disaster. Twitter can be augmented to include some interface for asking help during a disaster. The interface can be made intelligent to ask users to turn on their location information while asking for help during a disaster. The location information will help identify user location precisely and quickly for help to be sent at the earliest possibility.

8 Conclusions

In this article, we proposed a tweet classification system to identify tweets from disaster victims asking for help. Further, the user location is estimated from their old tweets, if the location is not mentioned in the current tweet. The system uses temporal location information of the users to make a Markov model, which is used for location inference. In this research, we have only considered the textual content of the tweets to categorize them, ignoring the Internet links (if any) provided in the tweets. The drawback here is that these Internet links may point to websites, which may yield further information or images of the affected area. The other drawback of the system is that it will not work for a first timer Twitter user, or the users, who have never switched on their geo-location. For the current work, we have tried to resolve this issue by sending an automatic query to user to report her/his location. In future, we will try to infer the locations from other techniques such as users’ friend networks, and other social networks such as Facebook, Tumblr etc. Another limitation of the current system is that it works for flood related disaster, as the system is trained with flood related corpus. For any other disaster, the system has to be trained with that corpus. The current research opens up several new directions for other researchers to explore. The classification accuracy of the system is 81%, which can be further enhanced by considering more parameters. The inclusion of other languages will further enhance the system, as more users are expressing their views in their native languages. Human experts can study the misclassification cases to find the reasons for such misclassifications. This research can also be used to categorize users based on their movement patterns, which can be used by other businesses such as tourism.