Event classification and location prediction from tweets during disasters
- 3.7k Downloads
Social media is a platform to express one’s view in real time. This real time nature of social media makes it an attractive tool for disaster management, as both victims and officials can put their problems and solutions at the same place in real time. We investigate the Twitter post in a flood related disaster and propose an algorithm to identify victims asking for help. The developed system takes tweets as inputs and categorizes them into high or low priority tweets. User location of high priority tweets with no location information is predicted based on historical locations of the users using the Markov model. The system is working well, with its classification accuracy of 81%, and location prediction accuracy of 87%. The present system can be extended for use in other natural disaster situations, such as earthquake, tsunami, etc., as well as man-made disasters such as riots, terrorist attacks etc. The present system is first of its kind, aimed at helping victims during disasters based on their tweets.
KeywordsDisaster management Location inference Geo-tagging Twitter Social media
The use of social media is being explored as a tool for disaster management by developers, researchers, government agencies and businesses. The disaster-affected area requires both, cautionary and disciplinary measures (Sushil 2017). Dai et al. (1994) first suggested the need for a computerized decision-making system during emergencies. Nowadays, information and communication technology (ICT) is being used widely during different phases of disaster for relief activities (Kabra and Ramesh 2015). Twitter plays a major role in informing people, acquiring their status information, and also gathering information on different rescue activities taking place during both, natural disasters (tsunamis/floods) and man-made disasters (terrorist attack/food contamination) (Al-Saggaf and Simmons 2015; Gaspar et al. 2016; Heverin and Zach 2012; Oh et al. 2013).
Social media platforms can be efficiently used for supply chain management by professionals, organizations, and retailers for their operations (Chae 2015; Mishra and Singh 2016; Papadopoulos et al. 2017). Social networks like Twitter and Facebook allow users to update information on social activities that they undertake (Mishra et al. 2016). Twitter provides the space where both official and common people can post their experiences and advice regarding disasters (Macias et al. 2009; Neubaum et al. 2014; Palen et al. 2010), which makes it a popular choice for disaster management. A lot of research work is going on to make this platform more suitable for disaster management. However, as suggested by Comfort et al. (2012), a more systematic study of social media is needed to improve public response. Turoff et al. (2013) is also of the same view, and have appealed to the research community to devise methods to improve citizen-engagement during emergencies. Quick and accurate responses from the leaders during disaster may boost their personal political standing (Ulku et al. 2015). Several agencies such as BMKG in Indonesia are actively engaged in providing updates and warnings to public through Twitter. Social media is also used by various agencies to coordinate rescue efforts and help victims.
Twitter is a micro blog where users send brief text messages, photographs and audio clips. Since users write small messages, they regularly send it and check for updates from others. Twitter updates include social events such as parties, cricket match, political campaigns, and disastrous events such as storms, heavy rainfall, earthquakes, traffic jams etc. A lot of work (Atefeh and Khreich 2015) has been done to detect events, both social as well as disastrous from Twitter messages. Most disastrous event detection systems are confined to detect whether a tweet is related to the disaster or not, based on textual content. The related tweets are further used to warn and inform people about precautionary measures (Sakaki et al. 2010, 2013). These tweets are also used to study the tweeting behavior of users during disasters. We view Twitter not only as an awareness platform, but a place where people can ask for help during disaster. The tweets asking for help need to be separated from other tweets related to the disaster. These tweets then can be used to guide the rescue personnel.
To help victims in need, one needs to have his/her exact location in their tweet, which is another important issue in emergency situations. Distribution centers play a big role in helping victims. Burkart et al. (2016) proposes a multi-objective location routing-model to minimize the cost of opening a distribution center for relief routing. The real time location estimation plays a big role in logistics, stockpiles, and medical supply planning (Duhamel et al. 2016; Lei et al. 2015; Paul and Hariharan 2012; Ozdamar et al. 2004). The growing number of location-based Social Networks provide the spatiotemporal data that has substantial potential to increase situational awareness and enhance, both planning and investigation (Chae et al. 2014). The analysis by Cheng et al. (2010) shows that only 26% users mention their location at a city level or below, and the remaining are mostly a country name, or even words with not much meaning, such as Wonderland. According to Cheng et al. (2010), only 0.42% tweets have geo-tags, but Morstatter et al. (2013) found that about 3.17% tweets are geo-tagged. These analyses reveal that Twitter has limited applicability as a location-based sensing system.
India is a multilingual country, where English is used as the main language for communicating on social media websites. However, users of these sites also use their regional languages (Fig. 2). Hence, event detection in the Indian context also needs to identify variations in the language used.
The rest of this paper is organized as follows. Section 2 reviews the existing literature. Our proposed work and algorithms are presented in Sect. 3. The results are documented in Sect. 4. Section 5 discusses the work presented in this paper. Theoretical contributions are listed in Sect. 6. Implications for practice are listed in Sect. 7. We conclude this paper in Sect. 8, with some future research directions.
2 Related works
Both academia and industry have started to explore Twitter as a tool for disaster management. Steiger et al. (2015) did a comprehensive review of Twitter related research papers and found that about 46% of them dealt with event detection and 13% were about location estimation. Around 27% of all papers discussed by him were related to event detection in emergency situations.
Studies such as Sakaki et al. (2010, 2013), Earle et al. (2011), and Lin et al. (2016) focused on tweets associated with natural disasters such as earthquakes and extreme weather conditions. Sakaki et al. (2010, 2013) developed an earthquake reporting system in Japan using Twitter messages. Their system was able to detect 93% of earthquakes (seismic intensity of 3 and more), as reported by Japan meteorological agency (JMA). They used simple linguistic features such as word count, and context of target event words etc. to train a SVM-based classifier for detecting earthquakes. Particle filter was employed to predict the location of the detected event. The system was much faster than the JMA broadcast announcements in sending notifications to the public after sensing an earthquake. Another study by Earle et al. (2011) also proposed an earthquake detection algorithm that relied solely on Twitter data. They constructed a tweet-frequency time series called tweetgram from tweets with the keyword, earthquake. The tweetgram showed large peaks correlated with the origin times of earthquakes. They reported that their system was able to find 48 globally distributed earthquakes with only two false triggers in 5 months of data. The detection accuracy of their system was faster than some seismographic detection, as 75% of the events were detected within 2 min of their origin time. Lin et al. (2016) compared the content of communication and frequency of communication on Twitter and Weibo during extreme weather events. Twitter retweets and Weibo reposts were compared, and the similarities and dissimilarities of these two platforms were listed on reposting behavior and post content attributes.
On the other hand, studies such as Li et al. (2012), Imran et al. (2013), and Laylavi et al. (2016a) focused on ranking and classification techniques to identify tweets on a priority basis. Li et al. (2012) proposed a system that used tweets to detect and analyze crime and disaster related events, such as shootings, car accidents, tornadoes etc. Their system was able to detect new events, rank those events according to their importance, and find spatial and temporal patterns for the detected events. Imran et al. (2013) extracted relevant information from tweets to find informative tweets that contributed to situational awareness. Their approach used text classification techniques to map tweets related to an emergency situation with different types of emergency related information. However, very less attention was given to assessment and classification of Twitter messages based on the level of informativeness and relatedness to a specific type of event. Laylavi et al. (2016a) proposed a method for detecting event-specific informative tweets related to a storm event. They used the term frequency analysis and relationship scoring function to define event-related term classes. Each tweet was given an event relatedness score. The results of the proposed system were compared against a manually annotated dataset to evaluate the performance. About 87% of event related tweets were classified accurately by the proposed system.
Other studies such as Zhou and Chen (2014) and Kwon and Kang (2016) relied on time series in utilizing tweets to identify events. Zhou and Chen (2014) proposed a graphical model representing the content, time, and location of tweets. Every tweet is represented as a probability distribution over a set of topics by their model called location-time constrained topic (LTT). The distance between the distributions of two messages defines the similarity measure. They proved the effectiveness and efficiency of their proposed approach through extensive experiments. Kwon and Kang (2016) quantified the risk level of disaster occurrences in Seoul by analyzing tweet text. The usage frequency of keyword - flood, inclusion of disaster sign word, and degree of adverbs present in tweets were used to quantify the risk levels. They also proposed tools to visualize these risk levels based on tweet locations with the help of a time series.
Some studies like Zhang et al. (2015) and Laylavi et al. (2016b) showed more interest in user profiles to better understand the origin of tweets. Zhang et al. (2015) detected burst words from micro-blogging text streams using term co-occurrence information and user social relation information. They proposed a spread model based on the analysis of both event content and user profiles. Their system was able to distinguish users’ contributions based on their status/position, and interest in the predicted event. The future popularity of an event was also predicted using the historical popularity of an event data. Laylavi et al. (2016b) introduced a multi-elemental location inference method to predict the location of tweets by exploiting the textual content, user profile location and place labeling. Three granularity levels of location name classes were defined to look up the location references from the location associated elements. The location assigned to a tweet is the finest granular level. They reported that 87% of their tweets are successfully geo-located with a mean distance error of 12.2 km, and median distance error of 4.5 km.
Weiler et al. (2016) evaluated task-based performance measure, and runtime behavior of state-of-the-art event detection techniques for Twitter. They used the data stream management system to implement all available event detection techniques to measure the run-time performance. They also proposed several new measures for task-based performance measure of event detection techniques. They did extensive experiments to prove that their measures were sound and discriminating.
Location inference is retrieval of location information from Twitter data. So far, it has received little attention in Twitter data research. In fact, a number of studies involving Twitter have collected only geo-tagged tweets and analyzed those tweets in different domains such as public health (Paul and Dredze 2011), societal events (Ciulla et al. 2012), political elections (Skoric et al. 2012), tourist spots (Oku et al. 2014), and earthquakes (Sakaki et al. 2010). However, Cheng et al. (2010) reported that only 0.42% tweets are geo-tagged, whereas Morstatter et al. (2013) reported that around 3.17% tweets are geo-tagged. This number is so small that it becomes necessary to devise methods to extract location information from only publicly available components of tweets. Researchers have employed different machine learning, statistical, probability and natural language processing techniques to estimate the location from tweets (Ajao et al. 2015). Most works have considered the geographical references used in tweets to determine the location in the absence of geo-tagging. These geographical references are either “location indicative words” (LIWs) such as local dialectal terms (e.g. yinz) and place names (e.g. Portland) (Bo et al. 2012) or gazetteer terms.
Eisenstein et al. (2010) employed a rather unique approach to identify tweet locations. They presented a model that identifies words with high regional affinity, geographically coherent linguistic regions, and the relationship between regional and topic variation. They found that high-level topics such as sports, entertainment, etc. are spoken differently in each geographic region, revealing topic-specific regional distinctions. They used these distinctions to geo-locate users based on their tweets. Performance was measured as error metrics, which are the mean and median distance between the predicted and true location in km. The median distance error of their model was reported at 494 km. Cheng et al. (2010) also followed a similar approach, where they analyzed the content of geo-tagged tweets and calculated statistics for the most frequently used words in each city. They used a lattice-based neighborhood-smoothing model to refine a user’s location estimate. Han et al. (2014) presented a geo-location prediction platform by detecting and analyzing LIWs. They proposed several methods to select a feature for identifying LIWs. They also analyzed the impact of non-geo tagged data, the influence of language, and the complementary geographical information in the user metadata. Their method obtained a median prediction error of 209 km.
Watanabe et al. (2011) presented a real-time local-event detection system called Jasmine, which was able to geo-tag the event automatically by identifying the location. The degree of association of a place name with a location in the real world is estimated. For instance, Times Square in a document may refer to Times Square in New York. They assigned geo-location information to non-geo-tagged documents by identifying such place names. Graham et al. (2014) explored the accuracy of various language detection methods on tweets by identifying common sources of errors. They also did a comprehensive study of different location information, such as profile location, device location and time zone information within tweets. They proposed methods to be employed to map and measure the geo-linguistic contours of people’s information trails on twitter. Hecht et al. (2011) did an extensive study of users’ profile locations. They found that 34% users did not provide real location information. However, by analyzing a user’s tweets, their country and state can be determined easily with decent accuracy by some simple machine learning techniques. Hiruta et al. (2012) proposed a method to detect and classify tweets based on the possible correlation of user profile locations using both textual content and geo-tagging in different categories.
Wing and Baldridge (2011) represented the earth’s surface with a discrete grid using a unit of text such as a word, phrase, or document. They used simple supervised methods to find the location of a document based only on its text. They obtained a median error of 479 km, and a mean error of 967 km for Twitter. Dalvi et al. (2012) presented a model to locate users based on indirect spatial references found in tweets. They used restaurants as the target object for their study. Schulz et al. (2013) presented a technique to determine the location from where a tweet originated. They detected the spatial indicators in the text message and in the user profile. The area referred by that spatial indicator is determined and represented by a weighted polygon. Weights of the polygon were determined using an optimization algorithm considering the reported uncertainty of the spatial indicators. The geo-localization is done by intersecting and stacking the 3D polygons over each other. They reported that their method is capable of locating 92% tweets with a median accuracy of below 30 km, and predicting the users’ residence location with a median accuracy of below 5.1 km.
Minot et al. (2015) proposed a method for estimating the home location of users based on the content of their posts and their social connections on Twitter. They achieved an accuracy of 77% within 10 km compared to the techniques using only social connections. In a similar effort, Rodrigues et al. (2016) proposed a method to infer the spatial location of Twitter users by using the tweet text and their friendship network. They build a friendship network graph with the geographical labels and Twitter texts. Markov Chain Monte Carlo simulation technique was used to learn the posterior probability distribution of geographical labels. The method presented promising results with little sensitivity to parameters and high values of precision for a large dataset of Twitter users. Duong-Trung et al. (2016) developed a generative content-based regression model via matrix factorization technique to tackle the near real time geo-location prediction problem. They showed that a real time geo-location prediction is possible without concatenation of individual tweets. They build a regression model on the real-value properties of latitudes and longitudes, which proves to be better than the existing techniques.
Event detection literature on Twitter
Approach / method / algorithm / platform / model
Adedoyin-Olowe et al. (2016)
Transaction-based rule change mining
Atefeh and Khreich (2015)
Supervised/unsupervised event detection approach
Boettcher and Lee (2012)
A novel local event detection method—EventRadar
Dong et al. (2015)
Statistical modelling and analysis
Earle et al. (2011)
Short-term and long-term average algorithm
Li et al. (2012)
Twitter-based event detection and analysis system (TEDAS)
Lin et al. (2016)
Sakaki et al. (2010)
Probabilistic spatio-temporal model
Watanabe et al. (2011)
Automatic geotagging method
Weiler et al. (2016)
Run-time and task-based performance
Zhang et al. (2015)
Linear spread model
Zhou and Chen (2014)
Location-time constrained topic (LTT)
Location estimation literature on Twitter
Approach / method / algorithm / platform / model
Ajao et al. (2015)
Natural language processing (NLP) techniques, gazetteers, probabilistic and machine learning techniques
Aulov and Halem (2012)
Cheng et al. (2010)
Content-based user location estimation
Dalvi et al. (2012)
Distance model a user-level model
Eisenstein et al. (2010)
Multi-level generative model
Graham et al. (2014)
Compact language detection (CLD)
Han et al. (2014)
Feature selection methods to identify location indicative words (LIWs)
Hecht et al. (2011)
Machine learning techniques
Laylavi et al. (2016b)
Middleton et al. (2014)
Real-time crisis mapping platform
Minot et al. (2015)
Social network-based approach, content-based approach, consensus-based fusing
Murthy and Longwell (2013)
Rodrigues et al. (2016)
Probabilistic approach Markov random field probability model, Markov Chain Monte Carlo simulation technique
Sakaki et al. (2010)
Kalman filtering and particle filtering
Schulz et al. (2013)
Multi-indicator method for locating tweet creation and location of the users residence
Wing and Baldridge (2011)
Supervision, Kullback–Leibler divergence, Nave Bayes, average cell probability
Nguyen et al. (2016)
Linear regression models
3.1 Data collection
In order to train and validate our model, sufficient tweets related to an event are needed, which should reflect the realistic scenario of that event. We used Twitter API to capture live tweets related to floods in southern and eastern states of India. The data collection was done using streaming API of Twitter with tweepy python library. The tweets were collected during November–December 2015 for Chennai floods (south India), and in July–August 2016 for Bihar. A total of 32,400 tweets were collected with keywords “flood”, “water”, “Baarh”. The collected tweets were in English, Hindi, and some other regional languages. For this study, we concentrated only on tweets in English and Hindi languages.
3.2 Data pre-processing
Tweets contain different types of noise and redundancies, such as emoticons, user mentions, Internet links etc. A proper data pre-processing is needed in order to use these tweets for any meaningful purpose. A number of steps were used to clean the tweets for this study: If the tweet contained “RT”, then that tweet was deleted, as this was not originally created by the sender, and it did not qualify for our analysis. The Internet links (starting with http://) were also deleted from the tweets, as we were concentrating on the tweet text only. Removing Internet links to photos, videos, news items or maps in a tweet can result in loss of useful details about the incident. However, since we were focusing on the textual content of the tweets, the Internet links were ignored. Any unwanted multiple dots were removed, and multiple spaces were merged into one. All non-ASCII characters were also deleted from the tweets. The stop words were removed, as they do not convey any meaningful information. Finally, the textual content of the tweets was converted to lowercase characters, as Uysal and Gunal (2014) showed that lowercase conversion is an effective pre-processing step. As a last step, the text in Hindi was translated to English.
3.3 Event classification
The number of words denoted by (w)
Verb in the tweet denoted by (verb)
Number of verbs denoted by (v)
Position of query word denoted by (pos)
Word before query word denoted by (before)
Word after query word denoted by (after)
3.4 Location estimation
The location of users, who have tweeted asking for help, is determined. During a disaster, the close relatives or friends also tweet asking help for their dear ones with addresses/names. The system detects such tweets and extracts the location information (address) given in the tweet. The first step in location prediction phase is to find whether the tweet refers to the person tweeting, or someone else. If the tweet refers to someone else, the geo tag in the tweet does not help locate the referred person. The system then finds the referred user’s twitter handle mentioned in tweet text, and Markov chain technique of finding the user location is applied.
If the users with high priority tweets have posted with geo-tagging, then the tweets are simply forwarded to the rescue team. On the other hand, if the user with high priority tweet has not tweeted with geo-tagging, the historical tweets of that user for the last 7 days is extracted by the system, and the spatio temporal sequences are extracted from their historical tweets. The rationale behind using historical tweets is that most of the user activities are confined to a very limited area, which is close to their home location (Cho et al. 2011). Most of the users visit locations such as their home, workplace, shopping markets, friends place etc. on a regular basis. Hence, it can be assumed that for most users, their activity area is small, and users with large activity areas represent only a small fraction of the total twitter users. So, given the historical locations of a user, a Markov model can be established to predict the current location of the user. The presence of user at a specific location with respect to time is a stochastic process and can be easily modeled by a Markov chain.
Performance of the classifiers is based on popular measures, such as precision, recall, F1-score and receiver operating characteristic (ROC) curve. The high priority and low priority classes are represented as class 0 and class 1, here.
Precision, recall and F1 scores using gradient boosting classifier
Precision, recall and F1 scores using random forest classifier
The current research attempts to effectively utilize social media in locating users asking for help during a disaster/emergency situation with the use of an automatic tweet parsing system. The system proposed in this study is capable of identifying 81% users tweeting for help during a flood related disaster in an Indian context (see Recall column, Table 4 for class 0). The users tweeting about general information related to the disaster are correctly classified in 76% cases. The objective of this research was to identify high priority users, which was successfully accomplished by our current system, as the classification accuracy of the high priority class is higher than the others. About 81% of the predicted high priority class samples are in fact high priority (see Precision column, Table 4 for class 0). This confirms that tweets can be used to identify users requiring assistance during a disaster with an automated system, as it shows good results for all tweets in English and Hindi languages. The users needing help were also localized, i.e. their latitudes and longitudes were determined with 87% accuracy. The location was determined using (i) address provided in the tweet, (ii) geo-tagging, and (iii) Markov chain. To the best of our knowledge, no one has used historical tweets to predict the current location of a twitter user. This result also supports the claim by Cho et al. (2011), which states that most twitter users have a limited moving zone.
The system picks up tweets from people mentioning names/addresses of other people in need of help, and extracts the location information (address) given in those tweets. The current research enriches the dimension of twitter research by finding whether the tweet refers to the person tweeting, or someone else. If the tweet refers to someone else, the geo-tagged tweet does not help in locating the referred person. If the referred user’s twitter handle is mentioned in the tweet, our Markov chain technique of finding the user location is very helpful. In case the user does not have historical geo-tagged tweets to infer her/his location, the user is automatically prompted by the system asking for their location. The current system adds a new dimension to social media, where it can be used as a tool for public help. Most of the prior research (Xiao et al. 2015; Huang and Xiao 2015; Carley et al. 2016) on uses of social media has mainly concentrated on either evaluating the fitness of social media as a disaster management tool, or detecting tweets related to disasters to warn users about disasters. Some others researchers (Hara 2015) have studied the behavior of Twitter users during disasters. This is probably the first article to use it for social help.
6 Theoretical contributions
The major contribution of this research is the development of text mining algorithm to detect flood related tweets in English and Hindi languages. The other contribution is the classification of these tweets into high and low priority classes to identify tweets needing urgent attention. Six features are extracted from the textual part of the tweet. The lightweight data pre-processing and feature extraction allows to pre-process data and extract the features from tweets as soon as they are collected. The proposed system does not need any extra storage to do all the computations. The other major contribution is location extraction from a tweet when location is mentioned in the tweet text, and in absence of location information, the use of historical locations of the user to predict their probable location using Markov chain. The formulation of Markov Chain using historical locations of a user is one of the highlights of the current research. The location accuracy of the proposed work is also very satisfactory at 81% on an average.
7 Implications for practice
It is important for disaster relief organizations, NGOs etc. to get real time information about the help required by the victims. It is not possible for human beings to directly scan the streaming tweets and filter urgent tweets, because of the volume and velocity of tweets. The proposed system can help separate the tweets requiring urgent attention (high priority tweets) from other tweets. The high priority tweets can then be easily inspected by a human operator, who can then inform their members about the type of help required by the user sending that tweet. This will make the job of relief operation agencies faster and easier. There are some tweets, which are misclassified by the proposed system, which can be studied by the researchers to identify the reasons behind such misclassification. The reasons of misclassification can be used to educate common public on how to write their tweets properly during emergency situations. The low priority tweets can be clustered into groups based on the content of the tweets. This will help the government agencies and researchers to analyze the behavior of users during different phases of an emergency/disaster. Twitter can be augmented to include some interface for asking help during a disaster. The interface can be made intelligent to ask users to turn on their location information while asking for help during a disaster. The location information will help identify user location precisely and quickly for help to be sent at the earliest possibility.
In this article, we proposed a tweet classification system to identify tweets from disaster victims asking for help. Further, the user location is estimated from their old tweets, if the location is not mentioned in the current tweet. The system uses temporal location information of the users to make a Markov model, which is used for location inference. In this research, we have only considered the textual content of the tweets to categorize them, ignoring the Internet links (if any) provided in the tweets. The drawback here is that these Internet links may point to websites, which may yield further information or images of the affected area. The other drawback of the system is that it will not work for a first timer Twitter user, or the users, who have never switched on their geo-location. For the current work, we have tried to resolve this issue by sending an automatic query to user to report her/his location. In future, we will try to infer the locations from other techniques such as users’ friend networks, and other social networks such as Facebook, Tumblr etc. Another limitation of the current system is that it works for flood related disaster, as the system is trained with flood related corpus. For any other disaster, the system has to be trained with that corpus. The current research opens up several new directions for other researchers to explore. The classification accuracy of the system is 81%, which can be further enhanced by considering more parameters. The inclusion of other languages will further enhance the system, as more users are expressing their views in their native languages. Human experts can study the misclassification cases to find the reasons for such misclassifications. This research can also be used to categorize users based on their movement patterns, which can be used by other businesses such as tourism.
- Adedoyin-Olowe, M., Gaber, M. M., Dancausa, C. M., Stahl, F., & Gomes, J. B. (2016). A rule dynamics approach to event detection in twitter with its application to sports and politics. Expert Systems with Applications, 55, 351–360.Google Scholar
- Ajao, O., Hong, J., & Liu, W. (2015). A survey of location inference techniques on twitter. Journal of Information Science, 41(6), 855–864.Google Scholar
- Al-Saggaf, Y., & Simmons, P. (2015). Social media in saudi arabia: Exploring its use during two natural disasters. Technological Forecasting and Social Change, 95, 3–15.Google Scholar
- Atefeh, F., & Khreich, W. (2015). A survey of techniques for event detection in twitter. Computational Intelligence, 31(1), 132–164.Google Scholar
- Aulov, O., & Halem, M. (2012). Human sensor networks for improved modeling of natural disasters. Proceedings of the IEEE, 100(10), 2812–2823.Google Scholar
- Benevenuto, F., Magno, G., Rodrigues, T., & Almeida, V. (2010). Detecting spammers on twitter. Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), 6, 12.Google Scholar
- Bo, H., Cook, P., & Baldwin, T. (2012). Geolocation prediction in social media data by finding location indicative words. Proceedings of COLING 2012: Technical papers, (pp. 1045–1062).Google Scholar
- Boettcher, A., & Lee, D. (2012). Eventradar: A real-time local event detection scheme using twitter stream. In IEEE International Conference on Green Computing and Communications (GreenCom), 2012 , (pp. 358–367).Google Scholar
- Carley, K. M., Malik, M., Landwehr, P. M., Pfeffer, J., & Kowalchuck, M. (2016). Crowd sourcing disaster management: The complex nature of twitter usage in padang Indonesia. Safety Science, 90, 48–61.Google Scholar
- Chae, B. K. (2015). Insights from hashtag# supplychain and twitter analytics: Considering twitter and twitter data for supply chain practice and research. International Journal of Production Economics, 165, 247–259.Google Scholar
- Chae, J., Thom, D., Jang, Y., Kim, S., Ertl, T., & Ebert, D. S. (2014). Public behavior response analysis in disaster events utilizing visual analytics of microblog data. Computers and Graphics, 38, 51–60.Google Scholar
- Cheng, Z., Caverlee, J., & Lee, K. (2010). You are where you tweet: a content based approach to geo-locating twitter users. In 19th ACM international conference on information and knowledge management, (pp. 759–768). ACM.Google Scholar
- Cho, E., Myers, S. A., & Leskovec, J. (2011). Friendship and mobility: user movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, (pp. 1082–1090). ACM.Google Scholar
- Ciulla, F., Mocanu, D., Baronchelli, A., Gonçalves, B., Perra, N., & Vespignani, A. (2012). Beating the news using social media: The case study of american idol. EPJ Data Science, 1(1), 1.Google Scholar
- Comfort, L. K., Waugh, W. L., & Cigler, B. A. (2012). Emergency management research and practice in public administration: Emergence, evolution, expansion, and future directions. Public Administration Review, 72(4), 539–547.Google Scholar
- Dai, J., Wang, S., & Yang, X. (1994). Computerized support systems for emergency decision making. Annals of Operations Research, 51(7), 313–325.Google Scholar
- Dalvi, N., Kumar, R., & Pang, B. (2012). Object matching in tweets with spatial models. In Proceedings of the fifth ACM international conference on Web search and data mining, (pp. 43–52). ACM.Google Scholar
- Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on machine learning, (pp. 233–240). ACM.Google Scholar
- Dong, X., Mavroeidis, D., Calabrese, F., & Frossard, P. (2015). Multiscale event detection in social media. Data Mining and Knowledge Discovery, 29(5), 1374–1405.Google Scholar
- Duhamel, C., Santos, A. C., Brasil, D., Chatelet, E., & Birregah, B. (2016). Connecting a population dynamic model with a multi-period location allocation problem for postdisaster relief operations. Annals of Operations Research, 247(2), 693–713.Google Scholar
- Duong-Trung, N., Schilling, N., & Schmidt-Thieme, L. (2016). Near real-time geolocation prediction in twitter streams via matrix factorization based regression. In Proceedings of the 25th ACM international on conference on information and knowledge management, CIKM ’16, (pp. 1973–1976). New York, USA: ACM.Google Scholar
- Earle, P. S., Bowden, D. C., & Guy, M. (2011). Twitter earthquake detection: Earthquake monitoring in a social world. Annals of Geophysics, 54(6), 708–715.Google Scholar
- Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2010). A latent variable model for geographic lexical variation. In 2010 conference on empirical methods in natural language processing, (pp. 1277–1287). Association for Computational Linguistics.Google Scholar
- Gaspar, R., Pedro, C., Panagiotopoulos, P., & Seibt, B. (2016). Beyond positive or negative: Qualitative sentiment analysis of social media reactions to unexpected stressful events. Computers in Human Behavior, 56, 179–191.Google Scholar
- Gayo-Avello, D. (2013). Nepotistic relationships in twitter and their impact on rank prestige algorithms. Information Processing and Management, 49(6), 1250–1280.Google Scholar
- Graham, M., Hale, S. A., & Gaffney, D. (2014). Where in the world are you geolocation and language identification in twitter. The Professional Geographer, 66(4), 568–578.Google Scholar
- Han, B., Cook, P., & Baldwin, T. (2014). Text-based twitter user geolocation prediction. Journal of Artificial Intelligence Research, 49, 451–500.Google Scholar
- Hara, Y. (2015). Behaviour analysis using tweet data and geo-tag data in a natural disaster. Transportation Research Procedia, 11, 399–412.Google Scholar
- Hecht, B., Hong, L., Suh, B., & Chi, E. H. (2011). Tweets from justin bieber’s heart: the dynamics of the location field in user profiles. In Proceedings of the SIGCHI conference on human factors in computing systems, (pp. 237–246). ACM.Google Scholar
- Heverin, T., & Zach, L. (2012). Use of microblogging for collective sense making during violent crises: A study of three campus shootings. Journal of the American Society for Information Science and Technology, 63(1), 34–47.Google Scholar
- Hiruta, S., Yonezawa, T. Jurmu, M., & Tokuda, H. (2012). Detection, classification and visualization of place-triggered geotagged tweets. In Proceedings of the 2012 ACM conference on ubiquitous computing, (pp. 956–963). ACM.Google Scholar
- Huang, Q., & Xiao, Y. (2015). Geographic situational awareness: mining tweets for disaster preparedness, emergency response, impact, and recovery. ISPRS International Journal of Geo-Information, 4(3), 1549–1568.Google Scholar
- IAMAI (2016). Internet and mobile association of india reprt about mobile twitter user. http://www.iamai.in/media/details/4620
- Imran, M., Elbassuoni, S. M., Castillo, C., Diaz, F., & Meier, P. (2013). Extracting information nuggets from disaster-related messages in social media. Proceedings Of ISCRAM. BadenBaden, Germany.Google Scholar
- Kabra, G., & Ramesh, A. (2015). Analyzing ict issues in humanitarian supply chain management: A sap-lap linkages framework. Global Journal of Flexible Systems Management, 16(2), 157–171.Google Scholar
- Kwon, H. Y., & Kang, Y. O. (2016). Risk analysis and visualization for detecting signs of flood disaster in twitter. Spatial Information Research, 24(2), 127–139.Google Scholar
- Laylavi, F., Rajabifard, A., & Kalantari, M. (2016a). Event relatedness assessment of twitter messages for emergency response. Information Processing and Management, 53(1), 266–280.Google Scholar
- Laylavi, F., Rajabifard, A., & Kalantari, M. (2016b). A multi-element approach to location inference of twitter: A case for emergency response. ISPRS International Journal of GeoInformation, 5(5), 1–16.Google Scholar
- Lei, L., Pinedo, M., Qi, L., Wang, S., & Yang, J. (2015). Personnel scheduling and supplies provisioning in emergency relief operations. Annals of Operations Research, 235(1), 487–515.Google Scholar
- Li, F., & Du, T. C. (2014). Listen to meevaluating the influence of micro-blogs. Decision Support Systems, 62, 119–130.Google Scholar
- Li, R., Lei, K. H., Khadiwala, R., & Chang, K. C.-C. (2012). Tedas: A twitter-based event detection and analysis system. In IEEE 28th international conference on data engineering, (pp. 1273–1276). IEEE.Google Scholar
- Lin, X., Lachlan, K. A., & Spence, P. R. (2016). Exploring extreme events on social media: A comparison of user reposting/retweeting behaviors on twitter and weibo. Computers in Human Behavior, 65, 576–581.Google Scholar
- Macias, W., Hilyard, K., & Freimuth, V. (2009). Blog functions as risk and crisis communication during hurricane katrina. Journal of Computer-Mediated Communication, 15(1), 1–31.Google Scholar
- Middleton, S. E., Middleton, L., & Modafferi, S. (2014). Real-time crisis mapping of natural disasters using social media. IEEE Intelligent Systems, 29(2), 9–17.Google Scholar
- Minot, A. S., Heier, A., King, D., Simek, O., & Stanisha, N. (2015). Searching for twitter posts by location. In Proceedings of the 2015 international conference on the theory of information retrieval, (pp. 357–360). ACM.Google Scholar
- Mishra, D., Gunasekaran, A., Childe, S. J., Papadopoulos, T., Dubey, R., & Wamba, S. (2016). Vision, applications and future challenges of Internet of Things: A bibliometric study of the recent literature. Industrial Management and Data Systems, 116(7), 1331–1355.Google Scholar
- Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2013). Is the sample good enough? comparing data from twitter’s streaming api with twitter’s firehose. arXiv preprint arXiv:1306.5204.
- Murthy, D., & Longwell, S. A. (2013). Twitter and disasters: The uses of twitter during the 2010 pakistan floods. Information, Communication and Society, 16(6), 837–855.Google Scholar
- Neubaum, G., Rosner, L., Rosenthal-von der Pütten, A. M., & Krämer, N. C. (2014). Psychosocial functions of social media usage in a disaster situation: A multi-methodological approach. Computers in Human Behavior, 34, 28–38.Google Scholar
- Nguyen, Q. C., Kath, S., Meng, H.-W., Li, D., Smith, K. R., VanDerslice, J. A., et al. (2016). Leveraging geotagged twitter data to examine neighborhood happiness, diet, and physical activity. Applied Geography, 73, 77–88.Google Scholar
- Oh, O., Agrawal, M., & Rao, H. R. (2013). Community intelligence and social media services: A rumor theoretic analysis of tweets during social crises. MIS Quarterly, 37(2), 407–426.Google Scholar
- Oku, K., Ueno, K., & Hattori, F. (2014). Mapping geotagged tweets to tourist spots for recommender systems. In IIAI 3rd international conference on advanced applied informatics (IIAIAAI), 2014 , (pp. 789–794). IEEE.Google Scholar
- O’Leary, D. E. (2015). Twitter mining for discovery, prediction and causality: Applications and methodologies. Intelligent Systems in Accounting, Finance and Management, 22(3), 227–247.Google Scholar
- Ozdamar, L., Ekinci, E., & Küçükyazici, B. (2004). Emergency logistics planning in natural disasters. Annals of Operations Research, 129(1), 217–245.Google Scholar
- Palen, L., Vieweg, S., & Anderson, K. M. (2010). Supporting everyday analysts in safetyand time-critical situations. The Information Society, 27(1), 52–62.Google Scholar
- Papadopoulos, T., Gunasekaran, A., Dubey, R., Altay, N., Childe, S. J., & Fosso Wamba, S. (2017). The role of big data in explaining disaster resilience in supply chains for sustainability. Journal of Cleaner Production, 142, 1108–1118.Google Scholar
- Paul, J. A., & Hariharan, G. (2012). Location-allocation planning of stockpiles for effective disaster mitigation. Annals of Operations Research, 196(1), 469–490.Google Scholar
- Paul, M. J., & Dredze, M. (2011). You are what you tweet: Analyzing twitter for public health. ICWSM, 20, 265–272.Google Scholar
- Rodrigues, E., Assunçao, R., Pappa, G. L., Renno, D., & Meira, W, Jr. (2016). Exploring multiple evidence to infer users location in twitter. Neurocomputing, 171, 30–38.Google Scholar
- Sakaki, T., Okazaki, M., & Matsuo, Y. (2010). Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on world wide web, (pp. 851–860). ACM.Google Scholar
- Sakaki, T., Okazaki, M., & Matsuo, Y. (2013). Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Transactions on Knowledge and Data Engineering, 25(4), 919–931.Google Scholar
- Schulz, A., Hadjakos, A., Paulheim, H., Nachtwey, J., & Muhlhauser M.(2013). A multi-indicator approach for geolocalization of tweets. In Seventh international AAAI conference on weblogs and social media, (pp. 573–582).Google Scholar
- Skoric, M., Poor, N., Achananuparp, P., Lim, E.-P., & Jiang, J. (2012). Tweets and votes: A study of the 2011 singapore general election. In 45th Hawaii international conference on system science (HICSS), (pp. 2583–2591). IEEE.Google Scholar
- Steiger, E., Albuquerque, J. P., & Zipf, A. (2015). An advanced systematic literature review on spatiotemporal analyses of twitter data. Transactions in GIS, 19(6), 809–834.Google Scholar
- Sushil (2017). Theory building using sap-lap linkages: an application in the context of disaster management. Annals of Operations Research, 1–26. doi: 10.1007/s10479-017-2425-3.
- Turoff, M., Hiltz, S. R., Banuls, V. A., & Van Den Eede, G. (2013). Multiple perspectives on planning for emergencies: An introduction to the special issue on planning and foresight for emergency preparedness and management. Technological Forecasting and Social Change, 80(9), 1647–1656.Google Scholar
- Ulku, M. A., Bell, K. M., & Wilson, S. G. (2015). Modeling the impact of donor behavior on humanitarian aid operations. Annals of Operations Research, 230(1), 153–168.Google Scholar
- Uysal, A. K., & Gunal, S. (2014). The impact of preprocessing on text classification. Information Processing and Management, 50(1), 104–112.Google Scholar
- Watanabe, K., Ochi, M., Okabe, M., & Onai, R. (2011). Jasmine: a real-time local event detection system based on geolocation information propagated to microblogs. In Proceedings of the 20th ACM international conference on information and knowledge management, (pp. 2541–2544). ACM.Google Scholar
- Weiler, A., Grossniklaus, M., & Scholl, M. H. (2016). An evaluation of the run time and task-based performance of event detection techniques for twitter. Information Systems, 62, 207–219.Google Scholar
- Wing, B. P., & Baldridge, J. (2011). Simple supervised document geolocation with geodesic grids. In 49th Annual meeting of the association for computational linguistics: Human language technologies-volume 1, (pp. 955–964). Association for Computational Linguistics.Google Scholar
- Xiao, Y., Huang, Q., & Wu, K. (2015). Understanding social media data for disaster management. Natural Hazards, 79(3), 1663–1679.Google Scholar
- Yardi, S., Romero, D., Schoenebeck, G., et al. (2010). Detecting spam in a twitter network. First Monday, 15(1).Google Scholar
- Zhang, X., Chen, X., Chen, Y., Wang, S., Li, Z., & Xia, J. (2015). Event detection and popularity prediction in microblogging. Neurocomputing, 149, 1469–1480.Google Scholar
- Zhou, X., & Chen, L. (2014). Event detection over twitter social media streams. The VLDB Journal, 23(3), 381–400.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.