When exploring a way to use Big-Data as a means to support the inter-organizational collaborations that articulate the responders’ actions and decisions, [10] highlights that this is a challenge on both (i) a social layer with the necessity to support the inter-organizational collaborations (among the crisis’ responders) at a macro-level and, in the same time, (ii) a technological layer with the lack of methodology to deal with unknown sources of data that needs to be “understood” to be exploited by decision-makers. Benaben et al. [10] especially summarize that, in disaster management, being able to use and “understand” data is required to support the coordination issues. Hence, focusing on the use of social media data, the following presents a specific study on existing methodologies to exploit and interpret this kind of data in such a context.
Yin et al. [7] have developed a tool to mine and extract data from micro blogs related to the crisis in a near-real-time manner. This tool allows users, both in a crisis decision making or general public role, to visualize information about the incident and the potential impact it may have. The tool is based on burst detection, numerous messages with similar wording or from the same area with an automated data collection. Sakaki et al. use Twitter data flows to detect earthquake in real-time [11]. They built a reporting tool that embeds two parts: the detection of tweets that are about earthquakes and a service dedicated to location detection. [12] provide a four-step approach to take advantage of tweets in crisis situations: (i) burst detection, (ii) text classification to find tweets about infrastructures status, (iii) online clustering and (iv) geotagging of the tweets.
Overall, these solutions are rather about providing decision-makers with visualization tools. As such, even if it is of help for the users, the resulting outputs remain at a data level of abstraction, i.e. the users still need to put them into a context by themselves if they want to obtain usable information for their decision-making process.
Many works have based their approach on classification problems, i.e. labelling each tweet with specific class(es), that have been predefined. Among them, [13, 14] propose tweets datasets with many crises cases. In the first case, the tweets have been classified informative or not thanks to a crowdsourcing service. In the second case, as a first step, tweets have been classified between on-topic and off-topic thanks to crowdsourcing, and the authors have use this training dataset to extract specific terms related to crisis (app. 380 terms). In line, [15] built a lexicon EMTerms 1.0 of more than 7200 terms directly related to 23 specific crisis categories. Both lexicons are available onlineFootnote 1. Using lexicons with keywords occurrence can show some good enough results, providing that the lexicons were built on suitable use cases (i.e. if they are built on hurricane and earthquake cases, there are good chances that the results will be bad in the case of a terror attack), however, extending lexicons with new cases is time-consuming and results are not always guaranteed, because it is rather hard to detect the terms that could make the lexicons too specific.
In Natural Language Processing (NLP), many methods can be used to classify corpus of documents (in our case a set of tweets, each one belonging to a specific class). Among them, a Convolutional Neural Network model, as presented in [16] allows to adapt an Artificial Neural Network model for analyzing n-grams. [17] provide comparative results of tweets classification over several methods: Support Vector Machines, Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN) through unigrams, unigrams + bigrams and unigrams + bigrams + trigrams. As a result, they show that in their two-class (on-topic or off-topic) classification case, CNNs are the most suitable.
Finally, this state of the art shows that, in order to “go up” in abstraction (i.e. instantiate the concepts into the model), the current systems miss to take into account the interactions among the tweets, which could be enabled thanks to a deeper semantic analysis.