1 Introduction

Emergency events include natural disasters such as earthquakes, cyclones, floods, fire, epidemics, as well as man-made disasters such as terror attacks, riots, socio-political movements (such as earthquakes, floods, terror attacks). In recent times, these have unfortunately become recurring scenarios. During an emergency event, one of the primary challenges is to obtain relevant and trustworthy ‘situational information’ about the event. In today’s world, Online Social Media (OSM), such as Twitter, Facebook and WhatsApp, have become important sources of real-time information related to emergency events. People at the site of the event, as well as elsewhere, can quickly post relevant information on such sites, whose use has increased exponentially due to ubiquity of smartphones and mobile Internet. In such scenarios, the Information Systems (IS) community can play a vital role in providing methodologies and systems for collecting, aggregating, and analyzing situational information in real-time, to assist in emergency relief operations as well as in emergency preparedness, such as cyclone and tsunami warning systems, surveillance systems etc.

Effective exploitation of the content posted on OSM requires effective real-time information processing methods. There are several challenges associated with extracting situational information from social media – see (Imran et al. 2015; Nazer et al. 2017) for surveys on these challenges. A primary challenge is that the valuable situational (or actionable) information is often obscured among large amounts of conversational content (Rudra et al. 2015). Again, the prevalence of rumors and fake information on social media is another challenge in separating out trustworthy situational information from rumors (Qazvinian et al. 2011; Zubiaga et al. 2018). Aggregating information from multiple OSM and online/offline resources (Roy et al. (2018)) is another interesting problem that demands research attention. Real-time management and summarization of dynamic content streams is another challenging problem in this respect. The informal vocabulary and brevity of social media posts also add to the challenge of retrieving useful information (Roy et al. 2017). Addressing the code-mixed vocabulary of OSM content (Ganguly et al. 2016) is also an important problem in multilingual countries and in countries where the native language is not English. Section 2 discusses some of these research challenges in further detail.

Given the aforementioned challenges, a special issue was prepared on “Exploitation of Social Media for Emergency Relief and Preparedness” for the journal Information Systems Frontiers. The objective of this special issue is to report current research examining various aspects of effective information extraction and exploitation from social media, for emergency relief as well as emergency preparedness. This special issue encouraged submissions of high-quality research papers related (though not limited to) to the problems mentioned above. The special issue aimed to bring together diverse research communities – such as Information Retrieval, Data Mining and Machine Learning, Natural Language Processing, Computational Social Science, Human Computer Interaction, and so on – that can potentially contribute towards building Information Systems for utilizing social media for emergency relief and preparedness.

2 Research challenges in using social media during emergencies

This section describes some research challenges (RCs) in utilizing online social media during emergency situations. The reader is referred to (Imran et al. 2015; Nazer et al. 2017) for more details on these challenges.

2.1 RC1: Identifying important situational information

Most of the information posted on social media during an emergency event is conversational in nature (e.g., sympathy for the victims of the disaster) while only a small fraction of the information actually provides situational information. Hence it is a necessary task to extract the situational information from the message stream. Even within situational information, different types of information are useful for different types of stakeholders. For instance, during a disease outbreak, the people who are already affected by the disease need to know about treatments, while people who are not yet affected need to know about symptoms and preventive measures, while monitoring organisations need information on which areas the disease is spreading. Hence, classifiers/Information Retrieval methodologies are needed to distinguish among different information or to extract particular types of information, so that relevant information can be routed to suitable stakeholders.

2.2 RC2: Extracting needs and availabilities of various types of resources, and matching needs with appropriate availabilities

An important sub-category of situational information, that is critical for coordinating relief efforts, is the information of what resources are needed and what resources are available in the disaster-affected area. It is observed that such critical information is posted on online social media like Twitter. For instance, Table 1 shows examples of tweets informing about needs and availabilities of resources, posted during the 2015 Nepal earthquake. Algorithms need to be developed to extract such critical information from among the social media posts.

Table 1 Examples of tweets informing about need and availability of resources, during the 2015 Nepal earthquake

After identifying needs and availabilities, another important task is to automatically suggest appropriate matches. For instance, the tweets shown adjacent to each other in Table 1 are good matches since they inform about needs and availabilities of the same resources. (Basu et al. 2018) have suggested few methods for automatically matching need-tweets with availability-tweets in disaster situations.

Note that the needs can be mentioned explicitly or expressed in a covert fashion, such as ‘people are staying outside. #Kathmandu. #Nepal #earthquake’. In the latter case it becomes a challenge to realize that tents or shelter is the item which is being sought; hence identifying and understanding/matching such posts are challenging.

2.3 RC3: Summarization of social media content streams

During an emergency event, information is posted so rapidly that it is not possible for human responders to go through all the data. Hence real-time algorithms are needed for timeline summarization. Also, as the event evolves, the summaries also need to evolve over time. There exist several summarization algorithms, including some specially for summarizing tweet streams during emergency events, e.g., (Rudra et al. 2015). However, evaluation of these algorithms needs to be looked into freshly. Evaluation of summarization algorithms is traditionally done using ROUGE scores (based on unigram/bigram overlap with gold standard summaries), but these measures are not sufficient for timeline summarization methods. Nugget-based or cluster-based evaluation methods have recently been shown to be more effective, but they require lot of annotation effort (Baruah et al. 2017).

2.4 RC4: Combining information from multiple sources

Situational information is extremely critical during emergency situations, hence all possible information sources need to be tapped into. Hence multiple information sources, such as news reports, social media (Twitter, Facebook, etc.), SMS or WhatsApp messages from mobile phones, etc. should be utilized together. The challenge in incorporating different information sources is that the vocabulary used in different sources might be different. For instance, news reports are written formally, while social media posts are often written informally -- while a news report on the 2015 Nepal earthquake will mention ‘Kathmandu’, researchers observed several tweets use the abbreviation ‘KTM’ for the same city. Hence, intelligent algorithms are needed to deal with the varying vocabulary of different information sources. (Roy et al. 2018) takes an initial step in this direction, where a neural network model is used to construct a common embedding space from the different vocabularies of the different information sources Facebook, Twitter, and WhatsApp.

2.5 RC5: Guarding against misinformation and other types of harmful content

During times of disaster, there is widespread panic and tension amongst the people. Not only the victims, but also the volunteers remain in a state of stress, due to which misinformation and rumours are able to seep into the network (Mondal et al. 2018). It is a challenge to detect such misinformation and rumours, since at such times, even genuinely renowned people can also unwittingly post rumours. In fact, combining information from multiple sources (RC4) might be a good way of identifying misinformation.

Another type of harmful content that is often posted during emergency events is communal content that targets particular religious or social groups (Rudra et al. 2018b). Surprisingly, such communal content is posted both during man-made emergencies (e.g., terror attacks) as well as natural disasters (e.g., floods and earthquakes). Methods to detect such content and then effectively deal with it must be developed. For instance, (Rudra et al. 2018b) propose using anti-communal content to counter the effects of communal content being posted.

2.6 RC6: Adding support for non-english and code-mixed data

During emergency events in multilingual societies, a significant amount of information is posted in regional languages such as Hindi. In fact, it has been observed that the social media posts in regional languages often contain significant situational information that is either not present in the English posts, or comes earlier than in the English posts (Rudra et al. 2015; Basu et al. 2017a, b). Additionally, there is lot of code-mixed data, where the same post contains words in multiple languages. Traditional Natural Language Processing and Information Retrieval techniques are not likely to perform well on such data. One option is to first translate all content into a single language (most commonly, English) and then apply traditional techniques for English NLP. However, the performance of this method is heavily dependent on the accuracy of the translation. Hence methodologies based on word embeddings are recently being tried to effectively deal with multi-lingual and code-mixed content.

2.7 RC7: Identifying important images posted on social media

In addition to the textual information, many users post informative images on social media to either convey the magnitude of devastation or to show certain locations where the aid is required. Along with the text, such images can also convey critical situational information (Alam et al. 2017). There has been very little research on utilising the images posted on social media for disaster relief. New methods are needed to identify images which provide situational information (e.g., infrastructure damage), and then to utilise such images in various ways to help the relief efforts.

The principal challenge in addressing the research challenges stated above is the informal nature of content posted on OSM. Crowdsourced content posted on OSM often contains informal language, arbitrary abbreviations of words, emoticons, different spellings for the same word (e.g., ‘gurudwara’ and ‘gurdwara’), multi-lingual and code-mixed content, and so on. As a result, traditional Natural Language Processing (NLP) and Information Retrieval (IR) methods, which are primarily meant for formal monolingual text, do not work well over such informal content. Neural network/Deep Learning (DL) methodologies have recently been found to be effective in such applications. It should be noted that DL techniques also have some limitations, e.g., they require huge amounts of training data that may be expensive to produce. Additionally, it takes lot of time (even days) to train DL models, and such high training times may not be affordable at the times of an emergency. Hence, a combination of traditional NLP/IR techniques and DL techniques can be the more practical methodology.

3 Papers in the special issue

In this special issue, high quality research papers that had neither been published previously nor were under consideration for publication in any other journal or conference were invited. Survey papers of superior quality were also invited. Extended versions of previously published papers were also welcome, but the submissions needed to contain at least 40% new material with respect to the previously published versions.

Eighteen (18) papers were submitted. After peer-reviews (most of which comprised major/minor revisions) by a competent Program Committee, nine (09) papers were accepted for inclusion in the special issue. The accepted papers covered several aspects of utilizing social media for crisis informatics. All except one of the accepted papers focused on post-disaster analysis (as opposed to disaster preparedness). The contributions of the accepted papers are briefly described below.

Disaster preparedness

Nemeskey and Kornai (2018) presented the only accepted paper on disaster preparedness. They report on ‘ahead of time’ preparation of vocabulary for social media messages posted during disaster events. This vocabulary is important because it contains typical keywords specially curated for emergency-situations in the absence of expert or crowdsource knowledge that can help in finding actionable information during disasters. Starting with some manually selected seed keywords, the vocabulary was expanded automatically by using lexical and semantic matching techniques from a given collection of documents containing emergency information. This method was successful in retrieving important keywords when evaluated against standard emergency vocabularies like CrisisLex.

Among the studies on post-disaster information analysis, most of the papers either proposed machine learning techniques for different tasks, or used “big data” frameworks to develop information systems that would be useful in a post-disaster situation.

Application of machine learning

Bandyopadhyay et al. (2018) propose a word-embedding based Ad Hoc Information Retrieval system that outperforms conventional term-matching based IR model. They also show that the proposed word embedding based method on the disaster-specific SMERP 2017 dataset, is more effective for this task than word embedding trained on the large social media collection provided for the TRECFootnote 1 2011 Microblog track dataset. Rudra et al. (2018a) develop a classifier which leverages low-level lexical features to distinguish between different disease categories on tweets of two recent outbreaks – Ebola and MERS. They also propose effective summarization techniques on the classified messages. Palshikar et al. (2018) propose self-learning algorithms that use minimal supervision to construct a simple bag-of-words model of information expressed in the news about various natural disasters. They show empirically that the proposed model outperforms many state-of-the-art semi-supervised learning algorithms. They also present an online algorithm that learns and automatically adjusts weights of the initial word model. Mondal et al. (2018) attempt rumor-detection on the tweets at early stage in the aftermath of a disaster situation. To this end, they present a probabilistic model on the important features of rumor propagation using which they obtain better rumor detection performance on tweets collected during a disaster event over relevant baselines.

Design of information systems

Two of the papers, viz. Troudi et al. (2018) and Avvenuti et al. (2018), use “big data” framework to develop useful information systems. Troudi et al. (2018) report a new mashup based method for event detection from social media using the Hadoop framework. They attempt bilingual event detection for English and French. Their proposed setup offers a multidimensional visualization by combining different multimedia components. On the other hand, Avvenuti et al. (2018) present a Big Data crisis mapping system capable of quickly collecting and analyzing social media data. They apply a classification based technique using word embeddings and geo-tagging to identify actionable information from tweets collected during two natural disasters in Italy.

Understanding social media posts during emergency events

The other two works (Smith et al. (2018) and Hong et al. (2018)) aimed to better understand what is posted in the aftermath of disaster events. Smith et al. (2018) presented a post-disaster study on the nature of languages and the underlying bias found in the tweets posted during this period. In fact, the authors discuss about regional sentiment bias during crises. They present a multi-lingual study over three languages for two events. They report interesting observations, such as, during the 2016 Paris terrorist attacks, there were 16% more negative comments written in English than what was written in French, even though the event originated in France. Hong et al. (2018) design a semi-automatic framework to understand the specific topics discussed from the communication contents of citizens and local governments during 18 snowstorms in the State of Maryland, US. Their study is aimed at potentially helping the local governments to identify citizens’ information needs and make decisions on the kind of information to deliver under certain conditions during natural disasters.

Thus, there is significant diversity in the papers included in the special issue, which reflect the diverse challenges that need to be addressed in this domain.

4 Forums related to using social media for emergency informatics

The First Workshop on Exploitation of Social Media for Emergency Relief and Preparedness (SMERP) 2017 (smerp2017) (co-located with the ECIR 2017 conference), on a similar theme, acted as a precursor to this special issue.Footnote 2 SMERP 2017 had two tracks – Peer-review Track and Data Challenge Track [Ghosh et al. 2017]. The peer-review Track requested for submissions on a theme aligned with this special issue, while the Data Challenge Track requested the participants to submit solutions on two tasks (Text retrieval and Text Summarization) on a given dataset. Almost all the teams who submitted to SMERP 2017 Peer-review Track, submitted to this special issue. Out of the teams whose papers were also accepted at SMERP 2017, (Palshikar et al. 2018) and (Nemeskey and Kornai 2018) are the two papers accepted in this special issue as well. The encouraging number of submissions in this special issue led the the second edition of SMERP, viz. SMERP 2018 which was co-located with The Web Conference (WWW 2018).Footnote 3 In SMERP 2018, a new theme – multi-modal and multi-view information retrieval – was added to the scope of the SMERP 2017 workshop.

Other important forums for researchers on this topic include the conference series on Information Systems for Crisis Response and Management (ISCRAM).Footnote 4 Another relevant forum is the Workshop series on “Social Web for Disaster Management” whose 2018 version was co-located with the WSDM 2018 conference, and had the theme ‘collective sensing, trust, and resilience in global crises’..Footnote 5

Additionally, several shared tasks have recently been organized on specific tasks pertaining to effective use of social media during emergencies. Examples of such shared tasks include the track “Information Retrieval from Microblogs during Disasters” (IRMiDis) [Ghosh and Ghosh 2016] that has been organized with the Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE) since 2016 (irmidis2016). These shared tasks have made available to the research community datasets that can be used for developing methods to address some practical challenges in utilizing social media for post-disaster relief operations [Ghosh and Ghosh 2016; Basu et al. 2017a].

5 Conclusion and future directions

Many insightful papers appear in this special issue on Exploitation of Social Media for Emergency Relief and Preparedness. It can be hoped that the papers will initiate important discussions and impactful, practical ideas in near future.

Several challenges remain to be addressed in the domain of utilizing social media for crises informatics. One such challenge is to combine information from multiple online and offline sources, and information of different modalities (e.g., text and image, posts in different languages) for more effective coordination of relief activities. Another very pertinent challenge is to identify information that is actually ‘actionable’ in a post-disaster scenario; in fact, the notion of ‘actionability’ remains to be defined. The organizers of this special issue hope that the research community will address these questions in the near future.