Synonyms

Deep learning; Flood detection; Machine learning; Metadata

Definitions

Water is one of the vital substances for life, which occupies two-thirds of the earth’s surface. For the survival of living beings, water management is critically important. One of the most common sources of its heavy wastage is flood. Conventional methods of disaster management have improved at large extent. Similarly, flood prediction, localization, and recovery management mechanism have been upgraded from manual reporting to advanced sensor-based intelligent systems. Additionally, data from social media big data streams appeared as a main source of information flow for ordinary events as well as for emergency situations. Millions of images along with text streams, audio clips, and videos are uploaded from different corners of the world. Data from social media is then processed to gain various socioeconomic benefits including weather prediction, disaster awareness, and so on. The aim of this chapter is to discuss and evaluate state-of-the-art computer vision and natural language techniques for flood detection using social media big data streams.

Overview

This chapter provides comprehensive review of the state-of-the-art natural language processing (NLP), machine learning (ML), and deep learning (DL) technologies for flood detection. This chapter also summarizes the performances of several ensemble techniques where floods are predicted using the combination of meta and visual data.

Introduction

Hydroinformatics helps to effectively manage water resources of the world, by utilizing information and communication technology (Abbott et al. 1991). Conventional methods of hydroinformatics have entirely changed with the advent of technological advancement. Traditionally, gauging machines were used to manage flood detection by examining the level of water. Floods are usually caused by overflowing of river, more rainfall than available drainage system, sudden breach of protection made against it, or unexpected opening of hydropower (Jongman et al. 2015). In case of weak protective measures, floods may create dangerous situation for living beings. Mostly, valuable economical resources are also highly affected by floods. The strong defensive measures are directly linked with the information regarding the flood. The strong flood response system performs its function on the basis of the data collected from different sources including weather forecast, social media contents, and data collected from satellite system, as shown in Fig. 1. The information helps to indicate time, location, causes, and impact of the flood. Flood management system makes preparation as a response to any possibility of the flood. The response includes relief during the event. While recovery starts after the end of the emergency, to overcome the situation (Jiang et al. 2013; Chen and Han 2016; Jongman et al. 2015), there are various resources, from which useful information is generated to overcome the situation, which includes social media contents, satellite data, and weather management data. On the basis of these information resources, flood management system performs the responsibilities of preparation, response, and recovery, as shown in Fig. 1.

Flood Detection Using Social Media Big Data Streams, Fig. 1
figure 64figure 64

Flood management system

The omnipresent nature of Internet has provided social interactivity pathway to the users. Moreover, social media has provided the opportunity of rapid interaction to its users and effortless sharing of text, images, audio, and video as well as any of their combination. Various social media giants are competing to attract maximum users, and Facebook tops the list. As per an official announcement, Facebook has achieved the target of 2 billion users by June, 2017. Big data collected from social media can be used for the well-being of the society and different situational awareness analysis. Specially, people use social media during emergency situations like storms and floods, to spread awareness among people of the surrounding vicinity, authorities, and humanitarian organization so that the situation may be tackled appropriately. Along with the data from social media, satellites are also used for the prediction and detection of flooding events (Pohl et al. 2012, 2016; Fohringer et al. 2015).

The massive quantity of disaster-related information is regularly collected from social media and satellite. These huge datasets require advanced storage and processing technologies. The term “big data” is used to denote complex and huge datasets, which cannot be captured, stored, and processed by normal computing technologies including big data analytics and cloud computing (Shamsi et al. 2013). The big data techniques create patterns by exploring huge amount of data. This motivates usage of big data techniques in disaster management systems, especially in flood management (Eilander et al. 2016; Chen and Han 2016). The main aim of this chapter is to discuss the existing challenges for flood detection using social media big data streams. This chapter will also provide comprehensive review of the state-of-the-art natural language processing (NLP), machine learning (ML), and deep learning (DL) technologies for flood detection.

Literature Review

Most awful situations are faced by the countries with scarcity of resources. They are unable to handle floods properly. Consequently, humanitarian organizations play their role to handle the situation. Moreover, countries having a larger area are also facing troubles in protecting larger region from flooding events. Also, there is a lack of proper intercommunication mechanism between countries to share the experiences and knowledge regarding flood management. Flooding events in Pakistan and Philippines are analyzed to explore the timing of occurrence of flood, mapping of its location, and understanding of its impact (Jongman et al. 2015). Data has been gathered by using organizations of disaster response, Global Flood Detection System (GFDS) satellite signals, and information from Twitter accounts. It was observed that GFDS satellite information is appropriate to produce better results in case of larger floods while less appropriate for flood with smaller duration. It also has a drawback, as it produces errors while calculating data in coastal areas, so it is more suitable to be used in regions of delta or islands. Moreover, Twitter information analysis for flood exploration can produce fruitful results in handling flood of any size. However, it has challenges of finding the appropriate location of event. Additionally, Twitter population is limited in rural areas, which hinders the accurate and timely information (Jongman et al. 2015).

In an experimental research study (Imran et al. 2016), a huge corpus of 52 million Twitter messages has been gathered during the period of 2013–2015. Dataset was collected from different parts of the world, during 19 different incidents of crises including earthquakes, floods, landslides, typhoons, volcanos, and airline accidents. Artificial Intelligence for Disaster Response (AIDR), an open source platform, is used for the collection and classification of the messages. Additionally, the platform provides easy method to collect tweets from specific regions with particular keywords. Messages are categorized according to their contents as injured and dead people, missing people, damaged infrastructure, volunteer services, donations, sympathy, and emotional support. Before classification, stop-words and uniform resource locators (URLs) are removed from the data; both of them have no or less role in classification. Moreover, Lovins stemmer is used, which removes the derivational part of the word to make it more clear like “flooding” will be replaced by “flood.” Unigram and bigram words are defined as features. However, the information gained is used to select the top 1000 effective features. After the selection of features, multiclass categorization is performed by using support vector machine (SVM), random forest (RF), and Naïve Bayes (NB) algorithms. Evaluation of trained models is performed by tenfold cross validation, and largest word embedding on the topic of crisis management has been developed (Imran et al. 2016).

Another research (Nguyen et al. 2016) has explored stochastic gradient descent and proposed online algorithm using deep neural network (DDN). This algorithm is aimed to perform identification and classification of significant tweets. For identification, informative effectiveness of each tweet is investigated, and as a result, two categories were formed which were tagged as “informative” and “non-informative.” Informative tweets contains indication regarding the category of destruction due to disaster. Furthermore, the tweets are categorized according to their contents, as tweets containing an indication of loss of infrastructure are placed in a different category, while tweets with information regarding deceased and injured persons are placed in a separate category. Convolutional neural network (CNN)-based model is trained and tested for multiple six classes, including “donation and volunteering,” “sympathy and support,” and so on (Nguyen et al. 2016). Some public datasets for disaster events are shown in Table 1.

Flood Detection Using Social Media Big Data Streams, Table 1 Some public datasets for disaster event

The severity of flood is diverse for different parts of the world. Few countries are significantly affected, as in Malaysia 40% of the total damage is caused by floods. The flood prone region of Terengganu, Malaysia, has been mapped by implementing support vector machine (SVM). The flood event took place in this area in November, 2009. Flood locations were detected by the method of texture analysis, and 181 flood-affected locations were recorded. The quantity of 70% locations was selected as training set and remaining 30% as testing set. Availability of flood or its absence is verified by binary classification with the help of various mathematical functions called kernels. This approach has used sigmoid (SIG), radial basis function (RBF), linear (LN), and polynomial (PL) kernel functions. Results were compared with the popular method of frequency ratio; it is found that SVM linear and SVM polynomial achieved success rate of 84.31% and 81.89%, respectively, while 72.48% by frequency with ratio method (Tehrany et al. 2015).

Efficient Natural Language Processing (NLP) Techniques for Flood Detection Using Social Media Big Data Text Streams

Natural language processing (NLP) plays a vital role in progressing artificial intelligence (AI) research in many practical areas of life. There are many several challenges in the NLP realm to be resolved such as natural language generation, natural language understanding, dialog systems, speech generation, and recognition systems to name a few. Major fields that support NLP research are computer science, artificial intelligence, and computational linguistics. On the Internet 80% of user-generated contents are in textual form, and the potential of this huge knowledge resource is still to be explored. Internet with its recent advancement in real-time interactive user-/social group-generated textual content and readership offers new potential applications in the area of real-time event-based information processing and management. The NLP-based applications generally have a preprocessing pipeline for processing of textual features into more meaningful units. There can be parsing of lexeme from text, stemming/lemmatization, statistical, or counting of features. During the US presidential election of 2016, Twitter proved to be the largest and powerful source of breaking news, with 40 million election-related tweets posted by 10 PM on election day. Social media platforms, including Twitter, have been used and analyzed for use in disasters and emergency situations. Kaminska and Rutten (2014) identified the main uses of social media across the four pillars of the disaster management cycle being prevention and mitigation, preparedness, response, and recovery. Following are the three main areas where these platforms can be very helpful (Dufty et al. 2016).

  • public information

  • situational awareness

  • community empowerment and engagement

Natural language processing (NLP) is a significant research area, which aims to achieve computational language processing mechanism, similar to human beings. Various desired tasks may be accomplished through NLP, including translation of text from one language to another, paraphrasing, summarization, informational retrieval, and so on. Nowadays, social media contents are a huge source of information; hence appropriate processing could reveal valuable outcomes. The text available in social media is in raw format and needs preprocessing. Various techniques are used for preprocessing including stemming and lemmatization. Stemming is a process of finding the root or the main word from its derived word. For example, if stemming is applied on word flooding and storming, they will be reverted to their root words as “flood” and “storm,” respectively. Lemmatization removes unnecessary part of the word and provides root word, which must be a dictionary word with proper meaning (Bramer 2007; Eilander et al. 2016).

Challenges in NLP

Social Media Challenges

The information available in the form of social media contents is unreliable in various aspects. The creators of the information are not experts of the field; this inexperience may result in wrong judgment. Moreover, there are no restrictions on the usage of slang language. As the word flood may be used to indicate a huge quantity of people or things arriving at some place. Additionally, information mentioned on social media could indicate past experiences, rather than current situations (Eilander et al. 2016).

Challenges of Using Tweets for Disaster Response System

Twitter is an important part of social media, where the user may upload text up to 140 characters. Usage of data collected from tweets for crisis management faces various challenges due to the informal and restrictive nature of Twitter. The major drawback is unintentional spelling mistakes due to negligence of user. Additionally, due to the restriction of fewer words, the user may intentionally use formal or informal abbreviations of the words as; “Govt.” or “gov” instead of “government” and “2morrow” instead of “tomorrow.” Sometimes, the user faces inputting the whole idea in a restrictive space and removes spaces between words and write them together, as “floodingEvent” instead of “flooding event,” which creates ambiguity (Imran et al. 2016; Nguyen et al. 2016).

Machine Learning and NLP

Naïve Bayes (NB) and Multinomial Naïve Bayes

Naïve Bayes algorithm is highly simple and efficient. It uses probability theory for predicting the class for the given instance. The algorithm assumes that attributes of the instances are independent from each other (Bramer 2007). The process of text classification considers each word appearing in the document as a separate attribute; hence Naïve Bayes is an appropriate option to process them. Initially, multivariate Bernoulli model of Naïve Bayes classifier is proposed. It creates a binary vector form of document; each instance of the vector indicates availability or unavailability of particular word in that document. This model is more applicable to text classification with constant number of attributes. It lacks in counting the frequency of words appearing in the document. To address this issue, multinomial Naïve Bayes (MNB) is introduced. It counts the frequency of each word appearing in the document (Jiang et al. 2013).

Deep Neural Network (DNN)

In contrast to manual feature extraction, deep neural network (DNN) automatically extracts the abstract features from the data, which helps in the construction of effective classes. The network is mostly trained by using online algorithms, which is a favorable feature to be used for disastrous situations. The adaptive nature of deep neural network (DNN) attracts its usage in crisis management. The model can be trained earlier, while in case of a newly available dataset, it can work fine in the classification of newly arrived datasets, which is ideal for a crisis situation where real-time data needs rapid processing (Nguyen et al. 2016). Figure 2 shows the flow diagram for metadata feature processing using TF-IDF. Naive Bayes and support vector machines are the most popular classifiers in NLP Systems.

Flood Detection Using Social Media Big Data Streams, Fig. 2
figure 65figure 65

Data flow diagram for metadata feature processing using TF-IDF

State-of-the-Art NLP Systems for Flood Detection Using Big Data Streams

Recently, various novel systems have been proposed to detect flood through text analysis of social media big data streams (Bischke et al. 2017; Hanif et al. 2017; Ahmad et al. 2017b,a; Avgerinakis et al. 2017; Nogueira 2017). Most of these approaches are evaluated on standard public dataset from Multimedia Satellite Task at MediaEval, 2017. This dataset consists of metadata of images uploaded by the social media users, which may or may not contain evidence of flood. This metadata contains various fields including relevant description of individual image, user tags, title, date on which the image was uploaded, and the device through which it was captured. The main objective is to extract and fuse the content of events which are present in satellite imagery and social media (Bischke et al. 2017). A lot of different and diverse techniques are reported including metadata features such as term frequency-inverse document frequency (TF-IDF) (Bischke et al. 2017; Hanif et al. 2017) and word/text embeddings from a large collection of titles, descriptions, and user tags (Tkachenko et al. 2017). Among them, traditional features like TF-IDF are quite useful since it integrates the concept of term frequency that counts the number of occurrences of each semantically similar concepts (“flood,” “river,” “damage”) in the given user tags. Table 2 summarizes the performances of several NLP-based techniques.

Flood Detection Using Social Media Big Data Streams, Table 2 Some state-of-the-art NLP techniques for flood detection using social media text data stream

Flood Detection from Social Media Visual Big Data Stream: Machine Learning and Deep Learning Techniques

Visual images available on social media can be effectively used for flood management system. Figure 3 shows some sample images from MediaEval 2017 satellite task. Over the years, various computer vision techniques have been investigated to extract local/global visual features from these images (Bischke et al. 2017). These features include AutoColorCorrelogram, EdgeHistogram, Color and Edge Directivity Descriptor (CEDD), Color Layout, Fuzzy Color and Texture Histogram (FCTH), Joint Composite Descriptor (JCD), Gabor, ScalableColor, Tamura, SIFT, Colour SIFT, and local binary patterns (LBP). Machine learning techniques such as support vector machine (SVM), decision tree (DT), and kernel discriminant analysis using spectral regression (SR-KDA) are then used to distinguish between flood and non-flood images.

Flood Detection Using Social Media Big Data Streams, Fig. 3
figure 66figure 66

Some images from MediaEval 2017 satellite task (Bischke et al. 2017)

Recently, deep learning is quite successful in image classification tasks. Deep learning is an advanced field of machine learning which portrays human brain and produces similar neural network for execution of tasks. It executes input through hierarchy of multiple layers to produce proper information. In contrast to machine learning, features are automatically extracted from the provided input which saves a lot of human effort and computing resources (LeCun et al. 2015). Due to real-time feature extraction and processing according to the required task, more computation is required; that is why graphical processing units (GPUs) are used for efficient results. For improved accuracy, neural networks require large amount of data, so that exact features may be predicted clearly. Deep learning has various applications in different fields including natural language processing (NLP), computer vision, audio processing, and so on. Especially in computer vision, deep learning has produced exceptional improvements to improve various dimensions including image segmentation, compression, localization, transformation, sharpening, completion, colorization, and prediction. Table 3 summarizes the performances of several ML- and DL-based techniques.

Flood Detection Using Social Media Big Data Streams, Table 3 Some state-of-the-art machine learning and deep learning techniques for flood detection using social media visual data stream

Transfer Learning

The effectiveness of deep network is directly related with the availability of training data; greater amount of training data reveals much accurate output. But, the strong training requires a huge quantity of resources. Transfer learning is proposed to resolve the issue of effective training, which does not require training of neural network from scratch. Instead, the mechanism performs little modification in extensively trained network, which is trained on large datasets and strong machines. These pre-trained neural networks can be used in a similar or different domain.

Importance of Deep Learning (DL) Algorithms in Event Prediction

During disaster and crises, accurate situational awareness and rapid communication are important to properly manage the situation. Conventional algorithms perform their functionality according to the defined steps, and they are inappropriate to manage crises. On contrary, deep learning algorithms perform their task according to input provided to them, and they are more appropriate to respond contextual awareness, so that accelerated action may be performed. ImageNet dataset: ImageNet is a dataset, which contains 15 million labeled images, categorized into about 22,000 categories (Jiang et al. 2013). Yearly, a contest is organized to produce better results by using ImageNet. The winner of that competition reveals new model, including AlexNet, QuocNet, Inception or GoogLeNet and BN-Inception-v2, and Inception version 3. These pre-trained models are released to be executed on machines having low computational and storage resources. One or more layers of these networks are altered according to domain, and then the network is retrained accordingly. This retrained network produce better efficiency due to the strong training it acquires from ImageNet dataset. The Inception version 3 was introduced in 2016 (Chen and Han 2016), with boosted performance on ILSVRC 2012 classification benchmark. In comparison with the state-of-the-art results (Nguyen et al. 2016), it has cut error by 25%, with six times cheaper computing resource with five times less parameters.

Combining Knowledge Using Fusion of Meta- and Visual Data

The performance of flood detection can be improved by using both image uploaded on social media and its metadata. The combination utilizes features of text as well as images; this enhances the effectiveness of the results. Figure 4 shows an example showing fusion of metadata and visual data. Table 4 summarizes the performances of several ensemble techniques. In these techniques, floods are predicted using the combination of meta- and visual data.

Flood Detection Using Social Media Big Data Streams, Fig. 4
figure 67figure 67

Data flow diagram showing fusion of meta- and visual data

Flood Detection Using Social Media Big Data Streams, Table 4 Some state-of-the-art ensemble techniques for flood detection using fusion of meta- and visual data

Conclusion

The idea of harnessing the social media for emergency management is quite effective. The fundamental rules of social media like instantaneous, cost-effective, transparent, and intentional are very much related to the objectives of emergency management which are collaborative, coordinated, integrated, progressive, and professional. Most emergency agencies around the world now use social media alongside the conventional platforms like newspaper, television, and community channels to establish the process of warning, safety, response, and recovery from the emergency situation. In this paper, we briefly explore the flood detection systems using social media big data streams. Recently, there are several datasets shared across the world from different countries/agencies for flood-related emergency situations. Moreover, there are integrated systems/frameworks for emergency reporting. These systems use several state-of-the-art machine learning approaches to determine the response in the critical situations. The integration of image and social media textual data deals the right mixture, for effective predictions and monitoring. The size of social media data, its integrations, filtration, and categorization to the relevant situation offer many challenges. On the other hand, its effective use in crisis detection, resilience, learning, and capacity building for real and online analysis cannot be denied.

Cross-References