Landslide detection in real-time social media image streams

Lack of global data inventories obstructs scientific modeling of and response to landslide hazards which are oftentimes deadly and costly. To remedy this limitation, new approaches suggest solutions based on citizen science that requires active participation. In contrast, as a non-traditional data source, social media has been increasingly used in many disaster response and management studies in recent years. Inspired by this trend, we propose to capitalize on social media data to mine landslide-related information automatically with the help of artificial intelligence techniques. Specifically, we develop a state-of-the-art computer vision model to detect landslides in social media image streams in real-time. To that end, we first create a large landslide image dataset labeled by experts with a data-centric perspective, and then, conduct extensive model training experiments. The experimental results indicate that the proposed model can be deployed in an online fashion to support global landslide susceptibility maps and emergency response.


Introduction
Landslides occur all around the world and cause thousands of deaths and billions of dollars in infrastructural damage worldwide every year (Kjekstad and Highland 2009).However, landslide events are often under-reported and insufficiently documented due to their complex natural phenomena governed by various intrinsic and external conditioning and triggering factors such as earthquakes and tropical storms, which are usually more conspicuous, and hence, more widely reported (Lee and Jones 2004).Due to this oversight and lack of global data inventories to study landslides, any attempt to quantify global landslide hazards and the associated impacts is destined to be an underestimation (Froude and Petley 2018).
In an attempt to tackle the challenge of building a global landslide inventory, NASA launched a website 1 in 2018 to allow citizens to report about the regional landslides they see in-person or online (Juang, Stanley, and Kirschbaum 2019).Following the same idea, researchers further developed other means such as mobile apps to collect citizenprovided data (Kocaman and Gokceoglu 2019;Cieslik et al. 2019).These efforts also help address concerns about news media sources' reporting biases (Moeller 2006;Pennington and Harrison 2013).However, this means the bulk of data collection and interpretation still involves time consuming work by specialists searching the Internet for news and social media reports, or directly engaging in communications with those submitting information and then interpreting the data received (Kocaman and Gokceoglu 2019;Juang, Stanley, and Kirschbaum 2019;Pennington et al. 2015;Taylor et al. 2015).
To alleviate the need for opt-in participation and manual processing, we strive to develop a state-of-the-art AI model that can automatically detect landslides2 in social media image streams in real time.To achieve this goal, we first create a large image dataset comprising more than 11,000 images from various data sources annotated by domain experts.We then exploit this dataset in a comprehensive experimentation searching for the optimal landslide model configuration.This exploration reveals interesting insights about the model training process.More importantly, the experimental results show that the optimal landslide model achieves a promising performance on a held-out test set.Based on this model, we envision a system that can contribute to harvesting of global landslide data, and hence, facilitate further landslide research.Furthermore, it can support global landslide susceptibility maps to provide situational awareness and improve emergency response and decision making.

Related Work
The literature on landslide detection and mapping approaches mainly uses four types of data sources: (i) physical sensors, (ii) remote sensing, (iii) volunteers, and (iv) social networks.Sensor-based approaches rely on land characteristics such as rainfall, altitude, soil type, and slope, to detect landslides and develop models to predict future events (Merghadi et al. 2020;Ramesh, Kumar, and Rangan 2009).While these approaches can be highly accurate for sub-catchments to the referenced data area, their large-scale deployment is extremely costly.
Earth observation data obtained using high-resolution satellite imagery has been widely used for landslide de-tection, mapping, and monitoring (Tofani et al. 2013).Remote sensing techniques either use Synthetic Aperture Radar (SAR) or optical imagery to perform landslide detection as an image classification, segmentation, object detection, or change detection task (Mohan et al. 2021;Cheng et al. 2013).While remote sensing through satellites can be useful to monitor landslides globally, their deployment can prove costly and time-consuming.Moreover, satellite data is susceptible to noise such as clouds.
A few studies demonstrate the use of Volunteered Geographical Information (VGI) as an alternative method to detect landslides (Kocaman and Gokceoglu 2019;Can, Kocaman, andGokceoglu 2019, 2020).These studies assume active participation of volunteers to collect landslide data where the volunteers opt in to use a mobile app to provide information such as photos, time of occurrence, damage description and other observations about a landslide event.
On the contrary, our work aims to capitalize on massive social media data without any active participation requirement and with better scalability.In addition, we construct a much larger dataset to train deep learning models and perform more extensive experimental evaluations.
The use of social media data for landslide detection has not been explored extensively.To the best of our knowledge, no prior work has explored the use of social media imagery to detect landslides.The most relevant work reported in (Musaev, Wang, and Pu 2014;Musaev et al. 2017) combines social media text data and physical sensors to detect landslides.The authors used textual messages collected through a set of landslide-related keywords on Twitter, Instagram, and YouTube, which were then combined with sensor data about seismic activity and rainfall to train a machine learning classifier that can identify landslide incidents.In this study, we focus on analyzing social media images which can provide more detailed information about the impact of the landslide event.To that end, our work can be considered as complementary to prior art.

Dataset
To train models that can detect landslides in images, we curated a large image dataset from multiple sources with different characteristics.Some images were obtained from the Web using Google Image search with keywords such as landslide, landslip, earth slip, mudslide, rockslide, rock fall whereas some images were collected from Twitter using similar landslide-related hashtags.Additional images were obtained from the British Geological Survey archives.The images obtained from social media or the Web are usually noisy and can include duplicates.Therefore, the collected data is manually labeled by three landslide experts who are also co-authors of this study.Since the AI task at hand is "given an image, recognize landslides" (i.e., no other external information or expert knowledge is available to the AI model), the experts were instructed to keep this computervision perspective in mind and label only the most evident cases as "landslide" images (i.e., the images where the landslide is the main theme exhibiting substantial visual cues for the computer vision model to learn from).In this context, the BGS images were also included in the labeling process to maintain label consistency across the dataset.On the other hand, since our ultimate goal is to develop a system that will continuously monitor the noisy social media streams to detect landslide events in real time, we retained negative (i.e., not-landslide) images that illustrate completely irrelevant cases (e.g., cartoons, advertisements, selfies) as well as difficult scenarios such as post-disaster images from earthquakes and floods in addition to other natural scenes without landslides in the final dataset.Despite the inherent difficulty of the task, the experts achieved an overall Fleiss' Kappa score of 0.58 (Fleiss 1971), which indicates an almost substantial inter-annotator agreement.The final dataset contains 11,737 images.Some example images are shown in Figure 1.The distribution of images across data sources is summarized in Table 1 and their breakdown into data splits are presented in Table 2.As suggested by  and lead to poor performance on the minority class (i.e., landslide), which would not be ideal for our application.
There are many approaches to tackle this problem, rang-ing from generating synthetic data to using specialized algorithms and loss functions.In this study, we explored one of the basic approaches, i.e., data resampling, where we oversampled images from the landslide class (i.e., sampling with replacement) to create a balanced training set.
Other training details.We ran all our experiments on Nvidia Tesla P100 GPUs with 16GB memory using PyTorch library. 3 We adjusted the batch size according to each CNN architecture in order to maximize GPU memory utilization.We used a fixed step size of 50 epochs in the learning rate scheduler of the SGD optimizer and a fixed patience of 50 epochs in the 'ReduceLROnPlateau' scheduler of the Adam optimizer, both with a factor of 0.1.All of the models were initialized using the weights pretrained on ImageNet (Russakovsky et al. 2015) and trained for a total of 200 epochs.Consequently, we trained a total of 560 CNN models in our quest for the best model configuration.

Results
Due to limited space, Table 3  • Despite the fact that top-performing model is trained without a class balancing strategy, the overall trend indicates that, while everything else is the same, the models trained with class balancing yield better performance than those trained without class balancing (178 vs. 95).• ResNet50 architecture tops the rankings among all CNN architectures by achieving the best average ranking as well as the highest mean F1-score according to Table 4.However, the overall difference between architectures do not seem to be significant except for InceptionNet which yields a significantly poor performance than others.• The impact of the learning rate on model performance shows opposite trends for different optimizers.As per Table 5, smaller learning rates (e.g., {10 −6 , 10 −5 , 10 −4 }) seem to work better with the Adam optimizer whereas larger learning rates (e.g., {10 −2 , 10 −3 }) seem to work better with the SGD optimizer.• As expected, the value of the weight decay also impacts the overall performance significantly (in particular, for the Adam optimizer).A large weight decay (e.g., 10 −2 ) hurts the overall performance which tends to improve as the weight decay takes on smaller values (see Table 6).
To illustrate the success of the transfer learning approach employed in this study, we created t-SNE (Van der Maaten and Hinton 2008) visualizations of the feature embeddings before and after the training of the best-performing model.As can be seen in Figure 2 2a) nor in the validation set (Figure 2b).However, after finetuning the model on the target landslide dataset, the resulting feature embeddings show almost perfect separation of the classes in the training set (Figure 2c) and a reasonably well separation in the validation set (Figure 2d).When applied on the held-out test set, the best-performing model achieves an F1-score of 0.701 and an accuracy of 0.870 as opposed to the F1 and accuracy scores of 0.805 and 0.913, respectively, on the validation set (Table 7).Although the difference in accuracy is relatively small, the difference in F1 is considerably large due to significant drops in precision and recall scores of the model on the test set.This phenomenon can be explained by the more-than-twice increase in the false positive (128 vs. 42) and false negative (178 vs. 60) predictions of the model on the test set as shown in Table 8.
To have a better understanding of the inner workings of the model, we investigated class activation maps (Zhou et al.

Path to Deployment
We envision a system that continuously monitors social media (i.e., Twitter) for general landslide-related content and deploys our landslide classification model to identify and retain the most relevant information.The planned system will follow a design approach similar to the one presented in (Alam, Imran, and Ofli 2017;Alam, Ofli, and Imran 2018) without the human-in-the-loop aspect (for now).Specifically, there will be a Tweet Collector module that will collect live tweets from the Twitter Streaming API4 that matches landslide-related keywords and hashtags in multiple languages.This module will be followed by an Image Collector module that will extract image URLs from the tweets (if any) and download images.Then, the Image Classifier module will run the downloaded images through our landslide model to tag each image as landslide or not-landslide.In parallel, the Geolocation Inference module will use tweet metadata to geolocate the images following the approach presented in (Qazi, Imran, and Ofli 2020).Eventually, all the results will be stored in a database by the Persister module, which will then be used by the Visualizer Module to create a dashboard and/or a map representation of the detection results.
With this plan, we hope to translate this fruitful collaboration between researchers and practitioners into a solid outcome that can benefit the landslide community as well as the government agencies and humanitarian organizations.

Conclusion
In this study, we aimed to develop a model that can automatically detect landslides in social media image streams.
For this purpose, we created a large image collection from multiple sources with different characteristics to ensure data diversity.Then, the collected images were assessed by three experts to attain high quality labels with almost substantial inter-annotator agreement.At the heart of this study lied an extensive search for the optimal landslide model configuration with various CNN architectures, network optimizers, learning rates, weight decays, and class balancing strategies.We provided several insights about the impact of each optimization dimension on the overall performance.The bestperforming model achieved high performance in terms of accuracy and F1 scores, which can be deemed sufficient for the purpose.Furthermore, presented error analyses pointed at potential improvements for future work.Finally, we described a road map to deploy the proposed landslide model in an online, real-time system.

Figure 1 :
Figure 1: Example images from the dataset.
Figure 2: Visualization of the feature embeddings before/after model finetuning

Figure 3 :
Figure 3: Class activation map visualizations of the model predictions on the test set.

Table 1 :
Distribution of images across data sources.

Table 2
, only about 23% of the images are categorized as "landslide".
plays a significant role on the performance of the resulting model depending on the available data size and problem characteristics.Therefore, we explored a representative sample of well-known CNN architectures in our

Table 3 :
Top-performing 10 configurations based on F1-score on the validation set.

Table 4 :
Performance comparison of CNN architectures

Table 5 :
, the original ResNet50 model pretrained on ImageNet cannot distinguish landslide images Effect of the learning rate on overall performance

Table 6 :
Effect of the weight decay on overall performance from not-landslide images neither in the training set (Figure

Table 7 :
Performance comparison of the best model on validation and test sets.However, in both false positive and false negative predictions, we observe that the errors occur mainly because the model fails to localize its attention on a particular region in the image, or is tricked by the image regions that are reminiscent of landslide scenes.

Table 8 :
Confusion matrices for the validation and test sets