MEDIC: a multi-task learning dataset for disaster image classification

Alam, Firoj; Alam, Tanvirul; Hasan, Md. Arid; Hasnat, Abul; Imran, Muhammad; Ofli, Ferda

doi:10.1007/s00521-022-07717-0

MEDIC: a multi-task learning dataset for disaster image classification

Original Article
Open access
Published: 03 September 2022

Volume 35, pages 2609–2632, (2023)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

MEDIC: a multi-task learning dataset for disaster image classification

Download PDF

Firoj Alam ORCID: orcid.org/0000-0001-7172-1997¹,
Tanvirul Alam²,
Md. Arid Hasan^3,4,
Abul Hasnat⁵,
Muhammad Imran¹ &
…
Ferda Ofli¹

4455 Accesses
16 Citations
4 Altmetric
Explore all metrics

Abstract

Recent research in disaster informatics demonstrates a practical and important use case of artificial intelligence to save human lives and suffering during natural disasters based on social media contents (text and images). While notable progress has been made using texts, research on exploiting the images remains relatively under-explored. To advance image-based approaches, we propose MEDIC (https://crisisnlp.qcri.org/medic/index.html), which is the largest social media image classification dataset for humanitarian response consisting of 71,198 images to address four different tasks in a multi-task learning setup. This is the first dataset of its kind: social media images, disaster response, and multi-task learning research. An important property of this dataset is its high potential to facilitate research on multi-task learning, which recently receives much interest from the machine learning community and has shown remarkable results in terms of memory, inference speed, performance, and generalization capability. Therefore, the proposed dataset is an important resource for advancing image-based disaster management and multi-task machine learning research. We experiment with different deep learning architectures and report promising results, which are above the majority baselines for all tasks. Along with the dataset, we also release all relevant scripts (https://github.com/firojalam/medic).

Disaster assessment from social media using multimodal deep learning

Article Open access 11 July 2024

Multi-source Multimodal Data and Deep Learning for Disaster Response: A Systematic Review

Article 27 November 2021

Predicting Disaster Type from Social Media Imagery via Deep Neural Networks Directed by Visual Attention

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Natural disasters cause significant damage (e.g., Hurricane Harvey in 2017 cost $125 billion)^{Footnote 1} and require urgent assistance in time of crisis. In the last decade, various social media played important roles in humanitarian response tasks as they were widely used to disseminate information and obtain valuable insights. During disaster events, people post content (e.g., text, images, and video) on social media to ask for help (e.g., report of a person stuck on a rooftop during a flood), offer support, identify urgent needs, or share their feelings. Such information is helpful for humanitarian organizations to take immediate actions to plan and launch relief operations. Recent studies demonstrated that images shared on social media during a disaster can assist humanitarian organizations in recognizing damages in infrastructure [1], assessing damage severity [2], identifying humanitarian information [3], detecting crisis incidents [4], and detecting disaster events with other related tasks [5]. However, the amount of research and resources to develop powerful computer vision-based predictive models remains insufficient compared to the NLP-based progress [6,7,8]. Motivated by these observations, this research aims to enrich available resources to make further advancements in the computer vision-based disaster management studies.

Recent advances in deep convolutional neural networks (CNN) and their learning techniques provide efficient solutions for different computer vision applications. While simple applications can be realized with a single-task formulation such as classification [9], semantic segmentation [10], or object detection [11], the complex ones such as autonomous vehicles, robotics, and social media image analysis [12, 13] necessitate incorporating multiple tasks, which significantly increases the computational and memory requirements for both training and inference. Multi-task learning (MTL) techniques [13,14,15] have emerged as the standard approach for these complex applications where a model is trained to solve multiple tasks simultaneously, which helps to improve the performance, reduce inference time and computational complexities. For example, an image posted on social media during a disaster event may contain information whether it is a flood event, shows infrastructure damage, and is severe. Such a multitude of information needs to be detected in real-time to help humanitarian organizations [12, 16] with various tasks including (i) disaster type recognition, (ii) informativeness classification, (iii) humanitarian categorization, and (iv) damage severity assessment (see Sect. 3 for more details). Existing works [1,2,3] present separate task-specific models, resulting in higher computational complexities (e.g., computational power, training and inference time). Hence, this research aims at reducing this overhead by addressing different tasks simultaneously with an MTL setup, which can also help reduce the carbon footprint [17].

Labeled public image datasets, such as ImageNet [18] and Microsoft COCO [19] made significant contributions to the advancement of today’s powerful machine learning models. Likewise, for the MTL setup, several image datasets have already been proposed, which are summarized in Table 1. These datasets include images from different domains such as indoor scenes, driving, faces, handwritten digits, and animal recognition, which are already contributing to the advancement of MTL research. However, an MTL dataset for critical real-world applications which comprise humanitarian response tasks during natural disasters is yet to become available. This paper proposes a novel MTL dataset for disaster image classification.

To this end, we build upon the previous work of Alam et al. [5] where the images are mostly annotated for individual tasks, and only 5558 out of 71,198 images have labels for all four tasks mentioned above. We provide an expansive extension by annotating the images for all tasks, i.e., we annotated 155,899 more labels for these tasks in addition to the existing ones.^{Footnote 2} For disaster type recognition and humanitarian categorization tasks, we also labeled a part of the images with multiple labels following a weak supervision approach as they are suitable for multilabel annotation (see Sect. 3). Figure 1 shows example images with the labels for all four tasks.

Our contributions in this research can be summarized as follows: (i) we provide a social media MTL image dataset for disaster response tasks with various complexities, which can be used as an evaluation benchmark for computer vision research; (ii) we ensured high quality annotations by making sure that at least two annotators agree on a label; (iii) we provide a benchmark for heterogeneous multi-task learning and baseline studies to facilitate future study; (iv) our experimental results can also be used as a baseline in the single-task learning setting.

The rest of the paper is organized as follows. Section 2 provides an overview of the existing work. Section 3 introduces the tasks and describes the dataset development process. Section 4 explains the experiments and presents the results while Sect. 5 provides a discussion. Finally, we conclude the paper in Sect. 6.

2 Related work

This paper mainly focuses on the development of an MTL dataset for disaster response tasks. Therefore, we first review the recent work on MTL and available MTL datasets; and then, survey social media image classification literature and datasets for disaster response.

2.1 Multi-task learning and datasets

Multi-task learning (MTL) aims to improve generalization capability by leveraging information in the training data consisting of multiple related tasks [14]. It simultaneously learns multiple tasks and has shown promising results in terms of generalization, computation, memory footprint, performance, and inference time by jointly learning through a shared representation [14, 15]. Since the seminal work by Caruana [14], MTL research has received wide attention in the last several years in NLP, computer vision, and other research areas [15, 20,21,22,23]. MTL brings benefits when associated tasks share complementary information. However, performance can suffer when multiple tasks have conflicting needs, and the tasks have competing priorities (i.e., one is superior to the other). This phenomenon is referred to as negative transfer. This understanding led to the question of what, when, and how to share information among tasks [15, 24]. To address these aspects, in the deep learning era, numerous architectures and optimization methods have been proposed. The architectures are categorized into hard and soft parameter sharing. Hard parameter sharing design consists of a shared network followed by task-specific heads [25,26,27]. In soft parameter sharing, each task has its own set of parameters, and a feature sharing mechanism to deal with cross-task talk [28,29,30]. In MTL literature, a problem can be formulated in two different ways—homogeneous and heterogeneous [24]. While the homogeneous MTL assumes that each task corresponds to a single output, the heterogeneous MTL assumes each task corresponds to a unique set of output labels [14, 31]. The latter setting uses a neural network using multiple sets of outputs and losses. In this study, we aim to provide a benchmark with our heterogeneous MTL dataset using the hard parameter sharing approach.

Earlier studies such as [32] and [33] mostly exploited the MNIST [34] and USPS [35] datasets for MTL experiments. These datasets were originally designed for single-task classification settings. For example, the widely used MNIST dataset was originally designed for digit classification, and Office-Caltech [36] was designed to categorize images in 31 classes, which are collected from different domains. However, such datasets are used with the homogeneous problem setting of multi-task learning by selecting ten target classes as ten binary classification tasks [24, 33, 37]. Numerous other widely used datasets such as MC-COCO [19] and CelebA [38] have also been used for multi-task learning in the homogeneous problem setting.

Several existing datasets consisting of multiple unique output label sets were studied in the heterogeneous setting. For example, AdienceFaces [39] was designed for gender and age group classification tasks, OmniArt [40] consists of seven tasks, NYU-V2 [41] consists of three tasks, and PASCAL [42, 43] consists of five tasks. Very few datasets were specifically designed for multi-task learning research. Most notable ones are Taskonomy [44] and BDD100K [13]. The Taskonomy dataset consists of four million images of indoor scenes from 600 buildings, and each image was annotated for twenty-six visual tasks. Ground truths of this dataset were obtained programmatically, and knowledge distillation approaches. The BDD100K dataset is a diverse 100K driving video dataset consisting of ten tasks. It was collected from Nexar,^{Footnote 3} where videos are uploaded by the drivers. In Table 1, we provide widely used datasets, which have been used for MTL.

Table 1 Upper part of the table presents the datasets used in multi-task learning studies in computer vision research

MEDIC: a multi-task learning dataset for disaster image classification

Abstract

Similar content being viewed by others

Disaster assessment from social media using multimodal deep learning

Multi-source Multimodal Data and Deep Learning for Disaster Response: A Systematic Review

Predicting Disaster Type from Social Media Imagery via Deep Neural Networks Directed by Visual Attention

1 Introduction

2 Related work

2.1 Multi-task learning and datasets

2.2 Disaster response studies and datasets

3 MEDIC dataset

3.1 Tasks

3.2 Annotations

3.2.1 Data curation

3.2.2 Multiclass annotation

3.2.3 Crowdsourcing results

3.2.4 Multilabel annotation

3.2.5 Resulting dataset

3.3 Comparison with other datasets

4 Experiments and results

4.1 Baseline

4.2 Single-task learning

4.3 Multi-task learning

4.4 Multilabel classification

4.5 Results

4.5.1 Baseline

4.5.2 Single- versus multi-task results

4.5.3 Multi-task results using different random seeds

4.5.4 Ablation experiments in multi-task setup

4.5.5 Multilabel classification results

4.5.6 Error analysis

4.5.7 Computational time analysis

5 Discussion and future work

6 Conclusions

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict od interest

Additional information

Publisher's Note

Appendices

Appendix

Appendix A Data collection

1.1 A.1 Data curation and annotation

1.2 A.2 Annotation instructions

1.2.1 A.2.1 Disaster types

1.2.2 A.2.2 Informativeness

1.2.3 A.2.3 Humanitarian categories

1.2.4 A.2.4 Damage Severity

1.3 A.3 Annotation interface

1.4 A.4 Manual annotation

1.5 A.5 Data analysis

Appendix B Error analysis

Appendix C The MEDIC dataset

1.1 C.1 Data format

1.2 C.2 Terms of use, privacy and license

1.3 C.3 Data maintenance

1.4 C.4 Benchmark code

1.5 C.5 Ethics statement

1.5.1 C.5.1 Dataset collection

1.5.2 C.5.2 Potential negative societal impacts

1.5.3 C.5.3 Biases

1.5.4 C.5.4 Intended use

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation