1 Introduction

Flooding is one of the most common and devastating natural hazards, leading to significant human and economic losses annually (Bentivoglio et al. 2022). As climate change contributes to more frequent and intense precipitation events, flooding severity is expected to increase (Tabari, 2020). Rapid estimation of flooded areas becomes crucial in the face of such a threat. During flooding events, emergency managers need timely and accurate information about inundated areas to coordinate response operations effectively (Manjusree et al. 2012; Li et al. 2018).

Rapid flood mapping provides an immediate understanding of the extent and severity of flooding. It helps authorities and humanitarian organizations allocate and distribute essential resources like food, water, and medical supplies (Cohen et al. 2018). Identifying flooded areas quickly is also important for protecting critical infrastructure such as power plants and water treatment facilities, thereby minimizing disruption and speeding up recovery efforts (Li et al. 2021). Additionally, flood maps are essential for analyzing flood patterns and informing urban planning and mitigation strategies (Meghanadh et al. 2020).

Floodwater depth is an important factor in flood inundation mapping (Fohringer et al. 2015, Li et al. 2018). Information about the depth of floodwater is also crucial for assessing the severity of floods and evaluating flood risk mitigation measures. This information is essential for deploying rescue efforts, determining road closures, and assessing accessible areas (Cohen et al. 2018). Furthermore, it plays a crucial part in aiding emergency services, evaluating accessibility, devising suitable intervention plans, calculating water volumes, allocating resources for pumping water, and promptly calculating intervention and reconstruction expenses (Cian et al. 2018). Data on floodwater depths is helpful for immediate response and post-disaster analyses, including evaluating property damage and assessing flood risks (Nguyen et al. 2016).

Various techniques have been used to estimate floodwater depth. Conventional approaches such as field surveys have been utilized to determine floodwater depth by directly measuring high-water marks in affected areas (Chaudhary et al. 2020). Although this method is precise, it is time-consuming and labor-intensive. Additionally, it is limited to small-scale applications and can be impacted by weather conditions (Elkhrachy, 2022). Conventional methods also rely on information from stream gauges at specific locations to offer real-time flood data, such as water level. However, these approaches have constraints when the floodwater exceeds the gauge’s height and when scattered gauge placements cannot adequately cover flooded areas (Li et al. 2018). Another approach is to use hydrodynamic models to assess floodwater depth, including the Hydrologic Engineering Center’s River Analysis System (HEC-RAS) (Athira et al. 2023; Brunner, 2016), Delft-3D (Haq et al. 2020), and LISFLOOD-FP (Yin et al. 2022). These models are known for their accuracy in simulating complex flood dynamics (Elkhrachy, 2022). Still, their utilization is hindered by the requirement for extensive input datasets that involve detailed topographic, meteorological, and hydrological data. Additionally, these models require significant computational resources and rely on powerful computing systems.

Remote sensing data has been used extensively for flood management in recent decades. Large-scale knowledge about the extent of floods can be obtained through remote sensing data, such as satellite imagery. Studies have combined optical and synthetic aperture radar (SAR) with a digital elevation model (DEM) to determine floodwater depth. For example, Cian et al. (2018) introduced a semi-automatic approach to calculate floodwater depth by utilizing SAR imagery and statistical estimation of DEM from LIDAR (Light Detection and Ranging). Surampudi and Kumar (2023) utilized SAR data and Shuttle Radar Topography Mission (SRTM) DEM to generate water depth closely following surface undulations in agricultural lands. These methods appear to be efficient in estimating floodwater depth. However, satellite image acquisition is limited by the temporal resolution (Bovenga et al. 2018). Also, cloud cover can affect optical sensors during flood events (Chaudhary et al. 2020), making it impossible to have flood images for areas covered with clouds. Additionally, vertical inaccuracies are frequently produced by DEMs, especially over complicated terrain such as urban areas. As a result, they are unreliable in identifying the important topographical elements which determine how floods behave (Schumann, 2014).

Recent advancements in machine learning have significantly revolutionized floodwater depth estimation. Numerous studies have employed sophisticated computer vision algorithms to assess water levels remotely. For instance, Pan et al. (2018) employed a Convolutional Neural Network (CNN)-based methodology to monitor the length of a ruler in footage captured by a video camera strategically placed adjacent to a river. Similarly, utilizing a mask region-based convolution neural network (Mask R-CNN), Park et al. (2021) achieved floodwater depth estimation by detecting submerged vehicles in flood photos. Furthermore, the wealth of flood images on social media platforms has provided a rich source for researchers to employ computer vision algorithms in estimating floodwater depth. Feng et al. (2020) introduced a workflow focusing on retrieving images containing humans from social media to estimate water levels. In a distinct approach, Quan et al. (2020) matched water levels with human poses, categorizing flood severity into “above the knee” and “below the knee”. In another approach, Meng et al. (2019) integrated deep learning with web images to estimate floodwater depth from the images. Song and Tuo (2021) leveraged CNNs to segment stop signs and extract floodwater depth data from images featuring such signs. Additionally, Li et al. (2023) utilized an object detection model based on CNN to automatically estimate water depth from images from social media platforms. While machine learning models have showcased their effectiveness in floodwater depth estimation, it is crucial to acknowledge their dependency on substantial, annotated training datasets. Creating such datasets can be a resource-intensive and time-consuming endeavor, underscoring a notable challenge in the practical implementation of these models.

The recent advent of large multimodal models, notably the Generative Pre-trained Transformer (GPT), has emerged as an exceptional development. These models exhibit a remarkable capability to comprehend human natural language, enabling proficient task execution across diverse domains. GPT-4 Vision (GPT-4 hereafter), a large-scale multimodal model, has demonstrated several impressive abilities of vision-language understanding and generation (OpenAI, 2023). For example, GPT-4 can generate natural language descriptions of images and even perform image processing tasks from descriptions written as text (Osco et al. 2023). These models can also offer intelligent solutions that resemble human thinking, allowing us to utilize general artificial intelligence to address problems in diverse applications (Wen et al. 2023).

In geographic information science, researchers have explored the potential of GPT-4 with applications to image generation, captioning, and analysis assistance in visuals, to name a few (Osco et al. 2023). A notable effort by Hu et al. (2023) involves the integration of geo-knowledge with GPT models for identifying location descriptions and their respective categories. This fusion results in a geo-knowledge-guided GPT model that accurately extracts location descriptions from disaster-related social media messages. Li and Ning (2023) developed a prototype of autonomous GIS (Geographic Information System) utilizing the GPT-4 API to accept tasks through natural language and autonomously solve spatial problems. Other endeavors include exploring the potential of GPT-4 in map-making (Tao & Xu, 2023) and human mobility modeling (Haydari et al. 2024).

GPT-4 has also demonstrated its potential in the urban science field. For example, Crooks and Chen (2024) examined the capability of GPT-4 in extracting information from street view photographs in urban analytics. Other endeavors include exploring the potential of GPT-4 in urban transportation system management (Zhang et al. 2023), urban planning (Wang et al. 2023), and energy management (Huang et al. 2023).

This research presents an automated, fast, and reliable approach leveraging GPT-4 to estimate floodwater depth from photographs capturing flood events. This study aims to contribute to disaster management, emergency response, and urban planning, potentially enhancing mitigation strategies, ultimately contributing to life-saving efforts, and minimizing economic losses in urban environments.

2 Method

2.1 Overview of the proposed approach

The GPT-4 model, developed by OpenAI, was trained using increasingly large amounts of data and has proven to be highly effective at extracting valuable information from images, even without requiring a separate training dataset. In this study, we propose an automated framework for estimating the floodwater depth by leveraging the advanced potential of GPT-4. This framework, FloodDepth-GPT, uses a GPT-4 model Python API to estimate the floodwater depth. The overall concept of the proposed approach is illustrated in Fig. 1. The approach begins by inputting flooding photos containing objects that can serve as consistent indicators for reference. Such street objects can include vehicles, humans, and street signs. By assessing the known height and relative submersion of these objects, FloodDepth-GPT can estimate water levels according to the visible objects within the photos. For instance, if the water reaches the knee level of a person whose height is known, FloodDepth-GPT can “deduce” the depth of water based on this comparative analysis. Besides the water depth, the FloodDepth-GPT also outlines the rationale behind it, which enhances the transparency, understanding, and explainability of the process.

Fig. 1
figure 1

Workflow of FloodDepth-GPT

2.2 Design of FloodDepth-GPT

The FloodDepth-GPT is a customized GPT with a set of prompts structured to guide the tool specifically toward estimating floodwater depth. These prompts include directions related to identifying and measuring reference points in the image, assessing visible waterlines on objects, and identifying known heights of common objects, such as humans, vehicles, and stop signs, present in the image (Appendix 1). The standard heights of the various objects were specified. For example, the average height of a man and the different parts of the body (i.e., knee, waist, shoulder, height, and waist) were included in the prompt (see Tables 12 and 3 for the heights of different reference objects). Figure 2 shows samples of the objects used in this study with their corresponding heights.

Table 1 Average height of different parts of a human body (Fryar et al. 2021; Li et al. 2023)
Table 2 Average height of different parts of a vehicle (Neighbor Storage, 2023)
Table 3 Stop sign dimensions (Song & Tuo, 2021; U.S. Department of Transportation, 2023)
Fig. 2
figure 2

Sample of reference objects with heights

Another crucial output of FloodDepth-GPT was detailed explanations of its estimations. This involves clear communication of the visual cues used in the estimation process and presentation of the depth measurements for ease of understanding and global applicability, enhancing the explainability of the AI output. Finally, the model was instructed to avoid speculation and base its analyses on the available objects within the image.

2.3 Performance evaluation

The ability of the FloodDepth-GPT was examined as follows. We collected 150 flood photos from various online sources. Previous studies have utilized flood photos to estimate the depth of floodwater based on different reference objects, including stop signs, vehicles, and humans (Li et al. 2023). This experimental dataset incorporates these three components as main reference objects (Tables 1, 2 and 3). We ensured that each selected photo has at least one of these objects. These photos served as input into FloodDepth-GPT, and the floodwater depth estimation for each photo was obtained using the model.

To evaluate the performance of the GPT model, this study compared the floodwater depth estimated by FloodDepth-GPT (GPT Estimation) and floodwater depth estimated manually by five individuals (manual estimation). The manual estimation processes were conducted independently, and used the average heights detailed in Tables 1, 2 and 3. The primary objective of this study is to explore the potential of GPT-4 in estimating floodwater depth, effectively positioning it as a potential human equivalent in this task. As a result, this evaluation approach was considered appropriate. Furthermore, the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) were calculated to quantitatively measure the accuracy of the FloodDepth-GPT estimations compared to the manual estimations. The MAE and RMSE were computed using Eqs. 1 and 2, respectively, where \({m}_{i}\) is the manual-estimated depth, \({gpt}_{i}\) is the FloodDepth-GPT estimation, and \(n\) is the number of images.

$$MAE= \sum _{i=1}^{n}\frac{{m}_{i}-{gpt}_{i} }{n}$$
(1)
$$RMSE= \sqrt{\frac{\sum _{i=1}^{n}{({m}_{i}-{gpt}_{i})}^{2}}{n}}$$
(2)

3 Results

Figure 3 presents the correlation between the floodwater depth estimation of GPT and humans. Result shows that there is a strong positive correlation between GPT estimation and the average estimations from human observers (Pearson’s correlation coefficient r = 0.8879). Additionally, there is a strong correlation between GPT and each human estimation (r = 0.8705, 0.8585, 0.8742, 0.8456, and 0.8834 for humans 1, 2, 3, 4, and 5, respectively). Overall, the data points are clustered along the regression line, which suggests that the estimations made by GPT are consistent with human estimates derived from visually examining the images. Additionally, the consistency across the human estimations lends credibility to their use as a benchmark for evaluating the accuracy of GPT.

Fig. 3
figure 3

Scatter plots of floodwater depth estimation of GPT and humans (unit: meter)

Additionally, comparing the GPT-based estimations and the manual estimations, a MAE of 25 cm was recorded, which is consistent with the previous studies - ranging from 6 cm to 32 cm – (Chaudhary et al. 2019; Alizadeh Kharazi & Behzadan, 2021; Park et al. 2021; Song & Tuo, 2021; Li et al. 2023), which used deep learning methods to estimate floodwater depth from images obtained from social media platforms (Table 4). As summarized in Table 4, the error obtained using the GPT-based method in our study falls within an acceptable range when compared to previous studies that have utilized deep learning techniques.

This suggests that the GPT-based method is reliable for estimating floodwater depth with acceptable accuracy. A unique aspect of our approach is that it does not require any model training, making it applicable to most flood scenarios, regardless of the reference feature present in the flood image. This has been demonstrated using three different features (humans, vehicles, or street signs), in contrast to previous methods, which are only trained on specific visual reference features. It should be noted that the methods presented in Table 4 were applied to datasets different from those used in our study.

Table 4 Comparison of the Mean Average Error (MAE) of this study with other literature on floodwater depth estimation

Furthermore, an RMSE of 30 cm was recorded when comparing the GPT and human estimations. This implies that on average, GPT-based estimations differ by approximately 30 cm from the human estimations.

Samples of FloodDepth-GPT estimations and responses are presented in Fig. 4. These results align with manual estimations. Furthermore, Fig. 5 highlights a detailed sample response from the FloodDepth-GPT, showing its ability not only to provide reliable floodwater depth estimations but also to provide reasonable explanations behind the estimations. This process entails identifying reference objects and the water level in flood photos and then utilizing the known height of these objects to estimate the floodwater depth. FloodDepth-GPT, for example, successfully identified the truck in a flood photo and estimated that the water level of the flood was below the bottom of the truck’s door. Utilizing the identified height of the truck’s bottom level, FloodDepth-GPT accurately estimated the floodwater depth.

Fig. 4
figure 4

Selected floodwater depth estimation results by human and GPT

Fig. 5
figure 5

Sample of FloodDepth-GPT response (Appendix 2 provides more samples)

Figures 6, 7 and 8 present examples where FloodDepth-GPT’s estimations diverged from human assessments. The discrepancies can be attributed to variations in the estimation points, insufficient criteria, and incorrect water level identification. In Fig. 6, where a human served as the reference object, the model assessed the water level to be above the knee (approximately at the mid-thigh), while in reality, the water level on the human appears to be approximately at knee level. Future research could address this issue through enhanced prompt engineering to improve the perspective through which the model observes the water level. Additionally, discrepancies in estimations occurred from the limited criteria used for estimation by both humans and the GPT model. For instance, in Fig. 7, street signs were utilized as reference points without specifying their heights in the estimation criteria, leading to variations in the estimations made by humans and the GPT model. Figure 8 demonstrates significant variation in estimations, likely owing to divergent observation points by human observers and the GPT model, such as the center of the road or roadside. This emphasizes the challenge of precisely estimating floodwater depths due to variations in the terrain and the depth within a flood scenario.

Fig. 6
figure 6

Variation in estimations due to wrong identification of water level

Fig. 7
figure 7

Variation in estimations due to insufficient criteria for estimation

Fig. 8
figure 8

Variation in the observation point within the flood photo

4 Discussion and future research

This study introduces a new approach to estimate floodwater depth by leveraging the ability of GPT-4. This method utilizes structured prompts to analyze flood photos and estimate floodwater depth. In contrast to conventional computer vision and deep learning methods that depend on specific pre-trained objects, FloodDepth-GPT can automatically identify water levels based on reference objects in flood photos. Also, FloodDepth-GPT demonstrates impressive speed and efficiency, enabling floodwater depth estimations in just 10 s per photo, at a cost of approximately $1 to process 150 flood photos. This approach streamlines the estimation process and enhances the rapidity of flood inundation mapping.

Results from this study reveal that the proposed method can utilize a variety of common reference objects in flooding photos. It enhances the method’s versatility and makes this approach applicable in different flood scenarios. To the best of our knowledge, previous research has largely focused on models trained to recognize specific objects in flood photos, such as humans, stop signs, vehicles, etc. For example, Li et al. (2023) developed an object detection model that could only identify humans in flood photos used in their analyses. Analogously, some other studies only utilized photos containing stop signs (Song & Tuo, 2021; Alizadeh & Behzadan, 2023). Although previous methods showed promising results, these techniques can only be applied to photos containing objects on which their models were trained, thereby restricting the utility of such models on a broader scale.

The findings in this study are transformative. The GPT model’s ability to interpret different urban and natural elements in images opens new possibilities for automatic environmental assessments. Such detailed assessments are crucial for urban planning, disaster preparedness, and climate change studies and can be achieved through AI-driven analysis. Moreover, this study highlights FloodDepth-GPT as an example of Explainable AI, which provides the reasoning behind the model’s decision-making. These reasonings illustrate the transparency of the method, which is crucial for fostering trust and understanding in AI-driven environmental assessments.

The results of this study indicate that the proposed method is reliable in its estimations, demonstrating a mean average error within a fair range for estimating floodwater depths. However, further examinations revealed the presence of outliers in the estimations. While some outliers do not necessarily indicate errors by the GPT model but rather reflect differences in the observation points of estimation within the photo (see Figs. 6, 7 and 8), further refinements in future studies could include the introduction of more detailed criteria for estimation. Moreover, a notable challenge with large multimodal models like GPT-4 is their inability to consistently reproduce results, as the model delivers different estimations when run multiple times. Since this model is in its early stages, subsequent versions are expected to see enhancements that address these reproducibility issues, providing a foundation for more robust future studies. Another limitation of the proposed approach is the differences in the heights, dimensions, and designs of reference objects across different regions. This can limit the model’s ability to provide uniform estimations across diverse geographic locations. Future studies can focus on calibrating the model to account for regional variations in reference objects.

One potential future research avenue is photo localization, as the floodwater depth is most useful when accurate geolocation of the photo is known, at least at street level. The rapid floodwater depth estimation from on-site photos combined with geolocation opens the possibility of producing flood inundation maps in real time. The most straightforward way of photo localization is to extract location information from the photo if such information is available in the metadata or from a geotagged post associated with the photo (Huang et al. 2019, 2020; Ning et al. 2020). For example, some social media platforms allow users to geotag photo locations using the phone’s built-in Global Positioning System (GPS) sensor or manually select the street name and address. A big challenge, though, is to locate the photo based on photo content only, where the location can be obtained by retrieving similar photos in a large database with localized photos such as street view images (e.g. Zhang et al. 2020). It is particularly challenging to locate flood photos as such photos often have inundated features, resulting in potential mismatching to the existing photos. Thus, innovative methods are required to match the flood and non-flood photos, such as the semantic scene graph (Yoon et al. 2021).

5 Conclusion

Floodwater depth estimation plays a pivotal role in the effective management of floods, facilitating informed disaster response and strategic planning. Utilizing the advanced capabilities of large pre-trained multimodal models (GPT-4 in this study), this paper introduces a novel method for automatically determining floodwater depths from photographs related to flooding. The GPT-4 model was customized to estimate water depths using recognized reference objects in the photo by providing specific instructions related to this task.

The findings of this study indicate that the proposed method holds promise for estimating floodwater depths from photographs. In comparison to prior research in this domain, our study demonstrates a universal pipeline for floodwater depth estimation rather than training various individual models on different reference objects, which is not only less computationally demanding but also efficient and economical. Such information gives decision-makers and the community at risk a rapid situation awareness of the extent and impact of flooding, which is essential for effective disaster management and decision-making. However, the study also acknowledges minor discrepancies between the model’s estimations and those derived by humans. Future research could enhance this methodology through enhanced prompt engineering and the introduction of additional criteria for more accurate estimations.

As we navigate the future of flood management, AI-driven insights become paramount. This approach, leveraging the power of AI and computer vision, emerges as a means for shaping resilient communities. It equips emergency responders and planners with the tools needed for rapid, data-driven decision-making and fosters innovation in disaster management and urban science.