Automated Floodwater Depth Estimation Using Large Multimodal Model for Rapid Flood Mapping

Information on the depth of floodwater is crucial for rapid mapping of areas affected by floods. However, previous approaches for estimating floodwater depth, including field surveys, remote sensing, and machine learning techniques, can be time-consuming and resource-intensive. This paper presents an automated and fast approach for estimating floodwater depth from on-site flood photos. A pre-trained large multimodal model, GPT-4 Vision, was used specifically for estimating floodwater. The input data were flooding photos that contained referenced objects, such as street signs, cars, people, and buildings. Using the heights of the common objects as references, the model returned the floodwater depth as the output. Results show that the proposed approach can rapidly provide a consistent and reliable estimation of floodwater depth from flood photos. Such rapid estimation is transformative in flood inundation mapping and assessing the severity of the flood in near-real time, which is essential for effective flood response strategies.


Introduction
Flooding is one of the most common and devastating natural hazards, leading to significant human and economic losses annually (Bentivoglio et al., 2022).As climate change contributes to more frequent and intense precipitation events, flooding severity is expected to increase (Tabari, 2020).In the face of such a threat, rapid estimation of flooded areas becomes crucial.During flooding events, emergency managers need timely and accurate information about inundated areas to coordinate response operations effectively (Manjusree et al., 2012;Li et al., 2020) Rapid flood mapping not only provides an immediate understanding of the extent and severity of flooding but also aids authorities and humanitarian organizations in allocating and distributing essential resources like food, water, and medical supplies (Cohen et al., 2018).Identifying flooded areas quickly is also important for protecting critical infrastructure such as power plants and water treatment facilities, thereby minimizing disruption and speeding up recovery efforts (J.Li et al., 2021).Additionally, flood maps are essential in analyzing flood patterns and informing urban planning and flood mitigation strategies (Meghanadh et al., 2020).
Floodwater depth is an important factor in flood inundation mapping (Fohringer, et al., 2015, Z. Li et al., 2018).Information about the depth of floodwater is also crucial for assessing the severity of floods and evaluating flood risk mitigation measures.This information is essential for deploying rescue efforts, determining road closures, and assessing accessible areas (Cohen et al., 2018).Furthermore, it also has a key role in supporting emergency response, assessing accessibility and designing suitable intervention plans, calculating water volumes, allocating resources for water pumping, and rapidly estimating the costs for intervention and reconstruction (Cian et al., 2018).Data on flood depths is not only useful for immediate response but also for post-disaster analysis, including evaluating property damage and assessing flood risks (Nguyen et al., 2016).
Various techniques have been used to estimate flood depth.Conventional approaches such as field surveys have been utilized to determine flood depth by directly measuring high-water marks in affected areas (Chaudhary et al., 2020).Though this method has a very high precision, it is time-consuming and labor-intensive.Additionally, it is limited to small-scale applications and can be impacted by weather conditions (Elkhrachy, 2022).Conventional methods also rely on information from stream gauges at specific locations to offer real-time flood data, such as water level.However, these approaches have constraints when the floodwater exceeds the height of the gauge and when scattered gauge placements are unable to adequately cover flooded areas (Z.Li et al., 2018).Another approach is to use hydrodynamic models to assess floodwater depth, including the Hydrologic Engineering Center's River Analysis System (HEC-RAS) (Athira et al., 2023;Brunner, 2016), Delft-3D (Haq et al., 2020), and LISFLOOD-FP (Yin et al., 2022).These models are known for their accuracy in simulating complex flood dynamics (Elkhrachy, 2022).However, their utilization is hindered by the requirement for extensive input datasets that involve detailed topographic, meteorological, and hydrological data.Additionally, these models require significant computational resources and rely on powerful computing systems.
Remote sensing data has been used extensively for flood management in recent decades.
Large-scale knowledge about the extent of floods can be obtained through remote sensing data, such as satellite imagery.To determine floodwater depth, studies have combined optical and Synthetic Aperture Radar (SAR) with a Digital Elevation Model (DEM).For example, Cian et al (2018) introduced a semi-automatic approach to calculate flood depth, by utilizing SAR imagery and statistical estimation of DEM from LIDAR (Light Detection and Ranging).Surampudi and Kumar (2023) utilized SAR data and Shuttle Radar Topography Mission (SRTM) DEM to generate water depth that closely follows surface undulations in agricultural lands.These methods appear to be efficient in estimating floodwater depth.However, image acquisition is limited by the temporal resolution of satellites (Bovenga et al., 2018).Also, optical sensors can be affected by cloud cover during flood events (Chaudhary et al., 2020), making it impossible to have flood images for areas covered with clouds.Additionally, vertical inaccuracies are frequently produced by DEMs, especially over complicated terrain such as urban areas.As a result, they are unreliable in identifying the important topographical elements which determine how floods behave (Schumann, 2014).
Recent advancements in machine learning have significantly revolutionized flood depth estimation.Numerous studies have employed sophisticated computer vision algorithms to remotely assess water levels.For instance, Pan et al. (2018) employed a Convolutional Neural Network (CNN)-based methodology to monitor the length of a ruler in footage captured by a video camera strategically placed adjacent to a river.Similarly, utilizing a mask region-based convolution neural network (Mask R-CNN), Park et al. (2021) achieved flood depth estimation by detecting submerged vehicles in flood photos.Furthermore, the wealth of flood images on social media platforms has provided a rich source for researchers to employ computer vision algorithms in estimating floodwater depth.Feng et al. (2020) introduced a workflow focusing on retrieving images containing humans from social media to estimate water levels.In a distinct approach, Quan et al. (2020) matched water levels with human poses, categorizing flood severity into "above the knee" and "below the knee".Other innovative approaches involve the integration of deep learning techniques with web images for flood depth estimation (Meng et al., 2019), leveraging CNNs to segment stop signs and extract flood depth data from images featuring such signs (Song and Tuo, 2021), and utilizing an object detection model based on CNN for automatically estimating water depth from images sourced from social media (Li et al., 2023).While machine learning models have showcased their effectiveness in flood depth estimation, it is crucial to acknowledge their dependency on substantial, annotated training datasets.Creating such datasets can be a resourceintensive and time-consuming endeavor, underscoring a notable challenge in the practical implementation of these models.
The recent advent of generative artificial intelligence (AI) models, notably the Generative Pre-trained Transformer (GPT), has emerged as an exceptional development.These models exhibit a remarkable capability to comprehend human natural language, enabling proficient task execution across diverse domains.GPT-4 Vision (GPT-4 hereafter), a large-scale multimodal model, has demonstrated several impressive abilities of vision-language understanding and generation (OpenAI, 2023).For example, GPT-4 can generate natural language descriptions of images and even perform image processing tasks from descriptions written as text.These models can also provide intelligent solutions that are more similar to human thinking, enabling us to use general artificial intelligence to solve problems in various applications (Wen et al., 2023).
In geographic information science, researchers have explored the potential of GPT with applications to image generation, captioning, and analysis assistance in visuals, to name a few (Osco et al., 2023).A notable effort by Hu et al. (2023) involves the integration of geo-knowledge with GPT models for identifying location descriptions and their respective categories.This fusion results in a geo-knowledge-guided GPT model good at accurately extracting location descriptions from disaster-related social media messages.Li and Ning (2023) pioneered a prototype of autonomous GIS (Geographic Information System) utilizing the GPT-4 API, aiming to accept tasks through natural language and autonomously solve spatial problems.Other endeavors include exploring the potential of GPT-4 in map-making (Tao and Xu, 2023) and examining the capability of GPT-4 in extracting information from streetview photographs in unban analytics (Crooks and Chen, 2024).
Given the exceptional capabilities of GPT-4, its anticipated impact extends to various fields, including flood risk management.This research presents an automated, fast, and reliable approach leveraging GPT-4 to estimate floodwater depth from photographs capturing flood events.This study aims to make a substantive contribution to disaster management and emergency response, with the potential to enhance mitigation strategies, ultimately contributing to life-saving efforts and minimizing economic losses.

Overview of the proposed approach
The GPT-4 model, developed by OpenAI, was trained using increasingly large amounts of data and has proven to be highly effective at extracting valuable information from images, even without requiring a separate training dataset.In this study, we propose a novel, fully automated framework for estimating the floodwater depth by leveraging the advanced potential of GPT-4.This framework, FloodDepth-GPT, uses a GPT-4 model Python API to estimate the floodwater depth.The overall concept of the proposed approach is illustrated in Figure 1.The approach begins by inputting flooding photos containing objects that can serve as consistent indicators for reference.Such street objects can include vehicles, humans, and street signs.By assessing the known height and relative submersion of these objects, FloodDepth-GPT can estimate water levels according to the visible objects within the photos.For instance, if the water reaches the knee level of a person whose height is known, FloodDepth-GPT can "deduce" the depth of water based on this comparative analysis.Besides the water depth, the FloodDepth-GPT also outlines the rationale behind it, which enhances the transparency, understanding, and explainability of the process.

Design of FloodDepth-GPT
The FloodDepth-GPT is a customized GPT with a set of prompts structured to guide the tool specifically toward estimating floodwater depth.These prompts include directions related to identifying and measuring reference points in the image, assessing visible waterlines on objects, and identifying known heights of common objects, such as humans, vehicles, and stop signs, present in the image (Appendix 1).The standard heights of the various objects were specified.For example, the average height of a man and the different parts of the body (i.e., knee, waist, shoulder, height, and waist) were included in the prompt (see Tables 1, 2, and 3 for the heights of different reference objects).Figure 2 shows samples of the objects used in this study with their corresponding heights.

Ground clearance
The height of the base of the car from the ground 0.2 0.5 0.7

Ride height
The height of the lower side of the car's body (just above the wheel) from the ground 0.6 0.8 1.0

Windshield's Height
The height at the start of the windshield from the ground

Stop sign
The height of the stop sign itself, from the top edge to the bottom edge. 0.9

Pole height
The height of the bottom of the sign from the ground 2.0

Total height
The total height of the car 2.9

Figure 2. Sample of reference objects with heights
Another crucial output of FloodDepth-GPT were detailed explanations of its estimations.This involves clear communication of the visual cues used in the estimation process and presentation of the depth measurements for ease of understanding and global applicability, enhancing the explainability of the AI output.Finally, the model was instructed to avoid speculation and base its analyses on the available objects within the image.

Performance evaluation
The ability of the FloodDepth-GPT was examined as follows.We collected 150 flood photos from various online sources.Previous studies have utilized flood photos to estimate the depth of floodwater based on different reference objects, including stop signs, vehicles, and humans (J.Li et al., 2023).This experimental dataset also incorporates these three components as main reference objects (Tables 1, 2, and 3).We ensured that each selected photo has at least one of these objects.
These photos served as input into FloodDepth-GPT, and the floodwater depth estimation for each photo was obtained using the model.
To evaluate the performance of the GPT model, this study compared the floodwater depth estimated by FloodDepth-GPT (GPT Estimation) and floodwater depth estimated manually by three individuals (Manual Estimation).The manual estimation processes were conducted independently, and used the average heights detailed in Tables 1-3.Furthermore, the Mean Absolute Error (MAE) was calculated to quantitatively measure the accuracy of the FloodDepth-GPT estimations in comparison to the manual estimations.The MAE was computed using the formula provided in Equation 1, where   is the manual-estimated depth,   is the FloodDepth-GPT estimation, and  is the number of images.

Results
Figure 3 presents the correlation between the flood water depth estimation of GPT and human.
Result shows that there is a strong positive correlation between GPT estimation and the average estimations from human observers (Pearson's correlation coefficient r = 0.8894).Additionally, there is a strong correlation between GPT and each human estimation (r = 0.8705, 0.8585, 0.8742 for human 1, 2, and 3, respectively).Overall, the data points are clustered along the regression line, which suggests that the estimations made by GPT are consistent with the estimations given by humans.Also, the consistency across the human estimations lends credibility to their use as a benchmark for evaluating the accuracy of GPT.assessments.The discrepancies can be attributed to variations in the estimation points, insufficient criteria, and incorrect water level identification.Figure 6, where a human served as the reference object, the model assessed the water level to be above the knee (approximately at the mid-thigh), while in reality, the water level on the human appears to be approximately at knee level.Future research could address this issue through enhanced prompt engineering to improve the perspective through which the model observes the water level.Additionally, discrepancies in estimations occurred from the limited criteria used for estimation by both humans and the GPT model.For instance, in Figure 7, street signs were utilized as reference points without specifying their heights in the estimation criteria, leading to variations in the estimations made by humans and the GPT model.This study introduces an automated approach to estimating floodwater depth by leveraging the ability of GPT-4.This method utilizes structured prompts to analyze flood photos and estimate flood depth.In contrast to conventional computer vision and deep learning methods that depend on specific pre-trained objects, FloodDepth-GPT can automatically identify water levels based on reference objects on flood photos.This approach streamlines the estimation process and enhances the rapidity of flood inundation mapping.
Results from this study reveal that the proposed method can utilize a variety of common reference objects in flooding photos.This enhances the method's versatility and makes this approach applicable in different flood scenarios.To the best of our knowledge, previous research has largely focused on models trained to recognize specific objects within flood photos, such as humans, stop signs, vehicles, etc.For example, J. Li et al., (2023) developed an object detection model that could only identify humans in flood photos used in their analyses.Analogously, some other studies only utilized photos containing stop signs (Alizadeh and Behzadan, 2023;Song and Tuo, 2021a).Although previous methods showed promising results, these techniques can only be applied to photos containing objects on which their models were trained, thereby restricting the utility of such models on a broader scale.
We think the findings in this study are transformative.The GPT model's ability to interpret different urban and natural elements in images opens new possibilities for automatic environmental asessmennts.Such detailed assessments are crucial for urban planning, disaster preparedness, and climate change studies can be achieved through AI-driven analysis.Moreover, this study highlights FloodDepth-GPT as an example of Explainable AI, which provides the reasoning behind the model's decision-making.This illustrates the transparency of the method, which is crucial for fostering trust and understanding in AI-driven environmental assessments.
The results of this study indicate that the proposed method is reliable in its estimations, demonstrating a mean average error within an fair range for estimating floodwater depths.
However, further examinations revealed the presence of outliers in the estimations.While some outliers do not necessarily indicate errors by the GPT model but rather reflect differences in the observation points of estimation within the photo (see Figure 6 -8), further refinements in future studies could include the introduction of more detailed criteria for estimation.Another limitation of the proposed approach is the differences in the heights and dimensions and designs of reference objects across different regions.This can limit the model's ability to provide uniform estimations across diverse geographic locations.Future studies can focus on calibrating the model to account for regional variations in reference objects.
One potential future research avenue is photo localization, as the floodwater depth is only useful when accurate geolocation of the photo is known, at least at street level.If flood photos and their geolocations can be obtained in real-time from various sources, the proposed approach could provide water depth estimates promptly, which opens the possibility of producing flood inundation maps in real-time.The most straightforward way of photo localization is to extract location information from the photo if such information is available in the metadata or from a geotagged post associated with the photo (Ning et al., 2020;Huang et al., 2019Huang et al., , 2020)).For example, some social media platforms allow users to geotag photo location using the phone's built-in Global Positioning System (GPS) or manually select the street name and address.A big challenge, though, is to locate the photo based on photo content only, where the location can be obtained by retrieving similar photos in a large database with localized photos such as street view images (e.g., Zhang et al., 2020).It is particularly challenging to locate flood photos as such photos often have inundated features, resulting in potential mismatching to the existing photos.Thus, innovative methods are required to match the flood and non-flood photos, such as the semantic scene graph (Yoon et al., 2021).

Conclusion
Floodwater depth estimation plays a pivotal role in the effective management of floods, facilitating informed disaster response and strategic planning.Utilizing the advanced capabilities of large pre-trained multimodal models (GPT-4 in this study), this paper introduces a novel method for automatically determining floodwater depths from photographs related to flooding.The GPT-4 model was customized to estimate water depths using recognized reference objects in the photo by providing specific instructions related to this task.
The findings of this study indicate that the proposed method holds considerable promise for estimating floodwater depths from photographs.In comparison to prior research in this domain, our study demonstrates a universal pipeline for floodwater depth estimation rather than training various individual models on different reference objects.This versatility results in a more efficient and cost-effective process.Additionally, the method can significantly enhance the speed of flood mapping.Such information gives decision-makers and the community at risk a rapid situation awareness of the extent and impact of flooding, which is essential for effective disaster management and decision-making.However, the study also acknowledges minor discrepancies between the model's estimations and those derived by humans.Future research could enhance this methodology through enhanced prompt engineering and the introduction of additional criteria for more accurate estimations.
As we navigate the future of flood management, AI-driven insights become paramount.This approach, leveraging the power of AI and computer vision, emerges as a means for shaping resilient communities.It not only equips emergency responders and planners with the tools needed for rapid, data-driven decision-making but also fosters the art of innovation in disaster management.

Figure 3 .
Figure 3. Scatter plots of flood depth estimation of GPT and human (unit: meter).

Figure 4 .
Figure 4. Selected flood depth estimation results by human and GPT

Figures 6 -
Figures 6 -8 present examples where FloodDepth-GPT's estimations diverged from human Figure 8 demonstrates significant variation in estimations, likely owing to divergent observation points by human observers and the GPT model, such as the center of the road or roadside.This emphasizes the challenge of precisely estimating flood depths due to variations in the terrain and the depth within a flood scenario.

Figure 6 .Figure 8 .
Figure 6.Variation in estimations due to wrong identification of water level