An Optical Digital Twin for Underwater Photogrammetry

Most parts of the Earth’s surface are situated in the deep ocean. To explore this visually rather adversarial environment with cameras, they have to be protected by pressure housings. These housings, in turn, need interfaces to the world, enduring extreme pressures within the water column. Commonly, a flat window or a half-sphere of glass, called flat-port or dome-port, respectively is used to implement such kind of interface. Hence, multi-media interfaces, between water, glass and air are introduced, entailing refraction effects in the images taken through them. To obtain unbiased 3D measurements and to yield a geometrically faithful reconstruction of the scene, it is mandatory to deal with the effects in a proper manner. Hence, we propose an optical digital twin of an underwater environment, which has been geometrically verified to resemble a real water lab tank that features the two most common optical interfaces. It can be used to develop, evaluate, train, test and tune refractive algorithms. Alongside this paper, we publish the model for further extension, jointly with code to dynamically generate samples from the dataset. Finally, we also publish a pre-rendered dataset ready for use at https://git.geomar.de/david-nakath/geodt.


Introduction
The biggest part of Earth's surface is covered by the deep sea (Eakins and Sharman 2012). Hence, vast amounts of the seafloor and the majority of the water column above it is yet to be thoroughly explored. Cameras have to be protected from salt water and their housings must sustain enormous pressures of approximately 1 bar per 10 m of depth. This especially holds true for the optical windows, the ports, of the housings. Glass domes, so called dome ports, are mechanically very stable and require thicknesses of up to one centimeter for commonly used dome diameters. The stability of flat ports depends strongly on their size and material, where larger windows quickly require thicknesses of many centimeters. Light rays collected by lenses behind these ports traverse different media and are refracted at the interfaces, which complicates underwater photogrammetry and associated applications of computer vision (Fig. 1).
The ocean can be coarsely separated into euphotic, disphotic and aphotic light zones. The bottom of the first zone is defined at 200[m] where only 1% of the surface photosynthetic available radiation (PAR) is remaining. Furthermore, no significant portion of sunlight reaches depths below and is totally extinct after 1000 [m], which marks the beginning of the aphotic zone (Kirk 1994). Hence, deep ocean photogrammetry needs artificial light sources, which also have to be accommodated in the same kind of housings as cameras. In such a scenario, also the cones of the lights are subject to refraction effects.
As ship time is expensive, and working on the ship is very demanding and allows only limited modifications of a system, it is desirable to test, tune and verify sensors and corresponding algorithms up front. In our experience, it is a good development practise to follow a development model with increasing complexity levels: 1. test correctness and stability of the core measurement model, observation equations and estimation algorithms by unit tests and numeric simulations 2. simulate as realistic as possible sensor data (here images) to evaluate the algorithm end-to-end, with the same pipeline to be used on real data 3. repeat the above experiments in a controlled, but real, setting (e.g. test tank), with increasing complexity 4. finally, perform experiments in the ocean In particular, steps 2 and 3 are important to understand issues and limitations of a photogrammetric system, when the data becomes more realistic. We have, therefore, built a tank that allows to attach underwater cameras to test underwater imaging algorithms. Still, setting up experiments here means substantial effort and many effects can be observed already in simulated data. For underwater photogrammetry applications, in particular refraction is important, and we observe that for a large part the photogrammetry community is not considering refraction explicitly, potentially due to a lack of supporting software and high costs / burdens to set up underwater equipment.
To facilitate the development of refractive algorithms, from calibration to multiview relation estimation, bundle adjustment or dense stereo reconstruction, we therefore provide a geometrically verified virtual test environment that provides easy access to refraction effects both with dome and flat ports, where users can set water properties and of course also add other objects or scenes as needed. Actually, implementing and verifying such a model takes substantial time and might block people from further research in this direction, which is why we want to make our efforts available to others.

Contribution and Outlook
In this paper, we specifically contribute the following: (i) we devise an optical digital twin of an underwater setting (real lab tank), which (ii) has been geometrically verified against a numerical simulation and real imagery. Furthermore, we will publish a (iii) Blender-based datset-generator with a convenient YAML-based interface. Finally, we will (iv) publish a pre-rendered dataset for the dome-port and the flat port interfaces for the conditions no-water, half-water and full-water.
The remainder of this paper is structured as follows. In the subsequent Sect. 2, we will present related work in the fields of refractive geometry and underwater image simulations. We will then turn to a detailed description of the water tank environment and its geometric optical digital twin (GEODT) in Sect. 3. The geometric verification of the just-introduced digital twin will be presented in Sect. 4. In Sect. 5. the pre-rendered dataset and the interface for dataset-generation will be described in detail. Finally, this paper will conclude with Sect. 6.

Refractive Geometry
It is well known that refractions are an integral part of the underwater image-formation model and thus have to be carefully taken into consideration in phototgrammetric applications, see e.g., (Shmutter 1967;Moore 1976;Kotowski 1988;Fryer and Fraser 1986) as well as (Harvey and Shortis 1998;Jaffe et al. 2001;Kunz and Singh 2008;Drap 2012).
Underwater imaging systems involving flat ports actually become axial cameras (Treibitz et al. 2008), and refraction at such camera housings, considered two times for thick glass, significantly complicates forward projection (Agrawal et al. 2010) and structure from motion (Jordt 2014;Jordt et al. 2016). Exactly centering a pinhole camera inside a dome port on the other hand can avoid refraction of principal rays, but doing so requires some effort (She et al. 2019;Menna et al. 2016). Decentered dome systems also become axial cameras, though with different geometry (She et al. 2022) and suffer from refraction (Menna et al. 2020). For both, dome and flat ports, efficient refraction models, approximations and algorithms are still an active area of research (see e.g., (Nocerino et al. 2021;Menna et al. 2017) as well as (Jordt and Koch 2011;Mulsow and Maas 2014;Duda and Gaudig 2016;Hu et al. 2021)).

Simulated Underwater Datasets
While real underwater datasets are-of course-the most desirable kind of data, it remains costly and difficult to obtain them. In addition, it is extremely challenging and sometimes even impossible to obtain ground truth by annotating the data or even by taking independent measurements. Hence, simulated datasets are a valid option, too-provided they can synthesize images with a satisfactory quality and accuracy.
On the simulation side, there exists some prior work especially on the simulation of Autonomous Underwater Vehicles (AUVs) equipped with cameras. The holistic AUV simulator UW Sim (Prats et al. 2012), which simulates an AUV and its sensor-suite, comprises a simple underwatercamera. The UUV-simulator (Manhães et al. 2016) rests on Gazebo (Koenig and Howard 2004) to provide a very comprehensive and interactive AUV-simulation suite. Both approaches, in turn, rely on the Robot Operating System (ROS) (Quigley et al. 2009), to allow for a tight integration with actual robots. Further underwater camera simulators model shallow sea water (Cozman and Krotkov 1997) with the Fog model (Nayar and Narasimhan 1999) or deep sea environments  with the Jaffe-McGlamery model (Jaffe 1990;McGlamery 1975).
However, the above approaches neglect the issue of refraction, introduced by water-glass-air interfaces, while the focus mainly rests on the issues of attenuation (Akkaynak et al. 2017) and scattering (Preisendorfer 1964; Mobley et al. 2021). Above the water, Agrafiotis et al. (2021) synthesized images taking the refractive surface of water into account. Underwater, Kahmen et al. (2019) employed a refracted projection for multi camera systems for flat interfaces, which basically corresponds to our numerical verification approach (She et al. 2022;Jordt-Sedlazeck and Koch 2012;Kunz and Singh 2008). Also for flat ports, Sedlazeck and Koch (2011) proposed a Jaffe-McGlamery-based (Jaffe 1990;McGlamery 1975) image formation model that customly added refraction effects. While having been a great tool at the time, due to the rasterization-based technique of the renderer, volumetric effects are only approximated coarsely in a post-processing step and the system was handcrafted for a particular flat-port.
In rasterized rendering approaches, geometry is transformed into the image space in a feed forward process. Optical effects occurring in the process have to be described by approximate models. Physically based rendering is an alternative way of synthesizing images (Pharr et al. 2016). In such a raytracing (Whitted 1980) approach light rays are shot through a scene, and their behavior is defined based on physical models. Multiple rays are shot per image pixel and the computed intensities are subsequently integrated to obtain a color value. Finally, a realistic-and physically sound-image can be obtained by repeating this process for every pixel. A well-known software bundle to design scenes and perform raytracing on them is Blender (Blender Community 2018). In Zwilgmeyer et al. (2021), it is used to simulate underwater images, however, totally neglecting the issue of refraction. We too use Blender as a dataset generator, with a special focus on refraction at the interfaces between media with differing optical densities.

GEODT-A Geometrically Verified Optical Digital Twin of a Scientific Lab Tank
We model an actually existing water lab tank, which is in every day use, to virtually make it available to the underwater-photogrammetry community, for testing, development, training, and tuning purposes.

General Setup
To resemble the real water tank as close as possible (see Fig. 2), we took the following steps. We model the tank as of water, which is modeled as a water body with a cavity to accommodate the dome port. The latter is necessary to keep the volumetric water effects out of the dome itself. Finally, we place a calibration target in the tank, which-due to its known properties-can be used for calibration, training, as well as for verification purposes. The latter can either be e.g., a checkerboard or a random-dot-pattern-equipped (Li et al. 2013) calibration object (see Fig. 3).

Volumetric Raytracing
To obtain an image with the raytracing technique in a volumetric setting, some variant of the volumetric rendering equation (VRE) (Novák et al. 2018;Fong et al. 2017), which is a generalization of the rendering equation (Kajiya 1986), has to be solved. As a closed-form solution is usually intractable for any non-trivial scene configuration, Monte Carlo methods are usually employed to approximate the solution (Novák et al. 2018;Veach 1998). In this paper, we specifically use the path tracer of Blender's 2.83 LTS Cycles engine, which is build on top of OptiX (Parker et al. 2010), to obtain the result.
The path tracer needs geometry information, material definitions, and medium definitions as an input for its computations. The geometry is provided by models, we define in Blender itself. We further define all materials as diffuse Bidirectional Scattering Distribution Functions (BSDFs) (see e.g., Pharr et al. 2016), whose reflectance are either defined by a base color or by a texture (e.g., in the case of the calibration targets). In addition, a medium definition is required, to compute the beam transmittance and a phase function, which encodes probable scattering directions. In the following subsections, we will thus additionally give a detailed definition of the latter.

Homogenous Scattering Medium
Throughout this paper, we assume the water volume as well as the glass volume to be exhaustively defined by a homogeneous scattering medium (see e.g., Pharr et al. 2016). The light propagation in such a medium is governed by the three following equations.

Attenuation
The attenuation describes the mean free path a ray can travel in the medium, it is given by the sum of absorption and out-scattering Those values can be set for the wideband coefficients R, G, B in Blender.

Albedo
The albedo gives the scattering ability of the medium by defining the probability of an absorption vs. a scattering event ( s ), once a particle is hit in the medium. It is given by Again, these values can be set for the wideband coefficients R, G, B in Blender.

Scattering
The scattering itself has to be carried out in a certain direction, commonly the Henyey Greenstein phase function is employed to describe a distribution over the unit sphere of directions (Henyey and Greenstein 1941) (1) t = a + s .
(3) p HG = (cos ) 1 4 1 − g 2 (1 + g 2 + 2g(cos )) 3∕2 . Fig. 3 Pre-defined calibration targets: left unwrapped random dot-pattern texture (Li et al. 2013) and corresponding calibration cube; right assortion of checkerboards. Of course, different calibration targets can be added and used as desired g = −1 g = 0 g = 1 Fig. 4 The Henyey-Greenstein phase function: the g value indicates the mean scattering direction and ranges from −1 full backward-scattering, over g = 0 isotropic scattering, to g = 1 full forward-scattering. up: view onto the water column; down view through the camera of the AUV It has the mean scattering direction parameter g which can be set in Blender. Its behavior is depicted in Fig. 4.

Modelling of Refractions
Refractions occur at the interfaces between participating media with different optical densities and are governed by Snell's Law (Glassner 1989). It is defined by the ratio of the sine of the angles 2 and 1 of the in-and outgoing ray w.r.t. the surface normal of an interface which equals to the ratio of the speed of light before v 1 and v 2 after transitioning to the other medium. We will use the reciprocal ratio of the indices of refraction to define the properties of an interface. The actual refraction of a ray within the simulation depends on the incident angle w.r.t. the surface normals of the model we use in the simulation (see Fig. 5), hence we have for the different interfaces as well as

Holistic Interface Modeling
After definition of all interface types, we modeled three different tank fill-rate configurations for the no-water, half-water, and full-water case, to ensure a proper holistic handling of light rays shot through the scene (see Fig. 6).
To enable versatile evaluation strategies, we provide three different water levels, which can be used as a verification step (full vs. no water: can we undo the water effects in the images?) or for information retrieval (half-water-case) in an e.g., calibration approach (She et al. 2019).

Full Water
In the full water case, the water body is modeled as a homogeneous scattering medium with a surface, which does not interact with the light. In addition, it is carved in to accommodate the dome-port without simulating water inside the port itself. To complete the water-body, its surface is explicitly modeled as an air2water interface (see Fig. 6). The dome-port is modeled as an air2glass-interface, followed by a glass-volume and finally a glass2water interface. The flat-port has the same interface/volume structure as the dome-port, it just exhibits a different (i.e. planar) geometry (c.f. Fig. 6).

Half Water
In the half water case, the water body is modeled in the same fashion as in the full-water case. The only difference is, that its height is exactly at the middle of the dome-port and the flat-port to enable direct comparison experiments. The dome-port is now modelled as an air2glass-interface followed by a glass-volume. To correctly account for the water level, exitant light now passes a splitted interface, where the upper part is modeled as an glass2air-interface, while the lower part is a glass2water-interface. Again, the flat-port has the same interface structure as the dome-port (c.f. Fig. 6).

No Water
Finally, in the no water case, we simply omit the water body. The dome-port is now modeled as an interface where light enters through an air2glass interface, passes the glass medium and exits through a glass2air interface. Here, the flat port is again modeled in a similar fashion like the dome port (See Fig. 6).

Approach
We verify the simulated lab tank GEODT against the two adjacent methods corresponding to step 1 and step 3 of our evaluation pipeline stated in the Introduction Sect. 1: namely numerical and real tank experiment (see Fig. 7). As an error-measure we chose the 2-norm of the mean the pixeldifference | x , y | 2 in image space of the detected corners on a known calibration target.

Setup
For the tank parametrization, we use the numbers stated in Sect. 3.1, to resemble the real tank as close as possible. In addition, we set the index of refraction ior air = 1.0 and for the water we use ior water = 1.333 . Finally, we set ior glass = 1.473 , as given by the manufacturer. For further preparation of the actual data, we obtain the in-air intrinsics, of the Basler machine vision camera using standard chessboard calibration. It is given in terms of an opencv pinhole model {K, d} , where K is the intrinsic matrix and the vector d denotes the corresponding distortion coefficients. Here, we yield a calibration residual of 0.25 [px]. In addition, we take actual photos in the lab tank and find the offset vector v o of the camera w.r.t. the dome center and to the checkerboard using (She et al. 2022). The reprojection error after this step is 0.52 [px]. After having detected the board pose, we can precisely rebuild the whole scene with the parameters relevant to model the refraction effects, using where k 1 holds for the ideal pinhole camera, modeled in Blender, k 2 is the real camera and d 1 = 0 as well as d 2 are the respective corresponding distortion parameters. Finally, v 0 denotes the offset vector of the camera w.r.t. the dome in [m] in Blender coordinates. It thus defines the extrinsics, when we know the dome position and assume no rotation w.r.t. it.
To obtain our real measurements, we extract the corners from the images taken in the tank and undistort them afterwards. The latter step allows us to investigate the refraction effects in the space of the ideal pinhole camera as simulated by Blender. We then use the available information to rebuild the scene in the GEODT and also subsequently extract the corners from the synthesized images. For the corner extraction step, we assume an error of 0.1 [px]. Finally, we numerically forward project the refracted corners from 3D space to image space (Kunz and Singh 2008).
We use the implementation of (She et al. 2022), however, also other implementations like (Jordt-Sedlazeck and Koch 2012) exist.

Results
For verification, we compare all 6 × 7 corners over 12 checkerboard poses and compute the mean as well as the standard deviation ̂ of the relative error. As we can see in Table 1, the mean error norm of the GEODT is very low (0.16[px]), when compared to the numerical simulation. The majority of this error can be explained by the corner detection noise stemming from the GEODT dataset. There is no detection noise to be accounted for in the numerical simulation, as the positions are directly computed. The comparisons with the real data generally yield a higher error for the GEODT, which holds as well for the numerical condition. Again, a lot of this error can be explained by the initial calibration (intrinsics and offset vector), which already yields a reprojection error of 0.52 [px]. This has to be considered in addition to the corner detection noise on the real as well as on the real GEODT data. This leaves us only with a residual << 1[px] in the mean error norms which can be caused by the numerical and GEODT models. See Fig. 8 for an overview of the distribution of the relative error as well as the real dataset used for verification. As the test images cover a lot of different poses, we can expect a very close simulation of the reality by the GEODT.

Dataset
Our dataset comprises a set of rendered images as well as the Blender model of the tank which can be used to generate imagery with custom settings. For all rendered data, the corresponding YAML-config files are provided in the supplementary material, to allow for an easy extension.

Pre-rendered Dataset
We rendered a dataset with 4096 samples per pixel(spp), not using any denoising steps to maintain the physical soundness of the images. In Fig. 9, example images are shown in the conditions full, half, no water in each row. Specifically, Fig. 9a-c show example images for the dome port using the A3 calibration board and the board poses and the camera offset-vector v o extracted from the real tank data. Figure 9d-f use the same poses mirrored to the other side of the tank and shown through the flat port, whose camera is centered and has a distance of 2[cm] to it. In Fig. 9g-i an example pose from a set of 20 random poses using the calibration cube (see Fig. 3, left) is synthesized using the dome port. Finally, Fig. 9 Examples from the pre-rendered datasets using dome and flat-port as well as the conditions full-water (left), half-water (middle), and nowater (right). The both upper rows show the A3 calibration target, while the two lower rows show the calibration cube with random dot pattern an example pose from the same set is shown in Fig. 9j-l through the flat port. In both latter conditions, we use the same camera settings as in the former ones.

Dataset Sampling
To generate more data, we provide the Blender file with code for automated generation of more images. i.e., if different poses, IORs or camera-settings are desired. Since the main challenge is to model the optical ports and interfaces, it should be easily possible to add more objects, textures or even whole scenes into the environment as needed. If desired, a seawater index of refraction can also be computed, based on temperature, pressure, salinity, density, and wavelength (Millard and Seaver 1990).

YAML Interface for Blender
For easy dataset generation, we implemented an interface, where the main parameters for dataset generation can be conveniently defined from the outside in a simple YAML file. Please see supplementary material (as indicated in the abstract) for examples.

Conclusion
In this paper, we introduced an optical digital twin of an underwater photogrammetry setting by modeling a real water lab tank and underwater cameras with the most common interfaces. Its main purpose is to facilitate further development of refractive photogrammetric algorithms and virtual verification experiments. We provide dome and flat ports with realistic (but still adjustable) optical properties as well as a convenient way to set the water properties in the virtual environment. We have shown by comparison to real camera images and to numerical forward projection of 3D coordinates that the refraction effects are properly simulated. Taking into consideration the errors caused by camera, dome offset calibration as well as corner detection, we can can safely assume a geometrical modeling error << 0.1 [px]. This customizable basic tool box is easy to use for training, testing and verification in other multi-media refraction scenarios or environments.

Limitations and Future Work
As of now, the model does not account for diffraction (Radziszewski et al. 2009) or depth of field effects. In the future, a radiometric calibration, which is also influenced by refraction effects, would also be desirable to further enable the development, tuning, and testing of color-restoration algorithms like e.g., (Akkaynak and Treibitz 2019;Nakath et al. 2021).