Learnable differencing center for nighttime depth perception

Yan, Zhiqiang; Zheng, Yupeng; Fan, Deng-Ping; Li, Xiang; Li, Jun; Yang, Jian

doi:10.1007/s44267-024-00048-9

Learnable differencing center for nighttime depth perception

Research
Open access
Published: 19 June 2024

Volume 2, article number 15, (2024)
Cite this article

Download PDF

You have full access to this open access article

Visual Intelligence Aims and scope Submit manuscript

Learnable differencing center for nighttime depth perception

Download PDF

81 Accesses
Explore all metrics

Abstract

Depth completion is the task of recovering dense depth map from sparse ones, usually with the help of color images. Existing image guided methods perform well on daytime depth perception self-driving benchmarks, but struggle in nighttime scenarios with poor visibility and complex illumination. To address these challenges, we propose a simple yet effective learnable differencing center network (LDCNet). The key idea is to use recurrent inter-convolution differencing (RICD) and illumination affinitive intra-convolution differencing (IAICD) to enhance the nighttime color images and reduce the negative effects of the varying illumination, respectively. RICD explicitly estimates global illumination by differencing two convolutions with different kernels, treating the small-kernel-convolution feature as the center of the large-kernel-convolution feature in a new perspective. IAICD softly alleviates the local relative light intensity by differencing a single convolution, where the center is dynamically aggregated based on neighboring pixels and the estimated illumination map in the RICD. On both nighttime depth completion and depth estimation tasks, extensive experiments demonstrate the effectiveness of our LDCNet, reaching the state of the art.

RigNet: Repetitive Image Guided Network for Depth Completion

Cascaded Transposed Long-Range Convolutions for Monocular Depth Estimation

Inpainting of Depth Images Using Deep Neural Networks for Real-Time Applications

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Depth completion [1] aims to predict dense depth maps from sparse ones and the corresponding color images. It is an essential task in computer vision and has been widely used in various applications, such as augmented reality [2, 3], 3D scene reconstruction [4, 5], and self-driving [6, 7]. In the past few years, plenty of image guided methods [7–10] have been proposed for depth completion under daytime conditions, e.g., the well-known KITTI benchmark [11]. However, very few approaches have focused on the more challenging nighttime scenarios. Nighttime depth-aware self-driving is especially important but difficult. As shown in Fig. 1, existing state-of-the-art image guided depth completion methods [4, 7, 9] perform well under daytime conditions but struggle in challenging nighttime scenarios. This is because the sparse depths from light detection and ranging (LiDAR) are illumination-invariant while color images are highly affected by visibility and illumination variations. Therefore, we identify the key challenge of nighttime depth completion as guidance from color images, which suffer greatly from poor visibility and complex illumination.

Poor visibility

A possible solution is to leverage existing low-light image enhancement techniques [12–14] to improve the visibility of nighttime color images. Since there are no paired clear images available as supervisory signals, self-supervised methods [12, 13, 15] for nighttime depth perception are preferred. However, they cannot generate very reasonable illumination maps, resulting in extremely untrustworthy enhanced color images for safe self-driving. For example, Fig. 2 shows that the state-of-the-art model [13] suffers from severe color cast. To address this issue, we propose recurrent inter-convolution differencing (RICD), which explicitly and gradually estimates global illumination to improve poor visibility, by using continuous differencing between two convolutions with different kernels. Treating the small-kernel-convolution feature as the center of the large-kernel-convolution feature is a new perspective. Moreover, convolution subtraction [16] is useful for modeling uncertainty [17] where the target pixels are difficult to predict accurately. In the nighttime image above, we can easily observe that there are many areas with underexposure, overexposure, and terminator (the junction area between light and dark) effects due to varying illumination, resulting in more and higher uncertainty than usual. Based on these priors, we transform the uncertainty in nighttime scenarios into relative light intensity by applying continuous convolution differencing. Such differencing features that capture explicit light intensity information are essential for predicting valid illumination. Consequently, RICD contributes to robust visibility enhancement of nighttime color images with more naturalistic visual effects, as shown in Fig. 2.

Complex illumination

Even after applying RICD enhancement, the distribution of relative light intensity in nighttime color images is still much more complex than that in daytime conditions. For instance, there are many terminator areas with varying illumination, which are difficult for standard convolutions to handle. Fortunately, inspired by the local binary pattern operator [18], which is robust to illumination variations, a series of central convolution differencing algorithms [19–22] are devised to address these challenging scenarios. Nevertheless, their differencing centers are typically fixed, leading to restricted applicability, especially for self-driving where safety is incredibly important. For example, if the center contains noise or is on terminator, these algorithms would introduce additional negative reference information, thus resulting in unsatisfactory illumination robustness. To address this issue, we propose illumination affinitive intra-convolution differencing (IAICD), which learns reasonable differencing centers within a single convolution. On the one hand, IAICD can reduce the latent impact of noise and predict an adaptive differencing center based on the surrounding neighbors. On the other hand, the estimated illumination map in the RICD module is used to adaptively measure the contribution of each neighbor. As a result, IAICD can cope with complex illumination in challenging nighttime scenarios.

Finally, considering that nighttime depth estimation [23–25] is a highly relevant task, we further evaluate our model. In short, our contributions are as follows:

1) To the best of our knowledge, this is the first instance where we have expanded the traditional depth completion task to address the challenging conditions of nighttime environments, thereby enhancing the safety of self-driving applications.

2) We identify the key challenge of nighttime depth completion as the guidance from color images, where the visibility is low and illumination is complex. To tackle these issues, RICD and IAICD with learnable differencing centers are proposed.

3) We build two benchmark datasets for the nighttime depth completion task. Extensive experiments show that our method achieves state-of-the-art results.

2 Related work

2.1 Monocular depth perception at night

Monocular depth perception mainly consists of depth estimation [26] and completion [11]. To date, various depth estimation methods have been developed for both supervised [27, 28] and self-supervised [26, 29] ways for daytime scenarios. Recently, some depth estimation approaches [23–25, 30, 31] focus on nighttime conditions. Specifically, ADDS [24] proposes a domain-separated network to address large day-night domain shift and illumination variations. RNW [23] introduces prior regularization and consistent image enhancement for stable training and brightness consistency, respectively. Furthermore, to handle the challenges in underexposed and overexposed regions, STEPS [25] presents a new method that jointly learns a nighttime image enhancer and a depth estimator with uncertain pixel masking strategy and bridge-shaped curve. For depth completion, the majority of related works are applied in daytime scenarios, employing either supervised [4, 7, 9, 10, 32–37] or self-supervised [6, 38] methods. For example, RigNet [9] and RigNet++ [39] explore a repetitive design in the image guided network branch to acquire clear guidance and structure. CFormer [7] couples convolution and vision transformer to leverage their local and global contexts. LRRU [35] presents a large-to-small dynamical kernel scope to capture long-to-short dependencies. However, there are hardly any depth completion approaches that focus on the much more challenging nighttime environment, which is a vital component of self-driving. Therefore, we attempt to develop a basic framework for nighttime depth completion task to compensate for self-driving applications.

2.2 Differencing convolution

Vanilla convolution is commonly used to extract basic visual features in deep learning networks, but it is not very effective when processing scenes with varying illumination. Inspired by local binary pattern [18] that is robust to illumination variations, CDC [19] first introduces central differencing convolution to aggregate both intensity and gradient information. After that, lots of modified operators are presented in various vision tasks [20–22, 40]. For example, C-CDC [41] extends CDC into dual-cross central differencing convolution via horizontal, vertical, and diagonal decomposition for face anti-spoofing. Furthermore, PDC [21] proposes pixel differencing convolution to enhance gradient information for edge detection. Besides, SDN [22] introduces semantic similarity to build semantic differencing convolution for semantic segmentation. However, their fixed differencing centers are usually not very robust or reasonable enough if they contain noise or are on terminator. Different from these methods, our goal is to design learnable differencing centers that are affinitive to neighbor and illumination distribution for safe self-driving at night.

2.3 Low-light image enhancement

Low-light images captured in dark environments often suffer from severe noise, low brightness, low contrast, and color deviation [14, 15]. Thus, plenty of supervised [42, 43] and self-supervised [12, 13, 15] methods are presented to restore the details. For example, the well-known histogram equalization (HE) [44] is a classic algorithm that strengthens the global contrast. However, the accuracy of HE-based approaches would degrade as the background noise contrast increases. As an alternative, Retinex-based methods [42, 45] perform better, with the assumption that low-light image can be decomposed into illumination and reflectance. However, this task usually lacks paired ground truth annotations, since there may be multiple low-light and high-light images for the same scene, making it difficult to determine which reference image is the best. Moreover, strictly aligned image pairs are also difficult to obtain [46]. Thus, self-supervised approaches are preferred for real-world applications. Recently, SCI [13] proposes a self-supervised illumination estimation framework with extremely lightweight parameters. Based on these previous studies, we present the recurrent differencing strategy between paired convolutions to predict more reasonable illumination.

3 Method

Our work is specifically designed for nighttime depth completion, as well as depth estimation. We introduce the recurrent inter-convolution differencing and illumination affinitive intra-convolution differencing in Sect. 3.2 and Sect. 3.3, respectively. The detailed design is depicted in Fig. 3. Then we elaborate our framework in Fig. 4 of Sect. 3.4. Finally, the loss is defined in Sect. 3.5.

3.1 Prior knowledge

Self-calibrated illumination (SCI)

Based on the Retinex theory [47], the relation between the low-light image x and enhanced image $\boldsymbol {x}'$ is formulated as:

$$ \boldsymbol {x}'=\boldsymbol {x} \oslash \boldsymbol {m}, $$

(1)

where $\boldsymbol {m}\in (0, 1]$ is the estimated illumination map, and ⊘ denotes pixel-wise division. On this basis, a self-calibrated module and an illumination estimator are designed in lightweight elf-supervised SCI [13]. Given the low-light input x, this estimator predicts the illumination map m via several $3\times 3$ convolutions. The enhancement loss includes fidelity and smoothness terms, defined as:

$$ \begin{aligned} &{\mathcal{L}_{\mathrm{f}}}= \frac{1}{n}\sum_{i=1}^{n}{{ ( \boldsymbol {m}_{i}-\boldsymbol {x}_{i} )}^{2}}, \\ &{\mathcal{L}_{\mathrm{s}}}=\frac{1}{n}\sum _{i=1}^{n}{\sum_{j\in \mathcal{N} ( i )}{{{ \mathcal{G}}_{i,j}} \vert {{\boldsymbol {m}}_{i}}-{{ \boldsymbol {m}}_{j}} \vert }}, \end{aligned} $$

(2)

where $\mathcal{G}_{i,j}$ is the weight of a Gaussian kernel, and $\mathcal{N} ( i )$ is a window centered at i with $5\times 5$ adjacent pixels. $\mathcal{L}_{\mathrm{f}}$ measures the similarity of m and x while $\mathcal{L}_{\mathrm{s}}$ regularizes the consistency of m itself. n denotes the number of valid pixels.

Vanilla convolution

We denote the frequently used 2D spatial convolution as vanilla convolution. For simplicity, we describe the convolution operator in 2D while ignoring the channel dimension. Given the input x, the new output feature y produced by vanilla convolution is represented as:

$$ \boldsymbol {y}_{p_{0}}=\sum_{p_{n}\in \mathcal{R}}{ \boldsymbol {\omega}_{p_{n}} \cdot \boldsymbol {x}_{p_{0} + p_{n}}}, $$

(3)

where $\mathcal{R}$ is the local receptive field region sampled from x. $\boldsymbol {\omega}_{p_{0}}$ is the convolution weight in the current location, while $p_{n}$ enumerates the locations in $\mathcal{R}$. Figure 3(a) is a $3\times 3$ kernel case.

Central differencing convolution

Based on the vanilla convolution, CDCN [19] designs central differencing convolution (CDC), where every pixel of x in $\mathcal{R}$ subtracts its center pixel $\boldsymbol {x}_{p_{0}}$:

$$ \boldsymbol {y}_{p_{0}}=\sum_{p_{n}\in \mathcal{R}}{ \boldsymbol {\omega}_{p_{n}} \cdot ( \boldsymbol {x}_{p_{0} + p_{n}} - \boldsymbol {x}_{p_{0}} )}. $$

(4)

Figure 3(b) illustrates the process of the above equation. Furthermore, by combining Eqs. (3) and (4), it yields the trade-off contribution of vanilla convolution and CDC, which is defined as:

$$ \begin{aligned} \boldsymbol {y}_{p_{0}}&=\theta \cdot \sum_{p_{n}\in \mathcal{R}}{\boldsymbol {\omega}_{p_{n}} \cdot ( \boldsymbol {x}_{p_{0} + p_{n}} - \boldsymbol {x}_{p_{0}} )} \\ &\quad{} + ( 1 - \theta ) \cdot \sum_{p_{n}\in \mathcal{R}}{ \boldsymbol {\omega}_{p_{n}} \cdot \boldsymbol {x}_{p_{0} + p_{n}}} \\ &\Rightarrow \quad \sum_{p_{n}\in \mathcal{R}}{\boldsymbol {\omega}_{p_{n}} \cdot \boldsymbol {x}_{p_{0} + p_{n}}} + \biggl( -\boldsymbol {x}_{p_{0}} \cdot \sum_{p_{n}\in \mathcal{R}}{\boldsymbol {\omega}_{p_{n}}} \biggr), \end{aligned} $$

(5)

where the first term is the vanilla convolution and the second is the central differencing term. θ is a coefficient.

3.2 Recurrent inter-convolution differencing

Existing low-light image enhancement methods cannot restore very reasonable output in more challenging nighttime self-driving scenarios. For example in Fig. 2, SCI [13] suffers from severe color cast. To address this issue, we propose RICD in Fig. 3(c). First, convolution subtraction [16] between two different-kernel vanilla convolutions is employed in RICD to highlight the uncertainty of different lighting areas. Then, the uncertainty is converted into illumination via recurrent convolution differencing. Suppose that $\mathcal{R}$ is the larger local receptive field region, while $\mathcal{\bar{R}}$ is the smaller, where $\bar{p}_{n}$ is the location enumeration. $\mathcal{R}$ and $\mathcal{\bar{R}}$ have the same current location $p_{0}$. As a result, one step of RICD is defined as:

$$ \boldsymbol {y}_{p_{0}}=\sum_{p_{n}\in \mathcal{R}}{ \boldsymbol {\omega}_{p_{n}} \cdot \boldsymbol {x}_{p_{0} + p_{n}}} - \sum _{{\bar{p}_{n}}\in \mathcal{\bar{R}}}{\boldsymbol {\omega}_{ \bar{p}_{n}} \cdot \boldsymbol {x}_{p_{0} + \bar{p}_{n}}}. $$

(6)

One novel aspect of RICD is that it converts the uncertainty distribution into illumination estimation. Besides, it introduces a new perspective that identifies the feature of the smaller-kernel convolution as the center of the feature of the larger-kernel convolution. The differencing center is dynamically learned from its local environment. These characteristics contribute to valid illumination prediction. Consequently, according to Eq. (1), RICD can restore robust enhanced images.

3.3 Illumination affinitive intra-convolution differencing

Although the RICD enhances the visibility of nighttime images, the relative light intensity caused by varying illumination is still much more complex than that in daytime images. To handle this problem, we present IAICD in Fig. 3(d). Different from CDC [19] whose center is typically fixed, IAICD first aggregates its differencing center adaptively from all neighboring pixels. After yielding the differencing matrix between neighbors and the center, IAICD reweights the matrix via $\mathcal{M}$, which is a channel-wise (c) normalization of the illumination map m, i.e., ${{\mathcal{M}}^{c}}={{{\boldsymbol {m}}^{c}}}/{\sum_{v=1}^{c}{ \vert {{\boldsymbol {m}}^{v}} \vert }}$, then yielding

$$ \boldsymbol {y}_{p_{0}}=\sum_{p_{n}\in \mathcal{R}}{ \boldsymbol {\omega}_{p_{n}} \cdot \biggl( \boldsymbol {x}_{p_{0} + p_{n}} - \sum_{p_{n}\in \mathcal{R}}{\mathcal{M}_{p_{n}} \cdot \boldsymbol {x}_{p_{n}}} \biggr)}. $$

(7)

Compared with CDC, the differencing center predicted by IAICD is robust. On the one hand, if the center $\boldsymbol {x}_{p_{0}}$ contains noise, CDC would introduce abnormal differencing information, whereas IAICD could ignore $\boldsymbol {x}_{p_{0}}$ or reduce its negative effect by distributing very small weight. On the other hand, if the center $\boldsymbol {x}_{p_{0}}$ lies on terminator areas, the fixed $\boldsymbol {x}_{p_{0}}$ is no longer appropriate as the differencing center, because its light intensity differs significantly from that of the neighbors. As an alternative, we integrate the corresponding illumination map to adjust the weight of each neighboring pixel.

While the illumination map is an all-ones matrix, and the weight of $p_{n} (n\neq 0)$ is equal to zero, IAICD will degenerate into CDC. IAICD in Eq. (7) is a generalized version of CDC in Eq. (4).

3.4 Learnable differencing center network

Architecture

Our learnable differencing center network (LDCNet) is illustrated in Fig. 4. Overall, LDCNet consists of an image guidance branch and a depth prediction branch. In the image guidance branch, the low-light image x is first fed into the RICD, generating the enhanced image $\boldsymbol {x}'$ and the illumination map m. Next, a simple Unet-like subnetwork $\Phi _{\mathrm{c}}$, composed of five layers with resolutions of 1/1, 1/2, 1/4, 1/8, and 1/16, is used to encode $\boldsymbol {x}'$. Together with m, then the features of each layer are input into the IAICD. In the depth prediction branch, the sparse depth d is encoded by a similar subnetwork $\Phi _{\mathrm{d}}$. Meanwhile, the output of IAICD is resolution-wisely leveraged to guide the dense depth prediction in $\Phi _{\mathrm{d}}$, yielding the final depth output o.

3.5 Loss function

Following the previous depth completion methods [9, 35, 36, 48], we employ $\mathcal{L}_{\mathrm{2}}$ loss to supervise the output o by using the ground truth depth D.

$$ {\mathcal{L}_{\mathrm{2}}}=\frac{1}{n}\sum _{i=1}^{n}{{ ( \boldsymbol {D}_{i}- \boldsymbol {o}_{i} )}^{2}}. $$

(8)

Finally, we jointly train the low-light image enhancement subnetwork and depth prediction subnetwork by combining Eqs. (2) and (8), obtaining the total loss function:

$$ \mathcal{L}_{\mathrm{total}}=\mathcal{L}_{\mathrm{2}} + \alpha \mathcal{L}_{\mathrm{f}} + \beta \mathcal{L}_{\mathrm{s}}, $$

(9)

where α and β are hyper-parameters, which are set to 0.15 and 0.30 as the defaults, respectively.

4 Experiments

In this section, we first introduce the two generated datasets in Sect. 4.1, metrics in Sect. 4.2, and implementation details in Sect. 4.3. Then, we present the quantitative and qualitative comparison with state-of-the-art methods in Sect. 4.4 and Sect. 4.5. Finally, extensive ablation studies in Sect. 4.6 are conducted to validate the effectiveness of each module.

4.1 Datasets

RobotCar-Night-DC

Oxford RobotCar [49] is a large-scale dataset that captures various weather and traffic conditions along a route in central Oxford. We create RobotCar-Night-DC from the 2014-12-16-18-44-24 sequences by using the left color images of the front stereo-camera. To generate sparse and ground truth depth, we employ the official toolbox to process the data from the front laser and sensors. Following the KITTI benchmark [11], we use the current frame for sparse depth generation and multiple frames for ground truth depth creation. The densities of the valid pixels of sparse depth and ground truth depth are approximately 4% and 16%, respectively. We crop and resize these data to $576\times 320$ to remove the car-hood and enable efficient training. As a result, the RobotCar-Night-DC dataset contains $10{,}290$ RGB-depth (RGB-D) pairs for training and 411 for testing.

CARLA-Night-DC

CARLA-EPE [25] is generated by CARLA simulator [50] and EPE network [51], which is a synthetic dataset for nighttime depth estimation task. The ground truth depth in CARLA-EPE is almost fully dense, which is unrealistic for LiDAR-based self-driving systems where the depth density is around 7% [9]. Hence, based on the synthetic dataset we create CARLA-Night-DC for the proposed nighttime depth completion task, by transferring the sparse LiDAR pattern of KITTI [11] to CARLA-EPE. Hence, CARLA-Night-DC is composed of 7532 RGB-D pairs in total, of which 7000 for training and 532 for testing.

4.2 Metrics

Given ground truth D and output o, we use the following metrics: mean absolute error (MAE), root mean square error (RMSE), inverse MAE (iMAE), inverse RMSE (iRMSE), square relative error (Sq Rel), absolute relative error (Abs Rel), and root mean square logarithmic error (RMSE log). Table 1 summarizes their mathematical expressions.

Table 1 Metric definition

Full size table

4.3 Implementation details

LDCNet is implemented on the Pytorch library with a single RTX 3090 GPU. We train it for 20 epochs using the Adam optimizer, with momentum $\beta _{1}=0.900$, $\beta _{2}=0.999$, and weight decay of $1.0 \times {10}^{-6}$. The initial learning rate is $1.0 \times {10}^{-3}$, which decreases by half every 5 epochs. We use synchronized cross-GPU batch normalization [52], resulting in a batch size of 12. The evaluation metrics are consistent with those of RNW [23] and KITTI [11]. The RMSE is measured in meters.

4.4 Nighttime depth estimation

We compare LDCNet with nighttime state-of-the-art methods, including MD2 [53], DeFeatNet [31], ADFA [30], ADDS [24], RNW [23], WSGD [54], and STEPS [25]. Based on STEPS, we embed our RICD and IAICD into its image enhancement branch and depth estimation branch, respectively. From Table 2 we can observe that LDCNet almost achieves the lowest errors and the highest accuracy. On RobotCar, LDCNet is superior to the second best STEPS in all aspects. Furthermore, LDCNet surpasses the well-known MD2 by large margins. For example, the RMSE of MD2 is reduced from 12.771 m to 6.725 m, an improvement of almost 47%, while the accuracy $\delta _{1}$ increases by 22.9%. On the CARLA-EPE dataset, LDCNet also outperforms the other three approaches. In addition, we compare these methods on RobotCar and CARLA-EPE and observe that they perform worse on CARLA-EPE. This can be attributed to the darker color images and the larger depth ranges of CARLA-EPE. Finally, the visual results in Fig. 5 show that LDCNet can predict more accurate depth with more complete and sharper edges, which further verify the superiority and effectiveness of LDCNet.

Table 2 Results of nighttime depth estimation on the RobotCar [49] and CARLA-EPE [25] benchmarks. LDCNet: the learnable differencing center network. ↑/↓ indicates that higher/lower scores are better. Bold indicates the best result and underline indicates the second best result

Full size table

4.5 Nighttime depth completion

For fair comparison, we retrain existing state-of-the-art daytime depth completion approaches in nighttime scenarios, including pNCNN [55], FusionNet [56], NCNN [32], S2D [3], NLSPN [4], GuideNet [48], RigNet [9], and CFormer [7]. The quantitative results are reported in Table 3. Overall, we discover that LDCNet achieves the best performance on the two nighttime depth perception benchmarks. Specifically, on the RobotCar-Night-DC dataset, the result of LDCNet is superior or competitive. For instance, LDCNet reduces the MAE by 28.7% compared with the third best RigNet. Compared with CFormer, which requires 5 days for training on a single 3090 GPU, LDCNet still achieves slightly better results with 20-hour training cost. On CARLA-Night-DC dataset, the challenging darker environment and greater distance result in poor performance of these methods. For example, the RMSE is at least 6 m greater than that of RobotCar-Night-DC. Additionally, we notice that the NCNN, pNCNN, NLSPN, FusionNet, and CFormer, all of which estimate confidence maps to reweight depth, suffer from large RMSE and MAE values. We analyse that the very low-light color images make it rather difficult to predict credible confidence distribution, resulting in unstable depth refinement. Finally, from Fig. 6 we discover that LDCNet succeeds in recovering object depth more accurately, such as the cars, bus shelters, and buildings in the foreground, and the trees, light poles, and billboards in the background.

Table 3 Results of nighttime depth completion on RobotCar-Night-DC and CARLA-Night-DC. Note that all methods in this table are retrained from scratch

Full size table

4.6 Ablation study

For efficient ablation on RobotCar-Night-DC, we halve the size of the two subnetworks in LDCNet by setting the stride of the first-layer convolution to 2. Results are presented in Table 4, Table 5, Fig. 7, and Fig. 8.

Table 4 Ablation on components of LDCNet. RICD: recurrent inter-convolution differencing; IAICD: illumination affinitive intra-convolution differencing

Full size table

Table 5 Ablation on diverse-kernel RICD. $k_{1}$ and $k_{2}$ denote the kernel sizes of the two convolutions

Full size table

LDCNet

As listed in Table 4, the baseline LDCNet-i first removes the RICD and IAICD modules. Then, as an alternative to IAICD, LDCNet-i incorporates the guidance module proposed in GuideNet [48]. When implementing our RICD design (LDCNet-ii), we discover that the two evaluation metrics are consistently improved, i.e., the RMSE is reduced by 104 mm and the MAE is reduce by 129 mm. Similarly, the individual IAICD (LDCNet-iii) contributes to larger performance improvement, severally reducing RMSE and MAE by 117 mm and 148 mm. Finally, to combine the best of two worlds, LDCNet-iv embeds the RICD and IAICD simultaneously into the baseline. As a result, LDCNet-iv performs much better than LDCNet-i, significantly exceeding it by 137 mm in RMSE and 181 mm in MAE.

RICD

The basic unit of RICD is the differencing between two convolutions with different kernels. Consequently, we ablate the diverse kernel sizes in Table 5. Based on LDCNet-i, RICD-i, RICD-ii, and RICD-iii conduct $(k+2)\times (k+2)$ and $k\times k$ convolution differencing. As the kernel size increases, the two evaluation metrics decrease gradually. For example, the MAE of $k_{2}=5$ is 109 mm superior to that of $k_{2}=1$. This is due to the learnable differencing center design, which regards the small-kernel-convolution feature as the center of the large-kernel-convolution feature. Such differencing convolutions with larger local receptive fields can predict reliable illumination distribution by aggregating the surrounding light information. Furthermore, RICD-iv increases the kernel size gap from 2 to 4. For one thing, it is clear that the $1\times 1$ convolution of RICD-i is not very suitable for use as the differencing center because it cannot leverage ambient information. Thus, RICD-iv performs better than RICD-i regardless of the larger gap. For another thing, with the larger size gap, the larger-kernel convolution introduces redundant light reference over long distances, while the smaller-kernel convolution can only map the light in local regions. Therefore, RICD-iv performs worse than RICD-ii and RICD-iii, which has smaller size gap. In addition, based on RICD-ii, Fig. 7 (a) shows the ablation of the RICD with different recurrent steps. We observe that RICD performs better as the step increases. As depicted in Fig. 8, RICD can strengthen the representation of relative light intensity, contributing to more precise illumination. Finally, we select RICD-ii and step-3 as the defaults.

IAICD

Different from the center differencing convolution (CDC) [19] with fixed center, IAICD first aggregates all neighboring pixels and then employs the illumination affinitive weight to produce its learnable center. Figure 7 (b) displays that both of these two strategies contribute to consistent improvement over vanilla convolution and CDC. Furthermore, to evaluate the robustness of IAICD, we introduce Gaussian noise into the raw color images. IAICD still performs better than CDC and achieves very close performance to itself using raw color images. All of these evidences demonstrate the effectiveness and robustness of IAICD.

4.7 Generalization

Here we further evaluate the generalization capabilities of our LDCNet for both daytime depth completion [11] and low-light image enhancement [42] tasks.

Table 6 reports the comparison results on the KITTI depth completion dataset [11], which is collected during the daytime. We can observe that the performance of current state-of-the-art methods [7, 9, 10, 48] is very similar. For example, the RMSE of the ranking metric is 730 mm nearby. Although our LDCNet is specifically designed for nighttime scenarios, it still achieves competitive performance on the daytime benchmark.

Table 6 Results on the KITTI depth completion benchmark

Full size table

Based on self-supervised SCI [13], which is trained for 600 epochs, we replace its illumination estimation module with our RICD block. According to Table 7, RICD consistently improves the baseline in both no-reference NIQE [60] and DE [59], and full-reference PSNR and SSIM metrics. Furthermore, Fig. 9 again demonstrates the superiority of our method, i.e., higher quality with lower training cost.

Table 7 Comparison on the difficult test split of SCI [13]. NIQE: natural image quality evaluator; DE [59]: discrete entropy, a completely blind no-reference metric; PSNR: peak signal-to-noise ratio; SSIM: structural similarity

Full size table

5 Conclusion

In this paper, we extended the conventional depth completion task into nighttime environments to complement safe self-driving. We identified the key challenge as the guidance from color images with low visibility and complex illumination. As a result, we proposed RICD and IAICD to improve the poor visibility and reduce negative influences of the varying illumination, respectively. RICD could predict explicit global illumination to enhance visibility, where treating the small-kernel convolution as the center of the large-kernel-convolution was a new perspective. IAICD succeeded in alleviating local relative light intensity, in which the differencing center was learned dynamically from the neighboring pixels and illumination maps of RICD. Thus, the center was robust and illumination affinitive. Finally, extensive experiments on depth perception datasets have verified the effectiveness of LDCNet.

Limitation

We believe our LDCNet is a general approach that could benefit network-based models in other various vision tasks. However, the current version is only evaluated on four tasks, i.e., nighttime depth estimation, nighttime depth completion, daytime depth completion, and low-light image enhancement. In recent future, we will extend it into more tasks, e.g., nighttime semantic segmentation, nighttime flow estimation, etc. Additionally, LDCNet has latent values that can be applied into important domains such as self-driving and 3D scene reconstruction under low-light environments.

Data availability

The datasets generated during and/or analyzed during the current study are available at https://github.com/yanzq95/LDCNet.

Abbreviations

CDC:: central differencing convolution
GT:: ground truth
HE:: histogram equalization
IAICD:: illumination affinitive intra-convolution differencing
LDCNet:: learnable differencing center network
RICD:: recurrent inter-convolution differencing
SCI:: self-calibrated illumination

References

Hu, J., Bao, C., Ozay, M., Fan, C., Gao, Q., Liu, H., et al. (2022). Deep depth completion from extremely sparse data: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 8244–8264.
Google Scholar
Dey, A., Jarvis, G., Sandor, C., & Reitmayr, G. (2012). Tablet versus phone: depth perception in handheld augmented reality. In Proceedings of the IEEE international symposium on mixed and augmented reality (pp. 187–196). Piscataway: IEEE.
Google Scholar
Ma, F., Cavalheiro, G. V., & Karaman, S. (2019). Self-supervised sparse-to-dense: self-supervised depth completion from lidar and monocular camera. In Proceedings of the international conference on robotics and automation (pp. 3288–3295). Piscataway: IEEE.
Google Scholar
Park, J., Joo, K., Hu, Z., Liu, C.-K., & Kweon, I. S. (2020). Non-local spatial propagation network for depth completion. In A. Vedaldi, H. Bischof, T. Brox, et al. (Eds.), Proceedings of the 16th European conference on computer vision (pp. 120–136). Cham: Springer.
Google Scholar
Yan, Z., Wang, K., Li, X., Zhang, Z., Li, G., Li, J., et al. (2024). Learning complementary correlations for depth super-resolution with incomplete data in real world. IEEE Transactions on Neural Networks and Learning Systems, 35(4), 5616–5626.
Article Google Scholar
Yan, Z., Wang, K., Li, X., Zhang, Z., Li, J., & Desnet, J. Y. (2023). Decomposed scale-consistent network for unsupervised depth completion. In Proceedings of the AAAI conference on artificial intelligence (pp. 3109–3117). Palo Alto: AAAI Press.
Google Scholar
Zhang, Y., Guo, X., Poggi, M., Zhu, Z., Huang, G., & Completionformer, S. M. (2023). Depth completion with convolutions and vision transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 18527–18536). Piscataway: IEEE.
Google Scholar
Liu, L., Song, X., Lyu, X., Diao, J., Wang, M., Liu, Y., et al. (2021). FCFR-Net: feature fusion based coarse-to-fine residual learning for depth completion. In Proceedings of the AAAI conference on artificial intelligence (pp. 2136–2144). Palo Alto: AAAI Press.
Google Scholar
Yan, Z., Wang, K., Li, X., Zhang, Z., Li, J., & Rignet, J. Y. (2022). Repetitive image guided network for depth completion. In S. Avidan, G.J. Brostow, M. Cissé, et al. (Eds.), Proceedings of the 17th European conference on computer vision (pp. 214–230). Cham: Springer.
Google Scholar
Lin, Y., Cheng, T., Zhong, Q., Zhou, W., & Yang, H. (2022). Dynamic spatial propagation network for depth completion. In Proceedings of the AAAI conference on artificial intelligence (pp. 1638–1646). Palo Alto: AAAI Press.
Google Scholar
Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., & Geiger, A. (2017). Sparsity invariant CNNs. In Proceedings of the international conference on 3D vision (pp. 11–20). Piscataway: IEEE.
Google Scholar
Guo, C., Li, C., Guo, J., Loy, C.C., Hou, J., Kwong, S., et al. (2020). Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1780–1789). Piscataway: IEEE.
Google Scholar
Ma, L., Ma, T., Liu, R., Fan, X., & Luo, Z. (2022). Toward fast, flexible, and robust low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5637–5646). Piscataway: IEEE.
Google Scholar
Zhang, Y., Di, X., Wu, J., Fu, R., Li, Y., Wang, Y., et al. (2023). A fast and lightweight network for low-light image enhancement. arXiv preprint. arXiv:2304.02978.
Li, C., Guo, C., & Loy, C. C. (2021). Learning to enhance low-light image via zero-reference deep curve estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8), 4225–4238.
Google Scholar
Shi, W., Ye, M., & Du, B. (2022). Symmetric uncertainty-aware feature transmission for depth super-resolution. In Proceedings of the ACM international conference on multimedia (pp. 3867–3876). New York: ACM.
Chapter Google Scholar
Kendall, A., & Gal, Y. (2017). What uncertainties do we need in Bayesian deep learning for computer vision? In I. Guyon, U. von Luxburg, S. Bengio, et al. (Eds.), Proceedings of the 31st international conference on neural information processing systems (pp. 5574–5584). Red Hook: Curran Associates.
Google Scholar
Boulkenafet, Z., Komulainen, J., & Hadid, A. (2015). Face anti-spoofing based on color texture analysis. In Proceedings of the IEEE international conference on image processing (pp. 2636–2640). Piscataway: IEEE.
Google Scholar
Yu, Z., Zhao, C., Wang, Z., Qin, Y., Su, Z., Li, X., et al. (2020). Searching central difference convolutional networks for face anti-spoofing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5295–5305). Piscataway: IEEE.
Google Scholar
Yu, Z., Wan, J., Qin, Y., Li, X., Li, S. Z., & Zhao, G. (2021). NAS-FAS: static-dynamic central difference network search for face anti-spoofing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(9), 3005–3023.
Article Google Scholar
Su, Z., Liu, W., Yu, Z., Hu, D., Liao, Q., Tian, Q., et al. (2021). Pixel difference networks for efficient edge detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5117–5127). Piscataway: IEEE.
Google Scholar
Tan, H., Wu, S., & Pi, J. (2022). Semantic diffusion network for semantic segmentation. In S. Koyejo, S. Mohamed, A. Agarwal, et al. (Eds.), Proceedings of the 36th international conference on neural information processing systems (pp. 8702–8716). Red Hook: Curran Associates.
Google Scholar
Wang, K., Zhang, Z., Yan, Z., Li, X., Xu, B., Li, J., et al. (2021). Regularizing nighttime weirdness: efficient self-supervised monocular depth estimation in the dark. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16055–16064). Piscataway: IEEE.
Google Scholar
Liu, L., Song, X., Wang, M., Liu, Y., & Zhang, L. (2021). Self-supervised monocular depth estimation for all day images using domain separation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 12737–12746). Piscataway: IEEE.
Google Scholar
Zheng, Y., Zhong, C., Li, P., Gao, H.-A., Zheng, Y., Jin, B., et al. (2023). Steps: joint self-supervised nighttime image enhancement and depth estimation. In Proceedings of the IEEE international conference on robotics and automation (pp. 4916–4923). Piscataway: IEEE.
Google Scholar
Zhou, T., Brown, M., Snavely, N., & Lowe, D. G. (2017). Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1851–1858). Piscataway: IEEE.
Google Scholar
Liu, C., Kumar, S., Gu, S., Timofte, R., & van Gool, L. (2023). Va-depthnet: a variational approach to single image depth prediction. In Proceedings of the international conference on learning representations (pp. 1–21). Retrieved May 27, 2024, from https://openreview.net/pdf?id=xjxUjHa_Wpa.
Google Scholar
Piccinelli, L., Sakaridis, C., & Yu, F. (2023). idisc: internal discretization for monocular depth estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 21477–21487). Piscataway: IEEE.
Google Scholar
Yan, J., Zhao, H., Bu, P., & Jin, Y. (2021). Channel-wise attention-based network for self-supervised monocular depth estimation. In Proceedings of the international conference on 3D vision (pp. 464–473). Piscataway: IEEE.
Google Scholar
Vankadari, M., Garg, S., Majumder, A., Kumar, S., & Behera, A. (2020). Unsupervised monocular depth estimation for night-time images using adversarial domain feature adaptation. In A. Vedaldi, H. Bischof, T. Brox, et al. (Eds.), Proceedings of the 16th European conference on computer vision (pp. 443–459). Cham: Springer.
Google Scholar
Spencer, J., Bowden, R., & Hadfield, S. (2020). DeFeat-Net: general monocular depth via simultaneous unsupervised representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14402–14413). Piscataway: IEEE.
Google Scholar
Eldesokey, A., Felsberg, M., & Khan, F. S. (2019). Confidence propagation through CNNs for guided sparse depth regression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10), 2423–2436.
Article Google Scholar
Liu, L., Liao, Y., Wang, Y., Geiger, A., & Liu, Y. (2021). Learning steering kernels for guided depth completion. IEEE Transactions on Image Processing, 30, 2850–2861.
Article Google Scholar
Zhou, W., Yan, X., Liao, Y., Lin, Y., Huang, J., Zhao, G., et al. (2023). Bev@ dc: bird’s-eye view assisted training for depth completion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9233–9242). Piscataway: IEEE.
Google Scholar
Wang, Y., Li, B., Zhang, G., Liu, Q., Gao, T., & Dai, Y. (2023). LRRU: long-short range recurrent updating networks for depth completion. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9422–9432). Piscataway: IEEE.
Google Scholar
Yu, Z., Sheng, Z., Zhou, Z., Luo, L., Cao, S.-Y., Gu, H., et al. (2023). Aggregating feature point cloud for depth completion. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8732–8743). Piscataway: IEEE.
Google Scholar
Liu, L., Song, X., Wang, M., Dai, Y., Liu, Y., & Zhang, L. (2024). AGDF-Net: learning domain generalizable depth features with adaptive guidance fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5), 3137–3155.
Article Google Scholar
Wong, A., & Soatto, S. (2021). Unsupervised depth completion with calibrated backprojection layers. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 12747–12756). Piscataway: IEEE.
Google Scholar
Yan, Z., Li, X., Zhang, Z., Li, J., & Yang, J. (2023). Rignet++: efficient repetitive image guided network for depth completion. arXiv preprint. arXiv:2309.00655.
Cao, Y., Tong, X., Wang, F., Yang, J., Cao, Y., Strat, S. T., et al. (2023). A deep thermal-guided approach for effective low-light visible image enhancement. Neurocomputing, 522, 129–141.
Article Google Scholar
Yu, Z., Qin, Y., Zhao, H., Li, X., & Zhao, G. (2021). Dual-cross central difference network for face anti-spoofing. In Proceedings of the 30th international joint conference on artificial intelligence (pp. 1281–1287). Palo Alto: AAAI Press.
Google Scholar
Zhang, Y., Guo, X., Ma, J., Liu, W., & Zhang, J. (2021). Beyond brightening low-light images. International Journal of Computer Vision, 129, 1013–1037.
Article Google Scholar
Zhang, Y., Di, X., Zhang, B., Ji, R., & Wang, C. (2021). Better than reference in low-light image enhancement: conditional re-enhancement network. IEEE Transactions on Image Processing, 31, 759–772.
Article Google Scholar
Pizer, S. M., Amburn, E. P., Austin, J. D., Cromartie, R., Geselowitz, A., Greer, T., et al. (1987). Adaptive histogram equalization and its variations. Computer Vision, Graphics, and Image Processing, 39(3), 355–368.
Article Google Scholar
Liu, R., Ma, L., Zhang, J., Fan, X., & Luo, Z. (2021). Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10561–10570). Piscataway: IEEE.
Google Scholar
Xiong, W., Liu, D., Shen, X., Fang, C., & Luo, J. (2020). Unsupervised real-world low-light image enhancement with decoupled networks. arXiv preprint. arXiv:2005.02818.
Land, E. H. (1977). The retinex theory of color vision. Scientific American, 237(6), 108–129.
Article MathSciNet Google Scholar
Tang, J., Tian, F.-P., Feng, W., Li, J., & Tan, P. (2020). Learning guided convolutional network for depth completion. IEEE Transactions on Image Processing, 30, 1116–1129.
Article Google Scholar
Maddern, W., Pascoe, G., Linegar, C., & Newman, P. (2017). 1 year, 1000 km: the Oxford robotcar dataset. The International Journal of Robotics Research, 36(1), 3–15.
Article Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). Carla: an open urban driving simulator. In Proceedings of the conference on robot learning (pp. 1–16). Retrieved May 27, 2024, from http://proceedings.mlr.press/v78/dosovitskiy17a.html.
Google Scholar
Richter, S. S., Abu AlHaija, H., & Koltun, V. (2022). Enhancing photorealism enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2), 1700–1715.
Article Google Scholar
Ioffe, S., & Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the international conference on machine learning (pp. 448–456). Retrieved May 27, 2024, from http://proceedings.mlr.press/v37/ioffe15.html.
Google Scholar
Godard, C., Mac Aodha, O., Firman, M., & Brostow, G. J. (2019). Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3828–3838). Piscataway: IEEE.
Google Scholar
Vankadari, M., Golodetz, S., Garg, S., Shin, S., Markham, A., & Trigoni, N. (2023). When the sun goes down: repairing photometric losses for all-day depth estimation. In Proceedings of the conference on robot learning (pp. 1992–2003). Retrieved May 27, 2024, from https://proceedings.mlr.press/v205/vankadari23a.html.
Google Scholar
Eldesokey, A., Felsberg, M., Holmquist, K., & Persson, M. (2020). Uncertainty-aware CNNs for depth completion: uncertainty from beginning to end. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12014–12023). Piscataway: IEEE.
Google Scholar
van Gansbeke, W., Neven, D., de Brabandere, B., & van Gool, L. (2019). Sparse and noisy lidar completion with RGB guidance and uncertainty. In Proceedings of the 16th international conference on machine vision applications (pp. 1–6). Piscataway: IEEE.
Google Scholar
Cheng, X., Wang, P., & Yang, R. (2018). Learning depth with convolutional spatial propagation network. In V. Ferrari, M. Hebert, C. Sminchisescu, et al. (Eds.), Proceedings of the 15th European conference on computer vision (pp. 103–119). Cham: Springer.
Google Scholar
Qiu, J., Cui, Z., Zhang, Y., Zhang, X., Liu, S., Zeng, B., et al. (2019). Deeplidar: deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3313–3322). Piscataway: IEEE.
Google Scholar
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423.
Article MathSciNet Google Scholar
Wang, S., Zheng, J., Hu, H.-M., & Li, B. (2013). Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE Transactions on Image Processing, 22(9), 3538–3548.
Article Google Scholar

Download references

Acknowledgements

We want to thank Kun Wang for investigating relevant literature. The authors express their gratitude to the anonymous reviewers and the editor.

Funding

This work was supported by the National Science Fund of China (No. 62072242 and No. 62361166670), the Young Scientists Fund of the National Natural Science Foundation of China (No. 62206134), the Fundamental Research Funds for the Central Universities (No. 070-63233084), and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX23_0471).

Author information

Authors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Zhiqiang Yan, Jun Li & Jian Yang
Chinese Academy of Sciences, Beijing, China
Yupeng Zheng
Nankai University, Tianjin, China
Deng-Ping Fan & Xiang Li

Authors

Zhiqiang Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yupeng Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Deng-Ping Fan
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jun Li
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: Zhiqiang Yan, Yupeng Zheng, Xiang Li; Methodology: Zhiqiang Yan, Xiang Li, Jun Li; Formal analysis and investigation: Zhiqiang Yan, Yupeng Zheng; Writing (original draft preparation): Zhiqiang Yan, Yupeng Zheng; Writing (review and editing): Deng-Ping Fan, Jun Li; Supervision: Jian Yang, Jun Li. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Jun Li or Jian Yang.

Ethics declarations

Competing interests

Authors are required to disclose financial or non-financial interests that are directly or indirectly related to the work submitted for publication.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yan, Z., Zheng, Y., Fan, DP. et al. Learnable differencing center for nighttime depth perception. Vis. Intell. 2, 15 (2024). https://doi.org/10.1007/s44267-024-00048-9

Download citation

Received: 20 February 2024
Revised: 28 May 2024
Accepted: 29 May 2024
Published: 19 June 2024
DOI: https://doi.org/10.1007/s44267-024-00048-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Learnable differencing center for nighttime depth perception

Abstract

Similar content being viewed by others

RigNet: Repetitive Image Guided Network for Depth Completion

Cascaded Transposed Long-Range Convolutions for Monocular Depth Estimation

Inpainting of Depth Images Using Deep Neural Networks for Real-Time Applications

1 Introduction

Poor visibility

Complex illumination

2 Related work

2.1 Monocular depth perception at night

2.2 Differencing convolution

2.3 Low-light image enhancement

3 Method

3.1 Prior knowledge

Self-calibrated illumination (SCI)

Vanilla convolution

Central differencing convolution

3.2 Recurrent inter-convolution differencing

3.3 Illumination affinitive intra-convolution differencing

3.4 Learnable differencing center network

Architecture

3.5 Loss function

4 Experiments

4.1 Datasets

RobotCar-Night-DC

CARLA-Night-DC

4.2 Metrics

4.3 Implementation details

4.4 Nighttime depth estimation

4.5 Nighttime depth completion

4.6 Ablation study

LDCNet

RICD

IAICD

4.7 Generalization

5 Conclusion

Limitation

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation