## Abstract

The extraction of depth information associated to dynamic scenes is an intriguing topic, because of its perspective role in many applications, including free viewpoint and 3D video systems. Time-of-flight (ToF) range cameras allow for the acquisition of depth maps at video rate, but they are characterized by a limited resolution, specially if compared with standard color cameras. This paper presents a super-resolution method for depth maps that exploits the side information from a standard color camera: the proposed method uses a segmented version of the high-resolution color image acquired by the color camera in order to identify the main objects in the scene and a novel surface prediction scheme in order to interpolate the depth samples provided by the ToF camera. Effective solutions are provided for critical issues such as the joint calibration between the two devices and the unreliability of the acquired data. Experimental results on both synthetic and real-world scenes have shown how the proposed method allows to obtain a more accurate interpolation with respect to standard interpolation approaches and state-of-the-art joint depth and color interpolation schemes.

This is a preview of subscription content, access via your institution.

## Notes

In the following description, we will call samples the input depth values obtained by reprojecting the ToF data, and pixels, the output pixels of the high-resolution depth map.

Threshold values of 0.1 and 0.4 refer to a depth value range between 0 and 1.

The errors reported in this section are measured in pixels on the high-resolution image of the color cameras

The acquired data for this setup is available online at the address http://lttm.dei.unipd.it/downloads/superres/.

In both cases, we just warped the images using a 3D mesh built from the depth data; no ad hoc post processing algorithms were used.

## References

Ballan L, Brusco N, Cortelazzo GM (2005) 3D passive shape recovery from texture and silhouette information. In: Proceedings of IEEE European conference on visual media production (CVMP). London

Beder C, Koch R (2008) Calibration of focal length and 3D pose based on the reflectance and depth image of a planar object. Int J Intell Syst Technol Appl 5:285–294

Bouguet J, Matlab camera calibration toolbox (2000). http://www.vision.caltech.edu/bouguetj/calib_doc/. Accessed 6 May 2013

Diebel J, Thrun S (2005) An application of Markov random fields to range sensing. In: Proceedings of conference on neural information processing systems (NIPS)

Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vision 59(2):167–181

Fischler M, Bolles R (1987) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. In: Readings in computer vision: issues, problems, principles, and paradigms. Morgan Kaufmann, San Francisco, pp 726–740

Garro V, dal Mutto C, Zanuttigh P, Cortelazzo G (2009) A novel interpolation scheme for range data with side information. In: Proceedings of IEEE European conference on visual media production (CVMP), pp 52–60

Guan L, Franco J, Pollefeys M (2008) 3D object reconstruction with heterogeneous sensor data. In: Proceedings of international symposium on 3D data processing, visualization and transmission (3DPVT)

Hartley R, Zisserman A (2004) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, Cambridge

Hartley RI, Sturm P (1994) Triangulation. In: Proceedings of ARPA image understanding workshop, pp 957–966

Horn BKP (1987) Closed-form solution of absolute orientation using unit quaternions. J Opt Soc Am A 4(4):629–642

Kim Y, Chan D, Theobalt C, Thrun S (2008) Design and calibration of a multi-view TOF sensor fusion system. In: Proceedings of IEEE CVPR workshop on time-of-flight computer vision

Kim Y, Theobalt C, Diebel J, Kosecka J, Micusik B, Thrun S (2009) Multi-view image and TOF sensor fusion for dense 3d reconstruction. In: Proceedings of 3-D digital imaging and modeling conference (3DIM)

Kopf J, Cohen MF, Lischinski D, Uyttendaele M (2007) Joint bilateral upsampling. ACM Trans Graph 26(3):96

Langmann B, Hartmann K, Loffeld O (2011) Comparison of depth super-resolution methods for 2D/3D images. Int J Comput Inf Syst Ind Manag Appl 3:635–645

Li Y, Xue T, Sun L, Liu J (2012) Joint example-based depth map super-resolution. In: Proceedings of IEEE international conference on multimedia and expo (ICME), pp 985–988

Lindner M, Lambers M, Kolb A (2008) Sub-pixel data fusion and edge-enhanced distance refinement for 2D/3D images. Int J Intell Syst Technol Appl 5:344–354

Lu J, Min D, Pahwa R, Do M (2011) A revisit to MRF-based depth map super-resolution and enhancement. In: Proceedings of international conference on acoustics, speech and signal processing (ICASSP), pp 985–988

Dal Mutto C, Zanuttigh P, Cortelazzo G (2010) A probabilistic approach to TOF and stereo data fusion. In: Proceedings of international symposium on 3D data processing, visualization and transmission (3DPVT)

Schuon S, Theobalt C, Davis J, Thrun S (2008) High-quality scanning using time-of-flight depth super resolution. In: Proceedings of CVPR workshop on time-of-flight computer vision, pp 1–7

Kahlmann T, Ingensand H (2008) Calibration and development for increased accuracy of 3D range image cameras. J Appl Geodesy 2:1–11

Yang Q, Yang R, Davis J, Nister D (2007) Spatial-depth super resolution for range images. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8

Zanuttigh P, Cortelazzo G (2009) Compression of depth information for 3d rendering. In: Proceedings of 3D TV conference

Zhang L, Curless B, Seitz S (2003) Spacetime stereo: shape recovery for dynamic scenes. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 367–374

Zhang Z (1998) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22:1330–1334

Zhu J, Wang L, Yang R, Davis J (2008) Fusion of time-of-flight depth and stereo for high accuracy depth maps. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8

## Author information

### Authors and Affiliations

### Corresponding author

## Appendix: Appendix: Bilinear interpolation on nonregular grids

### Appendix: Appendix: Bilinear interpolation on nonregular grids

In the proposed approach, after the calibration step, the available samples are not regularly distributed over a lattice. This appendix shows how the well-known bilinear interpolation scheme can be extended to nonregular grids.

Referring to Fig. 22, the depth of the red point **p**(*x*, *y*) is estimated from the depth *D*
_{
i
} = *D*(**p**
_{
i
}), *i* = 1, . . , 4 of the four blue samples **p**
_{
i
}(*x*
_{
i
}, *y*
_{
i
}), *i* = 1, . . , 4. The procedure works in two steps: firstly, we estimate the depth of the two yellow points **p**
_{
a
}(*x*
_{
a
}, *y*
_{
a
}) = **p**
_{
a
}(*x*, *y*
_{
a
}) and **p**
_{
b
}(*x*
_{
b
}, *y*
_{
b
}) = **p**
_{
a
}(*x*, *y*
_{
b
}), and then the depth of **p** is computed by interpolating the ones of **p**
_{
a
} and **p**
_{
b
}. Let us define with Δ*x*
_{
i
} = |**p**
_{
i
}−**p**|_{
x
} = |*x*
_{
i
}−*x*|, *i* = 1, . . , 4 and Δ*y*
_{
i
} = |**p**
_{
i
}−**p**|_{
y
} = |*y*
_{
i
}−*y*|; *i* = 1, . . , 4 the absolute value of the differences between the *x* and *y* coordinates of the available low-resolution samples (blue samples) and the coordinates of the point that is estimated (in red), i.e., the absolute value of the *x* and *y* components of the vectors connecting samples *p*
_{
i
} with *p*. First of all, the depth \(D_{a} \triangleq D(\mathbf {p}_{a})\) of point **p**
_{
a
}(*x*, *y*
_{
a
}) is estimated by linearly interpolating the depths of **p**
_{1} and **p**
_{2}.

where *C*
_{1} = Δ*x*
_{2}/(Δ*x*
_{1} + Δ*x*
_{2}) and *C*
_{2} = Δ*x*
_{1}/(Δ*x*
_{1} + Δ*x*
_{2}). The same procedure is applied to the estimate of depth *D*(**p**
_{
b
}) of **p**
_{
b
}(*x*, *y*
_{
b
}) from **p**
_{3} and **p**
_{4}:

where *C*
_{3} = *Δ*
*x*
_{4}/(Δ*x*
_{3} + Δ*x*
_{4}) and *C*
_{4} = Δ*x*
_{3}/(Δ*x*
_{3} + Δ*x*
_{4}). The vertical coordinates Δ*y*
_{
a
} = *y*
_{
a
} − *y* and Δ*y*
_{
b
} = *y* − *y*
_{
b
} of **p**
_{
a
} and **p**
_{
b
} with respect to **p** can be computed as follows:

In the second step, the depths *D*
_{
a
} and *D*
_{
b
} of *p*
_{
a
} and *p*
_{
b
} are linearly interpolated to get the depth of *p*:

where *C*
_{
a
} = Δ*y*
_{
b
}/(Δ*y*
_{
a
} + Δ*y*
_{
b
}) , *C*
_{
b
} = Δ*y*
_{
a
}/(Δ*y*
_{
a
} + Δ*y*
_{
b
}), *γ*
_{1} = *C*
_{
a
}
*C*
_{1}, *γ*
_{2} = *C*
_{
a
}
*C*
_{2}, *γ*
_{3} = *C*
_{
b
}
*C*
_{3} and *γ*
_{4} = *C*
_{
b
}
*C*
_{4}. Equation 21 has been obtained by replacing \(\hat {D_{a}}\) and \(\hat {D_{a}}\) in Eq. 20 with their expressions from Eqs. 13 and 15. Note how the final result is a weighted average of the four samples where the weights depend on the positions of the various samples as in standard bilinear interpolation. This approach is directly used on the low-resolution samples when the segmented region contains all the four samples, while in the other cases, the missing samples are firstly estimated by the methods of Section 3.2, and then Eq. 22 is applied.

## Rights and permissions

## About this article

### Cite this article

Garro, V., Dal Mutto, C., Zanuttigh, P. *et al.* Edge-preserving interpolation of depth data exploiting color information.
*Ann. Telecommun.* **68**, 597–613 (2013). https://doi.org/10.1007/s12243-013-0389-0

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s12243-013-0389-0