Abstract
Shape-from-focus (SFF) refers to the challenging inverse problem of recovering the scene depth from a given set of focused images using a static camera. Standard approaches model the interactions between neighboring pixels to get a regularized solution. Nevertheless, isotropic regularization is known to introduce undesired artifacts and to remove early thin structures. These structures have a small size in at least one dimension and are more numerous when considering superpixel preprocessing. This paper addresses the improvement of SFF regularization through the estimation of the presence of such structures and the construction of anisotropic neighborhoods sticking along image edges and proposes a flexible formulation over pixels or superpixels. A thoroughly study comparing different strategies for constructing these neighborhoods in terms of accuracy and running time for the targeted application is provided. Notably, experiments performed on a reference dataset show the overall superiority of the approach, e.g. a decrease of the RMSE value by about 20%, and its robustness against generated superpixels.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11045-022-00854-8/MediaObjects/11045_2022_854_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11045-022-00854-8/MediaObjects/11045_2022_854_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11045-022-00854-8/MediaObjects/11045_2022_854_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11045-022-00854-8/MediaObjects/11045_2022_854_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11045-022-00854-8/MediaObjects/11045_2022_854_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11045-022-00854-8/MediaObjects/11045_2022_854_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11045-022-00854-8/MediaObjects/11045_2022_854_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11045-022-00854-8/MediaObjects/11045_2022_854_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11045-022-00854-8/MediaObjects/11045_2022_854_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11045-022-00854-8/MediaObjects/11045_2022_854_Fig10_HTML.png)
Similar content being viewed by others
Data availibility
The datasets generated during and/or analysed during the current study are available at https://vision.middlebury.edu/stereo/data/.
References
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282. https://doi.org/10.1109/TPAMI.2012.120
Ali, U., & Mahmood, M. (2021). Robust focus volume regularization in shape from focus. IEEE Transactions on Image Processing, 30, 7215–7227. https://doi.org/10.1109/TIP.2021.3100268.
Ali, U., Pruks, V., & Mahmood, M. T. (2019). Image focus volume regularization for shape from focus through 3D weighted least squares. Information Sciences, 489, 155–166. https://doi.org/10.1016/j.ins.2019.03.056
Arbeláez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 898–916. https://doi.org/10.1109/TPAMI.2010.161
Boykov, Y., & Jolly, M.-P. (2001). Interactive graph cuts for optimal boundary & region segmentation of objects in N–D images. In Proceedings of the international conference on computer vision (vol. 1, pp. 105–112). https://doi.org/10.1109/ICCV.2001.937505
Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9), 1124–1137. https://doi.org/10.1109/TPAMI.2004.60
Cui, B., Xie, X., Ma, X., Ren, G., & Ma, Y. (2018). Superpixel-based extended random walker for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 56(6), 1–11. https://doi.org/10.1109/TGRS.2018.2796069
Favaro, P. (2010). Recovering thin structures via nonlocal-means regularization with application to depth from defocus. In Proceedings of the international conference on computer vision and pattern recognition (pp. 1133–1140). https://doi.org/10.1109/CVPR.2010.5540089
Fulkerson, B., Vedaldi, A., & Soatto, S. (2009). Class segmentation and object localization with superpixel neighborhoods. In Proceedings of the international conference on computer vision (pp. 670–677). https://doi.org/10.1109/ICCV.2009.5459175
Gaganov, V., & Ignatenko, A. (2009). Robust shape from focus via Markov random fields. In Conference ”GraphiCon’2009” (pp. 74–80).
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721–741. https://doi.org/10.1109/TPAMI.1984.4767596
Giraud, R., Ta, V.-T., Bugeau, A., Coupe, P., & Papadakis, N. (2017). Super-PatchMatch: An algorithm for robust correspondences using superpixel patches. IEEE Transactions on Image Processing, 26(8), 4068–4078. https://doi.org/10.1109/TIP.2017.2708504
Gould, S., Fulton, R., & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In Proceedings of the international conference on computer vision (pp. 1–8). https://doi.org/10.1109/ICCV.2009.5459211
Kumar G. P., & Sahay, R. R. (2017). Accurate structure recovery via weighted nuclear norm: A low rank approach to shape-from-focus. In 2017 IEEE international conference on computer vision workshops (ICCVW) (pp. 563–574). IEEE. https://doi.org/10.1109/ICCVW.2017.73
Lai, K.-N., & Leou, J.-J. (2021). Superpixel-based multi-focus image fusion. Advances in Computer Vision and Computational Biology, 66, 221–233. https://doi.org/10.1007/978-3-030-71051-417
Liu, Y.-J., Yu, C.-C., Yu, M.-J., & He, Y. (2016). Manifold SLIC: A fast method to compute content-sensitive superpixels. In Proceedings of international conference on computer vision and pattern recognition (pp. 651–659). https://doi.org/10.1109/CVPR.2016.77
Machairas, V., Faessel, M., Cárdenas-Peña, S., Chabardes, T., Walter, T., & Decencière, E. (2015). Waterpixels. IEEE Transactions on Image Processing, 24(11), 3707–3716.
Medioni, G., Mordohai, P., & Nicolescu, M. (2005). The tensor voting framework. Handbook of geometric computing (pp. 535–568). Springer. https://doi.org/10.1007/3-540-28247-516
Medioni, G., Tang, C.-K., & Lee, M.-S. (2000). Tensor Voting: Theory and applications. In Proceedings of the RFIA.
Merveille, O., Naegel, B., Talbot, H., & Passat, N. (2019). n d variational restoration of curvilinear structures with prior-based directional regularization. IEEE Transactions on Image Processing, 28(8), 3848–3859.
Merveille, O., Talbot, H., Najman, L., & Passat, N. (2018). Curvilinear structure analysis by ranking the orientation responses of path operators. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(2), 304–317. https://doi.org/10.1109/TPAMI.2017.2672972
Moeller, M., Benning, M., Schonlieb, C., & Cremers, D. (2015). Variational depth from focus reconstruction. IEEE Transactions on Image Processing, 24(12), 5369–5378. https://doi.org/10.1109/TIP.2015.2479469
Nayar, S., & Nakagawa, Y. (1994). Shape from focus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(8), 824–831. https://doi.org/10.1109/34.308479
Pei, S.-C., Chang, W.-W., & Shen, C.-T. (2014). Saliency detection using superpixel belief propagation. In Proceedings of the international conference on image processing (pp. 1135–1139). https://doi.org/10.1109/ICIP.2014.7025226
Pertuz, S., Puig, D., & Garcia, M. (2013). Analysis of focus measure operators for shape-from-focus. Pattern Recognition, 46(5), 1415–1432. https://doi.org/10.1016/j.patcog.2012.11.011
Ribal, C., Lermé, N., & Le Hégarat-Mascle, S. (2018). Efficient graph cut optimization for shape from focus. Journal of Visual Communication and Image Representation, 55, 529–539. https://doi.org/10.1016/j.jvcir.2018.06.029
Ribal, C., Lermé, N., & Le Hégarat-Mascle, S. (2020). Thin structures segmentation using anisotropic neighborhoods. Information processing and management of uncertainty in knowledge-based systems (vol. 1237, pp. 601–612). https://doi.org/10.1007/978-3-030-50146-4_44
Scharstein, D., & Pal, C. (2007). Learning conditional random fields for stereo. In Proceedings of the international conference on computer vision and pattern recognition (pp. 1–8). https://doi.org/10.1109/CVPR.2007.383191
Stawiaski, J., & Decencière, E. (2011). Region merging via graph-cuts. Image Analysis & Stereology, 27(1), 39.
Stutz, D., Hermans, A., & Leibe, B. (2018). Superpixels: An evaluation of the state-of-the-art. Computer Vision and Image Understanding, 166, 1–27. https://doi.org/10.1016/j.cviu.2017.03.007
Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., & Rother, C. (2008). A comparative study of energy minimization methods for Markov random fields with smoothness-based priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6), 1068–1080. https://doi.org/10.1109/TPAMI.2007.70844
Tang, D., Fu, H., & Cao, X. (2012). Topology preserved regular superpixel. In 2012 IEEE international conference on multimedia and expo (pp. 765–768). IEEE. Retrieved 2018-05-03, from http://ieeexplore.ieee.org/document/6298495/, https://doi.org/10.1109/ICME.2012.184
Ulen, J., Strandmark, P., & Kahl, F. (2015). Shortest paths with higherorder regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(12), 2588–2600.
Wang, Z., & Sheikh, H. R. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 14. https://doi.org/10.1109/TIP.2003.819861
Yao, J., Boben, M., Fidler, S., & Urtasun, R. (2015). Real-time coarse-tofine topologically preserving segmentation. In Proceedings of international conference on computer vision and pattern recognition (pp. 2947–2955). https://doi.org/10.1109/CVPR.2015.7298913
Yu, Y., Guan, H., & Ji, Z. (2015). Rotation-invariant object detection in highresolution satellite imagery using superpixel-based deep Hough forests. IEEE Geoscience and Remote Sensing Letters, 12(11), 2183–2187. https://doi.org/10.1109/LGRS.2015.2432135
Zou, Q., Cao, Y., Li, Q., Mao, Q., & Wang, S. (2012). Cracktree: Automatic crack detection from pavement images. Pattern Recognition Letters, 33(3), 227–238. https://doi.org/10.1016/j.patrec.2011.11.004
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Contributions
All authors equally contributed to the writing and the reviewing of this paper. All authors approved the current version of this paper.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: 3D Tensor Voting
Appendix A: 3D Tensor Voting
Let \({\mathbb {R}}^{3\times 3}\) with an origin coordinate O in \({\mathbb {R}}^3\) be the considered vector space, endowed with a voting function \(VF:{\mathbb {R}}^{3\times 3}\times {\mathbb {R}}^{3} \mapsto {\mathbb {R}}^{3\times 3}\). A tensor can be represented by a matrix \({\mathbb {T}}\in {\mathbb {R}}^{3\times 3}\). The voting operation VF builds a new tensor \({\mathbb {T}}'\) to the cast location \(P\in {\mathbb {R}}^3\) and adds it to the tensor at this location, since tensors have good summation properties. The tensor \({\mathbb {T}}'\) is a combination of rotation and scaling of the source tensor \({\mathbb {T}}\), combinations that are all derived from the stick kernel. Indeed, tensors can be decomposed in a basis of tensors, in which the stick tensor is the simplest element. Then, the stick kernel refers to the voting operation of this stick tensor.
In tensor voting, a tensor is a second order symmetric tensor that can be represented by a positive semidefinite diagonalizable matrix \({\mathbb {T}}\in {\mathbb {R}}^{3\times 3}\), whose eigenvectors are orthogonal. In addition to its coordinates, one tensor can be characterized either from six scalar values corresponding to the coefficients of the symmetric matrix or, from three eigenvalues and a rotation. This rotation defines the transformation of the orthonormal basis \(({\textbf{e}}_0,{\textbf{e}}_1,{\textbf{e}}_2)\) to align with \(({\hat{\textbf{e}}}_0,{\hat{\textbf{e}}}_1,{\hat{\textbf{e}}}_2)\in {\mathbb {R}}^{3\times 3}\) the set of eigenvectors sorted by decreasing eigenvalue. The decomposition of the matrix into a set of diagonal matrices is a key point introduced by Medioni et al. (2000). By definition, the tensor is a diagonal matrix in the system \(({\hat{\textbf{e}}}_0,{\hat{\textbf{e}}}_1,{\hat{\textbf{e}}}_2)\), so that:
where \({\mathbb {T}}_{stick}\), \({\mathbb {T}}_{plate}\) and \({\mathbb {T}}_{ball}\) are respectively the stick tensor, the plane one and the ball one, named according to their representations as ellipsoids (see figure in Medioni et al., 2005), and each of them represents a different type of structure: The stick component encodes the saliency of surfaces that are normal to \({\hat{\textbf{e}}}_0\), the plate component is encoding some curves with tangent direction \({\hat{\textbf{e}}}_2\), and the ball component is encoding points, e.g. corresponding to thin structure junctions.
The stick kernel that allows for the vote cast by a stick tensor, \({\mathbb {T}}_{stick}\in {\mathbb {R}}^{3\times 3}\), involves a multiplication of \({\mathbb {T}}_{stick}\) by a decay function DF, and a rotation by a vector \(\varvec{\Omega }\). Specifically, DF is as follows:
where \(\sigma _T\) is the scale parameter, v is a constant that controls the decay with curvature, \(r\in {\mathbb {R}}_{>0}\) is the length of the circle arc between O and P on the osculating circle joining O and P with normal \({\hat{\textbf{e}}}_0\) at point O and \(\phi \in ]-\pi ,\pi ]\) the angle between the tangent to the same osculating circle in O and \(\vec {OP}\). The decay function allows for a smooth voting kernel whose support can be bounded to a sphere of radius \(3\sigma _T\). Along with the term \(v\phi ^2\) used for increasing the decay with curvature, Medioni et al. (2000) proposes also to restrict vote to the area where \(\phi <\frac{\pi }{4}\) and consider that the term \(DF(r,\phi ,\sigma _T)\) is null otherwise.
The rotation \({\textbf{R}}(\varvec{\Omega })\in {\mathbb {R}}^{3\times 3}\) is defined by the rotation vector \(\varvec{\Omega }\in {\mathbb {R}}^3\), that transforms the vector \({\hat{\textbf{e}}}_0\) into the vector \({\hat{\textbf{e}}}'_0\) with \({\hat{\textbf{e}}}'_0\) and \({\hat{\textbf{e}}}_0\) symmetrical with respect to the mediator of the segment OP. This allows for computing the cast tensor \({\mathbb {T}}'_{stick}\in {\mathbb {R}}^{3\times 3}\) as follows:
where \(\cdot ^{T}\) is the transposition operation.
Plate tensor can be written \({{\mathbb {T}}_{plate} = {\hat{\textbf{e}}}_0 {\hat{\textbf{e}}}_0^T + {\hat{\textbf{e}}}_1 {\hat{\textbf{e}}}_1^T}\), while ball tensor is written \({{\mathbb {T}}_{ball} = {\hat{\textbf{e}}}_0 {\hat{\textbf{e}}}_0^T + {\hat{\textbf{e}}}_1 {\hat{\textbf{e}}}_1^T + {\hat{\textbf{e}}}_2 {\hat{\textbf{e}}}_2^T}\). The plate and ball kernels are derived from the stick kernel by integration of stick tensors. Approximating these integrals as sums of tensors,
where \(\Delta _\rho = \frac{\Pi }{I}\) and \(\Delta _\psi =\frac{\Pi }{J}\), and \(I,J\in {\mathbb {N}}\) are arbitrary constants. Note that these kernels are usually precomputed for computational efficiency.
Then, any tensor \({\mathbb {T}}_s\) at location \(s\in {\mathbb {R}}^3\) can be decomposed from Eq. (A1) in a basis \(({\hat{\textbf{e}}}_0,{\hat{\textbf{e}}}_1,{\hat{\textbf{e}}}_2 )\) as \({\mathbb {T}}(s) = (\lambda _0-\lambda _1){\hat{\textbf{e}}}_0{\hat{\textbf{e}}}_0^T + (\lambda _1-\lambda _2){\hat{\textbf{e}}}_1{\hat{\textbf{e}}}_1^T + \lambda _2{\hat{\textbf{e}}}_2{\hat{\textbf{e}}}_2^T\), and the vote cast at location \(t\in {\mathbb {R}}^3\) is written:
Having introduced the voting operation for one tensor, let us specify the global voting process.
From \({\mathcal {S}}_0,{\mathcal {S}}_1\subset {\mathcal {S}}\) the sets of voters and the cast locations respectively, \(\forall s\in {\mathcal {S}}\),
where \({\mathbb {T}}'(s)\) is the tensor at location s after vote and \({\mathbb {T}}(s)\) before.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ribal, C., Le Hégarat-Mascle, S. & Lermé, N. Thin structures retrieval using anisotropic neighborhoods of superpixels: application to shape-from-focus. Multidim Syst Sign Process 34, 179–204 (2023). https://doi.org/10.1007/s11045-022-00854-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11045-022-00854-8