Abstract
The traditional disparity refinement methods cannot get highly accurate disparity estimations, especially pixels around depth boundaries and within low textured regions. To tackle this problem, two novel stereo refinement strategies are proposed: (1) merging super-pixels into stable region to maintain continuity and accuracy of the same disparity; (2) optimizing the co-operative relations between adjacent regions. Then we can obtain high-quality and high-density disparity maps. The quantitative evaluation on Middlebury benchmark shows that our algorithm can significantly refine the results obtained by local and non-local methods.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Stereo matching has been one of the key problems in computer vision for years. Recently, most of publications [1,2,3,4] have been focused on solving this problem. And the segment-based methods [7,8,9] have attracted more and more attention due to their good performances for years.
Most segment-based stereo matching algorithms follow the four-step pipeline [5]: First, matching cost computation; Second, cost aggregation; Third, disparity computation/optimization; Fourth, disparity refinement. Traditional disparity refinement methods, involving left-right consistency checking [10], hole filling [11], and median filtering [12, 13], could not provide highly accurate disparity estimation. Yoon et al. [14] adopted adaptive supporting-weight approach for correspondence search to refine the local aggregation results. Yang [15] firstly proposed the non-local aggregation method and refined the non-local results with minimum spanning tree (MST). Based on Yang’s method, Mei et al. [16] proposed a segment-tree (ST) structure for non-local cost aggregation, they enhanced the disparity values, with a depth-color segmentation method extended from a classic graph-based segmentation method [17]. The region-based methods [18, 19], presented to further improve the disparity estimation, can get better results especially in low textured regions.
In this paper, we propose a stereo refinement algorithm based on merging super-pixels (MSP). Our algorithm includes the following seven steps: First, estimating the initial disparity values with a local or non-local method and locating the super-pixels with a depth-color segmentation method from stereo images; Second, estimating the robust information of each super-pixel by voting; Third, searching for the supporting neighbors of each super-pixel; Fourth, merging super-pixels into region based on the correlation of adjacent super-pixels; Fifth, updating the information of each region and finding out unreliable regions; Sixth, correcting unreliable region with its supporting region; Seventh, assigning disparity value for each pixel with considering the disparity of the correlative region.
In general, our paper makes these main contributions: (1) we merge super-pixels into stable region, then the disparity of each pixel can be estimated by considering the constraint on smoothness of the correlative region to maintain the continuity of the same disparity. (2) we apply the optimization of the cooperative relations between adjacent regions to reduce the unreliable disparity values and obtain the high-quality depth boundaries.
2 Obtaining Raw Cost Aggregation and Initial Disparities
2.1 Obtaining Cost and Disparity in Pixel Domain
First of all, we employ some local or non-local algorithms to obtain the raw cost aggregation and initial disparity values. These algorithms always poorly use WTA strategy to select disparities from multiple candidates and the disparity estimation obtained by these algorithms is not accurate enough. Later, the accuracy will be improved by our algorithm.
2.2 Over-Segment Based on Color-Depth
Segment-based algorithms usually assume that disparity values vary smoothly in each segment and the depth discontinuities only occur on segment boundaries. But in practice, over-segment based on color-depth is preferred and the assumption is not al-ways met. In this paper, we use efficient graph-based image segmentation [16, 17]. Figure 1 shows the disparity map of the Teddy stereo pair and the segmentation result of the left image produced by the method in [16]. In this paper, we call the over-segmentation super-pixel.
2.3 Cross-Checking Test
At first, a local or non-local cost aggregation method runs the left and the right image as reference images in turn to obtain two corresponding disparity maps. In order to eliminate the outlier in disparity map and obtain robust disparity estimation of each segmentation, the cross-checking test is applied. Then the occlusions and matching errors in the disparity map can be obtained, they are all called unreliable pixels in this paper. After cross-checking, the cost volume is refined according to [15]. Let \( D \) denotes the disparity map, a new cost value is computed for each pixel \( p \) at each disparity level \( d \) as:
3 Robust Super-Pixels Merging
The super-pixels are sensitive to unreliable pixels and they are correlative rather than individual. If the super-pixel is handled solely, the disparity values around the boundary between adjacent regions, which have the same disparity may be discontinuous. In this paper, an effective approach of merging super-pixels to stable region is proposed to resolve this problem.
3.1 Voting the Information of Super-Pixel
Before merging, the information of super-pixels should be obtained by voting. The information contains RGB values, disparity and the message whether the super-pixel is unreliable or not. The process of voting robust information can be expressed as:
First, the RGB values of super-pixel are estimated by using RGB values of all pixels within the region. And the RGB values of each super-pixel are respectively determined by voting a one-dimensional histogram, where the x-coordinate is the value of one of the three channels, and the y-coordinate is the count number of values. After sorting the histogram and smoothing operation by a Gaussian filter, the value of each individual channel is finally estimated by the maximum of the corresponding histogram;
Second, the disparity of each super-pixel is estimated in a similar way by getting rid of unreliable pixels.
Third, if the number of unreliable pixels in a super-pixel is more than a given per-cent of the number of all pixels within the super-pixel, we regard this super-pixel as an unreliable super-pixel and assign true (denotes the super-pixel is unreliable) for the message of this super-pixel. Let \( W_{occ} \) denotes the percent.
3.2 Supporting Neighbors Selection
In order to get rid of piecewise smooth, the super-pixels should be merged to stable region by considering the supporting neighbors of each super-pixel. Let \( W_{i} \left( {S_{p} } \right) \) denotes the weight of the correlation between the super-pixel \( S_{p} \) and its neighboring super-pixel \( S_{i} \). Considering the difference of disparity and color between super-pixels \( S_{p} \) and \( S_{i} \). The ratio \( \alpha \), which denotes the ratio of common border lengths to perimeter, can be written as:
where \( N_{i} \) denotes the length of the boundary between super-pixel \( S_{p} \) and \( S_{i} \). And \( N_{all} \) denotes the perimeter of super-pixel \( N_{i} \). Thus, \( W_{i} \left( {S_{p} } \right) \) can be written as:
where \( S_{i} \) covers all neighbors of super-pixel \( S_{p} \). \( \sigma_{s} \) and \( \sigma_{c} \) are two variables, which can self-adapt in terms of the disparity range and color range, to normalize \( I_{r} \) and \( D_{r} \) to the range [0, 1]. \( D_{r} \) denotes the disparity of super-pixel and \( I_{r} \) denotes the RGB values of super-pixel.
Here, it is worthy of attention that the proposed approach just depends on the con-textual information of the adjacent super-pixels and no ambiguity or artificial factor exists.
The supporting neighbors are selected by minimizing the set of \( W_{i} \left( {S_{p} } \right) \), \( i = 1, 2 \ldots n \). Due to the several minimum (because of equal) at the same time, the supporting neighbors of super-pixel \( S_{p} \) are consist of all neighboring super-pixels, which can minimize the \( W_{i} \left( {S_{p} } \right) \).
3.3 Merging Super-Pixels to Stable Region
This step aims to obtain stable region by merging super-pixels and it is divided into the following three cases:
-
(a)
If two neighboring super-pixels are both reliable super-pixel and their disparities are equal, then merge the two super-pixels;
-
(b)
If the two super-pixels are both unreliable or one is unreliable region, the other is not and one is the supporting neighbor of the other one, then merge the two super-pixels;
-
(c)
The rest conditions will not be merged. If a super-pixel was not merged with any other super-pixel, it should be regarded as a stable region. We merge the super-pixels by using a forest structure. (The forest construction algorithm, which regards super-pixel as pixel, is similar to the ST structure algorithm in [16].)
Figure 2 gives the super-pixels merged result of the left image and the disparity map with first iteration. The experimental results show that the new segmentations are stable and our method performs well in disparity estimation.
4 The Principle of Unreliable Region Optimization
The unreliable pixels have great effects on disparity estimation. In this section, we propose a new method to deal with unreliable pixels by optimizing the unreliable region. As described in Sect. 3, before optimizing, the information and the supporting neighbors of each region must be updated.
The principles of unreliable region optimization are as follows:
-
(a)
Considering each unreliable region’s supporting neighbors, if there is a supporting neighbor which is a reliable region, or an unreliable region which has already been optimized, then we regard the supporting neighbor as a supporting region;
-
(b)
If there is no supporting region of unreliable region \( S_{u} \), we select the neighbor which can minimize \( W_{i} \left( {S_{u} } \right) \) from all neighbors of \( S_{u} \) to be a supporting region;
-
(c)
If an unreliable region has more than one supporting region, selecting the supporting region with the minimum of disparity. And then we regard the selected supporting region as the final supporting region;
-
(d)
Assigning the final supporting region disparity for the correlative unreliable region disparity. And then set a label, which denotes the unreliable region has been optimized, to this unreliable region. Applying the four steps to all unreliable regions until each of them have been set an optimized label.
5 Depth Hypotheses Generation
In this section, we obtain the accurate disparity map by two steps. First, we adopt the constraint on smoothness to reduce the effect of spurious disparity estimation. Second, the iterative refinement is employed to enhance the accuracy of the disparity map.
5.1 The Constraint on Smoothness of Region
In order to reduce effects on spurious disparity estimation, we consider the smooth-ness of stable region. Usually, the depth discontinuity occurs around the boundaries of regions. Thus, the method, used to solve the smoothness problem, assigns the disparity value for each pixel by selecting the disparity from the correlative stable region disparity, which can minimize the cost aggregation. The optimal disparity value of pixel \( p \) within super-pixel \( S_{p} \) can be written as:
where \( \Delta d \) is a variable which determines the range of stable region disparity. If it is too small, the correct cost value may be excluded and if it is too large, the effects of spurious cost values may not be reduced. Thus we apply an adapting formulation for computing \( \Delta d \), the formulation can be written as:
where \( R \) denotes the disparity range of image and \( \gamma \) is a constant which is set to six in all of our experiments. According to Eq. (4), the disparity value of pixel \( p \) is \( d \) which minimizes \( D_{{d_{i} }}^{A} \left( p \right) \).
5.2 Enhancement with Iteration
After estimating the accurate disparity values, we can use iterative refinement to enhance the disparity estimation. As shown in Fig. 3, in the first iteration, disparity value with the best cost value is selected for each pixel, and then the robust typical disparity value can be voted for each stable region. In the next iteration, refining the disparity values by re-computing the steps from 2 to 7 based on the last iteration disparity map. New stable regions are determined and their information is updated. The best disparity values of pixels are selected only among the represent disparity value of the correlative stable regions. The final disparity values can be assigned after two iterations.
Figure 3 shows the second iteration segmentation result of the left image. Obviously, the experimental result performs better than the result in the first iteration (Fig. 2). In addition, in order to verify the robustness of the proposed algorithm, Fig. 4 shows the merged results of the rest stereo image pairs in the Middlebury data sets [6].
6 Experimental Results
The local algorithm [14] and the non-local algorithm [16] proved to be the top performer on Middlebury benchmark [6], but the results of this paper demonstrates that quantitative disparity map estimated by these algorithms can be improved by the proposed algorithm (MSP).
All experiments in this paper strictly follow a local stereo matching pipeline [5]. The specific descriptions are as follows:
-
(a)
Cost computation: The same cost used in the local method [14] and non-local method [16], is adopted in all our experiments. It is a blending of truncated color difference and truncated gradient difference.
-
(b)
Cost aggregation: Two cost aggregation methods are evaluated with various stereo data sets: local aggregation with adaptive supporting-weight (AW) [14], non-local aggregation with enhanced ST (Segment-tree) [16].
-
(c)
Disparity optimization: WTA (Winner-Take-All) operation is adopted in all experiments. This method simply chooses the disparity for each pixel with the minimal aggregated cost.
-
(d)
Disparity refinement: Based on the result of (c), applying the merged super-pixel (MSP) refinement algorithm to enhance the performance. Two parameters require to be set in this method, the parameter \( k \) is set to 0.03 and \( W_{occ} \) is set to 0.4. The final disparity map can be obtained by only iterating the proposed algorithm twice.
The disparity maps of all four stereo pairs in the Middlebury data sets computed by local method [14] are presented in Fig. 5(a). And the disparity maps obtained by the proposed algorithm, and based on the resulting disparity maps in Fig. 5(a), with different iterations, are presented in Fig. 5(b)–(c). Obviously, Fig. 5(b)–(c) show that their results are more accurate than the result in Fig. 6(a). Thus, it proves that the proposed method (MSP) is available to enhance the performance of local methods. Similarly, the proposed method (MSP) is effective to improve the performance of non-local methods. Visual comparisons in Fig. 5 show that the proposed refinement method performs better within the low textured regions. For instance, the region near the hand of teddy bear (the third row of Fig. 5) is estimated inaccurate with cost computation method (the first step of stereo matching pipeline). Both the local and non-local cost aggregation methods cannot correct these errors, but the proposed method can obtain the accurate disparity values through optimizing the unreliable region with its supporting region. Moreover, the method is more accurate around depth boundaries, such as the boundaries of the newspaper in Venus data set (the second row of Fig. 5). Errors around depth boundaries are mostly due to noises and would cause inconsistency, the method corrects the errors by merging super-pixels to stable region and assign the disparity value for each pixel by considering the constraint on smoothness of stable region. More details are presented in Figs. 6 and 7. According to the comparisons of the disparity estimation within zoom-in regions in Figs. 6 and 7, MSP-2 performs completely better than local and non-local methods, with more accurate estimation both in low textured regions (shown in Fig. 6) and around depth boundaries (shown in Fig. 7).
The running time of the algorithm is related to the number of iterations. By using a PC with CPU of PM 2.5G, the total time for processing the stereo pair of Tsukuba is about 2 s. Here, the number of iterations is 2, and the time for image segmentation is about 1 s. The comparisons between the proposed refinement method and local method [14] or non-local method [16] are shown in Table 1. The average error of local method is reduced by 0.38% (from 6.67% to 6.29%) through applying the proposed method. And the rank is increased by 18.3 (from 79.5 to 61.2). The average error of non-local method [16] is reduced by 0.61% (from 5.35% to 4.74%) through using the proposed method. And the rank is increased by 13.4 (from 37.7 to 24.3). It is clear to see the significant improvement of quantitative evaluation when we replace local and non-local stereo matching method with our novel refinement method.
7 Conclusion
This paper proposed a novel refinement algorithm for stereo matching, permits us to obtain the high-quality and high-density disparity map of a scene from its initial disparity estimation. Its novelty is reflected in the following two aspects: Novelty 1, presenting the method of merging super-pixels into stable region. Novelty 2, dealing with unreliable pixels by optimizing the unreliable region.
The advantage of this algorithm lies in that it is able to restrain and correct errors both in low textured regions and around depth boundaries, making us obtain the high-quality and high-density disparity map.
In the near future, we will focus on testing the algorithm with more challenging stereo data sets and various local or non-local cost aggregation methods.
References
Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1592–1599 (2015)
Sinha, S.N., Scharstein, D., Szeliski, R.: Efficient high-resolution stereo matching using local plane sweeps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1582–1589 (2014)
Zhang, K., Fang, Y., Min, D., et al.: Cross-scale cost aggregation for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1590–1597 (2014)
Shi, C., Wang, G., Yin, X., et al.: High-accuracy stereo matching based on adaptive ground control points. IEEE Trans. Image Process. 24(4), 1412–1423 (2015)
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms
Scharstein, D., Szeliski, R.: Middlebury stereo evaluation
Mei, X., Sun, X., Dong, W., et al.: Segment-tree based cost aggregation for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 313–320 (2013)
Muninder, V., Soumik, U., Krishna, G.: Robust segment-based stereo using cost aggregation. In: Proceedings of International Conference on British Machine Vision Conference (2014)
Wang, H.W., Chang, M.W., Lin, H.S., et al.: Segmentation based stereo matching using color grouping. In: ACM SIGGRAPH 2014 Posters, p. 73. ACM (2014)
Cochran, S.D., Medioni, G.: 3-D surface description from binocular stereo. TPAMI 14, 981–994 (1992)
Birchfield, S., Tomasi, C.: A pixel dissimilarity measure that is insensitive to image sampling. TPAMI 20, 401–406 (1998)
Mühlmann, K., Maier, D., Hesser, J., Männer, R.: Calculating dense disparity maps from color stereo images, an efficient implementation. IJCV 47, 79–88 (2002)
Rhemann, C., Hosni, A., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual correspondence and beyond. In: CVPR (2011)
Yoon, K.J., Kweon, I.S.: Adaptive supporting-weight approach for correspondence search. PAMI 28(4), 650–656 (2006)
Yang, Q.: A non-local cost aggregation method for stereo matching. In: CVPR, pp. 1402–1409 (2012)
Mei, X., Sun, X., Dong, W., et al.: Segment-tree based cost aggregation for stereo matching. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 313–320. IEEE (2013)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph based image segmentation. IJCV 59(2), 167–181 (2004)
Klaus, A., Sormann, M., Karner, K.: Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 3, pp. 15–18. IEEE (2006)
Wang, Z.F., Zheng, Z.G.: A region based stereo matching algorithm using cooperative optimization. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph based image segmentation. IJCV 59(2), 167–181 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Heng, J., Xu, Z., Zheng, Y., Liu, Y. (2017). Disparity Refinement Using Merged Super-Pixels for Stereo Matching. In: Zhao, Y., Kong, X., Taubman, D. (eds) Image and Graphics. ICIG 2017. Lecture Notes in Computer Science(), vol 10666. Springer, Cham. https://doi.org/10.1007/978-3-319-71607-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-71607-7_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71606-0
Online ISBN: 978-3-319-71607-7
eBook Packages: Computer ScienceComputer Science (R0)