Abstract
Stereo vision systems with additional flash/no-flash cues have been demonstrated to be robust to depth discontinuities. The ratio of a flash and no-flash image pair naturally provides additional scene depth information and thus can serve as a strong cue for preserving depth discontinuities. However, existing solution simply uses ratio as the guidance to perform matching cost aggregation and thus is still vulnerable to occlusions. Inevitable misalignment of flash and no-flash images due to camera and/or scene motion remains unsolved as well. This paper investigates into these two problems. An occlusion detection approach is derived based on foreground/background extraction. Matching cost computed in the occluded regions (which is useless and harmful) is thus discarded so that reliable information from non-occluded regions can be easily propagated in. The foreground, occlusion and depth estimation is modeled in a uniform framework base on Expectation-Maximum. The proposed solution is evaluated using both indoor and outdoor data sets, showing clear improvement over the state-of-the-art methods.
Similar content being viewed by others
Notes
Most of the current commercial active sensors are not reliable under outdoor environment and thus only indoor environment was tested.
References
Bastanlar, Y., Temizel, A., Yardimci, Y., & Sturm, P. (2012). Multi-view structure-from-motion for hybrid camera scenarios. Image and Vision Computing, 30(8), 557–572.
Blake, A., Rother, C., Brown, M., Perez, P., & Torr, P. (2004). Interactive image segmentation using an adaptive gmmrf model. In ECCV (pp. 428–441).
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. PAMI, 23(11), 1222–1239.
Chen, C., Lin, H., Yu, Z., Kang, S., & Yu, J. (2014). Light field stereo matching using bilateral statistics of surface cameras. In CVPR.
Gastal, E. S. L., & Oliveira, M. M. (2011). Domain transform for edge-aware image and video processing. TOG, 30(4), 69:1–69:12.
He, K., Sun, J., & Tang, X. (2013). Guided image filtering. PAMI, 35, 1397–1409.
Hirschmuller, H., & Scharstein, D. (2009). Evaluation of stereo matching costs on images with radiometric differences. PAMI, 31(9), 1582–1599.
Hosni, A., Rhemann, C., Bleyer, M., Rother, C., & Gelautz, M. (2013). Fast cost-volume filtering for visual correspondence and beyond. PAMI, 35, 504–511.
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., & Fitzgibbon, A. (2011). Kinectfusion: Real-time 3d reconstruction and interaction using a moving depth camera. In UIST (pp. 559–568).
Kaehler, O., & Reid, I. (2013). Efficient 3d scene labeling using fields of trees. In ICCV (pp. 3064–3071).
Liu, C., Yuen, J., Torralba, A., Sivic, J., & Freeman, W. T. (2008). Sift flow: Dense correspondence across different scenes. In ECCV (pp. 28–42).
Ma, Z., He, K., Wei, Y., Sun, J., & Wu, E. (2013). Constant time weighted median filtering for stereo matching and beyond. In ICCV.
Murray, D., & Little, J. (2000). Using real-time stereo vision for mobile robot navigation. Autonomous Robots, 8(2), 161–171.
Point-gray stereo camera. (2015). http://www.ptgrey.com//bumblebee2-firewire-stereo-vision-camera-systems.
Prisacariu, V., & Reid, I. (2012). 3d hand tracking for human computer interaction. Image and Vision Computing, 30(3), 236–250.
Ren, C., Prisacariu, V., Murray, D., & Reid, I. (2013). Star3d: Simultaneous tracking and reconstruction of 3d objects using rgb-d data. In ICCV (pp. 1561–1568).
Riegl vz 1000 scanner. http://www.riegl.com/nc/products/terrestrial-scanning/produktdetail/product/scanner/27/.
Rothganger, F., Lazebnik, S., Schmid, C., & Ponce, J. (2006). 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. IJCV, 66(3), 231–259.
Scharstein, D., & Szeliski, R. Middlebury stereo evaluation. http://vision.middlebury.edu/stereo/eval/.
Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV, 47, 7–42.
Softkinetic depth sensor. (2015). http://www.softkinetic.com/Products/DepthSenseCameras.
Sun, J., Li, Y., Kang, S., & Shum, H. (2006). Flash matting. In SIGGRAPH (pp. 772–778).
Sun, J., Sun, J., Kang, S., Xu, Z., Tang, X., & Shum, H. (2007). Flash cut: Foreground extraction with flash and no-flash image pairs. In CVPR.
Sun, D., Roth, S., & Black, M. (2014). A quantitative analysis of current practices in optical flow estimation and the principles behind them. IJCV, 106(2), 115–137.
Sun, J., Zheng, N., & Shum, H. Y. (2003). Stereo matching using belief propagation. PAMI, 25(7), 787–800.
Tomasi, C., & Manduchi, R. (1998). Bilateral filtering for gray and color images. In ICCV (pp. 839–846).
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (Dec. 2013). Deepflow: Large displacement optical flow with deep matching. In ICCV, Sydney.
Xiong, W., Chung, H., & Jia, J. (2009). Fractional stereo matching using expectation-maximization. PAMI, 31(3), 428–443.
Yang, Q. (2012). A non-local cost aggregation method for stereo matching. In CVPR (pp. 1402–1409).
Yang, Q. (2012). Recursive bilateral filtering. In ECCV (pp. 399–413).
Yang, H., Lin, W., & Lu, J. (2014). Daisy filter flow: A generalized discrete approach to dense correspondences. In CVPR (pp. 3406–3413).
Yang, Q., Tan, K.-H., & Ahuja, N. (2009). Real-time o(1) bilateral filtering. In CVPR.
Ye, J., Ji, Y., & Yu, J. (2013). A rotational stereo model based on xslit imaging. In ICCV.
Ye, J., Ji, Y., Li, F., & Yu, J. (2012). Angular domain reconstruction of dynamic 3d fluid surfaces. In CVPR (pp. 310–317).
Yoon, K.-J., & Kweon, I.-S. (2006). Adaptive support-weight approach for correspondence search. PAMI, 28(4), 650–656.
Yu, Z., Guo, X., Ling, H., & Yu, J. (2013). Line assisted light field triangulation and stereo matching. In ICCV.
Zabih, R., & Woodfill, J. (1994). Non-parametric local transforms for computing visual correspondence. In ECCV.
Zhang, Z. (2012). Microsoft kinect sensor and its effect. IEEE MultiMedia, 19(2), 4–12.
Zhou, C., Troccoli, A., & Pulli, K. (2012). Robust stereo with flash and no-flash image pairs. In CVPR (pp. 342–349).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Long Quan.
Rights and permissions
About this article
Cite this article
Xu, J., Yang, Q. & Feng, Z. Occlusion-Aware Stereo Matching. Int J Comput Vis 120, 256–271 (2016). https://doi.org/10.1007/s11263-016-0910-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-016-0910-9