The Visual Computer

, Volume 35, Issue 10, pp 1427–1446 | Cite as

Disparity estimation in stereo video sequence with adaptive spatiotemporally consistent constraints

  • Liang Tian
  • Jing LiuEmail author
  • Haibin Ling
  • Wei Guo
Original Article


Numerous stereo matching algorithms have been proposed to obtain disparity estimation for a single pair of stereo images. However, simply even applying the best of them to temporal frames independently, i.e., without considering the temporal consistency between consecutive frames, may suffer from the undesirable artifacts. Here, we proposed an adaptive, spatiotemporally consistent, constraints-based systematic method that generates spatiotemporally consistent disparity maps for stereo video image sequences. Firstly, a reliable temporal neighborhood is used to enforce the “self-similarity” assumption and prevent errors caused by false optical flow matching from propagating between consecutive frames. Furthermore, we formulate the adaptive temporal predicted disparity map as prior knowledge of the current frame. It is used as a soft constraint to enhance the temporal consistency of disparities, increase the robustness to luminance variance, and restrict the range of the potential disparities for each pixel. Additionally, to further strengthen smooth variation of disparities, the adaptive temporal segment confidence is incorporated as a soft constraint to reduce ambiguities caused by under- and over-segmentation, and retain the disparity discontinuities that align with 3D object boundaries from geometrically smooth, but strong color gradient regions. Experimental evaluations demonstrate that our method significantly improves the spatiotemporal consistency both quantitatively and qualitatively compared with other state-of-the-art methods on the synthetic DCB and realistic KITTI datasets.


Spatiotemporally consistent Adaptive temporal segment confidence Adaptive temporal predicted disparity Reliable temporal neighborhood 



This study was funded by the National Natural Science Foundation of China (Grant No.: 61802109), the Natural Science Foundation of Hebei province (Grant No.: F2017205066), the Science Foundation of Hebei Normal University (Grant No.: L2017B06, L2018K02).

Compliance with ethical standards

Conflict of interest

All authors declare that they have no conflict of interest.


  1. 1.
    Bartczak, B., Jung, D., Koch, R.: Real-Time Neighborhood Based Disparity Estimation Incorporating Temporal Evidence, pp. 153–162. Springer, Berlin (2008)Google Scholar
  2. 2.
    Čech, J., Sanchez-Riera, J., Horaud, R.: Scene flow estimation by growing correspondence seeds. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3129–3136. IEEE (2011)Google Scholar
  3. 3.
    Chen, Z., Sun, X., Wang, L., Yu, Y., Huang, C.: A deep visual correspondence embedding model for stereo matching costs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 972–980 (2015)Google Scholar
  4. 4.
    Dahan, M.J., Chen, N., Shamir, A., Cohen-Or, D.: Combining color and depth for enhanced image segmentation and retargeting. Vis. Comput. 28(12), 1181–1193 (2012)CrossRefGoogle Scholar
  5. 5.
    Davis, J., Ramamoorthi, R., Rusinkiewicz, S.: Spacetime stereo: a unifying framework for depth from triangulation. In: Proceedings. 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, vol. 2, pp. II–359. IEEE (2003)Google Scholar
  6. 6.
    Dobias, M., Sara, R.: Real-time global prediction for temporally stable stereo. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 704–707 (2011)Google Scholar
  7. 7.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the Kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012)Google Scholar
  8. 8.
    Gidaris, S., Komodakis, N.: Detect, replace, refine: deep structured prediction for pixel wise labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5248–5257 (2017)Google Scholar
  9. 9.
    Gong, M.: Real-time joint disparity and disparity flow estimation on programmable graphics hardware. Comput. Vis. Image Underst. 113(1), 90–100 (2009)CrossRefGoogle Scholar
  10. 10.
    Guerrero, P., Winnemöller, H., Li, W., Mitra, N.J.: Depthcut: improved depth edge estimation using multiple unreliable channels. Vis. Comput. 34(9), 1165–1176 (2017)CrossRefGoogle Scholar
  11. 11.
    Guney, F., Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4165–4175 (2015)Google Scholar
  12. 12.
  13. 13.
    Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)CrossRefGoogle Scholar
  14. 14.
    Hosni, A., Rhemann, C., Bleyer, M., Gelautz, M.: Temporally Consistent Disparity and Optical Flow via Efficient Spatio-Temporal Filtering, pp. 165–177. Springer, Berlin (2012)Google Scholar
  15. 15.
    Hung, C.H., Xu, L., Jia, J.: Consistent binocular depth and scene flow with chained temporal profiles. Int. J. Comput. Vis. 102(1–3), 271–292 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Jiang, J., Cheng, J., Chen, B., Wu, X.: Disparity prediction between adjacent frames for dynamic scenes. Neurocomputing 142, 335–342 (2014)CrossRefGoogle Scholar
  17. 17.
    Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., Bry, A.: End-to-end learning of geometry and context for deep stereo regression (2017). arXiv preprint arxiv:1703.04309
  18. 18.
    Khoshabeh, R., Chan, S.H., Nguyen, T.Q.: Spatio-temporal consistency in video disparity estimation. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 885–888. IEEE (2011)Google Scholar
  19. 19.
  20. 20.
  21. 21.
    Kordelas, G.A., Alexiadis, D.S., Daras, P., Izquierdo, E.: Revisiting guided image filter based stereo matching and scanline optimization for improved disparity estimation. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 3803–3807. IEEE (2014)Google Scholar
  22. 22.
    Larsen, E.S., Mordohai, P., Pollefeys, M., Fuchs, H.: Temporally consistent reconstruction from multiple video streams using enhanced belief propagation. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007)Google Scholar
  23. 23.
    Li, L., Yu, X., Zhang, S., Zhao, X., Zhang, L.: 3d cost aggregation with multiple minimum spanning trees for stereo matching. Appl. Opt. 56(12), 3411–3420 (2017)CrossRefGoogle Scholar
  24. 24.
    Li, X., Liu, J.: Efficient stereo matching using segment optimization. In: ICIP (2016)Google Scholar
  25. 25.
    Li, Y., Zhang, J., Zhong, Y., Wang, M.: An efficient stereo matching based on fragment matching. Vis. Comput. 1–13 (2018).
  26. 26.
    Lin, S.H., Chung, P.C.: Temporal consistency enhancement of depth video sequence. In: 2014 International Conference on Information Science, Electronics and Electrical Engineering (ISEEE), vol. 3, pp. 1897–1900. IEEE (2014)Google Scholar
  27. 27.
    Liu, F., Philomin, V.: Disparity estimation in stereo sequences using scene flow. In: Proceedings of the British Machine Vision Conference, pp. 55.1–55.11. BMVA Press (2009)Google Scholar
  28. 28.
    Liu, J., Li, C., Fan, X., Wang, Z., Shi, M., Yang, J.: View synthesis with 3d object segmentation-based asynchronous blending and boundary misalignment rectification. Vis. Comput. 32(6), 989–999 (2016)CrossRefGoogle Scholar
  29. 29.
    Liu, J., Li, C., Mei, F., Wang, Z.: 3d entity-based stereo matching with ground control points and joint second-order smoothness prior. Vis. Comput. 31(9), 1253–1269 (2015)CrossRefGoogle Scholar
  30. 30.
    Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695–5703 (2016)Google Scholar
  31. 31.
    Matsuo, T., Fukushima, N., Ishibashi, Y.: Weighted joint bilateral filter with slope depth compensation filter for depth map refinement. VISAPP 2, 300–309 (2013)Google Scholar
  32. 32.
    Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)Google Scholar
  33. 33.
    Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061–3070 (2015)Google Scholar
  34. 34.
    Min, D., Lu, J., Do, M.N.: Depth video enhancement based on weighted mode filtering. IEEE Trans. Image Process. 21(3), 1176–1190 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Min, D., Yea, S., Vetro, A.: Temporally consistent stereo matching using coherence function. In: 3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), 2010, pp. 1–4. IEEE (2010)Google Scholar
  36. 36.
    Ntouskos, V., Pirri, F.: Confidence driven tgv fusion (2016). arXiv preprint arXiv:1603.09302
  37. 37.
    Pham, C.C., Nguyen, V.D., Jeon, J.W.: Efficient spatio-temporal local stereo matching using information permeability filtering. In: 2012 19th IEEE International Conference on Image Processing, pp. 2965–2968 (2012)Google Scholar
  38. 38.
    Qi, F., Zhao, D., Liu, S., Fan, X.: 3d visual saliency detection model with generated disparity map. Multimed. Tools Appl. 76(2), 3087–3103 (2017)CrossRefGoogle Scholar
  39. 39.
    Richardt, C., Orr, D., Davies, I., Criminisi, A., Dodgson, N.A.: Real-time spatiotemporal stereo matching using the dual-cross-bilateral grid. In: European Conference on Computer Vision, pp. 510–523. Springer (2010)Google Scholar
  40. 40.
    Seki, A., Pollefeys, M.: Patch based confidence prediction for dense disparity map. In: BMVC, vol. 2, p. 4 (2016)Google Scholar
  41. 41.
    Shaked, A., Wolf, L.: Improved stereo matching with constant highway networks and reflective loss (2016). arXiv preprint arxiv:1701.00165
  42. 42.
    Sizintsev, M., Wildes, R.P.: Spatiotemporal stereo via spatiotemporal quadric element (stequel) matching. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 493–500. IEEE (2009)Google Scholar
  43. 43.
    Sun, D., Roth, S., Black, M.J.: Secrets of optical flow estimation and their principles. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2432–2439 (2010)Google Scholar
  44. 44.
    Taniai, T., Sinha, S.N., Sato, Y.: Fast multi-frame stereo scene flow with motion segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6891–6900. IEEE (2017)Google Scholar
  45. 45.
    Vogel, C., Roth, S., Schindler, K.: View-consistent 3d scene flow estimation over multiple frames. In: European Conference on Computer Vision, pp. 263–278. Springer (2014)Google Scholar
  46. 46.
    Vogel, C., Schindler, K., Roth, S.: 3d scene flow estimation with a piecewise rigid scene model. Int. J. Comput. Vis. 115(1), 1–28 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  47. 47.
    Vretos, N., Daras, P.: Temporal and color consistent disparity estimation in stereo videos. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 3798–3802. IEEE (2014)Google Scholar
  48. 48.
    Wedel, A., Brox, T., Vaudrey, T., Rabe, C., Franke, U., Cremers, D.: Stereoscopic scene flow computation for 3d motion understanding. Int. J. Comput. Vis. 95(1), 29–51 (2011)CrossRefzbMATHGoogle Scholar
  49. 49.
    Xing, G., Liu, Y., Zhang, W., Ling, H.: Light mixture intrinsic image decomposition based on a single rgb-d image. Vis. Comput. 32(6–8), 1013–1023 (2016)CrossRefGoogle Scholar
  50. 50.
    Xu, S., Zhang, F., He, X., Shen, X., Zhang, X.: Pm-pm: patchmatch with potts model for object segmentation and stereo matching. IEEE Trans. Image Process. 24(7), 2182–2196 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  51. 51.
    Yamaguchi, K., McAllester, D., Urtasun, R.: Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: European Conference on Computer Vision, pp. 756–771. Springer (2014)Google Scholar
  52. 52.
    Yang, W., Zhang, G., Bao, H., Kim, J., Lee, H.Y.: Consistent depth maps recovery from a trinocular video sequence. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1466–1473. IEEE (2012)Google Scholar
  53. 53.
    Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1–32), 2 (2016)zbMATHGoogle Scholar
  54. 54.
    Zeng, H., Ma, K.K.: Content-adaptive temporal consistency enhancement for depth video. In: 2012 19th IEEE International Conference on Image Processing (ICIP), pp. 3017–3020. IEEE (2012)Google Scholar
  55. 55.
    Zhang, G., Jia, J., Wong, T.T., Bao, H.: Consistent depth maps recovery from a video sequence. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 974–988 (2009)CrossRefGoogle Scholar
  56. 56.
    Zhu, S., Yan, L.: Local stereo matching algorithm with efficient matching cost and adaptive guided image filter. Vis. Comput. 33(9), 1087–1102 (2017)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Key Laboratory of Augmented Reality, College of Mathematics and Information ScienceHebei Normal UniversityShijiazhuangChina
  2. 2.Department of Computer and Information Sciences, Center for Data Analytics and Biomedical InformaticsTemple UniversityPhiladelphiaUSA

Personalised recommendations