Multimedia Tools and Applications

, Volume 77, Issue 15, pp 19869–19894 | Cite as

3D hierarchical optimization for multi-view depth map coding

  • Marc Maceira DuchEmail author
  • David Varas
  • Josep Ramon Morros Rubió
  • Javier Ruiz-Hidalgo
  • Ferran Marques


Depth data has a widespread use since the popularity of high resolution 3D sensors. In multi-view sequences, depth information is used to supplement the color data of each view. This article proposes a joint encoding of multiple depth maps with a unique representation. Color and depth images of each view are segmented independently and combined in an optimal Rate-Distortion fashion. The resulting partitions are projected to a reference view where a coherent hierarchy for the multiple views is built. A Rate-Distortion optimization is applied to obtain the final segmentation choosing nodes of the hierarchy. The consistent segmentation is used to robustly encode depth maps of multiple views obtaining competitive results with HEVC coding standards.


Scene segmentation Depth map segmentation 3D representation Rate-distortion optimization 



This work has been developed in the framework of projects TEC2013-43935-R and TEC2016-75976-R, financed by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF)


  1. 1.
    Barrera F, Padoy N (2014) Piecewise planar decomposition of 3D point clouds obtained from multiple static rgb-d cameras. In: 2014 2nd International conference on 3D vision, vol 1, pp 194–201Google Scholar
  2. 2.
    Charikar M, Guruswami V, Wirth A (2003) Clustering with qualitative information. In: Proceedings of the 44th Annual IEEE symposium on foundations of computer science FOCS ’03. IEEE Computer Society, Washington, DC, pp 524–533Google Scholar
  3. 3.
    Fehn C (2004) Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TVGoogle Scholar
  4. 4.
    Fischler M A, Bolles R C (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395MathSciNetCrossRefGoogle Scholar
  5. 5.
    Freeman H (1961) On the encoding of arbitrary geometric configurations. IRE Trans Electron Comput EC-10(2):260–268MathSciNetCrossRefGoogle Scholar
  6. 6.
    Gao Y, Cheung G, Maugey T, Frossard P, Liang J (2016) Encoder-driven inpainting strategy in multiview video compression. IEEE Trans Image Process 25 (1):134–149MathSciNetCrossRefGoogle Scholar
  7. 7.
    Glasner D, Vitaladevuni SN, Basri R (2011) Contour-based joint clustering of multiple segmentations. In: Proceedings of the 2011 IEEE Conference on computer vision and pattern recognition CVPR ’11. IEEE Computer Society, Washington, DC, pp 2385–2392Google Scholar
  8. 8.
    Gupta S, Arbeláez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from RGB-D images. In: 2013 IEEE Conference on computer vision and pattern recognition (CVPR), pp 564–571Google Scholar
  9. 9.
    Kowdle A, Sinha S, Szeliski R (2012) Multiple view object cosegmentation using appearance and stereo cues. In: European Conference on computer vision. Firenze, pp 789–803
  10. 10.
    Liang B, Zheng L (2015) A survey on human action recognition using depth sensors. In: 2015 International conference on digital image computing: techniques and applications (DICTA), pp 1–8Google Scholar
  11. 11.
    Lucas L F R, Wegner K, Rodrigues N M M, Pagliari C L, da Silva E A B, de Faria S M M (2015) Intra predictive depth map coding using flexible block partitioning. IEEE Trans Image Process 24(11):4055– 4068MathSciNetCrossRefGoogle Scholar
  12. 12.
    Maceira M, Morros J R, Ruiz-Hidalgo J (2016) Depth map compression via 3D region-based representation. Multimed Tools Appl 1–24Google Scholar
  13. 13.
    Merkle P, Smolic A, Muller K, Wiegand T (2007) Efficient prediction structures for multiview video coding. IEEE Trans Circ Syst Video Technol 17(11):1461–1473CrossRefGoogle Scholar
  14. 14.
    Merkle P, Müller K, Marpe D, Wiegand T (2016) Depth intra coding for 3D video based on geometric primitives. IEEE Trans Circ Syst Vid Technol 26(3):570–582CrossRefGoogle Scholar
  15. 15.
    Micusik B, Kosecka J (2009) Piecewise planar city 3D modeling from street view panoramic sequences. In: IEEE Conference on computer vision and pattern recognition, 2009. CVPR 2009., pp 2906–2912Google Scholar
  16. 16.
    Müller K, Merkle P, Wiegand T (2011) 3-D video representation using depth maps. Proc IEEE 99(4):643–656CrossRefGoogle Scholar
  17. 17.
    Müller K, Schwarz H, Marpe D, Bartnik C, Bosse S, Brust H, Hinz T, Lakshman H, Merkle P, Rhee FH, Tech G, Winken M, Wiegand T (2013) 3D High-efficiency video coding for multi-view video and depth data. IEEE Trans Image Process 22(9):3366–3378MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Ortega A, Ramchandran K (1998) Rate-distortion methods for image and video compression. IEEE Signal Process Mag 15(6):23–50CrossRefGoogle Scholar
  19. 19.
    Ostermann J, Bormans J, List P, Marpe D, Narroschke M, Pereira F, Stockhammer T, Wedi T (2004) Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circ Syst Mag 4(1):7–28CrossRefGoogle Scholar
  20. 20.
    Özkalayc BO, Alatan AA (2014) 3D planar representation of stereo depth images for 3DTV applications. IEEE Trans Image Process 23(12):5222–5232MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Ren X, Bo L, Fox D (2012) RGB-(D) scene labeling: features and algorithms. In: 2012 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2759–2766Google Scholar
  22. 22.
    Rusanovskyy D, Aflaki P, Hannuksela M (2011) Undo dancer 3DV sequence for purposes of 3DV standardization. ISO/IEC JTC1/SC29/WG11 MPEG2010 M 20028Google Scholar
  23. 23.
    Salembier P, Garrido L (2000) Binary partition tree as an efficient representation for image processing, segmentation, and information retrieval. IEEE Trans Image Process 9(4):561–576CrossRefGoogle Scholar
  24. 24.
    Schwarz LA, Mateus D, Lallemand J, Navab N (2011) Tracking planes with time of flight cameras and j-linkage. In: 2011 IEEE Workshop on applications of computer vision (WACV), pp 664–671Google Scholar
  25. 25.
    Shoham Y, Gersho A (1988) Efficient bit allocation for an arbitrary set of quantizers 36(9):1445– 1453Google Scholar
  26. 26.
    Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: ECCVGoogle Scholar
  27. 27.
    Sinha S, Steedly D, Szeliski R (2009) Piecewise planar stereo for image-based rendering. In: International conference on computer vision. Kyoto, pp 1881–1888Google Scholar
  28. 28.
    Sullivan G J, Ohm J R, Han W J, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circ Syst Vid Technol 22 (12):1649–1668CrossRefGoogle Scholar
  29. 29.
    Sullivan G J, Boyce J M, Chen Y, Ohm J R, Segall C A, Vetro A (2013) Standardized extensions of high efficiency video coding (HEVC). IEEE J Selected Top Signal Process 7(6):1001–1016CrossRefGoogle Scholar
  30. 30.
    Torres L, Kunt M (1996) Second generation video coding techniques. Springer, Boston, pp 1–30Google Scholar
  31. 31.
    Varas D, Alfaro M, Marques F (2015) Multiresolution hierarchy co-clustering for semantic segmentation in sequences with small variations. In: 2015 IEEE International conference on computer vision (ICCV), pp 4579–4587Google Scholar
  32. 32.
    Verleysen C, De Vleeschouwer C (2016) Piecewise-planar 3d approximation from wide-baseline stereo. In: The IEEE Conference on computer vision and pattern recognition (CVPR)Google Scholar
  33. 33.
    Wang A, Lu J, Cai J, Wang G, Cham T J (2015) Unsupervised joint feature learning and encoding for RGB-D scene labeling. IEEE Trans Image Process 24 (11):4459–4473MathSciNetCrossRefGoogle Scholar
  34. 34.
    Yin F, Velastin S A, Ellis T, Makris D (2015) Learning multi-planar scene models in multi-camera videos. IET Comput Vis 9(1):25–40CrossRefGoogle Scholar
  35. 35.
    Zhang J, Li R, Li H, Rusanovskyy D, Hannuksela M M (2011) Ghost Town Fly 3DV sequence for purposes of 3DV standardization. ISO/IEC JTC1/SC29/WG11. Doc M 20027Google Scholar
  36. 36.
    Zitnick CL, Kang SB, Uyttendaele M, Winder S, Szeliski R (2004) High-quality video view interpolation using a layered representation. In: ACM SIGGRAPH 2004 Papers SIGGRAPH ’04. New York, pp 600–608Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  • Marc Maceira Duch
    • 1
    Email author
  • David Varas
    • 1
  • Josep Ramon Morros Rubió
    • 1
  • Javier Ruiz-Hidalgo
    • 1
  • Ferran Marques
    • 1
  1. 1.Universitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations