Multimedia Tools and Applications

, Volume 76, Issue 12, pp 13761–13784 | Cite as

Depth map compression via 3D region-based representation

  • Marc Maceira DuchEmail author
  • Josep-Ramon Morros
  • Javier Ruiz-Hidalgo


In 3D video, view synthesis is used to create new virtual views between encoded camera views. Errors in the coding of the depth maps introduce geometry inconsistencies in synthesized views. In this paper, a new 3D plane representation of the scene is presented which improves the performance of current standard video codecs in the view synthesis domain. Two image segmentation algorithms are proposed for generating a color and depth segmentation. Using both partitions, depth maps are segmented into regions without sharp discontinuities without having to explicitly signal all depth edges. The resulting regions are represented using a planar model in the 3D world scene. This 3D representation allows an efficient encoding while preserving the 3D characteristics of the scene. The 3D planes open up the possibility to code multiview images with a unique representation.


Depth map coding 3D representation Image segmentation Data compression 



This work has been developed in the framework of the project BIGGRAPH-TEC2013-43935-R, financed by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF).


  1. 1.
    Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282. doi: 10.1109/TPAMI.2012.120 CrossRefGoogle Scholar
  2. 2.
    Arbelaez P, Maire M, Fowlkes C, Malik J (2009) From contours to regions: an empirical evaluation. In: IEEE Conference on computer vision and pattern recognition, Miami, pp 2294–2301. doi: 10.1109/CVPR.2009.5206707
  3. 3.
    Ataer-Cansizoglu E, Taguchi Y, Ramalingam S, Garaas T (2013) Tracking an RGB-D camera using points and planes. In: IEEE International conference on computer vision workshops, Sydney, pp 51–58. doi: 10.1109/ICCVW.2013.14
  4. 4.
    Bergh M, Boix X, Roig G, Capitani B, Gool L (2012) SEEDS: superpixels extracted via energy-driven sampling. In: European Conference on computer vision, lecture notes in computer science, vol 7578, pp 13–26. doi: 10.1007/978-3-642-33786-4_2
  5. 5.
    Cheung G, Kim WS, Ortega A, Ishida J, Kubota A (2011) Depth map coding using graph based transform and transform domain sparsification. In: International workshop on multimedia signal processing, pp 1–6. doi: 10.1109/MMSP.2011.6093810
  6. 6.
    Dodgson N (2005) Autostereoscopic 3D displays. Computer 38(8):31–36. doi: 10.1109/MC.2005.252 CrossRefGoogle Scholar
  7. 7.
    Farid M, Lucenteforte M, Grangetto M (2015) Panorama view with spatiotemporal occlusion compensation for 3D video coding. IEEE Trans Image Process 24(1):205–219. doi: 10.1109/TIP.2014.2374533 MathSciNetCrossRefGoogle Scholar
  8. 8.
    Fehn C (2004) Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV. In: Proceedings of SPIE 5291, Stereoscopic displays and virtual reality systems, pp 93–104. doi: 10.1117/12.524762
  9. 9.
    Fischler M, Bolles R (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395. doi: 10.1145/358669.358692 MathSciNetCrossRefGoogle Scholar
  10. 10.
    Freeman H (1961) On the coding of arbitrary geometric configurations. IRE Trans Electron Comput EC-10:260–268. doi: 10.1109/TEC.1961.5219197 MathSciNetCrossRefGoogle Scholar
  11. 11.
    Gallup D, Frahm JM, Pollefeys M (2010) Piecewise planar and non-planar stereo for urban scene reconstruction. In: IEEE conference on computer vision and pattern recognition, San Francisco, pp 1418–1425. doi: 10.1109/CVPR.2010.5539804
  12. 12.
    Gupta S, Arbelaez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from RGB-D images. In: IEEE conference on computer vision and pattern recognition, Portland, pp 564–571. doi: 10.1109/CVPR.2013.79
  13. 13.
    Hanhart P, Ebrahimi T (2012) Quality assessment of a stereo pair formed from decoded and synthesized views using objective metrics. In: 3DTV-Conference: the true vision - capture, transmission and display of 3D video, pp 1–4. doi: 10.1109/3DTV.2012.6365478
  14. 14.
    Hirschmuller H, Scharstein D (2007) Evaluation of cost functions for stereo matching. In: IEEE conference on computer vision and pattern recognition, pp 1–8. doi: 10.1109/CVPR.2007.383248
  15. 15.
    Jager F (2011) Contour-based segmentation and coding for depth map compression. In: Visual communications and image processing, pp 1–4. doi: 10.1109/VCIP.2011.6115989
  16. 16.
    Kim WS, Ortega A, Lai P, Tian D (2015) Depth map coding optimization using rendered view distortion for 3D video coding. IEEE Trans Image Process 24 (11):3534–3545. doi: 10.1109/TIP.2015.2447737 MathSciNetCrossRefGoogle Scholar
  17. 17.
    Kowdle A, Sinha S, Szeliski R (2012) Multiple view object cosegmentation using appearance and stereo cues. In: European conference on computer vision, Firenze, pp 789–803. doi: 10.1007/978-3-642-33715-4_57
  18. 18.
    Lei J, Li S, Zhu C, Sun M, Hou C (2015) Depth coding based on depth-texture motion and structure similarities. IEEE Trans Circuits Syst Video Technol 25(2):275–286. doi: 10.1109/TCSVT.2014.2335471 CrossRefGoogle Scholar
  19. 19.
    Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: International conference on computer vision, Vancouver, vol 2, pp 416–423. doi: 10.1109/ICCV.2001.937655
  20. 20.
    Maugey T, Ortega A, Frossard P (2015) Graph-based representation for multiview image geometry. IEEE Trans Image Process 24(5):1573–1586. doi: 10.1109/TIP.2015.2400817 MathSciNetCrossRefGoogle Scholar
  21. 21.
    Merkle P, Smolic A, Muller K, Wiegand T (2007a) Efficient prediction structures for multiview video coding. IEEE Trans Circuits Syst Video Technol 17 (11):1461–1473. doi: 10.1109/TCSVT.2007.903665
  22. 22.
    Merkle P, Smolic A, Muller K, Wiegand T (2007b) Multi-view video plus depth representation and coding. In: IEEE international conference on image processing, San Antonio, vol 1, pp 201–204. doi: 10.1109/ICIP.2007.4378926
  23. 23.
    Merkle P, Morvan Y, Smolic A, Farin D, Muller K, de With P, Wiegand T (2008) The effect of depth compression on multiview rendering quality. In: 3DTV-conference: the true vision - capture, transmission and display of 3D video, pp 245–248. doi: 10.1109/3DTV.2008.4547854
  24. 24.
    Merkle P, Muller K, Marpe D, Wiegand T (2015) Depth intra coding for 3D video based on geometric primitives. IEEE Trans Circuits Syst Video Technol PP (99):1–1. doi: 10.1109/TCSVT.2015.2407791 Google Scholar
  25. 25.
    Milani S, Zanuttigh P, Zamarin M, Forchhammer S (2011) Efficient depth map compression exploiting segmented color data. In: IEEE international conference on multimedia and expo, pp 1–6. doi: 10.1109/ICME.2011.6011969
  26. 26.
    Muller K, Merkle P, Wiegand T (2011) 3-D video representation using depth maps. Proc IEEE 99(4):643–656. doi: 10.1109/JPROC.2010.2091090 CrossRefGoogle Scholar
  27. 27.
    Muller K, Schwarz H, Marpe D, Bartnik C, Bosse S, Brust H, Hinz T, Lakshman H, Merkle P, Rhee F, Tech G, Winken M, Wiegand T (2013) 3D high-efficiency video coding for multi-view video and depth data. IEEE Trans Image Process 22(9):3366–3378. doi: 10.1109/TIP.2013.2264820 MathSciNetCrossRefGoogle Scholar
  28. 28.
    Ostermann J, Bormans J, List P, Marpe D, Narroschke M, Pereira F, Stockhammer T, Wedi T (2004) Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circuits Syst Mag 4(1):7–28. doi: 10.1109/MCAS.2004.1286980 CrossRefGoogle Scholar
  29. 29.
    Ozkalayci B, Alatan A (2014) 3D planar representation of stereo depth images for 3DTV applications. IEEE Trans Image Process 23(12):5222–5232. doi: 10.1109/TIP.2014.2360452 MathSciNetCrossRefGoogle Scholar
  30. 30.
    Peng J, Kim CS, Jay Kuo CC (2005) Technologies for 3D mesh compression: a survey. J Vis Commun Image Represent 16(6):688–733. doi: 10.1016/j.jvcir.2005.03.001 CrossRefGoogle Scholar
  31. 31.
    Rabbani T, van den Heuvel FA, Vosselman G (2006) Segmentation of point clouds using smoothness constraint. In: ISPRS commission V cymposium ‘image engineering and vision metrology’, pp 248–253Google Scholar
  32. 32.
    Salembier P, Garrido L (2000) Binary partition tree as an efficient representation for image processing, segmentation, and information retrieval. IEEE Trans Image Process 9(4):561–576. doi: 10.1109/83.841934
  33. 33.
    Scharstein D, Pal C (2007) Learning conditional random fields for stereo. In: IEEE conference on computer vision and pattern recognition, pp 1–8. doi: 10.1109/CVPR.2007.383191
  34. 34.
    Scharstein D, Szeliski R (2003) High-accuracy stereo depth maps using structured light. In: IEEE conference on computer vision and pattern recognition, vol 1, pp 195–202. doi: 10.1109/CVPR.2003.1211354
  35. 35.
    Shahriyar S, Murshed M, Ali M, Paul M (2014) Efficient coding of depth map by exploiting temporal correlation. In: International conference on digital image computing: techniques and applications, pp 1–8. doi: 10.1109/DICTA.2014.7008105
  36. 36.
    Shao F, Lin W, Jiang G, Yu M, Dai Q (2014) Depth map coding for view synthesis based on distortion analyses. IEEE J. Emerging Sel Top Circuits Syst 4 (1):106–117. doi: 10.1109/JETCAS.2014.2298314 CrossRefGoogle Scholar
  37. 37.
    Shen L, Liu Z, Zhang X, Zhao W, Zhang Z (2013) An effective CU size decision method for HEVC encoders. IEEE Trans Multimedia 15(2):465–470. doi: 10.1109/TMM.2012.2231060 CrossRefGoogle Scholar
  38. 38.
    Smolic A, Mueller K, Merkle P, Fehn C, Kauff P, Eisert P, Wiegand T (2006) 3D video and free viewpoint video - technologies, applications and MPEG standards. In: IEEE international conference on multimedia and expo, Toronto, pp 2161–2164. doi: 10.1109/ICME.2006.262683
  39. 39.
    Sullivan G, Ohm J, Han WJ, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circuits Syst Video Technol 22 (12):1649–1668. doi: 10.1109/TCSVT.2012.2221191 CrossRefGoogle Scholar
  40. 40.
    Tech G, Schwarz H, Muller K, Wiegand T (2012) 3D video coding using the synthesized view distortion change. In: Picture coding symposium, pp 25–28. doi: 10.1109/PCS.2012.6213277
  41. 41.
    Vetro A, Wiegand T, Sullivan G (2011) Overview of the stereo and multiview video coding extensions of the H.264/MPEG-4 AVC standard. Proc IEEE 99(4):626–642. doi: 10.1109/JPROC.2010.2098830 CrossRefGoogle Scholar
  42. 42.
    Vilaplana V, Marqués F, Salembier P (2008) Binary partition trees for object detection. IEEE Trans Image Process 17(11):2201–2216. doi: 10.1109/TIP.2008.2002841 MathSciNetCrossRefGoogle Scholar
  43. 43.
    Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014a) A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576. doi: 10.1109/LSP.2014.2310494
  44. 44.
    Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F (2014b) Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Circuits Syst Video Technol 24(12):2077–2089. doi: 10.1109/TCSVT.2014.2335852

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Universitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations