DIBR-Based Conversion from Monoscopic to Stereoscopic and Multi-View Video

  • Liang Zhang
  • Carlos Vázquez
  • Grégory Huchet
  • Wa James Tam


This chapter aims to provide a tutorial on 2D-to-3D video conversion methods that exploit depth-image-based rendering (DIBR) techniques. It is devoted not only to university students who are new to this area of research, but also to researchers and engineers who want to enhance their knowledge of video conversion techniques. The basic principles and the various methods for converting 2D video to stereoscopic 3D, including depth extraction strategies and DIBR-based view synthesis approaches, are reviewed. Conversion artifacts and evaluation of conversion quality are discussed, and the advantages and disadvantages of the different methods are elaborated. Furthermore, practical implementations for the conversion from monoscopic to stereoscopic and multi-view video are drawn.


3D-TV 2D-to-3D video conversion Conversion artifact Depth cue Depth estimation Depth-of-field Depth map preprocessing Depth-image-based rendering (DIBR) Disocclusion Focus Hole filling Human visual system Hybrid approach Linear perspective Motion parallax Pictorial depth cue Stereoscopic 3D (S3D) Surrogate depth map View synthesis 



We would like to express our sincere thanks to Mr. Robert Klepko for constructive suggestions during the preparation of this manuscript. Thanks are also due to NHK for providing the “Balloons,” “Tulips,” and “Redleaf” sequences.


  1. 1.
    Advanced Television Systems Commitee (ATSC) (2011) Final report of the ATSC planning team on 3D-TV. PT1-049r1. Advanced Television Systems Commitee (ATSC), Washington DC, USAGoogle Scholar
  2. 2.
    International Organisation for Standardisation (ISO) (2009) Vision on 3D video. ISO/IEC JTC1/SC29/WG11 N10357, International Organisation for Standardisation (ISO), Lausanne, SwitzerlandGoogle Scholar
  3. 3.
    Society of Motion Picture and Television Engineers (SMPTE) (2009) Report of SMPTE task force on 3D to the home. TF3D, Society of Motion Picture and Television EngineersGoogle Scholar
  4. 4.
    Smolic A, Mueller K, Merkle P, Vetro A (2009) Development of a new MPEG standard for advanced 3D video applications. TR2009-068, Mitsubishi Electric Research Laboratories, Cambridge, MA, USAGoogle Scholar
  5. 5.
    Valentini VI (2011) Legend3D sets the transformers 2D-3D conversion record straight. In: indiefilm3D. Available at:
  6. 6.
    Tam WJ, Speranza F, Yano S, Ono K, Shimono H (2011) Stereoscopic 3D-TV: visual comfort. IEEE Trans Broadcast 57(2):335–346 part IICrossRefGoogle Scholar
  7. 7.
    Kauff P, Atzpadin N, Fehn C, Müller M, Schreer O, Smolic A, Tanger R (2007) Depth map creation and image-based rendering for advanced 3DTV services providing interoperability and scalability. Signal Processing: Image Communication (Special issue on three-dimensional video and television) 22(2):217–234Google Scholar
  8. 8.
    Zhang L, Vázquez C, Knorr S (2011) 3D-TV content creation: automatic 2D-to-3D video conversion. IEEE Trans Broadcast 57(2):372–383CrossRefGoogle Scholar
  9. 9.
    Tam WJ, Zhang L (2006) 3D-TV content generation: 2D-to-3D conversion. In IEEE International Conference on Multimedia and Expo, Toronto, CanadaGoogle Scholar
  10. 10.
    Fehn C (2003) A 3D-TV approach using depth-image-based rendering (DIBR). In 3rd conference on visualization. Imaging and Image Processing, Benalmadena, SpainGoogle Scholar
  11. 11.
    Ostnes R, Abbott V, Lavender S (2004) Visualisation techniques: an overview—Part 1. Hydrogr J 113:4–7Google Scholar
  12. 12.
    Shimono K, Tam WJ, Nakamizo S (1999) Wheatstone-panum limiting case: occlusion, camouflage, and vergence-induced disparity cues. Percept Psychophys 61(3):445–455CrossRefGoogle Scholar
  13. 13.
    Ens J, Lawrence P (1993) An investigation of methods for determining depth from focus. IEEE Trans Pattern Anal Mach Intell 15(2):97–107CrossRefGoogle Scholar
  14. 14.
    Battiato S, Curti S, La Cascia M, Tortora M, Scordato E (2004) Depth map generation by image classification. Proc SPIE 5302:95–104CrossRefGoogle Scholar
  15. 15.
    Hudson W (1967) The study of the problem of pictorial perception among unacculturated groups. Int J Psychol 2(2):89–107CrossRefGoogle Scholar
  16. 16.
    Knorr S, Kunter M, Sikora T (2008) Stereoscopic 3D from 2D video with super-resolution capabilities. Signal Process: Image commun 23(9):665–676CrossRefGoogle Scholar
  17. 17.
    Mancini A (1998) Disparity estimation and intermediate view reconstruction for novel applications in stereoscopic video, McGill University, CanadaGoogle Scholar
  18. 18.
    Scharstein D, Szeliski R (2003) High-accuracy stereo depth maps using structured light. In: IEEE Computer society conference on computer vision and pattern recognition (CVPR 2003), vol 1. Madison, WI, USA, pp 195–202Google Scholar
  19. 19.
    Yamada K, Suehiro K, Nakamura H (2005) Pseudo 3D image generation with simple depth models. In: International conference in consumer electronics, Las Vegas, NV, pp 277–278Google Scholar
  20. 20.
    Battiato S, Capra A, Curti S, La Cascia M (2004) 3D Stereoscopic image pairs by depth-map generation. In: 3D Data processing, visualization and transmission, pp 124–131Google Scholar
  21. 21.
    Nedovic V, Smeulders AWM, Redertand A, Geusebroek JM (2007) Depth information by stage classification. In: International conference on computer visionGoogle Scholar
  22. 22.
    Yamada K, Suzuki Y (2009) Real-time 2D-to-3D conversion at full HD 1080P resolution. In: 13th IEEE International symposium on consumer electronics, Las Vegas, NV, pp 103–106Google Scholar
  23. 23.
    Huang X, Wang L, Huang J, Li D, Zhang M (2009) A depth extraction method based on motion and geometry for 2D-to-3D conversion. In: Third international symposium on intelligent information technology applicationGoogle Scholar
  24. 24.
    Jung Y-J, Baik A, Park D (2009) A novel 2D-to-3D conversion technique based on relative height-depth-cue. In: SPIE Conference on stereoscopic displays and applications XX, San Jose, CA, vol 7237. p 72371UGoogle Scholar
  25. 25.
    Tam WJ, Speranza F, Zhang L (2009) Depth map generation for 3-D TV: importance of edge and boundary information. In: Javidi B, Okano F, Son J-Y(eds) Three-dimensional imaging, visualization and display. Springer, New York, pp 153–181CrossRefGoogle Scholar
  26. 26.
    Tam WJ, Yee AS, Ferreira J, Tariq S, Speranza F (2005) Stereoscopic image rendering based on depth maps created from blur and edge information. In: Proceedings of the stereoscopic displays and applications, vol 5664. pp 104–115Google Scholar
  27. 27.
    Tam WJ, Vázquez C, Speranza F (2009) 3D-TV: a novel method for generating surrogate depth maps using colour information. In: SPIE Conference stereoscopic displays and applications XX, San José, USA, vol 7237, p 72371AGoogle Scholar
  28. 28.
    Zhang L, Tam WJ (2005) Stereoscopic image generation based on depth images for 3D TV. IEEE Trans Broadcast 51:191–199CrossRefGoogle Scholar
  29. 29.
    Ernst FE (2003) 2D-to-3D video conversion based on time-consistent segmentation. In: Proceedings of the immersive communication and broadcast systems workshop, Berlin, GermanyGoogle Scholar
  30. 30.
    Chang Y-L, Fang C-Y, Ding L-F, Chen S-Y, Chen L-G (2007) Depth map generation for 2D-to-3D conversion by short-term motion assisted color segmentation. In: IEEE International conference on multimedia and expo, pp 1958–1961Google Scholar
  31. 31.
    Vázquez C, Tam WJ (May 2010) CRC-CSDM: 2D to 3D conversion using colour-based surrogate depth maps. In: International conference on 3D systems and applications (3DSA 2010), Tokyo, JapanGoogle Scholar
  32. 32.
    Kim J, Baik A, Jung YJ, Park D (2010) 2D-to-3D conversion by using visual attention analysis. In: Proceedings SPIE, vol 7524, p 752412Google Scholar
  33. 33.
    Nothdurft H (2000) Salience from feature contrast: additivity across dimensions. Vis Res 40:1183–1201CrossRefGoogle Scholar
  34. 34.
    Rogers B-J, Graham M-E (1979) Motion parallax as an independent cue for depth perception. Perception 8:125–134CrossRefGoogle Scholar
  35. 35.
    Ferris S-H (1972) Motion parallax and absolute distance. J Exp Psychol 95(2):258–263CrossRefGoogle Scholar
  36. 36.
    Matsumoto Y, Terasaki H, Sugimoto K, Arakawa T (1997) Conversion system of monocular image sequence to stereo using motion parallax. In: SPIE Conference in stereoscopic displays and virtual reality systems IV, San Jose, CA, vol 3012. pp 108–112Google Scholar
  37. 37.
    Zhang L, Lawrence B, Wang D, Vincent A (2005) Comparison study of feature matching and block matching for automatic 2D to 3D video conversion. In: 2nd IEEE European conference on visual media production, London, UK, pp 122–129Google Scholar
  38. 38.
    Hartley R, Zisserman A (2000) Multiple view geometry in computer vision. Cambridge University Press, Cambridge, UKMATHGoogle Scholar
  39. 39.
    Choi S, Woods J (1999) Motion-compensated 3-D subband coding of video. IEEE Trans Image Process 8(2):155–167CrossRefGoogle Scholar
  40. 40.
    Kim MB, Song MS (1998) Stereoscopic conversion of monoscopic video by the transformation of vertical to horizontal disparity. Proc SPIE 3295:65–75CrossRefGoogle Scholar
  41. 41.
    Ideses I, Yaroslavsky LP, Fishbain B (2007) Real-time 2D to 3D video conversion. J Real-Time Image Process 2(1):3–7CrossRefGoogle Scholar
  42. 42.
    Pourazad M-T, Nasiopoulos P, Ward R-K (2009) An H.264-based scheme for 2D-to-3D video conversion. IEEE Trans Consum Electron 55(2):742–748Google Scholar
  43. 43.
    Pourazad M-T, Nasiopoulos P, Ward R-K (2010) Generating the depth map from the motion information of H.264-encoded 2D video sequence. EURASIP J Image Video ProcessGoogle Scholar
  44. 44.
    Kim D, Min D, Sohn K (2008) A stereoscopic video generation method using stereoscopic display characterization and motion analysis. IEEE Trans Broadcast 54(2):188–197CrossRefGoogle Scholar
  45. 45.
    Po L-M, Xu X, Zhu Y, Zhang S, Cheung K-W, Ting C-W (2010) Automatic 2D-to-3D video conversion technique based on depth-from-motion and color segmentation. In: IEEE International conference on signal processing, Hong Kong, China, pp 1000–1003Google Scholar
  46. 46.
    Xu F, Er G, Xie X, Dai Q (2008) 2D-to-3D conversion based on motion and color mergence. In: 3DTV Conference, Istanbul, TurkeyGoogle Scholar
  47. 47.
    Zhang G, Jia J, Wong TT, Bao H (2009) Consistent depth maps recovery from a video sequence. IEEE Trans Pattern Anal Mach Intell 31(6):974–988Google Scholar
  48. 48.
    Chang YL, Chang JY, Tsai YM, Lee CL, Chen LG (2008) Priority depth fusion for 2D-to-3D conversion systems. In: SPIE Conference on three-dimensional image capture and applications, San Jose, CA, vol 6805, p 680513Google Scholar
  49. 49.
    Cheng C-C, Li C-T, Tsai Y-M, Chen L-G (2009) Hybrid depth cueing for 2D-to-3D conversion system. In: SPIE Conference on Stereoscopic Displays and Applications XX, San Jose. CA, USA, vol 7237, p 723721Google Scholar
  50. 50.
    Chen Y, Zhang R, Karczewicz M (2011) Low-complexity 2D-to-3D video conversion. In: SPIE Conference on stereoscopic displays and applications XXII, vol 7863, p 78631IGoogle Scholar
  51. 51.
    Tam WJ, Alain G, Zhang L, Martin T, Renaud R (2004) Smoothing depth maps for improved stereoscopic image quality. In: Three-dimensional TV, video and display III (ITCOM’04), Philadelphia, PA, vol 5599, p 162Google Scholar
  52. 52.
    Vázquez C, Tam WJ, Speranza F (2006) Stereoscopic imaging: filling disoccluded areas in depth image-based rendering. In: SPIE Conference on three-dimensional tv, video and display V, Boston, MA, vol 6392, p 63920DGoogle Scholar
  53. 53.
    Shimono K, Tam WJ, Speranza F, Vázquez C, Renaud R (2010) Removing the cardboard effect in stereoscopic images using smoothed depth maps. In: Stereoscopic displays and applications XXI, San José, CA, vol 7524, p 75241CGoogle Scholar
  54. 54.
    Mori Y, Fukushima N, Yendo T, Fujii T, Tanimoto M (2009) View generation with 3D warping using depth information for FTV. Signal Process: Image Commun 24(12):65–72CrossRefGoogle Scholar
  55. 55.
    Chen W-Y, Chang Y-L, Lin S-F, Ding L-F, Chen L-G (2005) Efficient depth image based rendering with edge depenedent filter and interpolation. In: IEEE Internatinal conference on multimedia and expo, Amnsterdam, The NetherlandsGoogle Scholar
  56. 56.
    International Organization for Standardization / International Electrotechnical Commission (2007) Representation of auxiliary video and supplemental information. ISO/IEC FDIS 23002-3:2007(E), International organization for standardization / International electrotechnical commission, LausanneGoogle Scholar
  57. 57.
    Daly SJ, Held RT, Hoffman DM (2011) Perceptual issues in stereoscopic signal processing. IEEE Trans Broadcast 57(2):347–361CrossRefGoogle Scholar
  58. 58.
    Lang M, Hornung A, Wang O, Poulakos S, Smolic A, Gross M (2010) Nonlinear disparity mapping for stereoscopic 3D. In: ACM SIGGRAPH, Los Angeles, CAGoogle Scholar
  59. 59.
    Vázquez C, Tam WJ (2008) 3D-TV: coding of disocclusions for 2D+Depth representation of multi-view images. In: Tenth international conference on computer graphics and imaging (CGIM), Innsbruck, AustriaGoogle Scholar
  60. 60.
    Tauber Z, Li Z-N, Drew M-S (2007) Review and preview: disocclusion by inpainting for image-based rendering. IEEE Trans Syst Man Cybernetics Part C: Appl Rev 37(4):527–540Google Scholar
  61. 61.
    Azzari L, Battisti F, Gotchev A (2010) Comparative analysis of occlusion-filling techniques in depth image-based rendering for 3D videos. In: 3rd Workshop on mobile video delivery, Firenze, ItalyGoogle Scholar
  62. 62.
    Criminisi A, Perez P, Toyama K (2004) Region filling and object removal by exemplar-based image inpainting. IEEE Trans Image Process 13:1200–1212CrossRefGoogle Scholar
  63. 63.
    Criminisi A, Perez P, Toyama K, Gangnet M, Blake A (2006) Image region filling by exemplar-based inpainting. Patent No: 6,987,520, United StatesGoogle Scholar
  64. 64.
    Daribo I, Pesquet-Popescu B (2010) Depth-aided image inpainting for novel view synthesis. In International Workshop on Multimedia Signal Processing, Saint-Malo, France, pp 167–170Google Scholar
  65. 65.
    Gunnewiek R-K, Berrety R-PM, Barenbrug B, Magalhaes J-P (2009) Coherent spatial and temporal occlusion generation. In: Proceedings SPIE, vol 7237, p 723713Google Scholar
  66. 66.
    Cheng C-M, Lin S-J, Lai S-H (2011) Spatio-temporal consistent novel view synthesis algorithm from video-plus-depth sequences for autostereoscopic displays. IEEE Trans Broadcast 57(2):523–532CrossRefGoogle Scholar
  67. 67.
    Holliman NS, Dodgson NA, Favarola GE, Pockett L (2011) Three-dimensional displays: a review and applications analysis. IEEE Trans Broadcast 57(2):362–371CrossRefGoogle Scholar
  68. 68.
    Cheng CM, Lin SJ, Lai SH, Yang JC (2003) Improved novel view sysnthesis from depth image with large baseline. In: International conference on pattern recognition, Tampa, FLGoogle Scholar
  69. 69.
    Seymour M (2011) Art of stereo conversion: 2D-to-3D. In: fxguide. Available at:
  70. 70.
    Boev A, Hollosi D, Gotchev A (2008) Classification of stereoscopic artefacts., Mobile3DTV (Project No. 216503) Accessed 22 Jun 2011
  71. 71.
    Yamanoue H, Okui M, Okano F (2006) Geometrical analysis of puppet-theater and cardboard effects in stereoscopic HDTV images. IEEE Trans Circuits Syst Video Technol 16(6):744–752CrossRefGoogle Scholar
  72. 72.
    Mendiburu B (2009) Fundamentals of stereoscopic imaging. In: Digital cinema summit, NAB Las Vegas. Available at:
  73. 73.
    Yeh Y-Y, Silverstein LD (1990) Limits of fusion and depth judgment in stereoscopic color displays. Hum Factors: J Hum Factors Ergon Soc 32:45–60Google Scholar
  74. 74.
    Tam WJ, Stelmach LB (1998) Display duration and stereoscopic depth discrimination. Can J Exp Psychol 52(1):56–61Google Scholar
  75. 75.
    International Telecommunication Union (2010) Methodology for the subjective assessment of the quality of television pictures, ITU-RGoogle Scholar
  76. 76.
    Tam WJ, Vincent A, Renaud R, Blanchfield P, Martin T (2003) Comparison of stereoscopic and non-stereoscopic video images for visual telephone systems. In: Stereoscopic displays and virtual reality systems X, San José, CA, vol 5006, pp 304–312Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Liang Zhang
    • 1
  • Carlos Vázquez
    • 1
  • Grégory Huchet
    • 1
  • Wa James Tam
    • 1
  1. 1.Communications Research Centre CanadaOttawaCanada

Personalised recommendations