MatryODShka: Real-time 6DoF Video View Synthesis Using Multi-sphere Images

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12346)


We introduce a method to convert stereo 360\(^\circ \) (omnidirectional stereo) imagery into a layered, multi-sphere image representation for six degree-of-freedom (6DoF) rendering. Stereo 360\(^\circ \) imagery can be captured from multi-camera systems for virtual reality (VR), but lacks motion parallax and correct-in-all-directions disparity cues. Together, these can quickly lead to VR sickness when viewing content. One solution is to try and generate a format suitable for 6DoF rendering, such as by estimating depth. However, this raises questions as to how to handle disoccluded regions in dynamic scenes. Our approach is to simultaneously learn depth and disocclusions via a multi-sphere image representation, which can be rendered with correct 6DoF disparity and motion parallax in VR. This significantly improves comfort for the viewer, and can be inferred and rendered in real time on modern GPU hardware. Together, these move towards making VR video a more comfortable immersive medium.



We thank Ana Serrano for help with RGB-D comparisons and Eliot Laidlaw for improving the Unity renderer. We thank Frédéric Devernay, Brian Berard, and an Amazon Research Award, and NVIDIA for a GPU donation. This work was supported by a Brown OVPR Seed Award, RCUK grant CAMERA (EP/M023281/1), and an EPSRC-UKRI Innovation Fellowship (EP/S001050/1).

Supplementary material

500725_1_En_26_MOESM2_ESM.pdf (7.8 mb)
Supplementary material 2 (pdf 7982 KB)


  1. 1.
    Anderson, R., et al.: Jump: virtual reality video. ACM Trans. Graph. 35(6), 198:1–198:13 (2016). Scholar
  2. 2.
    Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2D–3D-semantic data for indoor scene understanding (2017).
  3. 3.
    Bertel, T., Campbell, N.D.F., Richardt, C.: MegaParallax: casual 360\(^\circ \) panoramas with motion parallax. TVCG 25(5), 1828–1835 (2019). Scholar
  4. 4.
    Bonneel, N., Tompkin, J., Sunkavalli, K., Sun, D., Paris, S., Pfister, H.: Blind video temporal consistency. ACM Trans. Graph. 34(6), 196:1–196:9 (2015). Scholar
  5. 5.
    Brown, M., Lowe, D.G.: Automatic panoramic image stitching using invariant features. IJCV 74(1), 59–73 (2007). Scholar
  6. 6.
    Broxton, M., et al.: Immersive light field video with a layered mesh representation. ACM Trans. Graph. 39(4), 86:1–86:15 (2020)CrossRefGoogle Scholar
  7. 7.
    Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: 3DV, pp. 667–676 (2017).
  8. 8.
    Chapdelaine-Couture, V., Roy, S.: The omnipolar camera: a new approach to stereo immersive capture. In: ICCP (2013).
  9. 9.
    Cheng, H.T., Chao, C.H., Dong, J.D., Wen, H.K., Liu, T.L., Sun, M.: Cube padding for weakly-supervised saliency prediction in 360\(^\circ \) videos. In: CVPR, pp. 1420–1429 (2018).
  10. 10.
    Choi, I., Gallo, O., Troccoli, A., Kim, M.H., Kautz, J.: Extreme view synthesis. In: ICCV, pp. 7780–7789 (2019).
  11. 11.
    Cohen, T.S., Geiger, M., Koehler, J., Welling, M.: Spherical CNNs. In: ICLR (2018)Google Scholar
  12. 12.
    Coors, B., Condurache, A.P., Geiger, A.: SphereNet: learning spherical representations for detection and classification in omnidirectional images. In: ECCV, pp. 518–533 (2018).
  13. 13.
    Dodgson, N.A.: Variation and extrema of human interpupillary distance. In: Stereoscopic Displays and Virtual Reality Systems (2004).
  14. 14.
    Dosselmann, R., Yang, X.D.: A comprehensive assessment of the structural similarity index. Signal Image Video Process. 5(1), 81–91 (2011). Scholar
  15. 15.
    Eilertsen, G., Mantiuk, R.K., Unger, J.: Single-frame regularization for temporally stable CNNs. In: CVPR, pp. 11168–11177 (2019).
  16. 16.
    Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning SO(3) equivariant representations with spherical CNNs. In: ECCV, pp. 52–68 (2018).
  17. 17.
    Flynn, J., et al.: DeepView: view synthesis with learned gradient descent. In: CVPR, pp. 2367–2376 (2019).
  18. 18.
    Google Inc.: Rendering omni-directional stereo content (2015).
  19. 19.
    Huang, J., Chen, Z., Ceylan, D., Jin, H.: 6-DOF VR videos with a single 360-camera. In: IEEE VR, pp. 37–44 (2017).
  20. 20.
    Im, S., Ha, H., Rameau, F., Jeon, H., Choe, G., Kweon, I.: All-around depth from small motion with a spherical panoramic camera. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 156–172. Springer, Cham (2016). Scholar
  21. 21.
    Ishiguro, H., Yamamoto, M., Tsuji, S.: Omni-directional stereo. TPAMI 14(2), 257–262 (1992). Scholar
  22. 22.
    Kettunen, M., Härkönen, E., Lehtinen, J.: E-LPIPS: robust perceptual image similarity via random transformation ensembles (2019).
  23. 23.
    Konrad, R., Dansereau, D.G., Masood, A., Wetzstein, G.: SpinVR: towards live-streaming 3D virtual reality video. ACM Trans. Graph. 36(6), 209:1–209:12 (2017). Scholar
  24. 24.
    Kopf, J., Cohen, M.F., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. ACM Trans. Graph. 26(3), 96 (2007). Scholar
  25. 25.
    Lai, P.K., Xie, S., Lang, J., Laganière, R.: Real-time panoramic depth maps from omni-directional stereo images for 6 DoF videos in virtual reality. In: IEEE VR, pp. 405–412 (2019).
  26. 26.
    Lai, W.S., Huang, J.B., Wang, O., Shechtman, E., Yumer, E., Yang, M.H.: Learning blind video temporal consistency. In: ECCV, pp. 170–185 (2018).
  27. 27.
    Lee, J., Kim, B., Kim, K., Kim, Y., Noh, J.: Rich360: optimized spherical representation from structured panoramic camera arrays. ACM Trans. Graph. 35(4), 63:1–63:11 (2016). Scholar
  28. 28.
    Lee, Y.K., Jeong, J., Yun, J.S., June, C.W., Yoon, K.J.: SpherePHD: applying CNNs on a spherical PolyHeDron representation of 360\(^\circ \) images. In: CVPR, pp. 9173–9181 (2019).
  29. 29.
    Liu, R., et al.: An intriguing failing of convolutional neural networks and the CoordConv solution. In: NeurIPS (2018)Google Scholar
  30. 30.
    Luo, B., Xu, F., Richardt, C., Yong, J.H.: Parallax360: stereoscopic 360\(^\circ \) scene representation for head-motion parallax. TVCG 24(4), 1545–1553 (2018). Scholar
  31. 31.
    Matzen, K., Cohen, M.F., Evans, B., Kopf, J., Szeliski, R.: Low-cost 360 stereo photography and video capture. ACM Trans. Graph. 36(4), 148:1–148:12 (2017). Scholar
  32. 32.
    Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. 38(4), 29:1–29:14 (2019). Scholar
  33. 33.
    Odena, A., Dumoulin, V., Olah, C.: Deconvolution and checkerboard artifacts. Distill (2016). 10.23915/distill.00003Google Scholar
  34. 34.
    Padmanaban, N., Ruban, T., Sitzmann, V., Norcia, A.M., Wetzstein, G.: Towards a machine-learning approach for sickness prediction in 360\(^\circ \) stereoscopic videos. TVCG 24(4), 1594–1603 (2018). Scholar
  35. 35.
    Parra Pozo, A., et al.: An integrated 6DoF video camera and system design. ACM Trans. Graph. 38(6), 216:1–216:16 (2019). Scholar
  36. 36.
    Peleg, S., Ben-Ezra, M., Pritch, Y.: Omnistereo: panoramic stereo imaging. TPAMI 23(3), 279–290 (2001). Scholar
  37. 37.
    Penner, E., Zhang, L.: Soft 3D reconstruction for view synthesis. ACM Trans. Graph. 36(6), 235:1–235:11 (2017). Scholar
  38. 38.
    Perazzi, F., et al.: Panoramic video from unstructured camera arrays. Comput. Graph. Forum 34(2), 57–68 (2015). Scholar
  39. 39.
    Porter, T., Duff, T.: Compositing digital images. Comput. Graph. (Proc. SIGGRAPH) 18(3), 253–259 (1984). Scholar
  40. 40.
    Richardt, C.: Omnidirectional stereo. In: Ikeuchi, K. (ed.) Computer Vision: A Reference Guide. Springer, Berlin (2020). Scholar
  41. 41.
    Richardt, C., Pritch, Y., Zimmer, H., Sorkine-Hornung, A.: Megastereo: constructing high-resolution stereo panoramas. In: CVPR, pp. 1256–1263 (2013).
  42. 42.
    Richardt, C., Tompkin, J., Halsey, J., Hertzmann, A., Starck, J., Wang, O.: Video for virtual reality. In: SIGGRAPH Courses (2017).
  43. 43.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: MICCAI, pp. 234–241 (2015).
  44. 44.
    Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV, pp. 9339–9347 (2019).
  45. 45.
    Schroers, C., Bazin, J.C., Sorkine-Hornung, A.: An omnistereoscopic video pipeline for capture and display of real-world VR. ACM Trans. Graph. 37(3), 37:1–37:13 (2018). Scholar
  46. 46.
    Serrano, A., et al.: Motion parallax for 360\(^\circ \) RGBD video. TVCG 25(5), 1817–1827 (2019). Scholar
  47. 47.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  48. 48.
    Srinivasan, P.P., Tucker, R., Barron, J.T., Ramamoorthi, R., Ng, R., Snavely, N.: Pushing the boundaries of view extrapolation with multiplane images. In: CVPR, pp. 175–184 (2019).
  49. 49.
    Straub, J., et al.: The replica dataset: a digital replica of indoor spaces (2019).
  50. 50.
    Su, Y.C., Grauman, K.: Learning spherical convolution for fast features from 360\(^\circ \) imagery. In: NIPS (2017)Google Scholar
  51. 51.
    Su, Y.C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: CVPR, pp. 9442–9451 (2019).
  52. 52.
    Sugawara, Y., Shiota, S., Kiya, H.: Super-resolution using convolutional neural networks without any checkerboard artifacts. In: ICIP, pp. 66–70 (2018).
  53. 53.
    Szeliski, R.: Image alignment and stitching: a tutorial. Found. Trends Comput. Graph. Vis. 2(1), 1–104 (2006). Scholar
  54. 54.
    Tateno, K., Navab, N., Tombari, F.: Distortion-aware convolutional filters for dense prediction in panoramic images. In: ECCV, pp. 732–750 (2018).
  55. 55.
    Thatte, J., Boin, J.B., Lakshman, H., Girod, B.: Depth augmented stereo panorama for cinematic virtual reality with head-motion parallax. In: ICME (2016).
  56. 56.
    Thatte, J., Girod, B.: Towards perceptual evaluation of six degrees of freedom virtual reality rendering from stacked OmniStereo representation. Electron. Imaging 2018(5), 1-6–352 (2018). Scholar
  57. 57.
    Wang, F.E., et al.: Self-supervised learning of depth and camera motion from 360\(^\circ \) videos. In: ACCV, pp. 53–68 (2018).
  58. 58.
    Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: ECCV, pp. 52–67 (2018).
  59. 59.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004). Scholar
  60. 60.
    Xia, F., Zamir, A.R., He, Z., Sax, A., Malik, J., Savarese, S.: Gibson Env: real-world perception for embodied agents. In: CVPR, pp. 9068–9079 (2018).
  61. 61.
    Zhang, C., Liwicki, S., Smith, W., Cipolla, R.: Orientation-aware semantic segmentation on icosahedron spheres. In: ICCV, pp. 3533–3541 (2019).
  62. 62.
    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018).
  63. 63.
    Zhou, H., Ummenhofer, B., Brox, T.: DeepTAM: deep tracking and mapping. In: ECCV, pp. 822–838 (2018).
  64. 64.
    Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: learning view synthesis using multiplane images. ACM Trans. Graph. 37(4), 65:1–65:12 (2018). Scholar
  65. 65.
    Zioulis, N., Karakottas, A., Zarpalas, D., Alvarez, F., Daras, P.: Spherical view synthesis for self-supervised 360\(^\circ \) depth estimation. In: 3DV, pp. 690–699 (2019).
  66. 66.
    Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: OmniDepth: dense depth estimation for indoors spherical panoramas. In: ECCV, pp. 448–465 (2018).

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Brown UniversityProvidenceUSA
  2. 2.Carnegie Mellon UniversityPittsburghUSA
  3. 3.University of BathBathUK

Personalised recommendations