Advertisement

MatryODShka: Real-time 6DoF Video View Synthesis Using Multi-sphere Images

Conference paper
  • 1.8k Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12346)

Abstract

We introduce a method to convert stereo 360\(^\circ \) (omnidirectional stereo) imagery into a layered, multi-sphere image representation for six degree-of-freedom (6DoF) rendering. Stereo 360\(^\circ \) imagery can be captured from multi-camera systems for virtual reality (VR), but lacks motion parallax and correct-in-all-directions disparity cues. Together, these can quickly lead to VR sickness when viewing content. One solution is to try and generate a format suitable for 6DoF rendering, such as by estimating depth. However, this raises questions as to how to handle disoccluded regions in dynamic scenes. Our approach is to simultaneously learn depth and disocclusions via a multi-sphere image representation, which can be rendered with correct 6DoF disparity and motion parallax in VR. This significantly improves comfort for the viewer, and can be inferred and rendered in real time on modern GPU hardware. Together, these move towards making VR video a more comfortable immersive medium.

Notes

Acknowledgment

We thank Ana Serrano for help with RGB-D comparisons and Eliot Laidlaw for improving the Unity renderer. We thank Frédéric Devernay, Brian Berard, and an Amazon Research Award, and NVIDIA for a GPU donation. This work was supported by a Brown OVPR Seed Award, RCUK grant CAMERA (EP/M023281/1), and an EPSRC-UKRI Innovation Fellowship (EP/S001050/1).

Supplementary material

500725_1_En_26_MOESM2_ESM.pdf (7.8 mb)
Supplementary material 2 (pdf 7982 KB)

References

  1. 1.
    Anderson, R., et al.: Jump: virtual reality video. ACM Trans. Graph. 35(6), 198:1–198:13 (2016).  https://doi.org/10.1145/2980179.2980257CrossRefGoogle Scholar
  2. 2.
    Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2D–3D-semantic data for indoor scene understanding (2017). http://arxiv.org/abs/1702.01105arXiv:1702.01105
  3. 3.
    Bertel, T., Campbell, N.D.F., Richardt, C.: MegaParallax: casual 360\(^\circ \) panoramas with motion parallax. TVCG 25(5), 1828–1835 (2019).  https://doi.org/10.1109/TVCG.2019.2898799CrossRefGoogle Scholar
  4. 4.
    Bonneel, N., Tompkin, J., Sunkavalli, K., Sun, D., Paris, S., Pfister, H.: Blind video temporal consistency. ACM Trans. Graph. 34(6), 196:1–196:9 (2015).  https://doi.org/10.1145/2816795.2818107CrossRefGoogle Scholar
  5. 5.
    Brown, M., Lowe, D.G.: Automatic panoramic image stitching using invariant features. IJCV 74(1), 59–73 (2007).  https://doi.org/10.1007/s11263-006-0002-3CrossRefGoogle Scholar
  6. 6.
    Broxton, M., et al.: Immersive light field video with a layered mesh representation. ACM Trans. Graph. 39(4), 86:1–86:15 (2020)CrossRefGoogle Scholar
  7. 7.
    Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: 3DV, pp. 667–676 (2017).  https://doi.org/10.1109/3DV.2017.00081
  8. 8.
    Chapdelaine-Couture, V., Roy, S.: The omnipolar camera: a new approach to stereo immersive capture. In: ICCP (2013).  https://doi.org/10.1109/ICCPhot.2013.6528311
  9. 9.
    Cheng, H.T., Chao, C.H., Dong, J.D., Wen, H.K., Liu, T.L., Sun, M.: Cube padding for weakly-supervised saliency prediction in 360\(^\circ \) videos. In: CVPR, pp. 1420–1429 (2018).  https://doi.org/10.1109/CVPR.2018.00154
  10. 10.
    Choi, I., Gallo, O., Troccoli, A., Kim, M.H., Kautz, J.: Extreme view synthesis. In: ICCV, pp. 7780–7789 (2019).  https://doi.org/10.1109/ICCV.2019.00787
  11. 11.
    Cohen, T.S., Geiger, M., Koehler, J., Welling, M.: Spherical CNNs. In: ICLR (2018)Google Scholar
  12. 12.
    Coors, B., Condurache, A.P., Geiger, A.: SphereNet: learning spherical representations for detection and classification in omnidirectional images. In: ECCV, pp. 518–533 (2018).  https://doi.org/10.1007/978-3-030-01240-3_32
  13. 13.
    Dodgson, N.A.: Variation and extrema of human interpupillary distance. In: Stereoscopic Displays and Virtual Reality Systems (2004).  https://doi.org/10.1117/12.529999
  14. 14.
    Dosselmann, R., Yang, X.D.: A comprehensive assessment of the structural similarity index. Signal Image Video Process. 5(1), 81–91 (2011).  https://doi.org/10.1007/s11760-009-0144-1CrossRefGoogle Scholar
  15. 15.
    Eilertsen, G., Mantiuk, R.K., Unger, J.: Single-frame regularization for temporally stable CNNs. In: CVPR, pp. 11168–11177 (2019).  https://doi.org/10.1109/CVPR.2019.01143
  16. 16.
    Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning SO(3) equivariant representations with spherical CNNs. In: ECCV, pp. 52–68 (2018).  https://doi.org/10.1007/978-3-030-01261-8_4
  17. 17.
    Flynn, J., et al.: DeepView: view synthesis with learned gradient descent. In: CVPR, pp. 2367–2376 (2019).  https://doi.org/10.1109/CVPR.2019.00247
  18. 18.
    Google Inc.: Rendering omni-directional stereo content (2015). https://developers.google.com/vr/jump/rendering-ods-content.pdf
  19. 19.
    Huang, J., Chen, Z., Ceylan, D., Jin, H.: 6-DOF VR videos with a single 360-camera. In: IEEE VR, pp. 37–44 (2017).  https://doi.org/10.1109/VR.2017.7892229
  20. 20.
    Im, S., Ha, H., Rameau, F., Jeon, H., Choe, G., Kweon, I.: All-around depth from small motion with a spherical panoramic camera. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 156–172. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_10CrossRefGoogle Scholar
  21. 21.
    Ishiguro, H., Yamamoto, M., Tsuji, S.: Omni-directional stereo. TPAMI 14(2), 257–262 (1992).  https://doi.org/10.1109/34.121792CrossRefGoogle Scholar
  22. 22.
    Kettunen, M., Härkönen, E., Lehtinen, J.: E-LPIPS: robust perceptual image similarity via random transformation ensembles (2019). http://arxiv.org/abs/1906.03973arXiv:1906.03973
  23. 23.
    Konrad, R., Dansereau, D.G., Masood, A., Wetzstein, G.: SpinVR: towards live-streaming 3D virtual reality video. ACM Trans. Graph. 36(6), 209:1–209:12 (2017).  https://doi.org/10.1145/3130800.3130836CrossRefGoogle Scholar
  24. 24.
    Kopf, J., Cohen, M.F., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. ACM Trans. Graph. 26(3), 96 (2007).  https://doi.org/10.1145/1276377.1276497CrossRefGoogle Scholar
  25. 25.
    Lai, P.K., Xie, S., Lang, J., Laganière, R.: Real-time panoramic depth maps from omni-directional stereo images for 6 DoF videos in virtual reality. In: IEEE VR, pp. 405–412 (2019).  https://doi.org/10.1109/VR.2019.8798016
  26. 26.
    Lai, W.S., Huang, J.B., Wang, O., Shechtman, E., Yumer, E., Yang, M.H.: Learning blind video temporal consistency. In: ECCV, pp. 170–185 (2018).  https://doi.org/10.1007/978-3-030-01267-0_11
  27. 27.
    Lee, J., Kim, B., Kim, K., Kim, Y., Noh, J.: Rich360: optimized spherical representation from structured panoramic camera arrays. ACM Trans. Graph. 35(4), 63:1–63:11 (2016).  https://doi.org/10.1145/2897824.2925983CrossRefGoogle Scholar
  28. 28.
    Lee, Y.K., Jeong, J., Yun, J.S., June, C.W., Yoon, K.J.: SpherePHD: applying CNNs on a spherical PolyHeDron representation of 360\(^\circ \) images. In: CVPR, pp. 9173–9181 (2019).  https://doi.org/10.1109/CVPR.2019.00940
  29. 29.
    Liu, R., et al.: An intriguing failing of convolutional neural networks and the CoordConv solution. In: NeurIPS (2018)Google Scholar
  30. 30.
    Luo, B., Xu, F., Richardt, C., Yong, J.H.: Parallax360: stereoscopic 360\(^\circ \) scene representation for head-motion parallax. TVCG 24(4), 1545–1553 (2018).  https://doi.org/10.1109/TVCG.2018.2794071CrossRefGoogle Scholar
  31. 31.
    Matzen, K., Cohen, M.F., Evans, B., Kopf, J., Szeliski, R.: Low-cost 360 stereo photography and video capture. ACM Trans. Graph. 36(4), 148:1–148:12 (2017).  https://doi.org/10.1145/3072959.3073645CrossRefGoogle Scholar
  32. 32.
    Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. 38(4), 29:1–29:14 (2019).  https://doi.org/10.1145/3306346.3322980CrossRefGoogle Scholar
  33. 33.
    Odena, A., Dumoulin, V., Olah, C.: Deconvolution and checkerboard artifacts. Distill (2016). 10.23915/distill.00003Google Scholar
  34. 34.
    Padmanaban, N., Ruban, T., Sitzmann, V., Norcia, A.M., Wetzstein, G.: Towards a machine-learning approach for sickness prediction in 360\(^\circ \) stereoscopic videos. TVCG 24(4), 1594–1603 (2018).  https://doi.org/10.1109/TVCG.2018.2793560CrossRefGoogle Scholar
  35. 35.
    Parra Pozo, A., et al.: An integrated 6DoF video camera and system design. ACM Trans. Graph. 38(6), 216:1–216:16 (2019).  https://doi.org/10.1145/3355089.3356555CrossRefGoogle Scholar
  36. 36.
    Peleg, S., Ben-Ezra, M., Pritch, Y.: Omnistereo: panoramic stereo imaging. TPAMI 23(3), 279–290 (2001).  https://doi.org/10.1109/34.910880CrossRefzbMATHGoogle Scholar
  37. 37.
    Penner, E., Zhang, L.: Soft 3D reconstruction for view synthesis. ACM Trans. Graph. 36(6), 235:1–235:11 (2017).  https://doi.org/10.1145/3130800.3130855CrossRefGoogle Scholar
  38. 38.
    Perazzi, F., et al.: Panoramic video from unstructured camera arrays. Comput. Graph. Forum 34(2), 57–68 (2015).  https://doi.org/10.1111/cgf.12541CrossRefGoogle Scholar
  39. 39.
    Porter, T., Duff, T.: Compositing digital images. Comput. Graph. (Proc. SIGGRAPH) 18(3), 253–259 (1984).  https://doi.org/10.1145/800031.808606CrossRefGoogle Scholar
  40. 40.
    Richardt, C.: Omnidirectional stereo. In: Ikeuchi, K. (ed.) Computer Vision: A Reference Guide. Springer, Berlin (2020).  https://doi.org/10.1007/978-3-030-03243-2_808-1CrossRefGoogle Scholar
  41. 41.
    Richardt, C., Pritch, Y., Zimmer, H., Sorkine-Hornung, A.: Megastereo: constructing high-resolution stereo panoramas. In: CVPR, pp. 1256–1263 (2013).  https://doi.org/10.1109/CVPR.2013.166
  42. 42.
    Richardt, C., Tompkin, J., Halsey, J., Hertzmann, A., Starck, J., Wang, O.: Video for virtual reality. In: SIGGRAPH Courses (2017).  https://doi.org/10.1145/3084873.3084894
  43. 43.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: MICCAI, pp. 234–241 (2015).  https://doi.org/10.1007/978-3-319-24574-4_28
  44. 44.
    Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV, pp. 9339–9347 (2019).  https://doi.org/10.1109/ICCV.2019.00943
  45. 45.
    Schroers, C., Bazin, J.C., Sorkine-Hornung, A.: An omnistereoscopic video pipeline for capture and display of real-world VR. ACM Trans. Graph. 37(3), 37:1–37:13 (2018).  https://doi.org/10.1145/3225150CrossRefGoogle Scholar
  46. 46.
    Serrano, A., et al.: Motion parallax for 360\(^\circ \) RGBD video. TVCG 25(5), 1817–1827 (2019).  https://doi.org/10.1109/TVCG.2019.2898757CrossRefGoogle Scholar
  47. 47.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  48. 48.
    Srinivasan, P.P., Tucker, R., Barron, J.T., Ramamoorthi, R., Ng, R., Snavely, N.: Pushing the boundaries of view extrapolation with multiplane images. In: CVPR, pp. 175–184 (2019).  https://doi.org/10.1109/CVPR.2019.00026
  49. 49.
    Straub, J., et al.: The replica dataset: a digital replica of indoor spaces (2019). http://arxiv.org/abs/1906.05797arXiv:1906.05797
  50. 50.
    Su, Y.C., Grauman, K.: Learning spherical convolution for fast features from 360\(^\circ \) imagery. In: NIPS (2017)Google Scholar
  51. 51.
    Su, Y.C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: CVPR, pp. 9442–9451 (2019).  https://doi.org/10.1109/CVPR.2019.00967
  52. 52.
    Sugawara, Y., Shiota, S., Kiya, H.: Super-resolution using convolutional neural networks without any checkerboard artifacts. In: ICIP, pp. 66–70 (2018).  https://doi.org/10.1109/ICIP.2018.8451141
  53. 53.
    Szeliski, R.: Image alignment and stitching: a tutorial. Found. Trends Comput. Graph. Vis. 2(1), 1–104 (2006).  https://doi.org/10.1561/0600000009MathSciNetCrossRefzbMATHGoogle Scholar
  54. 54.
    Tateno, K., Navab, N., Tombari, F.: Distortion-aware convolutional filters for dense prediction in panoramic images. In: ECCV, pp. 732–750 (2018).  https://doi.org/10.1007/978-3-030-01270-0_43
  55. 55.
    Thatte, J., Boin, J.B., Lakshman, H., Girod, B.: Depth augmented stereo panorama for cinematic virtual reality with head-motion parallax. In: ICME (2016).  https://doi.org/10.1109/ICME.2016.7552858
  56. 56.
    Thatte, J., Girod, B.: Towards perceptual evaluation of six degrees of freedom virtual reality rendering from stacked OmniStereo representation. Electron. Imaging 2018(5), 1-6–352 (2018).  https://doi.org/10.2352/ISSN.2470-1173.2018.05.PMII-352CrossRefGoogle Scholar
  57. 57.
    Wang, F.E., et al.: Self-supervised learning of depth and camera motion from 360\(^\circ \) videos. In: ACCV, pp. 53–68 (2018).  https://doi.org/10.1007/978-3-030-20873-8_4
  58. 58.
    Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: ECCV, pp. 52–67 (2018).  https://doi.org/10.1007/978-3-030-01252-6_4
  59. 59.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004).  https://doi.org/10.1109/TIP.2003.819861CrossRefGoogle Scholar
  60. 60.
    Xia, F., Zamir, A.R., He, Z., Sax, A., Malik, J., Savarese, S.: Gibson Env: real-world perception for embodied agents. In: CVPR, pp. 9068–9079 (2018).  https://doi.org/10.1109/CVPR.2018.00945
  61. 61.
    Zhang, C., Liwicki, S., Smith, W., Cipolla, R.: Orientation-aware semantic segmentation on icosahedron spheres. In: ICCV, pp. 3533–3541 (2019).  https://doi.org/10.1109/ICCV.2019.00363
  62. 62.
    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018).  https://doi.org/10.1109/CVPR.2018.00068
  63. 63.
    Zhou, H., Ummenhofer, B., Brox, T.: DeepTAM: deep tracking and mapping. In: ECCV, pp. 822–838 (2018).  https://doi.org/10.1007/978-3-030-01270-0_50
  64. 64.
    Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: learning view synthesis using multiplane images. ACM Trans. Graph. 37(4), 65:1–65:12 (2018).  https://doi.org/10.1145/3197517.3201323CrossRefGoogle Scholar
  65. 65.
    Zioulis, N., Karakottas, A., Zarpalas, D., Alvarez, F., Daras, P.: Spherical view synthesis for self-supervised 360\(^\circ \) depth estimation. In: 3DV, pp. 690–699 (2019).  https://doi.org/10.1109/3DV.2019.00081
  66. 66.
    Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: OmniDepth: dense depth estimation for indoors spherical panoramas. In: ECCV, pp. 448–465 (2018).  https://doi.org/10.1007/978-3-030-01231-1_28

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Brown UniversityProvidenceUSA
  2. 2.Carnegie Mellon UniversityPittsburghUSA
  3. 3.University of BathBathUK

Personalised recommendations