Synthesizing Light Field Video from Monocular Video

Govindarajan, Shrisudhan; Shedligeri, Prasan; Sarah; Mitra, Kaushik

doi:10.1007/978-3-031-20071-7_10

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13667))

Included in the following conference series:

European Conference on Computer Vision

2111 Accesses

Abstract

The hardware challenges associated with light-field (LF) imaging has made it difficult for consumers to access its benefits like applications in post-capture focus and aperture control. Learning-based techniques which solve the ill-posed problem of LF reconstruction from sparse (1, 2 or 4) views have significantly reduced the need for complex hardware. LF video reconstruction from sparse views poses a special challenge as acquiring ground-truth for training these models is hard. Hence, we propose a self-supervised learning-based algorithm for LF video reconstruction from monocular videos. We use self-supervised geometric, photometric and temporal consistency constraints inspired from a recent learning-based technique for LF video reconstruction from stereo video. Additionally, we propose three key techniques that are relevant to our monocular video input. We propose an explicit disocclusion handling technique that encourages the network to use information from adjacent input temporal frames, for inpainting disoccluded regions in a LF frame. This is crucial for a self-supervised technique as a single input frame does not contain any information about the disoccluded regions. We also propose an adaptive low-rank representation that provides a significant boost in performance by tailoring the representation to each input scene. Finally, we propose a novel refinement block that is able to exploit the available LF image data using supervised learning to further refine the reconstruction quality. Our qualitative and quantitative analysis demonstrates the significance of each of the proposed building blocks and also the superior results compared to previous state-of-the-art monocular LF reconstruction techniques. We further validate our algorithm by reconstructing LF videos from monocular videos acquired using a commercial GoPro camera. An open-source implementation is also made available (https://github.com/ShrisudhanG/Synthesizing-Light-Field-Video-from-Monocular-Video).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Light Field Reconstruction with Arbitrary Angular Resolution Using a Deep Coarse-To-Fine Framework

GLGNet: light field angular superresolution with arbitrary interpolation rates

Article Open access 01 March 2024

ZEPI-Net: Light Field Super Resolution via Internal Cross-Scale Epipolar Plane Image Zero-Shot Learning

Article 01 August 2022

References

Adelson, E.H., Bergen, J.R.: The plenoptic function and the elements of early vision. In: Computational Models of Visual Processing, pp. 3–20. MIT Press (1991)
Google Scholar
Bae, K., Ivan, A., Nagahara, H., Park, I.K.: 5d light field synthesis from a monocular video. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7157–7164. IEEE (2021)
Google Scholar
Bhat, S.F., Alhashim, I., Wonka, P.: AdaBins: depth estimation using adaptive bins. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4009–4018 (2021)
Google Scholar
Blocker, C.J., Chun, Y., Fessler, J.A.: Low-rank plus sparse tensor models for light-field reconstruction from focal stack data. In: 2018 IEEE 13th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), pp. 1–5. IEEE (2018)
Google Scholar
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural. Inf. Process. Syst. 33, 9912–9924 (2020)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Google Scholar
Dansereau, D.G., Girod, B., Wetzstein, G.: LiFF: light field features in scale and depth. In: Computer Vision and Pattern Recognition (CVPR). IEEE, June 2019
Google Scholar
Delbracio, M., Kelly, D., Brown, M.S., Milanfar, P.: Mobile computational photography: a tour. arXiv preprint arXiv:2102.09000 (2021)
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Garg, R., Wadhwa, N., Ansari, S., Barron, J.T.: Learning single camera depth estimation using dual-pixels. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7628–7637 (2019)
Google Scholar
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838 (2019)
Google Scholar
Hajisharif, S., Miandji, E., Guillemot, C., Unger, J.: Single sensor compressive light field video camera. In: Computer Graphics Forum, vol. 39, pp. 463–474. Wiley Online Library (2020)
Google Scholar
Huang, P.H., Matzen, K., Kopf, J., Ahuja, N., Huang, J.B.: DeepMVS: learning multi-view stereopsis (2018)
Google Scholar
Inagaki, Y., Kobayashi, Y., Takahashi, K., Fujii, T., Nagahara, H.: Learning to capture light fields through a coded aperture camera. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 431–448. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_26
Chapter Google Scholar
Ivan, A., et al.: Synthesizing a 4d spatio-angular consistent light field from a single image. arXiv preprint arXiv:1903.12364 (2019)
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. arXiv preprint arXiv:1506.02025 (2015)
Jaiswal, A., Babu, A.R., Zadeh, M.Z., Banerjee, D., Makedon, F.: A survey on contrastive self-supervised learning. Technologies 9(1), 2 (2021)
Article Google Scholar
Kalantari, N.K., Wang, T.C., Ramamoorthi, R.: Learning-based view synthesis for light field cameras. ACM Trans. Graph. (TOG) 35(6), 1–10 (2016)
Article Google Scholar
Kim, D., Woo, S., Lee, J.Y., Kweon, I.S.: Deep video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5792–5801 (2019)
Google Scholar
Kim, H.M., Kim, M.S., Lee, G.J., Jang, H.J., Song, Y.M.: Miniaturized 3d depth sensing-based smartphone light field camera. Sensors 20(7), 2129 (2020)
Article Google Scholar
Kobayashi, Y., Takahashi, K., Fujii, T.: From focal stacks to tensor display: A method for light field visualization without multi-view images. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2007–2011 (2017). https://doi.org/10.1109/ICASSP.2017.7952508
Li, Q., Kalantari, N.K.: Synthesizing light field from a single image with variable MPI and two network fusion. ACM Trans. Graph. 39(6), 1–229 (2020)
Article Google Scholar
Lippmann, G.: Épreuves réversibles donnant la sensation du relief. J. Phys. Theor. Appl. 7(1), 821–825 (1908). https://doi.org/10.1051/jphystap:019080070082100
Article Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net (2019)
Google Scholar
Lumentut, J.S., Kim, T.H., Ramamoorthi, R., Park, I.K.: Deep recurrent network for fast and full-resolution light field deblurring. IEEE Signal Process. Lett. 26(12), 1788–1792 (2019)
Article Google Scholar
Maruyama, K., Inagaki, Y., Takahashi, K., Fujii, T., Nagahara, H.: A 3-d display pipeline from coded-aperture camera to tensor light-field display through CNN. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 1064–1068 (2019). https://doi.org/10.1109/ICIP.2019.8803741
Marwah, K., Wetzstein, G., Bando, Y., Raskar, R.: Compressive light field photography using overcomplete dictionaries and optimized projections. ACM Trans. Graph. (TOG) 32(4), 1–12 (2013)
Article MATH Google Scholar
Mildenhall, B., et al.: Local light field fusion: Practical view synthesis with prescriptive sampling guidelines (2019)
Google Scholar
Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Ng, R., Levoy, M., Brédif, M., Duval, G., Horowitz, M., Hanrahan, P.: Light field photography with a hand-held plenoptic camera. Ph.D. thesis, Stanford University (2005)
Google Scholar
Niklaus, S., Mai, L., Yang, J., Liu, F.: 3d ken burns effect from a single image. ACM Trans. Graph. (ToG) 38(6), 1–15 (2019)
Article Google Scholar
Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019)
Google Scholar
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12179–12188 (2021)
Google Scholar
Sakai, K., Takahashi, K., Fujii, T., Nagahara, H.: Acquiring dynamic light fields through coded aperture camera. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 368–385. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_22
Chapter Google Scholar
Shedligeri, P., Schiffers, F., Ghosh, S., Cossairt, O., Mitra, K.: SelfVI: self-supervised light-field video reconstruction from stereo video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2491–2501 (2021)
Google Scholar
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Adv. Neural. Inf. Process. Syst. 28, 1–8 (2015)
Google Scholar
Srinivasan, P.P., Tucker, R., Barron, J.T., Ramamoorthi, R., Ng, R., Snavely, N.: Pushing the boundaries of view extrapolation with multiplane images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 175–184 (2019)
Google Scholar
Srinivasan, P.P., Wang, T., Sreelal, A., Ramamoorthi, R., Ng, R.: Learning to synthesize a 4d RGBD light field from a single image. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2243–2251 (2017)
Google Scholar
Takahashi, K., Kobayashi, Y., Fujii, T.: From focal stack to tensor light-field display. IEEE Trans. Image Process. 27(9), 4571–4584 (2018). https://doi.org/10.1109/TIP.2018.2839263
Article MathSciNet Google Scholar
Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Google Scholar
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Chapter Google Scholar
Vadathya, A.K., Girish, S., Mitra, K.: A unified learning-based framework for light field reconstruction from coded projections. IEEE Trans. Comput. Imaging 6, 304–316 (2019)
Article Google Scholar
Veeraraghavan, A., Raskar, R., Agrawal, A., Mohan, A., Tumblin, J.: Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing. ACM Trans. Graph. 26(3), 69 (2007)
Article Google Scholar
Wang, L., et al.: DeepLens: shallow depth of field from a single image. CoRR abs/1810.08100 (2018)
Google Scholar
Wang, T.C., Zhu, J.Y., Kalantari, N.K., Efros, A.A., Ramamoorthi, R.: Light field video capture using a learning-based hybrid imaging system. ACM Trans. Graph. (TOG) 36(4), 1–13 (2017)
Google Scholar
Wang, Y., Liu, F., Wang, Z., Hou, G., Sun, Z., Tan, T.: End-to-end view synthesis for light field imaging with pseudo 4DCNN. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 340–355. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_21
Chapter Google Scholar
Wetzstein, G., Lanman, D., Hirsch, M., Raskar, R.: Tensor displays: compressive light field synthesis using multilayer displays with directional backlighting. ACM Trans. Graph. 31(4), 1–12 (2012). https://doi.org/10.1145/2185520.2185576
Wilburn, B., et al.: High performance imaging using large camera arrays. ACM Trans. Graph. 24(3), 765–776 (2005). https://doi.org/10.1145/1073204.1073259
Wu, G., Zhao, M., Wang, L., Dai, Q., Chai, T., Liu, Y.: Light field reconstruction using deep convolutional network on EPI. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6319–6327 (2017)
Google Scholar
Xu, R., Li, X., Zhou, B., Loy, C.C.: Deep flow-guided video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3723–3732 (2019)
Google Scholar
Yeung, H.W.F., Hou, J., Chen, J., Chung, Y.Y., Chen, X.: Fast light field reconstruction with deep coarse-to-fine modeling of spatial-angular clues. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 138–154. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_9
Chapter Google Scholar
Zhang, Z., Liu, Y., Dai, Q.: Light field from micro-baseline image pair. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3800–3809 (2015)
Google Scholar
Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: learning view synthesis using multiplane images. In: SIGGRAPH (2018)
Google Scholar
Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 286–301. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_18
Chapter Google Scholar

Download references

Acknowledgements

This work was supported in part by Qualcomm Innovation Fellowship (QIF) India 2021.

Author information

Authors and Affiliations

Indian Institute of Technology Madras, Chennai, India
Shrisudhan Govindarajan, Prasan Shedligeri, Sarah & Kaushik Mitra

Authors

Shrisudhan Govindarajan
View author publications
You can also search for this author in PubMed Google Scholar
Prasan Shedligeri
View author publications
You can also search for this author in PubMed Google Scholar
Sarah
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Mitra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shrisudhan Govindarajan .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 16394 KB)

Supplementary material 1 (pdf 905 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Govindarajan, S., Shedligeri, P., Sarah, Mitra, K. (2022). Synthesizing Light Field Video from Monocular Video. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13667. Springer, Cham. https://doi.org/10.1007/978-3-031-20071-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-20071-7_10
Published: 13 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20070-0
Online ISBN: 978-3-031-20071-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Synthesizing Light Field Video from Monocular Video

Abstract

Access this chapter

Similar content being viewed by others

Light Field Reconstruction with Arbitrary Angular Resolution Using a Deep Coarse-To-Fine Framework

GLGNet: light field angular superresolution with arbitrary interpolation rates

ZEPI-Net: Light Field Super Resolution via Internal Cross-Scale Epipolar Plane Image Zero-Shot Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 905 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Synthesizing Light Field Video from Monocular Video

Abstract

Access this chapter

Similar content being viewed by others

Light Field Reconstruction with Arbitrary Angular Resolution Using a Deep Coarse-To-Fine Framework

GLGNet: light field angular superresolution with arbitrary interpolation rates

ZEPI-Net: Light Field Super Resolution via Internal Cross-Scale Epipolar Plane Image Zero-Shot Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 905 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation