Generative Adversarial Networks as an Advancement in 2D to 3D Reconstruction Techniques

Dhondse, Amol; Kulkarni, Siddhivinayak; Khadilkar, Kunal; Kane, Indrajeet; Chavan, Sumit; Barhate, Rahul

doi:10.1007/978-981-13-9364-8_25

Generative Adversarial Networks as an Advancement in 2D to 3D Reconstruction Techniques

Amol Dhondse¹⁷,
Siddhivinayak Kulkarni¹⁸,
Kunal Khadilkar¹⁹,
Indrajeet Kane¹⁹,
Sumit Chavan¹⁹ &
…
Rahul Barhate²⁰

Conference paper
First Online: 25 September 2019

1453 Accesses
1 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1016))

Abstract

Synthesizing three-dimensional objects from single or multiple two-dimensional views has been a challenging task. To combat this, several techniques involving Convolutional Neural Networks (CNNs), Long Short-Term Memory Networks (LSTMs), and Recurrent Neural Network (RNN) have been proposed. Since its advent in 2014, there has been a tremendous amount of research done in the area of Generative Adversarial Networks (GANs). Among the various applications of GANs, image synthesis has shown great potential due to the power of two deep neural networks—generator and discriminator, trained in a competitive way, which are able to produce reasonably realistic images. Formulation of 3D-GANs—which are able to generate three-dimensional objects from multiple two-dimensional views with impressive accuracy—has emerged as a promising solution to the aforementioned issue. This paper provides a comprehensive analysis of deep learning methods used in generating three-dimensional objects, reviews the different models and frameworks for three-dimensional object generation, and discusses some evaluation metrics and future research direction in using GANs as an alternative for simultaneous localization and environment mapping as well as leveraging the power of GANs to revolutionize the field of education and medicine.

Kunal Khadilkar, Indrajeet Kane, Sumit Chavan and Rahul Barhate: Indicates equal contribution.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115, 211–252.
Article MathSciNet Google Scholar
Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den Driessche, G., et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–489. https://doi.org/10.1038/nature16961.
Article Google Scholar
Noam, B., & Sandholm, T. (2017). Safe and nested subgame solving for imperfect-information games. NIPS.
Google Scholar
Ng, A. Y., & Jordan, M. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Advances in Neural Information Processing Systems (Vol. 2).
Google Scholar
Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6796673&isnumber=6795851.
Article MathSciNet Google Scholar
Kingma, D. P. (2014). Stochastic gradient VB and the variational auto-encoder.
Google Scholar
Manisha, P., & Gujar, S. (2018). Generative adversarial networks (GANs): What it can generate and what it cannot? CoRR abs/1804.00140: n. pag.
Google Scholar
Goodfellow, I., Jean, P.-A., Mehdi, M., Bing, X., David, W.-F., Sherjil, O., et al. (2014). Generative adversarial networks. In Advances in neural information processing systems (Vol. 3).
Google Scholar
Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR, abs/1511.06434.
Google Scholar
The CIFAR-10 dataset. Retrieved from: https://www.cs.toronto.edu/~kriz/cifar.html, on September 30, 2018.
Adversarially Learned Inference—Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/Samples-and-reconstructions-on-the-SVHN-dataset-For-the-reconstructions-odd-columns-are_fig2_303755744. Accessed September 30, 2018.
Introduction to GANs, retrieved: From https://medium.com/ai-society/gans-from-scratch, on September 24, 2018.
Carlson, W. E. (1982). An algorithm and data structure for 3D object synthesis using surface patch intersections. In SIGGRAPH.
Google Scholar
Tangelder, J. W. H., & Veltkamp, R. C. (2008). A survey of content based 3D shape retrieval methods. Multimedia Tools and Applications, 39(3), 441–471.
Google Scholar
Van Kaick, O., Zhang, H., Hamarneh, G., & Cohen-Or, D. (2011). A survey on shape correspondence. CGF.
Google Scholar
Li, Y., Su, H., Qi, C. R., Fish, N., Cohen-Or, D., & Guibas, L. J. (2015). Joint embeddings of shapes and images via cnn image purification. ACM TOG, 34(6), 234.
Google Scholar
Su, H., Qi, C. R., Li, Y., & Guibas, L. (2015). Render for CNN: Viewpoint estimation in images using CNNS trained with rendered 3D model views. In ICCV.
Google Scholar
Girdhar, R., Fouhey, D. F., Rodriguez, M., & Gupta, A. (2016). Learning a predictable and generative vector representation for objects. In ECCV.
Google Scholar
Shi, B., Bai, S., Zhou, Z., & Bai, X. (2015). Deeppano: Deep panoramic representation for 3-d shape recognition. IEEE SPL, 22(12), 2339–2343.
Google Scholar
Choy, C. B., et al. (2016). 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In ECCV.
Chapter Google Scholar
Dosovitskiy, A., et al. (2017). Learning to generate chairs, tables and cars with convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 692–705.
Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
Article Google Scholar
Kar, A., et al. (2015). Category-specific object reconstruction from a single image. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1966–1974).
Google Scholar
Bregler, C., et al. (2000). Recovering non-rigid 3D shape from image streams. In CVPR.
Google Scholar
Everingham, M., et al. (2014). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111, 98–136.
Article Google Scholar
Bourdev, L. D., et al. (2010). Detecting people using mutually consistent Poselet activations. In ECCV.
Chapter Google Scholar
Yu, X., Roozbeh, M., & Silvio, S. (2014). Beyond PASCAL: A benchmark for 3D object detection in the wild (pp. 75–82). https://doi.org/10.1109/wacv.2014.6836101.
Yang, J., et al. (2015). Weakly-supervised disentangling with recurrent transformations for 3D view synthesis. NIPS.
Google Scholar
Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-PIE. Image and Vision Computing, 28(5), 807–813.
Google Scholar
Zhou, T., et al. (2016). View synthesis by appearance flow. In ECCV.
Chapter Google Scholar
Tatarchenko, M., et al. (2016). Multi-view 3D models from single images with a convolutional network. In ECCV.
Chapter Google Scholar
Wu, Z., et al. (2015). 3D ShapeNets: A deep representation for volumetric shapes. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1912–1920).
Google Scholar
Kulkarni, T. D., et al. (2015). Deep convolutional inverse graphics network. NIPS.
Google Scholar
Kitani, K. (2016). Learning a predictable and generative vector representation for objects.
Google Scholar
Qi, C. R., et al. (2016). Volumetric and multi-view CNNs for object classification on 3D data. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp 5648–5656).
Google Scholar
Fan, H., et al. (2017). A point set generation network for 3D object reconstruction from a single image. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2463–2471.
Google Scholar
Qi, C. R., et al. (2017). PointNet: Deep learning on point sets for 3D classification and segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 77–85).
Google Scholar
Guerrero, P., et al. (2018). Learning local shape properties from raw point clouds.
Google Scholar
Umetani, N. (2017). Exploring generative 3D shapes using autoencoder networks. SIGGRAPH Asia Technical Briefs.
Google Scholar
Kong, C., Lin, C.-H., & Lucey, S. (2017). Using locally corresponding CAD models for dense 3D reconstructions from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Google Scholar
Pontes, J. K., Kong, C., Eriksson, A. P., Fookes, C., Sridharan, S., & Lucey, S. (2017). Compact model representation for 3D reconstruction. In 2017 International Conference on 3D Vision (3DV) (pp. 88–96).
Google Scholar
Sun, Y., Liu, Z., Wang, Y., & Sarma, S. E. (2018). Im2Avatar: Colorful 3D Reconstruction from a single image. CoRR, abs/1804.06375.
Google Scholar
Liu, J., Yu, F., & Funkhouser, T. A. (2017). Interactive 3D modeling with a generative adversarial network. In 2017 International Conference on 3D Vision (3DV) (pp. 126–134).
Google Scholar
Chang, A. X., Funkhouser, T. A., Guibas, L. J., Hanrahan, P., Huang, Q., Li, Z., et al. (2015). Shapenet: An information-rich 3D model repository. CoRR, abs/1512.03012.
Google Scholar
Li, B., Lu, Y., Li, C., Godil, A., Schreck, T., Aono, M., et al. (2014). Large scale comprehensive 3D shape retrieval. In Proceedings of the 7th Eurographics Workshop on 3D Object Retrieval, 3DOR’15 (pp. 131–140). Aire-la-Ville, Switzerland, Switzerland: Eurographics Association.
Google Scholar
Lee, J. (2014). Yobi3d.
Google Scholar
Park, E., et al. (2017). Transformation-grounded image generation network for novel 3D view synthesis. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Google Scholar
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556: n. pag.
Google Scholar
Singh, A., Sha, J., Narayan, K., Achim, T., & Abbeel, P. (2014). BigBIRD: A large-scale 3D database of object instances. In International Conference on Robotics and Automation (ICRA). http://rll.berkeley.edu/bigbird/.
Salimans, T., et al. (2016). Improved techniques for training GANs. NIPS.
Google Scholar
Visual Turing Test. Retrieved from http://visualturingtest.org/, on September 30, 2018.
Revinskaya, A., & Feng, Y. From 2D Sketch to 3D shading and multi-view images. Stanford University.
Google Scholar
Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., & Belongie, S. (2016). Stacked generative adversarial networks.
Google Scholar
Gadelha, M., et al. (2017). 3D shape induction from 2D views of multiple objects. In 2017 International Conference on 3D Vision (3DV) (pp. 402–411).
Google Scholar
Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3D faces. In SIGGRAPH.
Google Scholar
Laurentini, A. (1994). The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150–162.
Article Google Scholar
Wu, J., et al. (2016). Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. NIPS .
Google Scholar
Maas, A. L. (2013). Rectifier nonlinearities improve neural network acoustic models.
Google Scholar
Lim, J. J., Pirsiavash, H., & Torralba, A. (2013). Parsing IKEA objects: Fine pose estimation. IEEE International Conference on Computer Vision, 2013, 2992–2999.
Google Scholar
Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, pp. 3485–3492. https://doi.org/10.1109/CVPR.2010.5539970.
Wu, J., Xue, T., Lim, J. J., Tian, Y., Tenenbaum, J. B., Torralba, A., et al. (2016). Single image 3D interpreter network. In ECCV.
Chapter Google Scholar
Qi, C. R., Su, H., Niessner, M., Dai, A., Yan, M., & Guibas, L. J. (2016). Volumetric and multi-view CNNS for object classification on 3D data. In CVPR.
Google Scholar
Maturana, D., & Scherer, S. (2015). Voxnet: A 3D convolutional neural network for real-time object recognition. In IROS.
Google Scholar
Sedaghat, N., Zolfaghari, M., Amiri, E., & Brox, T. (2016). Orientation-boosted Voxel Nets for 3D Object Recognition. arXiv:1604.03351.
Kazhdan, M.M., Funkhouser, T.A., & Rusinkiewicz, S. (2003). Rotation invariant spherical harmonic representation of 3D shape descriptors. Symposium on Geometry Processing.
Google Scholar
Chen, D., Tian, X., Shen, E. Y., & Ouhyoung, M. (2003). On visual similarity based 3D Model Retrieval. Comput. Graph. Forum, 22, 223–232.
Google Scholar
Sharma, A., Grau, O., & Fritz, M. (2016). Vconv-dae: Deep volumetric shape learning without object labels. arXiv preprint, arXiv:1604.03755.
Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S. Z. (2016). Face alignment across large poses: A 3D solution. In CVPR.
Google Scholar
Blanz, V., & Vetter, T. (1999). A morphable model for the synthesize of 3D faces. In SIGGRAPH.
Google Scholar
Zhu, X., Lei, Z., Yan, J., Yi, D., & Li, S. Z. (2015). High-fidelity pose and expression normalization for face recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 787–796).
Google Scholar
Yin, X., Yu, X., Sohn, K., Liu, X., & Chandraker, M. K. (2017). Towards large-pose face frontalization in the wild. IEEE International Conference on Computer Vision (ICCV), 2017, 4010–4019.
Article Google Scholar
Firman, M. (2016). RGBD datasets: past, present and future (pp. 661–673). https://doi.org/10.1109/cvprw.2016.88.
Kar, A., Tulsiani, S., Carreira, J., & Malik, J. (2015). Category-specific object reconstruction from a single image. In CVPR.
Google Scholar
Sun, X., Wu, J., Zhang, X., Zhang, Z., Zhang, C., Xue, T., et al. (2018). Pix3D: Dataset and methods for single-image 3D Shape modeling.
Google Scholar
Chang, A., Dai, A., Funkhouser, T., Halber, M., Nießner, M., Savva, M., et al. (2017). Matterport3D: Learning from RGB-D data in indoor environments.
Google Scholar
Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2016). 300 faces in-the-wild challenge: Database and results. Image and Vision Computing (IMAVIS), Special issue on facial landmark localisation. In In-the-wild.
Google Scholar
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2013). 300 faces in-the-wild challenge: The first facial landmark localization challenge. In Proceedings of IEEE International Conference on Computer Vision (ICCV-W), 300 Faces in-the-Wild Challenge (300-W). Sydney, Australia. December, 2013.
Google Scholar
Li, B., Lu, Y., Li, C., Godil, A., Schreck, T., Aono, M., et al. (2014). Large scale comprehensive 3D shape retrieval. In Proceedings of the 7th Eurographics Workshop on 3D Object Retrieval, 3DOR’15 (pp. 131–140). Aire-la-Ville, Switzerland, Switzerland. Eurographics Association.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Master Inventor, IBM, Pune, India
Amol Dhondse
Department of Computer Engineering, MIT-WPU, Pune, India
Siddhivinayak Kulkarni
Department of Computer Engineering, MITCOE, Pune, India
Kunal Khadilkar, Indrajeet Kane & Sumit Chavan
Department of Information Technology, MITCOE, Pune, India
Rahul Barhate

Authors

Amol Dhondse
View author publications
You can also search for this author in PubMed Google Scholar
Siddhivinayak Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Kunal Khadilkar
View author publications
You can also search for this author in PubMed Google Scholar
Indrajeet Kane
View author publications
You can also search for this author in PubMed Google Scholar
Sumit Chavan
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Barhate
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kunal Khadilkar .

Editor information

Editors and Affiliations

Society for Data Science, Pune, Maharashtra, India
Neha Sharma
A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India
Amlan Chakrabarti
Department of Automatics and Applied Software, Aurel Vlaicu University of Arad, Arad, Romania
Valentina Emilia Balas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dhondse, A., Kulkarni, S., Khadilkar, K., Kane, I., Chavan, S., Barhate, R. (2020). Generative Adversarial Networks as an Advancement in 2D to 3D Reconstruction Techniques. In: Sharma, N., Chakrabarti, A., Balas, V. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1016. Springer, Singapore. https://doi.org/10.1007/978-981-13-9364-8_25

Download citation

DOI: https://doi.org/10.1007/978-981-13-9364-8_25
Published: 25 September 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9363-1
Online ISBN: 978-981-13-9364-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics