Skip to main content
Log in

Multi-view Pixel2Mesh++: 3D reconstruction via Pixel2Mesh with more images

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

To meet the increasing demand for high-quality 3D models, we propose an end-to-end deep learning network architecture, which can generate 3D mesh models with multiple RGB images and is different from previous methods which generate voxel or point cloud models. Unlike the single-image-based pixel2mesh network, we introduce the ConvLSTM layer to fuse perceptual features, making it possible to process multiple images simultaneously. To constrain the smoothness of 3D shapes, we design a graph pooling layer to reduce mesh structure and define a new loss function—Smooth loss. Collaborating with the graph unpooling layer in Pixel2Mesh (P2M), the graph pooling layer guarantees the mesh topology of the final 3D shapes generated. The application of Smooth loss ensures the visual appeal and structural accuracy of 3D shapes generated. Our experiments on ShapeNet dataset show that our method, compared with previous deep learning networks, can generate higher-precision 3D shapes and achieves the best on F-score and CD. In addition, due to the introduction of fusion features from multiple images, our experimental results are more convincing and credible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Ioannidou, A., Chatzilari, E., Nikolopoulos, S., Kompatsiaris, I.: Deep learning advances in computer vision with 3d data: a survey. ACM Comput. Surv. 50(2), 1–38 (2017).

    Article  Google Scholar 

  2. Yuan, Z.H., Lu, T., Zhou, H.-Y., Chen, B., Li, J.-N.: Incremental 3d reconstruction using Bayesian learning. In: Proceedings of the 25th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: Advanced Research in Applied Artificial Intelligence, IEA/AIE’12, pp. 754–763. Springer: Heidelberg (2012)

  3. Penner, E.: Soft 3d reconstruction for view synthesis. ACM Trans. Graph. 36(6), 1–11 (2017).

    Article  Google Scholar 

  4. Yong Tsui Lee and Fen Fang: 3d reconstruction of polyhedral objects from single parallel projections using cubic corner. Comput. Aided Des. 43(8), 1025–1034 (2011).

    Article  Google Scholar 

  5. Trucco, E.: Session details: 3d reconstruction. In: Proceedings of the 1st International Workshop on 3D Video Processing, 3DVP’10, New York, NY, USA. Association for Computing Machinery(2010)

  6. Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3d-r2n2: a unified approach for single and multi-view 3d object reconstruction. In: European conference on computer vision, pp. 628–644. Springer (2016)

  7. Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017)

  8. Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2mesh: generating 3d mesh models from single rgb images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 52–67 (2018)

  9. Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional lstm network: a machine learning approach for precipitation nowcasting, pp. 802–810 (2015)

  10. Wen, C., Zhang, Y., Li, Z., Fu, Y.: Pixel2mesh++: multi-view 3d mesh generation via deformation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1042–1051 (2019)

  11. Nozawa, N., Shum, H.P., Ho, E.S., Morishima, S.: 3d car shape reconstruction from a single sketch image. In Motion, Interaction and Games, MIG’19, New York, NY, USA. Association for Computing Machinery (2019)

  12. Haag, M., Nagel, H.-H.: Combination of edge element and optical flow estimates for 3d-model-based vehicle tracking in traffic image sequences. Int. J. Comput. Vis. 35(3), 295–319 (1999).

    Article  Google Scholar 

  13. Nozawa, N., Shum, H.P.H., Feng, Q., Ho, E.S.L., Morishima, S.: 3d car shape reconstruction from a contour sketch using gan and lazy learning. Vis. Comput. 38(4), 1317–1330 (2022).

    Article  Google Scholar 

  14. Loh, A.M., Hartley, R.I., et al.: Shape from non-homogeneous, non-stationary, anisotropic, perspective texture, vol. 5. In: BMVC, pp. 69–78. Citeseer (2005)

  15. Aloimonos, J.: Shape from texture. Biol. Cybern. 58(5), 345–360 (1988).

    Article  MATH  Google Scholar 

  16. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H. Mobilenets: efficient convolutional neural networks for mobile vision applications. Computer Vision and Pattern Recognition (2017)

  17. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017).

    Article  Google Scholar 

  18. Lyu, K., Li, Y., Zhang, Z.: Attention-aware multi-task convolutional neural networks. IEEE Trans. Image Process. 29, 1867–1878 (2020).

    Article  MathSciNet  MATH  Google Scholar 

  19. Ni, Z., Yang, W., Wang, S., Ma, L., Kwong, S.: Towards unsupervised deep image enhancement with generative adversarial network. IEEE Trans. Image Process. 29, 9140–9151 (2020).

    Article  MATH  Google Scholar 

  20. Yang, T.T., Tong, C.: Real-time detection network for tiny traffic sign using multi-scale attention module. Sci. China Technol. Sci. 65(2), 396–406 (2022).

    Article  Google Scholar 

  21. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).

    Article  Google Scholar 

  22. Chen, R., Han, S., Xu, J., Su, H.: Point-based multi-view stereo network, pp. 1538–1547 (2019)

  23. Delanoy, J., Aubry, M., Isola, P., Efros, A.A., Bousseau, A.: 3d sketching using multi-view deep volumetric prediction. In: Proceedings ACM Computer Graphics and Interactive Techniques, vol. 1, no. 1 (2018)

  24. Xie, H., Yao, H., Sun, X., Zhou, S. and Tong, X.: Weighted voxel: a novel voxel representation for 3d reconstruction. In: Proceedings of the 10th International Conference on Internet Multimedia Computing and Service, ICIMCS’18, New York, NY, USA. Association for Computing Machinery (2018)

  25. Huang, T., Liu, Y.: 3d point cloud geometry compression on deep learning. In: Proceedings of the 27th ACM International Conference on Multimedia, MM’19, pp. 890–898, New York, NY, USA. Association for Computing Machinery (2019)

  26. Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond Euclidean data. IEEE Signal Process. Mag. 34(4), 18–42 (2017).

    Article  Google Scholar 

  27. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering, pp. 3844–3852 (2016)

  28. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. Learning (2016)

  29. Ting, Z., Feng, D.D., Zheng, T.: 3d reconstruction of single picture. In Proceedings of the Pan–Sydney area workshop on visual information processing, VIP’05, pp. 83–86. AUS—Australian Computer Society, Inc (2004)

  30. Xiang, N., Wang, L., Jiang, T., Li, Y., Yang, X., Zhang, J.: Single-image mesh reconstruction and pose estimation via generative normal map. In: Proceedings of the 32nd International Conference on Computer Animation and Social Agents, CASA’19, pp. 79–84, New York, NY, USA. Association for Computing Machinery (2019)

  31. Gao, Y., Yao, Y., Jiang, Y.: Multi-target 3d reconstruction from rgb-d data. In: Proceedings of the 2nd International Conference on Computer Science and Software Engineering, CSSE 2019, pp. 184–191, New York, NY, USA. Association for Computing Machinery (2019)

  32. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: Learning 3d reconstruction in function space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)

  33. Nikoohemat, S., Diakite, A.A., Zlatanova, S., Vosselman, G.: Indoor 3d reconstruction from point clouds for optimal routing in complex buildings to support disaster management. Autom. Construct. 113, 103109 (2020).

    Article  Google Scholar 

  34. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 165–174 (2019)

  35. Peng, S., Niemeyer, M., Mescheder, L., Pollefeys, M., Geiger, A.: Convolutional occupancy networks. arXiv preprint arXiv:2003.04618 (2020)

  36. Wang, W., Ceylan, D., Mech, R., Neumann, U.: 3dn: 3d deformation network. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1038–1046 (2019)

  37. Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: Disn: deep implicit surface network for high-quality single-view 3d reconstruction. In: Advances in Neural Information Processing Systems, pp. 492–502 (2019)

  38. Sinha, A., Unmesh, A., Huang, Q., Ramani, K.: Surfnet: generating 3d shape surfaces using deep residual networks, pp. 791–800 (2017)

  39. Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)

  40. Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Fisher, Y.: ShapeNet: an information-rich 3d model repository. Technical Report arXiv:1512.03012 [cs.GR], Stanford University—Princeton University—Toyota Technological Institute at Chicago (2015)

Download references

Acknowledgements

This study is partially supported by National Natural Science Foundation of China (62176016), the National Key R &D Program of China (Nos. 2018YFB2101100 and 2019YFB2101600), Guizhou Province Science and Technology Project: Research and Demonstration of Sci. & Tech Big Data Mining Technology Based on Knowledge Graph (supported by Qiankehe[2021] General 382), Training Program of the Major Research Plan of the National Natural Science Foundation of China (Grant No. 92046015), and Beijing Natural Science Foundation Program and Scientific Research Key Program of Beijing Municipal Commission of Education (Grant No. KZ202010025047).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Tong.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 5527 KB)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, R., Yin, X., Yang, Y. et al. Multi-view Pixel2Mesh++: 3D reconstruction via Pixel2Mesh with more images. Vis Comput 39, 5153–5166 (2023). https://doi.org/10.1007/s00371-022-02651-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02651-7

Keywords

Navigation