Abstract
To meet the increasing demand for high-quality 3D models, we propose an end-to-end deep learning network architecture, which can generate 3D mesh models with multiple RGB images and is different from previous methods which generate voxel or point cloud models. Unlike the single-image-based pixel2mesh network, we introduce the ConvLSTM layer to fuse perceptual features, making it possible to process multiple images simultaneously. To constrain the smoothness of 3D shapes, we design a graph pooling layer to reduce mesh structure and define a new loss function—Smooth loss. Collaborating with the graph unpooling layer in Pixel2Mesh (P2M), the graph pooling layer guarantees the mesh topology of the final 3D shapes generated. The application of Smooth loss ensures the visual appeal and structural accuracy of 3D shapes generated. Our experiments on ShapeNet dataset show that our method, compared with previous deep learning networks, can generate higher-precision 3D shapes and achieves the best on F-score and CD. In addition, due to the introduction of fusion features from multiple images, our experimental results are more convincing and credible.
Similar content being viewed by others
References
Ioannidou, A., Chatzilari, E., Nikolopoulos, S., Kompatsiaris, I.: Deep learning advances in computer vision with 3d data: a survey. ACM Comput. Surv. 50(2), 1–38 (2017).
Yuan, Z.H., Lu, T., Zhou, H.-Y., Chen, B., Li, J.-N.: Incremental 3d reconstruction using Bayesian learning. In: Proceedings of the 25th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: Advanced Research in Applied Artificial Intelligence, IEA/AIE’12, pp. 754–763. Springer: Heidelberg (2012)
Penner, E.: Soft 3d reconstruction for view synthesis. ACM Trans. Graph. 36(6), 1–11 (2017).
Yong Tsui Lee and Fen Fang: 3d reconstruction of polyhedral objects from single parallel projections using cubic corner. Comput. Aided Des. 43(8), 1025–1034 (2011).
Trucco, E.: Session details: 3d reconstruction. In: Proceedings of the 1st International Workshop on 3D Video Processing, 3DVP’10, New York, NY, USA. Association for Computing Machinery(2010)
Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3d-r2n2: a unified approach for single and multi-view 3d object reconstruction. In: European conference on computer vision, pp. 628–644. Springer (2016)
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017)
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2mesh: generating 3d mesh models from single rgb images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 52–67 (2018)
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional lstm network: a machine learning approach for precipitation nowcasting, pp. 802–810 (2015)
Wen, C., Zhang, Y., Li, Z., Fu, Y.: Pixel2mesh++: multi-view 3d mesh generation via deformation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1042–1051 (2019)
Nozawa, N., Shum, H.P., Ho, E.S., Morishima, S.: 3d car shape reconstruction from a single sketch image. In Motion, Interaction and Games, MIG’19, New York, NY, USA. Association for Computing Machinery (2019)
Haag, M., Nagel, H.-H.: Combination of edge element and optical flow estimates for 3d-model-based vehicle tracking in traffic image sequences. Int. J. Comput. Vis. 35(3), 295–319 (1999).
Nozawa, N., Shum, H.P.H., Feng, Q., Ho, E.S.L., Morishima, S.: 3d car shape reconstruction from a contour sketch using gan and lazy learning. Vis. Comput. 38(4), 1317–1330 (2022).
Loh, A.M., Hartley, R.I., et al.: Shape from non-homogeneous, non-stationary, anisotropic, perspective texture, vol. 5. In: BMVC, pp. 69–78. Citeseer (2005)
Aloimonos, J.: Shape from texture. Biol. Cybern. 58(5), 345–360 (1988).
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H. Mobilenets: efficient convolutional neural networks for mobile vision applications. Computer Vision and Pattern Recognition (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017).
Lyu, K., Li, Y., Zhang, Z.: Attention-aware multi-task convolutional neural networks. IEEE Trans. Image Process. 29, 1867–1878 (2020).
Ni, Z., Yang, W., Wang, S., Ma, L., Kwong, S.: Towards unsupervised deep image enhancement with generative adversarial network. IEEE Trans. Image Process. 29, 9140–9151 (2020).
Yang, T.T., Tong, C.: Real-time detection network for tiny traffic sign using multi-scale attention module. Sci. China Technol. Sci. 65(2), 396–406 (2022).
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).
Chen, R., Han, S., Xu, J., Su, H.: Point-based multi-view stereo network, pp. 1538–1547 (2019)
Delanoy, J., Aubry, M., Isola, P., Efros, A.A., Bousseau, A.: 3d sketching using multi-view deep volumetric prediction. In: Proceedings ACM Computer Graphics and Interactive Techniques, vol. 1, no. 1 (2018)
Xie, H., Yao, H., Sun, X., Zhou, S. and Tong, X.: Weighted voxel: a novel voxel representation for 3d reconstruction. In: Proceedings of the 10th International Conference on Internet Multimedia Computing and Service, ICIMCS’18, New York, NY, USA. Association for Computing Machinery (2018)
Huang, T., Liu, Y.: 3d point cloud geometry compression on deep learning. In: Proceedings of the 27th ACM International Conference on Multimedia, MM’19, pp. 890–898, New York, NY, USA. Association for Computing Machinery (2019)
Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond Euclidean data. IEEE Signal Process. Mag. 34(4), 18–42 (2017).
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering, pp. 3844–3852 (2016)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. Learning (2016)
Ting, Z., Feng, D.D., Zheng, T.: 3d reconstruction of single picture. In Proceedings of the Pan–Sydney area workshop on visual information processing, VIP’05, pp. 83–86. AUS—Australian Computer Society, Inc (2004)
Xiang, N., Wang, L., Jiang, T., Li, Y., Yang, X., Zhang, J.: Single-image mesh reconstruction and pose estimation via generative normal map. In: Proceedings of the 32nd International Conference on Computer Animation and Social Agents, CASA’19, pp. 79–84, New York, NY, USA. Association for Computing Machinery (2019)
Gao, Y., Yao, Y., Jiang, Y.: Multi-target 3d reconstruction from rgb-d data. In: Proceedings of the 2nd International Conference on Computer Science and Software Engineering, CSSE 2019, pp. 184–191, New York, NY, USA. Association for Computing Machinery (2019)
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: Learning 3d reconstruction in function space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)
Nikoohemat, S., Diakite, A.A., Zlatanova, S., Vosselman, G.: Indoor 3d reconstruction from point clouds for optimal routing in complex buildings to support disaster management. Autom. Construct. 113, 103109 (2020).
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 165–174 (2019)
Peng, S., Niemeyer, M., Mescheder, L., Pollefeys, M., Geiger, A.: Convolutional occupancy networks. arXiv preprint arXiv:2003.04618 (2020)
Wang, W., Ceylan, D., Mech, R., Neumann, U.: 3dn: 3d deformation network. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1038–1046 (2019)
Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: Disn: deep implicit surface network for high-quality single-view 3d reconstruction. In: Advances in Neural Information Processing Systems, pp. 492–502 (2019)
Sinha, A., Unmesh, A., Huang, Q., Ramani, K.: Surfnet: generating 3d shape surfaces using deep residual networks, pp. 791–800 (2017)
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)
Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Fisher, Y.: ShapeNet: an information-rich 3d model repository. Technical Report arXiv:1512.03012 [cs.GR], Stanford University—Princeton University—Toyota Technological Institute at Chicago (2015)
Acknowledgements
This study is partially supported by National Natural Science Foundation of China (62176016), the National Key R &D Program of China (Nos. 2018YFB2101100 and 2019YFB2101600), Guizhou Province Science and Technology Project: Research and Demonstration of Sci. & Tech Big Data Mining Technology Based on Knowledge Graph (supported by Qiankehe[2021] General 382), Training Program of the Major Research Plan of the National Natural Science Foundation of China (Grant No. 92046015), and Beijing Natural Science Foundation Program and Scientific Research Key Program of Beijing Municipal Commission of Education (Grant No. KZ202010025047).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, R., Yin, X., Yang, Y. et al. Multi-view Pixel2Mesh++: 3D reconstruction via Pixel2Mesh with more images. Vis Comput 39, 5153–5166 (2023). https://doi.org/10.1007/s00371-022-02651-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02651-7