摘要
人类可以熟练的对真实世界中物体按照形状或者功能进行分类, 并在思维中建立每类物体的视觉概念和周围真实世界的视觉知识 (Pan, 2019). Pan (2021) 指出建立这些视觉概念和视觉知识的计算表达是发展下一代人工智能的一个关键步骤. 学习同一视觉概念下所有物体的三维形状空间是实现视觉概念计算表达的一个关键步骤. 本文提出三维形状空间学习中面临的关键技术挑战, 并围绕这些技术挑战回顾了这一领域的研究进展, 最后讨论了三维形状空间学习领域的研究趋势和未来发展方向.
References
Bai S, Bai X, Zhou ZC, et al., 2016. GIFT: a real-time and scalable 3D shape search engine. IEEE Conf on Computer Vision and Pattern Recognition, p.5023–5032. https://doi.org/10.1109/CVPR.2016.543
Cao C, Weng YL, Zhou S, et al., 2014. FaceWareHouse: a 3D facial expression database for visual computing. IEEE Trans Visual Comput Graph, 20(3):413–425. https://doi.org/10.1109/TVCG.2013.249
Chan ER, Monteiro M, Kellnhofer P, et al., 2021. pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5799–5809. https://doi.org/10.1109/CVPR46437.2021.00574
Chen ZQ, Zhang H, 2019. Learning implicit fields for generative shape modeling. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5932–5941. https://doi.org/10.1109/CVPR.2019.00609
Deng Y, Yang JL, Tong X, 2021. Deformed implicit field: modeling 3D shapes with learned dense correspondence. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.10286–10296. https://doi.org/10.1109/CVPR46437.2021.01015
Deng Y, Yang J, Xiang J, et al., 2022. GRAM: generative radiance manifolds for 3D-aware image generation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.10673–10683.
Egger B, Smith WA, Tewari A, 2020. 3D morphable face models past, present, and future. ACM Trans Graph, 39(5):157. https://doi.org/10.1145/3395208
Gadelha M, Maji S, Wang R, 2017. 3D shape induction from 2D views of multiple objects. Int Conf on 3D Vision, p.402–411. https://doi.org/10.1109/3DV.2017.00053
Groueix T, Fisher M, Kim VG, et al., 2018. A Papier-Mache approach to learning 3D surface generation. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.216–224. https://doi.org/10.1109/CVPR.2018.00030
Hughes JF, van Dam A, McGuire M, et al., 2013. Computer Graphics: Principles and Practice (3rd Ed.). Addison-Wesley, Upper Saddle River, USA.
Jiang C, Huang J, Tagliasacchi A, et al., 2020. ShapeFlow: learnable deformation flows among 3D shapes. Advances in Neural Information Processing Systems 33, p.9745–9757.
Jin YW, Jiang DQ, Cai M, 2020. 3D reconstruction using deep learning: a survey. Commun Inform Syst, 20(4): 389–413. https://doi.org/10.4310/CIS.2020.v20.n4.a1
Li X, Dong Y, Peers P, et al., 2019. Synthesizing 3D shapes from silhouette image collections using multi-projection generative adversarial networks. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5530–5539. https://doi.org/10.1109/CVPR.2019.00568
Liu F, Liu XM, 2020. Learning implicit functions for topology-varying dense 3D shape correspondence. Proc 34th Int Conf on Neural Information Processing Systems, p.4823–4834.
Loper M, Mahmood N, Romero J, et al., 2015. SMPL: a skinned multi-person linear model. ACM Trans Graph, 34(6):248. https://doi.org/10.1145/2816795.2818013
Lun ZL, Gadelha M, Kalogerakis E, et al., 2017. 3D shape reconstruction from sketches via multi-view convolutional networks. Proc Int Conf on 3D Vision, p.67–77. http://arxiv.org/abs/1707.06375
Masci J, Boscaini D, Bronstein MM, et al., 2015. Geodesic convolutional neural networks on Riemannian manifolds. Proc IEEE Int Conf on Computer Vision Workshop, p.832–840. https://doi.org/10.1109/ICCVW.2015.112
Měch R, Prusinkiewicz P, 1996. Visual models of plants interacting with their environment. Proc 23rd Annual Conf on Computer Graphics and Interactive Techniques, p.397–410. https://doi.org/10.1145/237170.237279
Mescheder L, Oechsle M, Niemeyer M, et al., 2019. Occupancy networks: learning 3D reconstruction in function space. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4455–4465. https://doi.org/10.1109/CVPR.2019.00459
Mo KC, Zhu SL, Chang AX, et al., 2019. PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.909–918. https://doi.org/10.1109/CVPR.2019.00100
Müller P, Wonka P, Haegler S, et al., 2006. Procedural modeling of buildings. ACM SIGGRAPH Papers, p.614–623. https://doi.org/10.1145/1141911.1141931
Niu CJ, Li J, Xu K, 2018. Im2Struct: recovering 3D shape structure from a single RGB image. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4521–4529. https://doi.org/10.1109/CVPR.2018.00475
Pan YH, 2019. On visual knowledge. Front Inform Technol Electron Eng, 20(8):1021–1025. https://doi.org/10.1631/FITEE.1910001
Pan YH, 2021a. Miniaturized five fundamental issues about visual knowledge. Front Inform Technol Electron Eng, 22(5):615–618. https://doi.org/10.1631/FITEE.2040000
Pan YH, 2021b. On visual understanding. Front Inform Technol Electron Eng, early access. https://doi.org/10.1631/FITEE.2130000
Park JJ, Florence P, Straub J, et al., 2019. DeepSDF: learning continuous signed distance functions for shape representation. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.165–174. https://doi.org/10.1109/CVPR.2019.00025
Paschalidou D, Katharopoulos A, Geiger A, et al., 2021. Neural parts: learning expressive 3D shape abstractions with invertible neural networks. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3204–3215. https://doi.org/10.1109/CVPR46437.2021.00322
Qi CR, Su H, Mo KC, et al., 2017. PointNet: deep learning on point sets for 3D classification and segmentation. IEEE Conf on Computer Vision and Pattern Recognition, p.77–85. https://doi.org/10.1109/CVPR.2017.16
Riegler G, Ulusoy AO, Geiger A, 2017. OctNet: learning deep 3D representations at high resolutions. IEEE Conf on Computer Vision and Pattern Recognition, p.6620–6629. https://doi.org/10.1109/CVPR.2017.701
Sinha A, Bai J, Ramani K, 2016. Deep learning 3D shape surfaces using geometry images. Proc 14th European Conf on Computer Vision, p.223–240. https://doi.org/10.1007/978-3-319-46466-4_14
Su H, Maji S, Kalogerakis E, et al., 2015. Multi-view convolutional neural networks for 3D shape recognition. IEEE Int Conf on Computer Vision, p.945–953. https://doi.org/10.1109/ICCV.2015.114
Sun CY, Zou QF, Tong X, et al., 2019. Learning adaptive hierarchical cuboid abstractions of 3D shape collections. ACM Trans Graph, 38(6):241. https://doi.org/10.1145/3355089.3356529
Tulsiani S, Su H, Guibas LJ, et al., 2017. Learning shape abstractions by assembling volumetric primitives. IEEE Conf on Computer Vision and Pattern Recognition, p.1466–1474. https://doi.org/10.1109/CVPR.2017.160
Wang NY, Zhang YD, Li ZW, et al., 2018. Pixel2Mesh: generating 3D mesh models from single RGB images. Proc 15th European Conf on Computer Vision, p.55–71. https://doi.org/10.1007/978-3-030-01252-6_4
Wang PS, Liu Y, Guo YX, et al., 2017. O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans Graph, 36(4):72. https://doi.org/10.1145/3072959.3073608
Wang PS, Liu Y, Tong X, 2022. Dual octree graph networks for learning adaptive volumetric shape representations. ACM Trans Graph, 41(4):103. https://doi.org/10.1145/3528223.3530087
Wen C, Zhang YD, Li ZW, et al., 2019. Pixel2Mesh++: multi-view 3D mesh generation via deformation. IEEE/CVF Int Conf on Computer Vision, p.1042–1051. https://doi.org/10.1109/ICCV.2019.00113
Wu ZR, Song SR, Khosla A, et al., 2015. 3D ShapeNets: a deep representation for volumetric shapes. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.1912–1920. https://doi.org/10.1109/CVPR.2015.7298801
Xiao YP, Lai YK, Zhang FL, et al., 2020. A survey on deep geometry learning: from a representation perspective. Comput Visual Med, 6(2):113–133. https://doi.org/10.1007/s41095-020-0174-8
Yang J, Mo KC, Lai YK, et al., 2023. DSG-Net: learning disentangled structure and geometry for 3D shape generation. ACM Trans Graph, 42(1):1. https://doi.org/10.1145/3526212
Yang KZ, Chen XJ, 2021. Unsupervised learning for cuboid shape abstraction via joint segmentation from point clouds. ACM Trans Graph, 40(4):152. https://doi.org/10.1145/3450626.3459873
Yu FG, Liu K, Zhang Y, et al., 2019. PartNet: a recursive part decomposition network for fine-grained and hierarchical shape segmentation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.9483–9492. https://doi.org/10.1109/CVPR.2019.00972
Yu LQ, Li XZ, Fu CW, et al., 2018. PU-Net: point cloud upsampling network. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.2790–2799. https://doi.org/10.1109/CVPR.2018.00295
Zheng XY, Liu Y, Wang PS, et al., 2022. SDF-StyleGAN: implicit SDF-based StyleGAN for 3D shape generation. https://arxiv.org/abs/2206.12055
Zheng ZR, Yu T, Dai QH, et al., 2021. Deep implicit templates for 3D shape representation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1429–1439. https://doi.org/10.1109/CVPR46437.2021.00148
Zuffi S, Kanazawa A, Jacobs DW, et al., 2017. 3D Menagerie: modeling the 3D shape and pose of animals. IEEE Conf on Computer Vision and Pattern Recognition, p.5524–5532. https://doi.org/10.1109/CVPR.2017.586
Acknowledgements
This paper is based on the author’s presentations in the first and second workshops on visual knowledge and visual intelligence. The author would like to thank all workshop attendees for the insightful discussions. The author also thanks Prof. Yunhe PAN and Dr. Heung-Yeung SHUM for their invaluable comments on the topics presented in the paper. Finally, the author thanks all collaborators for our research works presented in Li et al. (2019), Sun et al. (2019), and Deng et al. (2021, 2022).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Xin TONG declares that he has no conflict of interest.
Rights and permissions
About this article
Cite this article
Tong, X. Three-dimensional shape space learning for visual concept construction: challenges and research progress. Front Inform Technol Electron Eng 23, 1290–1297 (2022). https://doi.org/10.1631/FITEE.2200318
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2200318