Abstract
We consider the field of three-dimensional technical vision and in particular three-dimensional recognition. The problems of three-dimensional vision are singled out, and methods for obtaining and presenting three-dimensional data, as well as applications of three-dimensional vision, are reviewed. Deep learning methods in 3D recognition problems are surveyed. The main modern trends in this field are revealed. So far, quite a few neural network architectures, convolutional layers, sampling, pooling, and aggregation operations, and methods for representing and processing three-dimensional input data have been proposed. The field is under active development, with the greatest variety of methods being presented for point clouds.
Similar content being viewed by others
Change history
04 August 2022
An Erratum to this paper has been published: https://doi.org/10.1134/S0005117922070104
REFERENCES
COLMAP Project Page on Github.io—Main Page. https://colmap.github.io. Cited August 3, 2021.
COLMAP Project Page on Github.io—Datasets. https://colmap.github.io/datasets.html. Cited August 3, 2021.
Pérez, P. and Iván, R., Blurring the boundaries between real and artificial in architecture and urban design through the use artificial intelligence, PhD Thesis, Univ. Coruña, 2017.
Neubauer, W., Doneus, M., Studnicka, N., and Riegl, J., Combined high resolution laser scanning and photogrammetrical documentation of the pyramids at Giza, CIPA XX Int. Symp. (Citeseer, 2005), pp. 470–475.
McCarthy, J.K., Benjamin, J., Winton, T., and van Duivenvoorde, W., 3D Recording and Interpretation for Maritime Archaeology, Springer Nature, 2019.
Hoiem, D. and Savarese, S., Representations and techniques for 3D object recognition and scene interpretation, Synth. Lect. Artif. Intell. Mach. Learn., 2011, vol. 5, no. 5, pp. 1–169.
Biederman, I., On the semantics of a glance at a scene, Perceptual Organ., 1981, vol. 213, p. 253.
Bello, S.A., Yu, S., Wang, C., Adam, J.M., and Li, J., Deep learning on 3D point clouds, Remote Sensing, 2020, vol. 12, no. 11, p. 1729.
Maturana, D. and Scherer, S., 3D convolutional neural networks for landing zone detection from lidar, IEEE ICRA. IEEE, 2015, pp. 3471–3478.
Maturana, D. and Scherer, S., Voxnet: a 3D convolutional neural network for real-time object recognition, IEEE/RSJ IROS. IEEE, 2015, pp. 922–928.
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., and Guibas, L.J., Volumetric and multi-view CNNs for object classification on 3D data, Proc. CVPR, 2016, pp. 5648–5656.
Wang, C., Cheng, M., Sohel, F., Bennamoun, M., and Li, J., NormalNet: a voxel-based CNN for 3D object classification and retrieval, Neurocomputing, 2019, vol. 323, pp. 139–147.
Ghadai, S., Lee, X., Balu, A., Sarkar, S., and Krishnamurthy, A., Multi-resolution 3D convolutional neural networks for object recognition. 2018. .
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J., 3D ShapeNets: a deep representation for volumetric shapes, Proc. CVPR, 2015, pp. 1912–1920.
Riegler, G., Osman Ulusoy, A., and Geiger, A., Octnet: learning deep 3D representations at high resolutions, Proc. CVPR, 2017, pp. 3577–3586.
Tatarchenko, M., Dosovitskiy, A., and Brox, T., Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs, Proc. IEEE Int. Conf. Comput. Vision, 2017, pp. 2088–2096.
Su, Hang., Maji, S., Kalogerakis, E., and Learned-Miller, E., Multi-view convolutional neural networks for 3D shape recognition, Proc. IEEE Int. Conf. Comput. Vision, 2015, pp. 945–953.
Leng, B., Guo, S., Zhang, X., and Xiong, Z., 3D object retrieval with stacked local convolutional autoencoder, Signal Process., 2015, vol. 112, pp. 119–128.
Bai, S., Bai, X., Zhou, Z., Zhang, Z., and Jan Latecki, L., Gift: a real-time and scalable 3D shape search engine, Proc. CVPR, 2016, pp. 5023–5032.
Kalogerakis, E., Averkiou, M., Maji, S., and Chaudhuri, S., 3D shape segmentation with projective convolutional networks, Proc. CVPR, 2017, pp. 3779–3788.
Cao, Z., Huang, Q., and Karthik, R., 3D object classification via spherical projections, 2017 Int. Conf. 3D Vision (3DV), IEEE, 2017, pp. 566–574.
Zhang, L., Sun, J., and Zheng, Q., 3D point cloud recognition based on a multi-view convolutional neural network, Sensors, 2018, vol. 18, no. 11, p. 3681.
Kanezaki, A., Matsushita, Y., and Nishida, Y., RotationNet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints, Proc. CVPR, 2018, pp. 5010–5019.
Kundu, A., Yin, X., Fathi, A., Ross, D., Brewington, B., Funkhouser, T., and Pantofaru, C., Virtual multi-view fusion for 3D semantic segmentation, Eur. Conf. Comput. Vision (ECCV), Springer, 2020, pp. 518–535.
Su, H., Jampani, V., Sun, D., Maji, S., Kalogerakis, E., Yang, M.-H., and Kautz, J., Splatnet: sparse lattice networks for point cloud processing, Proc. CVPR, 2018, pp. 2530–2539.
Rao, Y., Lu, J., and Zhou, J., Spherical fractal convolutional neural networks for point cloud recognition, Proc. CVPR, 2019, pp. 452–460.
Qi, C.R., Su, H., Mo, K., and Guibas, L.J., Pointnet: deep learning on point sets for 3D classification and segmentation, Proc. CVPR, 2017, pp. 652–660.
Qi, C.R., Yi, L., Su, H., and Guibas, L.J., Pointnet++: deep hierarchical feature learning on point sets in a metric space, 2017. .
Zhou, Y. and Tuzel, O., Voxelnet: end-to-end learning for point cloud based 3D object detection, Proc. CVPR, 2018, pp. 4490–4499.
Li, J., Chen, B.M., and Lee, G.H., So-Net: self-organizing network for point cloud analysis, Proc. CVPR, 2018, pp. 9397–9406.
Hua, B.-S., Tran, M.-K., and Yeung, S.-K., Pointwise convolutional neural networks, Proc. CVPR, 2018, pp. 984–993.
Zhao, Y., Birdal, T., Deng, H., and Tombari, F., 3D point capsule networks, Proc. CVPR, 2019, pp. 1009–1018.
Sabour, S., Frosst, N., and Hinton, G.E., Dynamic routing between capsules, 2017. .
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., and Chen, B., PointCNN: Convolution on \( \chi \)-transformed points, 2018. .
Zhao, H., Jiang, L., Fu, C.-W., and Jia, J., Pointweb: enhancing local neighborhood features for point cloud processing, Proc. CVPR, 2019, pp. 5565–5573.
Wu, W., Qi, Z., and Fuxin, L., PointConv: deep convolutional networks on 3D point clouds, Proc. CVPR, 2019, pp. 9621–9630.
Liu, Y., Fan, B., Xiang, S., and Pan, C., Relation-shape convolutional neural network for point cloud analysis, Proc. CVPR, 2019, pp. 8895–8904.
Lan, S., Yu, R., Yu, G., and Davis, L.S., Modeling local geometric structure of 3D point clouds using Geo-CNN, Proc. CVPR, 2019, pp. 998–1008.
Komarichev, A., Zhong, Z., and Hua, J., A-CNN: annularly convolutional neural networks on point clouds, Proc. CVPR, 2019, pp. 7421–7430.
Xu, Y., Fan, T., Xu, M., Zeng, L., and Qiao, Y., Spidercnn: deep learning on point sets with parameterized convolutional filters, Proc. ECCV, 2018, pp. 87–102.
Arshad, S., Shahzad, M., Riaz, Q., and Fraz, M.M., DPRNet: deep 3D point based residual network for semantic segmentation and classification of 3D point clouds, IEEE Access, 2019, vol. 7, pp. 68892–68904.
Klambauer, G., Unterthiner, T., Mayr, A., and Hochreiter, S., Self-normalizing neural networks, 2017. .
Yang, J., Zhang, Q., Ni, B., Li, L., Liu, J., Zhou, M., and Tian, Q., Modeling point clouds with self-attention and Gumbel subset sampling, Proc. CVPR, 2019, pp. 3323–3332.
Liu, J., Ni B., Li, C., Yang, J., and Tian, Q., Dynamic points agglomeration for hierarchical point sets learning, Proc. CVPR, 2019. pp. 7546–7555.
Zhang, M., You, H., Kadam, P., Liu, S., and Kuo, C.-C.J., Pointhop: an explainable machine learning method for point cloud classification, IEEE Trans. Multimedia, 2020, vol. 22, no. 7, pp. 1744–1755.
Kuo, C.-C.J., Zhang, M., Li, S., Duan, J., and Chen, Y., Interpretable convolutional neural networks via feedforward design, J. Visual Commun. Image Representation, 2019, vol. 60, pp. 346–359.
Zhang, M., Wang, Y., Kadam, P., Liu, S., and Kuo, C.-C.J., Pointhop++: a lightweight learning model on point sets for 3D classification, IEEE Int. Conf. Image Process. (ICIP), 2020, pp. 3319–3323.
Kadam, P., Zhang, M., Liu, S., and Kuo, C.-C.J., R-PointHop: a green, accurate and unsupervised point cloud registration method, 2021. .
Chen, N., Liu, L., Cui, Z., Chen, R., Ceylan, D., Tu, C., and Wang, W., Unsupervised learning of intrinsic structural representation points, Proc. CVPR, 2020, pp. 9121–9130.
Klokov, R. and Lempitsky, V., Escape from cells: deep Kd-networks for the recognition of 3D point cloud models, Proc. IEEE Int. Conf. Comput. Vision, 2017, pp. 863–872.
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., and Solomon, J.M., Dynamic graph CNN for learning on point clouds, ACM Trans. Graphics (TOG), 2019, vol. 38, no. 5, pp. 1–12.
Wang, C., Samari, B., and Siddiqi, K., Local spectral graph convolution for point set feature learning, Proc. ECCV, 2018, pp. 52–66
Han, W., Wen, C., Wang, C., Li, X., and Li, Q., Point2Node: correlation learning of dynamic-node for point cloud feature modeling, Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, pp. 10925–10932.
Landrieu, L. and Simonovsky, M., Large-scale point cloud semantic segmentation with superpoint graphs, Proc. CVPR, 2018, pp. 4558–4567.
Landrieu, L. and Boussaha, M., Point cloud oversegmentation with graph-structured deep metric learning, Proc. CVPR, 2019, pp. 7440–7449.
Wang, L., Huang, Y., Hou, Y., Zhang, S., and Shan, J., Graph attention convolution for point cloud semantic segmentation, Proc. CVPR, 2019, pp. 10296–10305.
Lin, Z.-H., Huang, S.-Y., and Wang, Y.-C.F., Convolution in the cloud: learning deformable kernels in 3D graph convolution networks for point cloud analysis, Proc. CVPR, 2020, pp. 1800–1809.
Xiang, T., Zhang, C., Song, Y., Yu, J., and Cai, W., Walk in the cloud: learning curves for point clouds shape analysis, 2021. .
Feng, Y., Feng, Y., You, H., Zhao, X., and Gao, Y., MeshNet: mesh neural network for 3D shape representation, Proc. AAAI Conf. Artif. Intell., 2019, vol. 33, pp. 8279–8286.
Muzahid, A., Wan, W., Sohel, F., Wu, L., and Hou, L., CurveNet: curvature-based multitask learning deep networks for 3D object recognition, IEEE/CAA J. Autom. Sin., 2020, vol. 8, no. 6, pp. 1177–1187.
Qiao, Y.-L., Gao, L., Rosin, P., Lai, Y.-K., and Chen, X., Learning on 3D meshes with Laplacian encoding and pooling, IEEE Trans. Visualization Comput. Graphics., 2020.
Lahav, A. and Tal, A., MeshWalker: deep mesh understanding by random walks, ACM Trans. Graphics (TOG), 2020, vol. 39, no. 6, pp. 1–13.
Yang, Z., Litany, O., Birdal, T., Sridhar, S., and Guibas, L., Continuous geodesic convolutions for learning on 3D shapes, Proc. IEEE/CVF Winter Conf. Appl. Comput. Vision, 2021, pp. 134–144.
Yuan, S. and Fang, Y., Ross: robust learning of one-shot 3D shape segmentation, Proc. IEEE/CVF Winter Conf. Appl. Comput. Vision, 2020, pp. 1961–1969.
Gao, L., Wu, T., Yuan, Y.-J., Lin, M.-X., Lai, Y.-K., and Zhang, H., TM-Net: deep generative networks for textured meshes, 2020. .
Funding
This work was supported by the Russian Foundation for Basic Research, project no. 20-37-90039.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Translated by V. Potapchouck
Rights and permissions
About this article
Cite this article
Orlova, S.R., Lopata, A.V. 3D Recognition: State of the Art and Trends. Autom Remote Control 83, 503–519 (2022). https://doi.org/10.1134/S0005117922040014
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0005117922040014