Multi-View CNN Feature Aggregation with ELM Auto-Encoder for 3D Shape Recognition
- 185 Downloads
Fast and accurate detection of 3D shapes is a fundamental task of robotic systems for intelligent tracking and automatic control. View-based 3D shape recognition has attracted increasing attention because human perceptions of 3D objects mainly rely on multiple 2D observations from different viewpoints. However, most existing multi-view-based cognitive computation methods use straightforward pairwise comparisons among the projected images then follow with weak aggregation mechanism, which results in heavy computation cost and low recognition accuracy. To address such problems, a novel network structure combining multi-view convolutional neural networks (M-CNNs), extreme learning machine auto-encoder (ELM-AE), and ELM classifer, named as MCEA, is proposed for comprehensive feature learning, effective feature aggregation, and efficient classification of 3D shapes. Such novel framework exploits the advantages of deep CNN architecture with the robust ELM-AE feature representation, as well as the fast ELM classifier for 3D model recognition. Compared with the existing set-to-set image comparison methods, the proposed shape-to-shape matching strategy could convert each high informative 3D model into a single compact feature descriptor via cognitive computation. Moreover, the proposed method runs much faster and obtains a good balance between classification accuracy and computational efficiency. Experimental results on the benchmarking Princeton ModelNet, ShapeNet Core 55, and PSB datasets show that the proposed framework achieves higher classification and retrieval accuracy in much shorter time than the state-of-the-art methods.
KeywordsELM auto-encoder Convolutional neural networks 3D shape recognition Multi-view feature aggregation
This study was supported in part by the Science and Technology Development Fund of Macao S.A.R (FDCT) under grant FDCT/121/2016/A3 and MoST-FDCT Joint Grant 015/2015/AMJ, in part by University of Macau under grant MYRG2016-00160-FST.
Compliance with Ethical Standards
Conflict of interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
- 1.Wang F, Kang L, Li Y. 2015. Sketch-based 3D shape retrieval using convolutional neural networks, Computer Science.Google Scholar
- 2.Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 IEEE, pp. 248–255; 2009.Google Scholar
- 3.Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 2012;25(2):2012.Google Scholar
- 4.Su H, Maji S, Kalogerakis E, et al. Multi-view convolutional neural networks for 3D shape recognition. IEEE International Conference on Computer Vision. IEEE Computer Society; 2015. p. 945–953.Google Scholar
- 5.Rifai S, Vincent P, Muller X, Glorot X, Bengio Y. Contractive auto-encoders: explicit invariance during feature extraction. Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 833–840; 2011.Google Scholar
- 9.Qi CR, Su H, Mo K, Guibas LJ. Pointnet: deep learning on point sets for 3D classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 2017;1(2):4.Google Scholar
- 10.Li Y, Bu R, Sun M, Chen B. 2018. Pointcnn. arXiv:1801.07791.
- 11.Wu J, Zhang C, Xue T, Freeman B, Tenenbaum J. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. Advances in Neural Information Processing Systems, pp. 82–90; 2016.Google Scholar
- 12.Tatarchenko M, Dosovitskiy A, Brox T. 2017. Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. arXiv:1703.09438.
- 14.Maturana D, Scherer S. Voxnet: a 3D convolutional neural network for real-time object recognition. International Conference on Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ. IEEE, pp. 922–928; 2015.Google Scholar
- 15.Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J. 2015. 3D shapenets: a deep representation for volumetric shapes, Eprint Arxiv, pp. 1912–1920.Google Scholar
- 17.Eitz M, Richter R, Boubekeur T, Hildebrand K, Alexa M. Sketch-based shape retrieval. ACM Trans Graph 2012;31(4):31–1.Google Scholar
- 18.Chen D-Y, Tian X-P, Shen Y-T, Ouhyoung M. On visual similarity based 3d model retrieval. Computer Graphics Forum, vol. 122, no. 3. Wiley Online Library, pp. 223–232; 2003.Google Scholar
- 20.Kazhdan M, Funkhouser T, Rusinkiewicz S. Rotation invariant spherical harmonic representation of 3D shape descriptors. Eurographics/acm SIGGRAPH Symposium on Geometry Processing, pp. 156–164; 2003.Google Scholar
- 21.Bai S, Bai X, Zhou Z, Zhang Z, Latecki LJ. Gift: a real-time and scalable 3D shape search engine. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. IEEE, pp. 5023–5032; 2016.Google Scholar
- 26.Yang ZX, Wang XB, Wong PK. Single and simultaneous fault diagnosis with application to a multistage gearbox: a versatile dual-elm network approach. IEEE Trans Ind Inf 2018;PP(99):1–1.Google Scholar
- 32.Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 2010;11(Dec): 3371–3408.Google Scholar
- 33.Shilane P, Min P, Kazhdan M, Funkhouser T. The princeton shape benchmark. Shape modeling applications, 2004. Proceedings. IEEE, pp. 167–178; 2004.Google Scholar
- 34.Savva M, Yu F, Su H, Kanezaki A, Furuya T, Ohbuchi R, Zhou Z, Yu R, Bai S, Bai X, Aono M, Tatsuma A, Thermos S, Axenopoulos A, Papadopoulos GT, Daras P, Deng X, Lian Z, Li B, Johan H, Lu Y, Mk S. Large-scale 3D shape retrieval from shapenet Core55. Eurographics Workshop on 3D Object Retrieval. In: Pratikakis I, Dupont F, and Ovsjanikov M, editors; 2017. The Eurographics Association.Google Scholar
- 35.Trimble. 3D warehouse. 2012. https://3dwarehouse.sketchup.com/.
- 37.Xie Z, Xu K, Shan W, Liu L, Xiong Y, Huang H. Projective feature learning for 3D shapes with multi-view depth images. Comput Graphics Forum 2015;7:34.Google Scholar
- 38.Maaten Lvd, Hinton G. Visualizing data using t-sne. J Mach Learn Res 2008;9(Nov):2579–2605.Google Scholar