Abstract
Following the success of Convolutional Neural Networks on object recognition and image classification using 2D images; in this work the framework has been extended to process 3D data. However, many current systems require huge amount of computation cost for dealing with large amount of data. In this work, we introduce an efficient 3D volumetric representation for training and testing CNNs and we also build several datasets based on the volumetric representation of 3D digits, different rotations along the x, y and z axis are also taken into account. Unlike the normal volumetric representation, our datasets are much less memory usage. Finally, we introduce a model based on the combination of CNN models, the structure of the model is based on the classical LeNet. The accuracy result achieved is beyond the state of art and it can classify a 3D digit in around 9 ms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lowe, D.G.: Object recognition from local scale-invariant features. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision 1999, vol. 2, pp. 1150–1157. IEEE (1999)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2005 CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Wu, R., Yan, S., Shan, Y., Dang, Q., Sun, G.: Deep image: scaling up image recognition, vol. 22, p. 388 (2015). arXiv preprint arXiv:1501.02876
Ji, S., Xu, W., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Maturana, D., Scherer, S.: Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE (2015)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Song, S., Xiao, J.: Deep sliding shapes for amodal 3d object detection in rgb-d images (2015). arXiv preprint arXiv:1511.02300
Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. Int. J. Robot. Res. 34(4–5), 705–724 (2015)
Socher, R., Huval, B., Bath, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3d object classification. In: Advances in Neural Information Processing Systems, pp. 665–673 (2012)
Alexandre, L.A.: 3d object recognition using convolutional neural networks with transfer learning between input channels. In: Menegatti, E., Michael, N., Berns, K., Yamaguchi, H. (eds.) Intelligent Autonomous Systems 13, pp. 889–898. Springer, Switzerland (2016)
Höft, N., Schulz, H., Behnke, S.: Fast semantic segmentation of RGB-D scenes with GPU-accelerated deep neural networks. In: Lutz, C., Thielscher, M. (eds.) KI 2014. LNCS, vol. 8736, pp. 80–85. Springer, Heidelberg (2014)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 10, 993–1001 (1990)
Acknowledgment
This work has been partially supported by Project Eyes of Things (EoT) Grant n. 643924 from the European Union’s Horizon 2020 Research and Innovation Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Xu, X., Corrigan, D., Dehghani, A., Caulfield, S., Moloney, D. (2016). 3D Object Recognition Based on Volumetric Representation Using Convolutional Neural Networks. In: Perales, F., Kittler, J. (eds) Articulated Motion and Deformable Objects. AMDO 2016. Lecture Notes in Computer Science(), vol 9756. Springer, Cham. https://doi.org/10.1007/978-3-319-41778-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-41778-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41777-6
Online ISBN: 978-3-319-41778-3
eBook Packages: Computer ScienceComputer Science (R0)