Advertisement

View-point Invariant 3D Classification for Mobile Robots Using a Convolutional Neural Network

  • Jiyoun Moon
  • Hanjun Kim
  • Beomhee Lee
Regular Papers Robot and Applications
  • 13 Downloads

Abstract

3D object classification is an important component in semantic scene understanding for mobile robots. However, many current systems do not consider the practical issues such as object representation from different viewing positions of mobile robots. A novel 3D object representation is introduced using cylindrical occupancy grid and 3D convolutional neural network with row-wise max pooling layer. Due to the rotationally invariant characteristics of this method, robots can successfully classify 3D objects regardless of starting positions of object modelling. Experimental results on publicly available benchmark dataset show the significantly improved performance compared with other conventional algorithms.

Keywords

3D object classification cylindrical CNN mobile robots view-point invariant 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    I. Kostavelis and A. Gasteratos, “Semantic mapping for mobile robotics tasks: a survey,” Robotics and Autonomous Systems, vol. 66, pp. 86–103, 2015.CrossRefGoogle Scholar
  2. [2]
    A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “CNN features off-the-shelf: an astounding baseline for recognition,” Proc. of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519, 2014.Google Scholar
  3. [3]
    R. Socher, B. Huval, B. Bath, C. D. Manning, and A. Y. Ng, “Convolutional-recursive deep learning for 3d object classification,” Advances in Neural Information Processing Systems, pp. 656–664, 2012.Google Scholar
  4. [4]
    L. A. Alexandre, “3d object recognition using convolutional neural networks with transfer learning between input channels,” Intelligent Autonomous Systems 13, pp. 889–898, 2016.CrossRefGoogle Scholar
  5. [5]
    D. Maturana and S. Scherer, “Voxnet: a 3D convolutional neural network for real-time object recognition,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 922–928, 2015.Google Scholar
  6. [6]
    V. Hegde and R. Zade, “Fusionnet: 3D object classification using multiple data representations,” arXiv preprint arXiv:1607.05695, 2016.Google Scholar
  7. [7]
    J. Behley, V. Steinhage, and A. B. Cremers, “Performance of histogram descriptors for the classification of 3D laser range data in urban environments,” Proc. of IEEE International Conference on Robotics and Automation, pp. 4391–4398, 2012.Google Scholar
  8. [8]
    A. Teichman, J. Levinson, and S. Thrun, “Towards 3d object recognition via classification of arbitrary object tracks,” Proc. of IEEE International Conference on Robotics and Automation, pp. 4034–4041, 2011.Google Scholar
  9. [9]
    S. M. Prakhya, B. Liu, and W. Lin, “B-shot: a binary feature descriptor for fast and efficient keypoint matching on 3d point clouds,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1929–1934, 2015.Google Scholar
  10. [10]
    S. Bu, P. Han, Z. Liu, K. Li, and J. Han, “Shift-invariant ring feature for 3d shape,” The Visual Computer, vol. 30, no. 6–8, pp. 867–876, 2014.CrossRefGoogle Scholar
  11. [11]
    F. J. Huang, Y. L. Boureau, and Y. LeCun, “Unsupervised learning of invariant feature hierarchies with applications to object recognition,” Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1–8, 2007.Google Scholar
  12. [12]
    H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations,” Proc. of 26th Annual International Conference on Machine Learning, pp. 609–616, 2009.Google Scholar
  13. [13]
    M. Norouzi, M. Ranjbar, and G. Mori, “Stacks of convolutional restricted boltzmann machines for shift-invariant feature learning,” Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2735–2742, 2009.Google Scholar
  14. [14]
    R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” Proc. of 25th International Conference on Machine Learning, pp. 160–167, 2008.Google Scholar
  15. [15]
    J. Fan, W. Xu, Y. Wu, and Y. Gong, “Human tracking using convolutional neural networks,” IEEE Trans. on Neural Networks, vol. 21, no. 10, pp. 1610–1623, 2010.CrossRefGoogle Scholar
  16. [16]
    M. Yang, F. Lv, W. Xu, and Y. Gong, “Detection driven adaptive multi-cue integration for multiple human tracking,” Proc. of IEEE International Conference on Computer Vision, pp. 1554–1561, 2009.Google Scholar
  17. [17]
    H. Lee, P. Pham, Y. Largman, and A. Y. Ng, “Unsupervised feature learning for audio classification using convolutional deep belief networks,” Advances in Neural Information Processing Systems, pp. 1096–1104, 2009.Google Scholar
  18. [18]
    S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. J. Weiss, and K. Wilson, “CNN architectures for large-scale audio classification,” Proc. of IEEE International Conference on Acoustics, Speech and Signal, pp. 131–135, 2017.Google Scholar
  19. [19]
    H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view convolutional neural networks for 3D shape recognition,” Proc. of IEEE International Conference on Computer Vision, pp. 945–953, 2015.Google Scholar
  20. [20]
    B. Shi, S. Bai, Z. Zhou, and X. Bai, “Deeppano: deep panoramic representation for 3D shape recognition,” IEEE Signal Processing Letters, vol. 22, no. 12, pp. 2339–2343, 2015.CrossRefGoogle Scholar
  21. [21]
    Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: a deep representation for volumetric shapes,” Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1912–1920, 2015.Google Scholar
  22. [22]
    Y. Zhou and O. Tuzel, “Voxelnet: end-to-end learning for point cloud based 3d object detection,” arXiv preprint arXiv:1711.06396, 2017.Google Scholar
  23. [23]
    A. Brock, T. Lim, J. M. Ritchie, and N. Weston, “Generative and discriminative voxel modeling with convolutional neural networks,” arXiv preprint arXiv:1608.04236, 2016.Google Scholar
  24. [24]
    G. Riegler, A. O. Ulusoy, and A. Geiger, “Octnet: learning deep 3d representations at high resolutions,” Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3577–3586, 2017.Google Scholar
  25. [25]
    M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner, “Vote3deep: fast object detection in 3D point clouds using efficient convolutional neural networks,” Proc. of IEEE International Conference on Robotics and Automation, pp. 1355–1361, 2017.Google Scholar
  26. [26]
    C. R. Qi, H. Su, M. NieSSner, A. Dai, M. Yan, and L. J. Guibas, “Volumetric and multi-view cnns for object classification on 3D data,” Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 5648–5656, 2016.Google Scholar
  27. [27]
    C. V. Nguyen, S. Izadi, and D. Lovell, “Modeling kinect sensor noise for improved 3D reconstruction and tracking,” Proc. of IEEE International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, pp. 524–530, 2012.Google Scholar
  28. [28]
    S. Li, J. Wang, Z. Liang, and L. Su, “Tree point clouds registration using an improved icp algorithm based on kdtree,” Proc. of IEEE International Conference on Geoscience and Remote Sensing Symposium, pp. 4545–4548, 2016.Google Scholar
  29. [29]
    K. Liu, H. Skibbe, T. Schmidt, T. Blein, K. Palme, T. Brox, and O. Ronneberger, “Rotation-invariant hog descriptors using fourier analysis in polar and spherical coordinates,” International Journal of Computer Vision, vol. 106, no. 3, pp. 342–364, 2014.MathSciNetCrossRefzbMATHGoogle Scholar
  30. [30]
    S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks for human action recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221–231, 2013.CrossRefGoogle Scholar
  31. [31]
    N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.MathSciNetzbMATHGoogle Scholar
  32. [32]
    K. V. Vishwanath, D. Gupta, A. Vahdat, and K. Yocum, “Modelnet: towards a datacenter emulation environment,” Proc. of IEEE International Conference on Peer-to-Peer Computing, pp. 81–82, 2009.Google Scholar

Copyright information

© Institute of Control, Robotics and Systems and The Korean Institute of Electrical Engineers and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Automation and Systems Research Institute, Department of Electrical EngineeringSeoul National UniversitySeoulKorea

Personalised recommendations