2D/3D Object Recognition and Categorization Approaches for Robotic Grasping

  • Nabila Zrira
  • Mohamed Hannat
  • El Houssine Bouyakhf
  • Haris Ahmad Khan
Part of the Studies in Computational Intelligence book series (SCI, volume 730)


Object categorization and manipulation are critical tasks for a robot to operate in the household environment. In this chapter, we propose new methods for visual recognition and categorization. We describe 2D object database and 3D point clouds with 2D/3D local descriptors which we quantify with the k-means clustering algorithm for obtaining the bag of words (BOW). Moreover, we develop a new global descriptor called VFH-Color that combines the original version of Viewpoint Feature Histogram (VFH) descriptor with the color quantization histogram, thus adding the appearance information that improves the recognition rate. The acquired 2D and 3D features are used for training Deep Belief Network (DBN) classifier. Results from our experiments for object recognition and categorization show an average of recognition rate between 91% and 99% which makes it very suitable for robot-assisted tasks.


  1. 1.
    Aldoma, A., Tombari, F., Rusu, R., Vincze, M.: OUR-CVFH–oriented, unique and repeatable clustered viewpoint feature histogram for object recognition and 6DOF pose estimation. Springer (2012)Google Scholar
  2. 2.
    Aldoma, A., Vincze, M., Blodow, N., Gossow, D., Gedikli, S., Rusu, R., Bradski, G.: Cad-model recognition and 6dof pose estimation using 3d cues. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops, pp. 585–592. IEEE (2011)Google Scholar
  3. 3.
    Alexandre, L.A.: 3d object recognition using convolutional neural networks with transfer learning between input channels. In: Intelligent Autonomous Systems 13, pp. 889–898. Springer (2016)Google Scholar
  4. 4.
    Antonelli, G., Fossen, T.I., Yoerger, D.R.: Underwater robotics. In: Springer Handbook of Robotics, pp. 987–1008. Springer (2008)Google Scholar
  5. 5.
    Avila, S., Thome, N., Cord, M., Valle, E., Araújo, A.D.A.: Bossa: Extended bow formalism for image classification. In: 2011 18th IEEE International Conference on Image Processing, pp. 2909–2912. IEEE (2011)Google Scholar
  6. 6.
    Bai, J., Nie, J.-Y., Paradis, F.: Using language models for text classification. In: Proceedings of the Asia Information Retrieval Symposium, Beijing, China (2004)Google Scholar
  7. 7.
    Basu, J.K., Bhattacharyya, D., Kim, T.-H.: Use of artificial neural network in pattern recognition. Int. J. Softw. Eng. Appl. 4, 2 (2010)Google Scholar
  8. 8.
    Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)CrossRefGoogle Scholar
  9. 9.
    Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Computer vision–ECCV 2006, pp. 404–417. Springer (2006)Google Scholar
  10. 10.
    Bengio, Y.: Learning deep architectures for ai. Foundations and trends®. Mach. Learn. 2(1), 1–127 (2009)Google Scholar
  11. 11.
    Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94(2), 115 (1987)CrossRefGoogle Scholar
  12. 12.
    Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 821–826. IEEE (2011)Google Scholar
  13. 13.
    Bolovinou, A., Pratikakis, I., Perantonis, S.: Bag of spatio-visual words for context inference in scene classification. Pattern Recogn. 46(3), 1039–1053 (2013)CrossRefGoogle Scholar
  14. 15.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 1, Prague, pp. 1–2 (2004)Google Scholar
  15. 16.
    Dunbabin, M., Corke, P., Vasilescu, I., Rus, D.: Data muling over underwater wireless sensor networks using an autonomous underwater vehicle. In: Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006, pp. 2091–2098. IEEE (2006)Google Scholar
  16. 17.
    Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust rgb-d object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 681–687. IEEE (2015)Google Scholar
  17. 18.
    Fei, B., Ng, W.S., Chauhan, S., Kwoh, C.K.: The safety issues of medical robotics. Reliab. Eng. Syst. Safety 73(2), 183–192 (2001)CrossRefGoogle Scholar
  18. 19.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings, vol. 2, IEEE, pp. II–264 (2003)Google Scholar
  19. 20.
    Filliat, D.: A visual bag of words method for interactive qualitative localization and mapping. In: 2007 IEEE International Conference on Robotics and Automation, pp. 3921–3926. IEEE (2007)Google Scholar
  20. 21.
    Forlizzi, J., DiSalvo, C.: Service robots in the domestic environment: a study of the roomba vacuum in the home. In: Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction, pp. 258–265. ACM (2006)Google Scholar
  21. 22.
    Freund, E.: Fast nonlinear control with arbitrary pole-placement for industrial robots and manipulators. Int. J. Robot. Res. 1(1), 65–78 (1982)CrossRefGoogle Scholar
  22. 23.
    Geusebroek, J.-M., Burghouts, G.J., Smeulders, A.W.: The amsterdam library of object images. Int. J. Comput. Vis. 61(1), 103–112 (2005)CrossRefGoogle Scholar
  23. 24.
    Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefMATHGoogle Scholar
  24. 25.
    Hu, F., Xia, G.-S., Wang, Z., Huang, X., Zhang, L., Sun, H.: Unsupervised feature learning via spectral clustering of multidimensional patches for remotely sensed scene classification. IEEE J. Selected Topics Appl Earth Observ. Remote Sens. 8, 5 (2015)Google Scholar
  25. 26.
    Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Fritz, M., Saenko, K., Darrell, T.: A category-level 3d object dataset: Putting the kinect to work. In: Consumer Depth Cameras for Computer Vision, pp. 141–165. Springer (2013)Google Scholar
  26. 27.
    Jaulin, L.: Robust set-membership state estimation; application to underwater robotics. Automatica 45(1), 202–206 (2009)MathSciNetCrossRefMATHGoogle Scholar
  27. 28.
    Johnson, A., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 433–449 (1999)CrossRefGoogle Scholar
  28. 29.
    Khan, R., Barat, C., Muselet, D., Ducottet, C.: Spatial orientations of visual word pairs to improve bag-of-visual-words model. In: Proceedings of the British Machine Vision Conference, pp. 89–1. BMVA Press (2012)Google Scholar
  29. 30.
    Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824. IEEE (2011)Google Scholar
  30. 31.
    Larlus, D., Verbeek, J., Jurie, F.: Category level object segmentation by combining bag-of-words models with dirichlet processes and random fields. Int. J. Comput. Vis. 88(2), 238–253 (2010)MathSciNetCrossRefGoogle Scholar
  31. 32.
    LeCun, Y., Huang, F.J., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004, vol. 2, pp. II–97. IEEE (2004)Google Scholar
  32. 33.
    Li, M., Ma, W.-Y., Li, Z., Wu, L.: Visual language modeling for image classification, Feb. 28 2012. US Patent 8,126,274Google Scholar
  33. 34.
    Li, T., Mei, T., Kweon, I.-S., Hua, X.-S.: Contextual bag-of-words for visual categorization. IEEE Trans. Circuits Syst. Video Technol. 21(4), 381–392 (2011)CrossRefGoogle Scholar
  34. 35.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: The proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, vol. 2, pp. 1150–1157. IEEE (1999)Google Scholar
  35. 36.
    Madai-Tahy, L., Otte, S., Hanten, R., Zell, A.: Revisiting deep convolutional neural networks for rgb-d based object recognition. In: International Conference on Artificial Neural Networks, pp. 29–37. Springer (2016)Google Scholar
  36. 37.
    Madry, M., Ek, C.H., Detry, R., Hang, K., Kragic, D.: Improving generalization for 3d object categorization with global structure histograms. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1379–1386. IEEE (2012)Google Scholar
  37. 38.
    Mc Donald, K.R.: Discrete language models for video retrieval. Ph.D. thesis, Dublin City University (2005)Google Scholar
  38. 39.
    McCann, S., Lowe, D.G.: Local naive bayes nearest neighbor for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3650–3656. IEEE (2012)Google Scholar
  39. 40.
    Mian, A., Bennamoun, M., Owens, R.: On the repeatability and quality of keypoints for local feature-based 3d object retrieval from cluttered scenes. Int. J. Comput. Vis. 89(2–3), 348–361 (2010)CrossRefGoogle Scholar
  40. 41.
    Nair, V., Hinton, G.E.: 3d object recognition with deep belief nets. In: Advances in Neural Information Processing Systems, pp. 1339–1347 (2009)Google Scholar
  41. 42.
    Ouadiay, F.Z., Zrira, N., Bouyakhf, E.H., Himmi, M.M.: 3d object categorization and recognition based on deep belief networks and point clouds. In: Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics, pp. 311–318 (2016)Google Scholar
  42. 43.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR 2007. IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8. IEEE (2007)Google Scholar
  43. 44.
    Potter, M.C.: Short-term conceptual memory for pictures. J. Exp. Psychol: Hum Learn. Mem. 2(5), 509 (1976)MathSciNetGoogle Scholar
  44. 45.
    Rusu, R., Blodow, N., Beetz, M.: Fast point feature histograms (fpfh) for 3d registration. In: IEEE International Conference on Robotics and Automation, 2009. ICRA 2009, pp. 3212–3217. IEEE (2009)Google Scholar
  45. 46.
    Rusu, R., Blodow, N., Marton, Z., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2008, pp. 3384–3391 (2008)Google Scholar
  46. 47.
    Rusu, R., Bradski, G., Thibaux, R., Hsu, J.: Fast 3d recognition and pose using the viewpoint feature histogram. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2155–2162. IEEE (2010)Google Scholar
  47. 48.
    Rusu, R., Cousins, S.: 3D is here: point cloud library (PCL). In: IEEE International Conference on Robotics and Automation (ICRA) (Shanghai, China, May 9-13 2011)Google Scholar
  48. 49.
    Savarese, S., Fei-Fei, L.: 3d generic object categorization, localization and pose estimation. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007, pp. 1–8. IEEE (2007)Google Scholar
  49. 50.
    Schwarz, M., Schulz, H., Behnke, S.: Rgb-d object recognition and pose estimation based on pre-trained convolutional neural network features. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1329–1335. IEEE (2015)Google Scholar
  50. 51.
    Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th International Conference on Multimedia pp. 357–360. ACM (2007)Google Scholar
  51. 52.
    Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering object categories in image collectionsGoogle Scholar
  52. 53.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Ninth IEEE International Conference on Computer Vision, Proceedings, pp. 1470–1477. IEEE (2003)Google Scholar
  53. 54.
    Smolensky, P. Information processing in dynamical systems: Foundations of harmony theoryGoogle Scholar
  54. 55.
    Socher, R., Huval, B., Bath, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3d object classification. In: Advances in Neural Information Processing Systems, pp. 665–673 (2012)Google Scholar
  55. 56.
    Tang, S., Wang, X., Lv, X., Han, T.X., Keller, J., He, Z., Skubic, M., Lao, S.: Histogram of oriented normal vectors for object recognition with a depth sensor. In: Asian Conference on Computer Vision, pp. 525–538. Springer (2012)Google Scholar
  56. 57.
    Toldo, R., Castellani, U., Fusiello, A.: A bag of words approach for 3d object categorization. In: Computer Vision/Computer Graphics CollaborationTechniques, pp. 116–127. Springer (2009)Google Scholar
  57. 58.
    Tombari, F., Salti, S., Stefano, D.L.: Unique signatures of histograms for local surface description. In: Computer Vision–ECCV 2010, pp. 356–369. Springer (2010)Google Scholar
  58. 59.
    Tombari, F., Salti, S., Stefano, L.: A combined texture-shape descriptor for enhanced 3d feature matching. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 809–812. IEEE (2011)Google Scholar
  59. 60.
    Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: Ninth IEEE International Conference on Computer Vision, 2003. Proceedings, pp. 273–280. IEEE (2003)Google Scholar
  60. 61.
    Vigo, D.A.R., Khan, F.S., Van de Weijer, J., Gevers, T.: The impact of color on bag-of-words based object recognition. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 1549–1553. IEEE (2010)Google Scholar
  61. 62.
    Visentin, G., Van Winnendael, M., Putz, P.: Advanced mechatronics in esa’s space robotics developments. In: 2001 IEEE/ASME International Conference on Advanced Intelligent Mechatronics, 2001. Proceedings (2001), vol. 2, pp. 1261–1266. IEEE (2001)Google Scholar
  62. 63.
    Wohlkinger, W., Vincze, M.: Ensemble of shape functions for 3d object classification. In: 2011 IEEE International Conference on Robotics and Biomimetics (ROBIO) (2011), pp. 2987–2992. IEEE (2011)Google Scholar
  63. 64.
    Wu, L., Hoi, S.C., Yu, N.: Semantics-preserving bag-of-words models and applications. IEEE Trans. Image Process. 19(7), 1908–1920 (2010)MathSciNetCrossRefMATHGoogle Scholar
  64. 65.
    Yoshida, K.: Achievements in space robotics. IEEE Robot. Automat. Mag. 16(4), 20–28 (2009)CrossRefGoogle Scholar
  65. 66.
    Zhang, H., Berg, A.C., Maire, M., Malik, J.: Svm-knn: discriminative nearest neighbor classification for visual category recognition. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 2126–2136. IEEE (2006)Google Scholar
  66. 67.
    Zheng, L., Wang, S., Liu, Z., Tian, Q.: Packing and padding: Coupled multi-index for accurate image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1939–1946 (2014)Google Scholar
  67. 68.
    Zhong, Y.: Intrinsic shape signatures: a shape descriptor for 3d object recognition. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp. 689–696. IEEE (2009)Google Scholar
  68. 69.
    Zhu, L., Rao, A.B., Zhang, A.: Theory of keyblock-based image retrieval. ACM Trans. Inf. Syst. (TOIS) 20(2), 224–257 (2002)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Nabila Zrira
    • 1
  • Mohamed Hannat
    • 1
  • El Houssine Bouyakhf
    • 1
  • Haris Ahmad Khan
    • 2
  1. 1.LIMIARF Laboratory, Faculty of Sciences RabatMohammed V University RabatRabatMorocco
  2. 2.NTNU, Norwegian University of Science and TechnologyGjøvikNorway

Personalised recommendations