From Vision to Grasping: Adapting Visual Networks

  • Rebecca AlldayEmail author
  • Simon Hadfield
  • Richard Bowden
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10454)


Grasping is one of the oldest problems in robotics and is still considered challenging, especially when grasping unknown objects with unknown 3D shape. We focus on exploiting recent advances in computer vision recognition systems. Object classification problems tend to have much larger datasets to train from and have far fewer practical constraints around the size of the model and speed to train. In this paper we will investigate how to adapt Convolutional Neural Networks (CNNs), traditionally used for image classification, for planar robotic grasping. We consider the differences in the problems and how a network can be adjusted to account for this. Positional information is far more important to robotics than generic image classification tasks, where max pooling layers are used to improve translation invariance. By using a more appropriate network structure we are able to obtain improved accuracy while simultaneously improving run times and reducing memory consumption by reducing model size by up to 69%.


Robotic grasping Machine learning CNNs SqueezeNet AlexNet 



This work was supported by the Marion Redfearn Trust, EPSRC and Tesco Labs, with particular thanks to Paul Wilkinson for his support.


  1. 1.
    Bohg, J., Morales, A., Asfour, T., Kragic, D.: Data-driven grasp synthesis - A survey. CoRR abs/1309.2660 (2013)Google Scholar
  2. 2.
    El-Khoury, S., Sahbani, A.: Handling objects by their handles. In: IROS Workshop on Grasp and Task Learning by Imitation (2008)Google Scholar
  3. 3.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)Google Scholar
  4. 4.
    Huebner, K., Kragic, D.: Selection of robot pre-grasps using box-based shape approximation. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1765–1770, September 2008Google Scholar
  5. 5.
    Iandola, F.N., Moskewicz, M.W., Ashraf, K., Han, S., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)1 MB model size (2016)Google Scholar
  6. 6.
    Jaeger, H.: Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach (2002)Google Scholar
  7. 7.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  8. 8.
    Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. Int. J. Robot. Res. 34(4–5), 705–724 (2015)CrossRefGoogle Scholar
  9. 9.
    Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. CoRR abs/1504.00702 (2015)Google Scholar
  10. 10.
    Pelossof, R., Miller, A., Allen, P., Jebara, T.: An SVM learning approach to robotic grasping. In: 2004 IEEE International Conference on Robotics and Automation, 2004, Proceedings, ICRA 2004, vol. 4, pp. 3512–3518, April 2004Google Scholar
  11. 11.
    Pinto, L., Gupta, A.: Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours (2015). arXiv:1509.06825
  12. 12.
    Redmon, J., Angelova, A.: Real-time grasp detection using convolutional neural networks. CoRR abs/1412.3128 (2014)Google Scholar
  13. 13.
    Saxena, A., Driemeyer, J., Ng, A.Y.: Robotic grasping of novel objects using vision. Int. J. Robot. Res. 27(2), 157–173 (2008)CrossRefGoogle Scholar
  14. 14.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. CoRR abs/1409.4842 (2014)Google Scholar
  15. 15.
    Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: C3D: generic features for video analysis. CoRR abs/1412.0767 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Rebecca Allday
    • 1
    Email author
  • Simon Hadfield
    • 1
  • Richard Bowden
    • 1
  1. 1.Centre for Vision, Speech and Signal ProcessingUniversity of SurreyGuildfordUK

Personalised recommendations