Cognitive Computation

, Volume 10, Issue 5, pp 718–736 | Cite as

End-to-End ConvNet for Tactile Recognition Using Residual Orthogonal Tiling and Pyramid Convolution Ensemble

  • Lele Cao
  • Fuchun Sun
  • Xiaolong Liu
  • Wenbing Huang
  • Ramamohanarao Kotagiri
  • Hongbo Li


Tactile recognition enables robots identify target objects or environments from tactile sensory readings. The recent advancement of deep learning and biological tactile sensing inspire us proposing an end-to-end architecture ROTConvPCE-mv that performs tactile recognition using residual orthogonal tiling and pyramid convolution ensemble. Our approach uses stacks of raw frames and tactile flow as dual input, and incorporates the strength of multi-layer OTConvs (orthogonal tiling convolutions) organized in a residual learning paradigm. We empirically demonstrate that OTConvs have adjustable invariance capability to different input transformations such as translation, rotation, and scaling. To effectively capture multi-scale global context, a pyramid convolution structure is attached to the concatenated output of two residual OTConv pathways. The extensive experimental evaluations show that ROTConvPCE-mv outperforms several state-of-the-art methods with a large margin regarding recognition accuracy, robustness, and fault-tolerance. Practical suggestions and hints are summarized throughout this paper to facilitate the effective recognition using tactile sensory data.


Recognition Tactile sensors Feature extraction Residual learning Convolution neural networks Tactile flow 



We thank Weihao Cheng for suggesting momentum prediction [62] during the process of iterating our work. We also express our appreciation to Jingwei Yang and Rui Ma for the explanation of JKSC and BoS-LDSs source code. We also express our gratitude to Xiaohui Hu and Haolin Yang for their help in collecting the HCs10 dataset.

Funding Information

This work was supported by grants from the China National Natural Science Foundation (Nos. 61327809 and 61210013). Lele Cao is also supported by the State Scholarship Fund under file number 201406210275.

Compliance with Ethical Standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Sun F, Liu C, Huang W, Zhang J. Object classification and grasp planning using visual and tactile sensing. IEEE Trans Syst Man and Cybernetics: Syst 2016;46(7):969–979.CrossRefGoogle Scholar
  2. 2.
    Kappassov Z, Corrales JA, Perdereau V. Tactile sensing in dexterous robot hands. Robot Auton Syst 2015;74:195–220.CrossRefGoogle Scholar
  3. 3.
    Xu D, Loeb GE, Fishel JA. Tactile identification of objects using bayesian exploration. Proceedings of ICRA; 2013. p. 3056–3061.Google Scholar
  4. 4.
    Xiao W, Sun F, Liu H, He C. Dexterous robotic hand grasp learning using piecewise linear dynamic systems model. Proceedings of ICCSIP; 2014. p. 845–855.Google Scholar
  5. 5.
    Ma R, Liu H, Sun F, Yang Q, Gao M. Linear dynamic system method for tactile object classification. Sci China Inform Sci 2014;57(12):1–11.Google Scholar
  6. 6.
    Madry M, Bo L, Kragic D, Fox D. ST-HMP: unsupervised Spatio-temporal feature learning for tactile data. Proceedings of ICRA; 2014. p. 2262–2269.Google Scholar
  7. 7.
    Spiers AJ, Liarokapis MV, Calli B, Dollar AM. Single-grasp object classification and feature extraction with simple robot hands and tactile sensors. IEEE Trans Haptics 2016;9(2):207–220.CrossRefPubMedGoogle Scholar
  8. 8.
    Liu H, Greco J, Song X, Bimbo J, Seneviratne L, Althoefer K. Tactile image based contact shape recognition using neural network. Proceedings of MFI; 2012. p. 138–143.Google Scholar
  9. 9.
    Hoelscher J, Peters J, Hermans T. Evaluation of tactile feature extraction for interactive object recognition. Proceedings of IEEE-RAS 15th international conference on humanoid robots (humanoids). IEEE; 2015. p. 310–317.Google Scholar
  10. 10.
    Matsubara T, Shibata K. Active tactile exploration with uncertainty and travel cost for fast shape estimation of unknown objects. Robot Auton Syst 2017;91:314–326.CrossRefGoogle Scholar
  11. 11.
    Bekiroglu Y, Laaksonen J, Jorgensen JA, Kyrki V, Kragic D. Assessing grasp stability based on learning and haptic data. IEEE Trans Robot 2011;27(3):616–629.CrossRefGoogle Scholar
  12. 12.
    Dang H, Allen PK. Stable grasping under pose uncertainty using tactile feedback. Auton Robot 2014;36(4): 309–330.CrossRefGoogle Scholar
  13. 13.
    Kwiatkowski J, Cockburn D, Duchaine V. Grasp stability assessment through the fusion of proprioception and tactile signals using convolutional neural networks. Proceedings of IROS. IEEE; 2017. p. 286–292.Google Scholar
  14. 14.
    Yang H, Liu X, Cao L, Sun F. A new slip-detection method based on pairwise high frequency components of capacitive sensor signals. Proceedings of ICIST; 2015. p. 56–61.Google Scholar
  15. 15.
    Heyneman B, Cutkosky MR. Slip classification for dynamic tactile array sensors. The Int J Robot Res 2016; 35(4):404–421.CrossRefGoogle Scholar
  16. 16.
    Gorges N, Navarro SE, Goger D, Worn H. Haptic object recognition using passive joints and haptic key features. Proceedings of ICRA; 2010. p. 2349–2355.Google Scholar
  17. 17.
    Luo S, Mou W, Althoefer K, Liu H. Novel tactile-sift descriptor for object shape recognition. IEEE Sensors J 2015;15(9):5001–5009.CrossRefGoogle Scholar
  18. 18.
    Corradi T, Hall P, Iravani P. Bayesian tactile object recognition: Learning and recognising objects using a new inexpensive tactile sensor. Proceedings of ICRA; 2015. p. 3909–3914.Google Scholar
  19. 19.
    Bekiroglu Y, Kragic D, Kyrki V. Learning grasp stability based on tactile data and HMMs. Proceedings of RO-MAN; 2010. p. 132–137.Google Scholar
  20. 20.
    Soh H, Su Y, Demiris Y. Online spatio-temporal gaussian process experts with application to tactile classification. Proceedings of IROS; 2012. p. 4489–4496.Google Scholar
  21. 21.
    Gogulski J, Boldt R, Savolainen P, Guzmán-López J, Carlson S, Pertovaara A. A segregated neural pathway for prefrontal top-down control of tactile discrimination. Cerebral Cortex (New York, NY: 1991) 2013;25(1):161–166.Google Scholar
  22. 22.
    Drimus A, Kootstra G, Bilberg A, Kragic D. Design of a flexible tactile sensor for classification of rigid and deformable objects. Robot Auton Syst 2014;62(1):3–15.CrossRefGoogle Scholar
  23. 23.
    Liu H, Guo D, Sun F. Object recognition using tactile measurements: kernel sparse coding methods. IEEE Trans Instrum Meas 2016;65(3):656–665.CrossRefGoogle Scholar
  24. 24.
    Chebotar Y, Hausman K, Su Z, Sukhatme GS, Schaal S. Self-supervised regrasping using spatio-temporal tactile features and reinforcement learning. Proceedings of IROS; 2016. p. 1960–1966.Google Scholar
  25. 25.
    Wu H, Jiang D, Gao H. Tactile motion recognition with convolutional neural networks. Proceedings of IROS; 2017. p. 1572–1577.Google Scholar
  26. 26.
    Huang W, Sun F, Cao L, Zhao D, Liu H, Harandi M. Sparse coding and dictionary learning with linear dynamical systems. Proceedings of CVPR; 2016. p. 3938–3947.Google Scholar
  27. 27.
    Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L. Large-scale video classification with convolutional neural networks. Proceedings of CVPR; 2014. p. 1725–1732.Google Scholar
  28. 28.
    Tu Z, Zheng A, Yang E, Luo B, Hussain A. A biologically inspired vision-based approach for detecting multiple moving objects in complex outdoor scenes. Cognitive Comput 2015;7(5):539–551.CrossRefGoogle Scholar
  29. 29.
    Tu Z, Abel A, Zhang L, Luo B, Hussain A. A new spatio-temporal saliency-based video object segmentation. Cognitive Comput 2016;8(4):629–647.CrossRefGoogle Scholar
  30. 30.
    Tünnermann J, Mertsching B. Region-based artificial visual attention in space and time. Cognitive Comput 2014;6(1):125–143.CrossRefGoogle Scholar
  31. 31.
    Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. Proceedings of NIPS; 2014. p. 568–576.Google Scholar
  32. 32.
    Guo D, Sun F, Fang B, Yang C, Xi N. Robotic grasping using visual and tactile sensing. Inf Sci 2017;417:274–286.CrossRefGoogle Scholar
  33. 33.
    Cao L, Kotagiri R, Sun F, Li H, Huang W, Aye ZMM. Efficient spatio-temporal tactile object recognition with randomized tiling convolutional networks in a hierarchical fusion strategy. Proceedings of the 30th AAAI; 2016. p. 3337–3345.Google Scholar
  34. 34.
    Gallace A, Spence C. The cognitive and neural correlates of “tactile consciousness”: a multisensory perspective. Conscious Cogn 2008;17(1):370–407.CrossRefPubMedGoogle Scholar
  35. 35.
    Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. Proceedings of ECCV; 2014. p. 818–833.Google Scholar
  36. 36.
    Ngiam J, Chen Z, Chia D, Koh PW, Le QV, Ng AY. Tiled convolutional neural nets. Proceedings of NIPS; 2010. p. 1279–1287.Google Scholar
  37. 37.
    Lee H, Grosse R, Ranganath R, Ng AY. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. Proceedings of ICML; 2009. p. 609–616.Google Scholar
  38. 38.
    Gong Y, Wang L, Guo R, Lazebnik S. Multi-scale orderless pooling of deep convolutional activation features. Proceedings of ECCV; 2014. p. 392–407.Google Scholar
  39. 39.
    Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 2017;40(4):834–848.CrossRefPubMedGoogle Scholar
  40. 40.
    Saxe A, Koh PW, Chen Z, Bhand M, Suresh B, Ng AY. On random weights and unsupervised feature learning. Proceedings of ICML; 2011. p. 1089–1096.Google Scholar
  41. 41.
    Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y. What is the best multi-stage architecture for object recognition?. Proceedings of CVPR; 2009. p. 2146–2153.Google Scholar
  42. 42.
    Pinto N, Doukhan D, DiCarlo JJ, Cox DD. A high-throughput screening approach to discover good forms of biologically inspired visual representation. PLoS Comput Biology 2009;5(11):e1000,579. 1–12.CrossRefGoogle Scholar
  43. 43.
    Huang GB, Bai Z, Kasun LLC, Vong CM. Local receptive fields based extreme learning machine. IEEE Comput Intell Mag 2015;10(2):18–29.CrossRefGoogle Scholar
  44. 44.
    He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 770–778.Google Scholar
  45. 45.
    Bicchi A, Scilingo EP, Ricciardi E, Pietrini P. Tactile flow explains haptic counterparts of common visual illusions. Brain Res Bull 2008;75(6):737–741.CrossRefPubMedGoogle Scholar
  46. 46.
    Sun D, Roth S, Black MJ. Secrets of optical flow estimation and their principles. Proceedings of CVPR; 2010. p. 2432–2439.Google Scholar
  47. 47.
    Horn BK, Schunck BG. Determining optical flow. Artif Intell 1981;17:185–203.CrossRefGoogle Scholar
  48. 48.
    Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. International conference on machine learning; 2015. p. 448–456.Google Scholar
  49. 49.
    Spratling MW. A hierarchical predictive coding model of object recognition in natural images. Cognitive Comput 2017;9(2):151–167.CrossRefGoogle Scholar
  50. 50.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. Proceedings of CVPR; 2015. p. 1–9.Google Scholar
  51. 51.
    He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. Proceedings of ECCV; 2014. p. 346–361.Google Scholar
  52. 52.
    Hengshuang Z, Jianping S, Xiaojuan Q, Xiaogang W, Jiaya J. Pyramid scene parsing network. Proceedings of CVPR; 2017. p. 2881–2890.Google Scholar
  53. 53.
    Liu X, Deng Z. 2018. Segmentation of drivable road using deep fully convolutional residual network with pyramid pooling. Cognitive Comput:1–10.Google Scholar
  54. 54.
    Hu X, Zhang X, Liu M, Chen Y, Li P, Pei W, Zhang C, Chen H. A flexible capacitive tactile sensor array with micro structure for robotic application. Sci China Info Sci 2014;57(12):1–6.CrossRefGoogle Scholar
  55. 55.
    Zhang J, Cui J, Lu Y, Zhang X, Hu X. A flexible capacitive tactile sensor for manipulator. Proceedings of ICCSIP; 2016. p. 303–309.Google Scholar
  56. 56.
    Nair V, Hinton GE. Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th ICML; 2010. p. 807–814.Google Scholar
  57. 57.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM international conference on Multimedia. ACM; 2014. p. 675–678.Google Scholar
  58. 58.
    Scardapane S, Wang D. Randomness in neural networks: an overview. Wiley Interdisciplinary Rev: Data Mining Knowl Discovery 2017;7(2):e1200.Google Scholar
  59. 59.
    Bo L, Ren X, Fox D. Hierarchical matching pursuit for image classification. Proceedings of NIPS; 2011. p. 2115–2123.Google Scholar
  60. 60.
    Saisan P, Doretto G, Wu YN, Soatto S. Dynamic texture recognition. Proceedings of CVPR; 2001. p. 58–63.Google Scholar
  61. 61.
    Johnson BW. Fault-tolerant microprocessor-based sys. IEEE Micro 1984;4(6):6–21.CrossRefGoogle Scholar
  62. 62.
    Cao L, Sun F, Liu X, Huang W, Cheng W, Kotagiri R. Fix-budget and recurrent data mining for online haptic perception. International conference on neural information processing; 2017. p. 581–591.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.The Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  2. 2.The Department of Computing and Information SystemsThe University of MelbourneMelbourneAustralia
  3. 3.King Digital Entertainment plcActivision Blizzard Inc. (ATVI)StockholmSweden

Personalised recommendations