Skip to main content
Log in

A framework for the fusion of visual and tactile modalities for improving robot perception

一种用于提高机器人感知的视触觉模态融合的框架

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Robots should ideally perceive objects using human-like multi-modal sensing such as vision, tactile feedback, smell, and hearing. However, the features presentations are different for each modal sensor. Moreover, the extracted feature methods for each modal are not the same. Some modal features such as vision, which presents a spatial property, are static while features such as tactile feedback, which presents temporal pattern, are dynamic. It is difficult to fuse these data at the feature level for robot perception. In this study, we propose a framework for the fusion of visual and tactile modal features, which includes the extraction of features, feature vector normalization and generation based on bag-of-system (BoS), and coding by robust multi-modal joint sparse representation (RM-JSR) and classification, thereby enabling robot perception to solve the problem of diverse modal data fusion at the feature level. Finally, comparative experiments are carried out to demonstrate the performance of this framework.

创新点

提出了一种视触觉信息融合框架和鲁棒多模态联合稀疏表示编码方法, 解决由于机器人感知的视(静态)、触觉(动态)跨模态信息特征空间维度不同而带来的特征层融合难题。具体包括:视触觉特征提取、用“词袋”算法归一化维度不同的特征向量、鲁棒多模态联合稀疏表示编码、通过视触觉融合算法进行分类。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Sharma R, Pavlovic V I, Huang T S. Toward multimodal human-computer interface. Proc IEEE, 1998, 86: 853–869

    Article  Google Scholar 

  2. Nock H J, Iyengar G, Neti C. Assessing face and speech consistency for monologue detection in video. In: Proceedings of the 10th ACM International Conference on Multimedia. New York: ACM, 2002. 303–306

    Google Scholar 

  3. Meier U, Stiefelhagen R, Yang J, et al. Towards unrestricted lip reading. Int J Pattern Recogn Artif Intell, 2000, 14: 571–585

    Article  Google Scholar 

  4. Wolff G J, Prasad K V, Stork D G, et al. Lipreading by neural networks: visual processing, learning and sensory integration. In: Proceedings of Advances in Neural Information Processing Systems, Denver, 1993. 1027–1034

    Google Scholar 

  5. Olshausen B A, Field D J. Sparse coding with an overcomplete basis set: a strategy employed by v1? Vision Res, 1997, 37: 3311–3325

    Article  Google Scholar 

  6. Nguyen N H, Nasrabadi N M, Tran T D. Robust multi-sensor classification via joint sparse representation. In: Proceedings of the 14th International Conference on Information Fusion. New York: IEEE Press, 2011. 1–8

    Google Scholar 

  7. Zhang H C, Zhang Y N, Nasrabadi N M, et al. Joint-structured-sparsity-based classification for multiple-measurement transient acoustic signals. IEEE Trans Syst Man Cybern-part B Cybern, 2012, 42: 1586–1598

    Article  Google Scholar 

  8. Yuan X-T, Liu X B, Yan S C. Visual classification with multitask joint sparse representation. IEEE Trans Image Process, 2012, 21: 4349–4360

    Article  MathSciNet  Google Scholar 

  9. Liu H P, Sun F C. Fusion tracking in color and infrared images using joint sparse representation. Sci China Inf Sci, 2012, 55: 590–599

    Article  MathSciNet  Google Scholar 

  10. Shekhar S, Patel V M, Nasrabadi N M, et al. Joint sparse representation for robust multimodal biometrics recognition. IEEE Trans Pattern Anal Mach Intell, 2014, 36: 113–126

    Article  Google Scholar 

  11. Rao N, Nowak R, Cox C, et al. Classification with the sparse group lasso. IEEE Trans Signal Process, 2016, 64: 448–463

    Article  MathSciNet  Google Scholar 

  12. Zhang Q, Levine M D. Robust multi-focus image fusion using multi-task sparse representation and spatial context. IEEE Trans Image Process, 2016, 25: 2045–2058

    Article  MathSciNet  Google Scholar 

  13. Lowe D. Distinctive image features from scale-invariant keypoints. Int J Comput Vision, 2004, 60: 91–110

    Article  Google Scholar 

  14. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2005. 886–893

    Google Scholar 

  15. Chatzichristofis S A, Zagoris K, Boutalis Y S, et al. Accurate image retrieval based on compact composite descriptors and relevance feedback information. Int J Pattern Recogn Artif Intell, 2010, 24: 207–244

    Article  Google Scholar 

  16. Aldous D, Ibragimov I, Jacod J. Exchangeability and Related Topics. Berlin: Springer, 1985. 1–198

    Google Scholar 

  17. van Gemert J C, Veenman C J, Smeulders A W, et al. Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell, 2010, 32: 1271–1283

    Article  Google Scholar 

  18. Wang J, Yang J, Yu K, et al. Locality-constrained linear coding for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE Press, 2010. 3360–3367

    Google Scholar 

  19. Doretto G, Chiuso A, Wu Y N, et al. Dynamic textures. Int J Comput Vision, 2003, 51: 91–109

    Article  MATH  Google Scholar 

  20. Ellis K, Coviello E, Chan A B, et al. A bag of systems representation for music auto-tagging. IEEE Trans Audio Speech Lang Process, 2013, 21: 2554–2569

    Article  Google Scholar 

  21. Mumtaz A, Coviello E, Lanckriet G R G, et al. A scalable and accurate descriptor for dynamic textures using bag of system trees. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 697–712

    Article  Google Scholar 

  22. Ma R, Liu H P, Sun F C, et al. Linear dynamic system method for tactile object classification. Sci China Inf Sci, 2014, 57: 120205

    Google Scholar 

  23. Sprechmann P, Ramirez I, Sapiro G, et al. C-hilasso: a collaborative hierarchical sparse modeling framework. IEEE Trans Signal Process, 2011, 59: 4183–4198

    Article  MathSciNet  Google Scholar 

  24. Jalali A, Sanghavi S, Ruan C, et al. A dirty model for multi-task learning. In: Proceedings of Conference on Neural Information Processing Systems, Canada, 2010. 964–972

    Google Scholar 

  25. Clarke F H. Optimization and Nonsmooth Analysis. Hoboken: Wiley, 1990. 24–109

    Google Scholar 

  26. Chen X J, Zhou W J. Smoothing nonlinear conjugate gradient method for image restoration using nonsmooth nonconvex minimization. SIAM J Imag Sci, 2010, 3: 765–790

    Article  MathSciNet  MATH  Google Scholar 

  27. Schmidt M, Fung G, Rosaless R. Optimization Methods for L1 Regularization. Berlin: Springer-Verlag, 2009

    Google Scholar 

  28. Figueiredo M A T, Nowak R D, Wright S J. Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J Sel Topics Signal Process, 2007, 1: 586–597

    Article  Google Scholar 

  29. Wright S J, Nowak R D, Figueiredo M A T. Sparse reconstruction by separable approximation. IEEE J Sel Topics Signal Process, 2009, 57: 2479–2493

    Article  MathSciNet  Google Scholar 

  30. Yin WT, Osher S, Goldfarb D, et al. Bregman iterative algorithms for l1-minimization with applications to compressed sensing. SIAM J Imag Sci, 2008, 1: 143–168

    Article  MathSciNet  MATH  Google Scholar 

  31. Boyd S, Parikh N, Chu E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Lear, 2010, 3: 1–122

    Article  MATH  Google Scholar 

  32. Chi E C, Lange K. Splitting methods for convex clustering. J Comput Graph Stat, 2015, 24: 994–1013

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fuchun Sun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Sun, F., Wu, H. et al. A framework for the fusion of visual and tactile modalities for improving robot perception. Sci. China Inf. Sci. 60, 012201 (2017). https://doi.org/10.1007/s11432-016-0158-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-016-0158-2

Keywords

关键词

Navigation