A framework for the fusion of visual and tactile modalities for improving robot perception

Zhang, Wenchang; Sun, Fuchun; Wu, Hang; Yang, Haolin

doi:10.1007/s11432-016-0158-2

A framework for the fusion of visual and tactile modalities for improving robot perception

一种用于提高机器人感知的视触觉模态融合的框架

Research Paper
Published: 22 November 2016

Volume 60, article number 012201, (2017)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Wenchang Zhang^1,2,
Fuchun Sun¹,
Hang Wu² &
…
Haolin Yang¹

323 Accesses
11 Citations
Explore all metrics

Abstract

Robots should ideally perceive objects using human-like multi-modal sensing such as vision, tactile feedback, smell, and hearing. However, the features presentations are different for each modal sensor. Moreover, the extracted feature methods for each modal are not the same. Some modal features such as vision, which presents a spatial property, are static while features such as tactile feedback, which presents temporal pattern, are dynamic. It is difficult to fuse these data at the feature level for robot perception. In this study, we propose a framework for the fusion of visual and tactile modal features, which includes the extraction of features, feature vector normalization and generation based on bag-of-system (BoS), and coding by robust multi-modal joint sparse representation (RM-JSR) and classification, thereby enabling robot perception to solve the problem of diverse modal data fusion at the feature level. Finally, comparative experiments are carried out to demonstrate the performance of this framework.

创新点

提出了一种视触觉信息融合框架和鲁棒多模态联合稀疏表示编码方法, 解决由于机器人感知的视(静态)、触觉(动态)跨模态信息特征空间维度不同而带来的特征层融合难题。具体包括:视触觉特征提取、用“词袋”算法归一化维度不同的特征向量、鲁棒多模态联合稀疏表示编码、通过视触觉融合算法进行分类。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object recognition combining vision and touch

Article Open access 18 April 2017

Tadeo Corradi, Peter Hall & Pejman Iravani

iCLAP: shape recognition by combining proprioception and touch sensing

Article Open access 23 June 2018

Shan Luo, Wenxuan Mou, … Hongbin Liu

Multimodal sensor fusion in the latent representation space

Article Open access 03 February 2023

Robert J. Piechocki, Xiaoyang Wang & Mohammud J. Bocus

References

Sharma R, Pavlovic V I, Huang T S. Toward multimodal human-computer interface. Proc IEEE, 1998, 86: 853–869
Article Google Scholar
Nock H J, Iyengar G, Neti C. Assessing face and speech consistency for monologue detection in video. In: Proceedings of the 10th ACM International Conference on Multimedia. New York: ACM, 2002. 303–306
Google Scholar
Meier U, Stiefelhagen R, Yang J, et al. Towards unrestricted lip reading. Int J Pattern Recogn Artif Intell, 2000, 14: 571–585
Article Google Scholar
Wolff G J, Prasad K V, Stork D G, et al. Lipreading by neural networks: visual processing, learning and sensory integration. In: Proceedings of Advances in Neural Information Processing Systems, Denver, 1993. 1027–1034
Google Scholar
Olshausen B A, Field D J. Sparse coding with an overcomplete basis set: a strategy employed by v1? Vision Res, 1997, 37: 3311–3325
Article Google Scholar
Nguyen N H, Nasrabadi N M, Tran T D. Robust multi-sensor classification via joint sparse representation. In: Proceedings of the 14th International Conference on Information Fusion. New York: IEEE Press, 2011. 1–8
Google Scholar
Zhang H C, Zhang Y N, Nasrabadi N M, et al. Joint-structured-sparsity-based classification for multiple-measurement transient acoustic signals. IEEE Trans Syst Man Cybern-part B Cybern, 2012, 42: 1586–1598
Article Google Scholar
Yuan X-T, Liu X B, Yan S C. Visual classification with multitask joint sparse representation. IEEE Trans Image Process, 2012, 21: 4349–4360
Article MathSciNet Google Scholar
Liu H P, Sun F C. Fusion tracking in color and infrared images using joint sparse representation. Sci China Inf Sci, 2012, 55: 590–599
Article MathSciNet Google Scholar
Shekhar S, Patel V M, Nasrabadi N M, et al. Joint sparse representation for robust multimodal biometrics recognition. IEEE Trans Pattern Anal Mach Intell, 2014, 36: 113–126
Article Google Scholar
Rao N, Nowak R, Cox C, et al. Classification with the sparse group lasso. IEEE Trans Signal Process, 2016, 64: 448–463
Article MathSciNet Google Scholar
Zhang Q, Levine M D. Robust multi-focus image fusion using multi-task sparse representation and spatial context. IEEE Trans Image Process, 2016, 25: 2045–2058
Article MathSciNet Google Scholar
Lowe D. Distinctive image features from scale-invariant keypoints. Int J Comput Vision, 2004, 60: 91–110
Article Google Scholar
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2005. 886–893
Google Scholar
Chatzichristofis S A, Zagoris K, Boutalis Y S, et al. Accurate image retrieval based on compact composite descriptors and relevance feedback information. Int J Pattern Recogn Artif Intell, 2010, 24: 207–244
Article Google Scholar
Aldous D, Ibragimov I, Jacod J. Exchangeability and Related Topics. Berlin: Springer, 1985. 1–198
Google Scholar
van Gemert J C, Veenman C J, Smeulders A W, et al. Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell, 2010, 32: 1271–1283
Article Google Scholar
Wang J, Yang J, Yu K, et al. Locality-constrained linear coding for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE Press, 2010. 3360–3367
Google Scholar
Doretto G, Chiuso A, Wu Y N, et al. Dynamic textures. Int J Comput Vision, 2003, 51: 91–109
Article MATH Google Scholar
Ellis K, Coviello E, Chan A B, et al. A bag of systems representation for music auto-tagging. IEEE Trans Audio Speech Lang Process, 2013, 21: 2554–2569
Article Google Scholar
Mumtaz A, Coviello E, Lanckriet G R G, et al. A scalable and accurate descriptor for dynamic textures using bag of system trees. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 697–712
Article Google Scholar
Ma R, Liu H P, Sun F C, et al. Linear dynamic system method for tactile object classification. Sci China Inf Sci, 2014, 57: 120205
Google Scholar
Sprechmann P, Ramirez I, Sapiro G, et al. C-hilasso: a collaborative hierarchical sparse modeling framework. IEEE Trans Signal Process, 2011, 59: 4183–4198
Article MathSciNet Google Scholar
Jalali A, Sanghavi S, Ruan C, et al. A dirty model for multi-task learning. In: Proceedings of Conference on Neural Information Processing Systems, Canada, 2010. 964–972
Google Scholar
Clarke F H. Optimization and Nonsmooth Analysis. Hoboken: Wiley, 1990. 24–109
Google Scholar
Chen X J, Zhou W J. Smoothing nonlinear conjugate gradient method for image restoration using nonsmooth nonconvex minimization. SIAM J Imag Sci, 2010, 3: 765–790
Article MathSciNet MATH Google Scholar
Schmidt M, Fung G, Rosaless R. Optimization Methods for L1 Regularization. Berlin: Springer-Verlag, 2009
Google Scholar
Figueiredo M A T, Nowak R D, Wright S J. Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J Sel Topics Signal Process, 2007, 1: 586–597
Article Google Scholar
Wright S J, Nowak R D, Figueiredo M A T. Sparse reconstruction by separable approximation. IEEE J Sel Topics Signal Process, 2009, 57: 2479–2493
Article MathSciNet Google Scholar
Yin WT, Osher S, Goldfarb D, et al. Bregman iterative algorithms for l1-minimization with applications to compressed sensing. SIAM J Imag Sci, 2008, 1: 143–168
Article MathSciNet MATH Google Scholar
Boyd S, Parikh N, Chu E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Lear, 2010, 3: 1–122
Article MATH Google Scholar
Chi E C, Lange K. Splitting methods for convex clustering. J Comput Graph Stat, 2015, 24: 994–1013
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

The State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing, 100084, China
Wenchang Zhang, Fuchun Sun & Haolin Yang
Institution of Medical Equipment, Tianjin, 300161, China
Wenchang Zhang & Hang Wu

Authors

Wenchang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fuchun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Haolin Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fuchun Sun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, W., Sun, F., Wu, H. et al. A framework for the fusion of visual and tactile modalities for improving robot perception. Sci. China Inf. Sci. 60, 012201 (2017). https://doi.org/10.1007/s11432-016-0158-2

Download citation

Received: 08 March 2016
Accepted: 30 June 2016
Published: 22 November 2016
DOI: https://doi.org/10.1007/s11432-016-0158-2

Keywords

关键词

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework for the fusion of visual and tactile modalities for improving robot perception

Abstract

创新点

Access this article

Similar content being viewed by others

Object recognition combining vision and touch

iCLAP: shape recognition by combining proprioception and touch sensing

Multimodal sensor fusion in the latent representation space

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

关键词

Navigation

A framework for the fusion of visual and tactile modalities for improving robot perception

Abstract

创新点

Access this article

Similar content being viewed by others

Object recognition combining vision and touch

iCLAP: shape recognition by combining proprioception and touch sensing

Multimodal sensor fusion in the latent representation space

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

关键词

Search

Navigation