Multimodal visual image processing of mobile robot in unstructured environment based on semi-supervised multimodal deep network

Li, Yajia

doi:10.1007/s12652-020-02037-4

Multimodal visual image processing of mobile robot in unstructured environment based on semi-supervised multimodal deep network

Original Research
Published: 07 May 2020

Volume 11, pages 6349–6359, (2020)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Yajia Li¹

247 Accesses
4 Citations
Explore all metrics

Abstract

With the continuous development of computer technology, machine vision and image processing algorithms, people’s research on mobile robots with vision systems is becoming deeper and deeper. This paper studies the related problems of visual image processing of mobile robots in outdoor unstructured environments. In this work, we propose a new approach that integrates heterogeneous features through a well-designed Semi-supervised multimodal deep network (SMMDN). For each modality, there is a multi-layer sub-neural network with a separate structure corresponding to it, which is used to transform features in different modes into the same modal features. At the same time, through a network layer common to all modes above these sub-neural networks, a connection is established between these different modes, and finally a plurality of heterogeneous modes is converted into the same mode and a plurality of them are extracted from fusion characteristics of data modalities. The simulation results prove that SMMDN improves the perception and recognition ability of mobile robots for outdoor complex environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Article 10 June 2021

SimCLR-Inception: An Image Representation Learning and Recognition Model for Robot Vision

Dynamic Environments Localization via Dimensions Reduction of Deep Learning Features

References

Bagnell JA, Bradley D, Silver D (2010) Learning for autonomous navigation. Robot Autom Mag IEEE 17(2):74–84
Article Google Scholar
Barri A, Dooms A, Jansen B, Schelkens P (2014) A locally adaptive system for the fusion of objective quality measures. IEEE Transa Image Process Publ IEEE Signal Process Soc 23(6):2446–2458
Article MathSciNet Google Scholar
Chartsias A, Joyce T, Giuffrida MV et al (2018) Multimodal MR synthesis via modality-invariant latent representation. IEEE Trans Med Imaging 37(3):803–814
Article Google Scholar
Dong W, Chang F, Zhao Z (2015) Visual tracking with multi-feature joint sparse representation. J Electron Imaging 24(1):013006
Article Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet Google Scholar
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet Google Scholar
Jing YK, Bian YM, Hu ZH et al (2018) Deep learning for drug design: an artificial intelligence paradigm for drug discovery in the big data Era. Aaps J 20(3):58
Article Google Scholar
Kriegeskorte N (2015) Deep learnings: a new framework for modeling biological vision and brain information processing. Annu Rev Vis Sci 1(1):417–446
Article Google Scholar
Liang M, Li Z, Chen T et al (2015) Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Trans Comput Biol Bioinf 12(4):928–937
Article Google Scholar
Liu Y, Wu F (2009) Multimodality video shot clustering with tensor representation. Multimed Tools Appl 41(1):93–109
Article MathSciNet Google Scholar
Martin C, Schaffernicht E, Scheidig A et al (2006) Multimodal sensor fusion using a probabilistic aggregation scheme for people detection and tracking. Robot Auton Syst 54(9):721–728
Article Google Scholar
Ngiam J, Khosla A, Kim M, et al (2011) Multimodal deep learning. In: Proceedings of the 28th International Conference on machine learning. New York, USA: ACM, 2011, pp 689–696
Orciuoli F, Parente M (2017) An ontology-driven context-aware recommender system for indoor shopping based on cellular automata. J Ambient Intell Hum Comput 8(6):937–955
Article Google Scholar
Penizzotto F, Slawinski E, Mut V (2014) Metric to visual aspects of the human in teleoperation of a mobile robot. IEEE Lat Am Trans 12(8):1375–1380
Article Google Scholar
Qinkun X, Xiaoguang G, Xiaowei F et al (2006) New local path replanning algorithm for unmanned combat air vehicle. World Congress Intell Control Autom 1:4033–4037
Article Google Scholar
Shen XB, Sun QS, Yuan YH (2015) A unified multiset canonical correlation analysis framework based on graph embedding for multiple feature extraction. Neurocomputing 148:397–408
Article Google Scholar
Subrahmanya N, Shin YC (2010) Sparse multiple kernel learning for signal processing applications. IEEE Trans Softw Eng 32(5):788–798
Google Scholar
Suzuki T, Sugiyama M (2013) Fast learning rate of multiple kernel learning: trade-off between sparsity and smoothness. Ann 41(3):1381–1405
MathSciNet MATH Google Scholar
Tan X, Zhang X, Li J (2015) Big data quantum private comparison with the intelligent third party. J Ambient Intell Hum Comput 6(6):797–806
Article Google Scholar
Verstraeten J, Stuip M, Birgelen TV (2012) Assessment of detect and avoid solutions for use of unmanned aircraft systems in nonsegregated airspace. In: Handbook of unmanned aerial vehicles, pp 1955–1979
Wu F, Liu Y, Zhuang Y (2009) Tensor-based transductive learning for multimodality video semantic concept detection. IEEE Trans Multimed 11(5):868–878
Article Google Scholar
Xu GL, Yan W, University C.Q. (2013) Based on the binary tree structure double optimization SVM classification algorithm. J Chongqing Norm Univ 30(6):109–113
Google Scholar
Zheng WL, Liu W, Lu YF, Lu BL, Cichocki A (2019) EmotionMeter: a multimodal framework for recognizing human emotions. IEEE Trans Cybern 49(3):1110–1122
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Information, Hohhot Vocational College, Hohhot, 010051, Mongolia, China
Yajia Li

Authors

Yajia Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yajia Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y. Multimodal visual image processing of mobile robot in unstructured environment based on semi-supervised multimodal deep network. J Ambient Intell Human Comput 11, 6349–6359 (2020). https://doi.org/10.1007/s12652-020-02037-4

Download citation

Received: 15 October 2019
Accepted: 24 April 2020
Published: 07 May 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s12652-020-02037-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal visual image processing of mobile robot in unstructured environment based on semi-supervised multimodal deep network

Abstract

Access this article

Similar content being viewed by others

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

SimCLR-Inception: An Image Representation Learning and Recognition Model for Robot Vision

Dynamic Environments Localization via Dimensions Reduction of Deep Learning Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimodal visual image processing of mobile robot in unstructured environment based on semi-supervised multimodal deep network

Abstract

Access this article

Similar content being viewed by others

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

SimCLR-Inception: An Image Representation Learning and Recognition Model for Robot Vision

Dynamic Environments Localization via Dimensions Reduction of Deep Learning Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation