Abstract
With the continuous development of computer technology, machine vision and image processing algorithms, people’s research on mobile robots with vision systems is becoming deeper and deeper. This paper studies the related problems of visual image processing of mobile robots in outdoor unstructured environments. In this work, we propose a new approach that integrates heterogeneous features through a well-designed Semi-supervised multimodal deep network (SMMDN). For each modality, there is a multi-layer sub-neural network with a separate structure corresponding to it, which is used to transform features in different modes into the same modal features. At the same time, through a network layer common to all modes above these sub-neural networks, a connection is established between these different modes, and finally a plurality of heterogeneous modes is converted into the same mode and a plurality of them are extracted from fusion characteristics of data modalities. The simulation results prove that SMMDN improves the perception and recognition ability of mobile robots for outdoor complex environments.
Similar content being viewed by others
References
Bagnell JA, Bradley D, Silver D (2010) Learning for autonomous navigation. Robot Autom Mag IEEE 17(2):74–84
Barri A, Dooms A, Jansen B, Schelkens P (2014) A locally adaptive system for the fusion of objective quality measures. IEEE Transa Image Process Publ IEEE Signal Process Soc 23(6):2446–2458
Chartsias A, Joyce T, Giuffrida MV et al (2018) Multimodal MR synthesis via modality-invariant latent representation. IEEE Trans Med Imaging 37(3):803–814
Dong W, Chang F, Zhao Z (2015) Visual tracking with multi-feature joint sparse representation. J Electron Imaging 24(1):013006
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Jing YK, Bian YM, Hu ZH et al (2018) Deep learning for drug design: an artificial intelligence paradigm for drug discovery in the big data Era. Aaps J 20(3):58
Kriegeskorte N (2015) Deep learnings: a new framework for modeling biological vision and brain information processing. Annu Rev Vis Sci 1(1):417–446
Liang M, Li Z, Chen T et al (2015) Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Trans Comput Biol Bioinf 12(4):928–937
Liu Y, Wu F (2009) Multimodality video shot clustering with tensor representation. Multimed Tools Appl 41(1):93–109
Martin C, Schaffernicht E, Scheidig A et al (2006) Multimodal sensor fusion using a probabilistic aggregation scheme for people detection and tracking. Robot Auton Syst 54(9):721–728
Ngiam J, Khosla A, Kim M, et al (2011) Multimodal deep learning. In: Proceedings of the 28th International Conference on machine learning. New York, USA: ACM, 2011, pp 689–696
Orciuoli F, Parente M (2017) An ontology-driven context-aware recommender system for indoor shopping based on cellular automata. J Ambient Intell Hum Comput 8(6):937–955
Penizzotto F, Slawinski E, Mut V (2014) Metric to visual aspects of the human in teleoperation of a mobile robot. IEEE Lat Am Trans 12(8):1375–1380
Qinkun X, Xiaoguang G, Xiaowei F et al (2006) New local path replanning algorithm for unmanned combat air vehicle. World Congress Intell Control Autom 1:4033–4037
Shen XB, Sun QS, Yuan YH (2015) A unified multiset canonical correlation analysis framework based on graph embedding for multiple feature extraction. Neurocomputing 148:397–408
Subrahmanya N, Shin YC (2010) Sparse multiple kernel learning for signal processing applications. IEEE Trans Softw Eng 32(5):788–798
Suzuki T, Sugiyama M (2013) Fast learning rate of multiple kernel learning: trade-off between sparsity and smoothness. Ann 41(3):1381–1405
Tan X, Zhang X, Li J (2015) Big data quantum private comparison with the intelligent third party. J Ambient Intell Hum Comput 6(6):797–806
Verstraeten J, Stuip M, Birgelen TV (2012) Assessment of detect and avoid solutions for use of unmanned aircraft systems in nonsegregated airspace. In: Handbook of unmanned aerial vehicles, pp 1955–1979
Wu F, Liu Y, Zhuang Y (2009) Tensor-based transductive learning for multimodality video semantic concept detection. IEEE Trans Multimed 11(5):868–878
Xu GL, Yan W, University C.Q. (2013) Based on the binary tree structure double optimization SVM classification algorithm. J Chongqing Norm Univ 30(6):109–113
Zheng WL, Liu W, Lu YF, Lu BL, Cichocki A (2019) EmotionMeter: a multimodal framework for recognizing human emotions. IEEE Trans Cybern 49(3):1110–1122
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Y. Multimodal visual image processing of mobile robot in unstructured environment based on semi-supervised multimodal deep network. J Ambient Intell Human Comput 11, 6349–6359 (2020). https://doi.org/10.1007/s12652-020-02037-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02037-4