Skip to main content
Log in

Multimodal visual image processing of mobile robot in unstructured environment based on semi-supervised multimodal deep network

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

With the continuous development of computer technology, machine vision and image processing algorithms, people’s research on mobile robots with vision systems is becoming deeper and deeper. This paper studies the related problems of visual image processing of mobile robots in outdoor unstructured environments. In this work, we propose a new approach that integrates heterogeneous features through a well-designed Semi-supervised multimodal deep network (SMMDN). For each modality, there is a multi-layer sub-neural network with a separate structure corresponding to it, which is used to transform features in different modes into the same modal features. At the same time, through a network layer common to all modes above these sub-neural networks, a connection is established between these different modes, and finally a plurality of heterogeneous modes is converted into the same mode and a plurality of them are extracted from fusion characteristics of data modalities. The simulation results prove that SMMDN improves the perception and recognition ability of mobile robots for outdoor complex environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Bagnell JA, Bradley D, Silver D (2010) Learning for autonomous navigation. Robot Autom Mag IEEE 17(2):74–84

    Article  Google Scholar 

  • Barri A, Dooms A, Jansen B, Schelkens P (2014) A locally adaptive system for the fusion of objective quality measures. IEEE Transa Image Process Publ IEEE Signal Process Soc 23(6):2446–2458

    Article  MathSciNet  Google Scholar 

  • Chartsias A, Joyce T, Giuffrida MV et al (2018) Multimodal MR synthesis via modality-invariant latent representation. IEEE Trans Med Imaging 37(3):803–814

    Article  Google Scholar 

  • Dong W, Chang F, Zhao Z (2015) Visual tracking with multi-feature joint sparse representation. J Electron Imaging 24(1):013006

    Article  Google Scholar 

  • Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  Google Scholar 

  • Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  Google Scholar 

  • Jing YK, Bian YM, Hu ZH et al (2018) Deep learning for drug design: an artificial intelligence paradigm for drug discovery in the big data Era. Aaps J 20(3):58

    Article  Google Scholar 

  • Kriegeskorte N (2015) Deep learnings: a new framework for modeling biological vision and brain information processing. Annu Rev Vis Sci 1(1):417–446

    Article  Google Scholar 

  • Liang M, Li Z, Chen T et al (2015) Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Trans Comput Biol Bioinf 12(4):928–937

    Article  Google Scholar 

  • Liu Y, Wu F (2009) Multimodality video shot clustering with tensor representation. Multimed Tools Appl 41(1):93–109

    Article  MathSciNet  Google Scholar 

  • Martin C, Schaffernicht E, Scheidig A et al (2006) Multimodal sensor fusion using a probabilistic aggregation scheme for people detection and tracking. Robot Auton Syst 54(9):721–728

    Article  Google Scholar 

  • Ngiam J, Khosla A, Kim M, et al (2011) Multimodal deep learning. In: Proceedings of the 28th International Conference on machine learning. New York, USA: ACM, 2011, pp 689–696

  • Orciuoli F, Parente M (2017) An ontology-driven context-aware recommender system for indoor shopping based on cellular automata. J Ambient Intell Hum Comput 8(6):937–955

    Article  Google Scholar 

  • Penizzotto F, Slawinski E, Mut V (2014) Metric to visual aspects of the human in teleoperation of a mobile robot. IEEE Lat Am Trans 12(8):1375–1380

    Article  Google Scholar 

  • Qinkun X, Xiaoguang G, Xiaowei F et al (2006) New local path replanning algorithm for unmanned combat air vehicle. World Congress Intell Control Autom 1:4033–4037

    Article  Google Scholar 

  • Shen XB, Sun QS, Yuan YH (2015) A unified multiset canonical correlation analysis framework based on graph embedding for multiple feature extraction. Neurocomputing 148:397–408

    Article  Google Scholar 

  • Subrahmanya N, Shin YC (2010) Sparse multiple kernel learning for signal processing applications. IEEE Trans Softw Eng 32(5):788–798

    Google Scholar 

  • Suzuki T, Sugiyama M (2013) Fast learning rate of multiple kernel learning: trade-off between sparsity and smoothness. Ann 41(3):1381–1405

    MathSciNet  MATH  Google Scholar 

  • Tan X, Zhang X, Li J (2015) Big data quantum private comparison with the intelligent third party. J Ambient Intell Hum Comput 6(6):797–806

    Article  Google Scholar 

  • Verstraeten J, Stuip M, Birgelen TV (2012) Assessment of detect and avoid solutions for use of unmanned aircraft systems in nonsegregated airspace. In: Handbook of unmanned aerial vehicles, pp 1955–1979

  • Wu F, Liu Y, Zhuang Y (2009) Tensor-based transductive learning for multimodality video semantic concept detection. IEEE Trans Multimed 11(5):868–878

    Article  Google Scholar 

  • Xu GL, Yan W, University C.Q. (2013) Based on the binary tree structure double optimization SVM classification algorithm. J Chongqing Norm Univ 30(6):109–113

    Google Scholar 

  • Zheng WL, Liu W, Lu YF, Lu BL, Cichocki A (2019) EmotionMeter: a multimodal framework for recognizing human emotions. IEEE Trans Cybern 49(3):1110–1122

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yajia Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y. Multimodal visual image processing of mobile robot in unstructured environment based on semi-supervised multimodal deep network. J Ambient Intell Human Comput 11, 6349–6359 (2020). https://doi.org/10.1007/s12652-020-02037-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02037-4

Keywords

Navigation