Vision-state Fusion: Improving Deep Neural Networks for Autonomous Robotics

Cereda, Elia; Bonato, Stefano; Nava, Mirko; Giusti, Alessandro; Palossi, Daniele

doi:10.1007/s10846-024-02091-6

Vision-state Fusion: Improving Deep Neural Networks for Autonomous Robotics

Regular paper
Open access
Published: 10 April 2024

Volume 110, article number 58, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Vision-state Fusion: Improving Deep Neural Networks for Autonomous Robotics

Download PDF

289 Accesses
Explore all metrics

Abstract

Vision-based deep learning perception fulfills a paramount role in robotics, facilitating solutions to many challenging scenarios, such as acrobatic maneuvers of autonomous unmanned aerial vehicles (UAVs) and robot-assisted high-precision surgery. Control-oriented end-to-end perception approaches, which directly output control variables for the robot, commonly take advantage of the robot’s state estimation as an auxiliary input. When intermediate outputs are estimated and fed to a lower-level controller, i.e., mediated approaches, the robot’s state is commonly used as an input only for egocentric tasks, which estimate physical properties of the robot itself. In this work, we propose to apply a similar approach for the first time – to the best of our knowledge – to non-egocentric mediated tasks, where the estimated outputs refer to an external subject. We prove how our general methodology improves the regression performance of deep convolutional neural networks (CNNs) on a broad class of non-egocentric 3D pose estimation problems, with minimal computational cost. By analyzing three highly-different use cases, spanning from grasping with a robotic arm to following a human subject with a pocket-sized UAV, our results consistently improve the R\(^{2}\) regression metric, up to +0.51, compared to their stateless baselines. Finally, we validate the in-field performance of a closed-loop autonomous cm-scale UAV on the human pose estimation task. Our results show a significant reduction, i.e., 24% on average, on the mean absolute error of our stateful CNN, compared to a State-of-the-Art stateless counterpart.

Article PDF

Human Pose Estimation in UAV-Human Workspace

On-line object detection: a robotics challenge

Article 25 November 2019

Pose Forecasting in Industrial Human-Robot Collaboration

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Pinto, L., Gupta, A.: Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In: IEEE international conference on robotics and automation (ICRA). IEEE 2016, 3406–3413 (2016)
Palossi, D., Zimmerman, N., Burrello, A., Conti, F., Müller, H., Gambardella, L.M., Benini, L., Giusti, A., Guzzi, J.: Fully onboard AI-powered human-drone pose estimation on ultra-low power autonomous flying nano-UAVs, IEEE Int. Things J. (2021) pp. 1–1https://doi.org/10.1109/JIOT.2021.3091643
Loquercio, A., Kaufmann, E., Ranftl, R., Müller, M., Koltun, V., Scaramuzza, D.: Learning high-speed flight in the wild. Sci. Robot. 6(59), (2021) eabg5810. https://doi.org/10.1126/scirobotics.abg5810
Kaufmann, E., Loquercio, A., Ranftl, R., Mueller, M., Koltun, V., Scaramuzza, D.: Deep drone acrobatics. In: Robotics science and systems XVI, pp. 4780–4783 (2020)
Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: VINet: Visual-inertial odometry as a sequence-to-sequence learning problem. Proceedings of the AAAI conference on artificial intelligence 31(1) (2017). https://doi.org/10.1609/aaai.v31i1.11215
Han, L., Lin, Y., Du, G., Lian, S.: DeepVIO: Self-supervised deep learning of monocular visual inertial odometry using 3d geometric constraints, in. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2019, 6906–6913 (2019). https://doi.org/10.1109/IROS40897.2019.8968467
Article Google Scholar
Abekawa, N., Ferrè, E.R., Gallagher, M., Gomi, H., Haggard, P.: Disentangling the visual, motor and representational effects of vestibular input. Cortex 104, 46–57 (2018)
Article Google Scholar
Ferrè, E.R., Alsmith, A.J., Haggard, P., Longo, M.R.: The vestibular system modulates the contributions of head and torso to egocentric spatial judgements. Exp. Brain Res. 239(7), 2295–2302 (2021)
Article Google Scholar
Clement, G., Fraysse, M.-J., Deguine, O.: Mental representation of space in vestibular patients with otolithic or rotatory vertigo. NeuroReport 20(5), 457–461 (2009)
Article Google Scholar
Clément, G., Skinner, A., Richard, G., Lathan, C.: Geometric illusions in astronauts during long-duration spaceflight. NeuroReport 23(15), 894–899 (2012)
Article Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research 17(1), 1334–1373 (2016)
MathSciNet Google Scholar
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke V et al.: Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on robot learning, PMLR, pp. 651–673 (2018)
Pillai, S., Leonard, J.J.: Towards visual ego-motion learning in robots, in. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017, 5533–5540 (2017). https://doi.org/10.1109/IROS.2017.8206441
Article Google Scholar
Cereda, E., Ferri, M., Mantegazza, D., Zimmerman, N., Gambardella, L.M., Guzzi, J., Giusti, A., Palossi, D.: Improving the generalization capability of DNNs for ultra-low power autonomous nano-UAVs. In: 2021 17th International conference on distributed computing in sensor systems (DCOSS), pp. 327–334 (2021) https://doi.org/10.1109/DCOSS52077.2021.00060
Li, S., De Wagter, C., De Croon, G.C.H.E.: Self-supervised monocular multi-robot relative localization with efficient deep neural networks, in. International Conference on Robotics and Automation (ICRA) 2022, 9689–9695 (2022). https://doi.org/10.1109/ICRA46639.2022.9812150
Article Google Scholar
Kaufmann, E., Gehrig, M., Foehn, P., Ranftl, R., Dosovitskiy, A., Koltun, V., Scaramuzza, D.: Beauty and the beast: Optimal methods meet learning for drone racing. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp. 690–696 (2019)
Jung, S., Hwang, S., Shin, H., Shim, D.H.: Perception, guidance, and navigation for indoor autonomous drone racing using deep learning. IEEE Robotics and Automation Letters 3(3), 2539–2544 (2018)
Article Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE 2017, 23–30 (2017)
Zeng, A., Yu, K.-T., Song, S., Suo, D., Walker, E., Rodriguez, A., Xiao, J.: Multi-view self-supervised deep learning for 6D pose estimation in the Amazon picking challenge. In: IEEE international conference on robotics and automation (ICRA). IEEE 2017, 1383–1386 (2017)
Nava, M., Paolillo, A., Guzzi, J., Gambardella, L.M., Giusti, A.: Uncertainty-aware self-supervised learning of spatial perception tasks. IEEE Robotics and Automation Letters 6(4), 6693–6700 (2021)
Article Google Scholar
Shorten, C., Khoshgoftaar, T.: A survey on image data augmentation for deep learning. J. Big Data 6 (2019). https://doi.org/10.1186/s40537-019-0197-0
Xie, Q., Dai, Z., Hovy, E., Luong, T., Le, Q.: Unsupervised data augmentation for consistency training. In: Advances in neural information processing systems, vol 33, Curran Associates, Inc., pp 6256–6268 (2020)
Zheng, Q., Zhao, P., Li, Y., Wang, H., Yang, Y.: Spectrum interference-based two-level data augmentation method in deep learning for automatic modulation classification. Neural Comput. Appl. 33(13), 7723–7745 (2021)
Article Google Scholar
Wan, Y., Gao, W., Han, S., Wu, Y.: Boosting image-based localization via randomly geometric data augmentation, in. IEEE International Conference on Image Processing (ICIP) 2020, 688–692 (2020). https://doi.org/10.1109/ICIP40778.2020.9190809
Article Google Scholar
Guerry, J., Boulch, A., Le Saux, B., Moras, J., Plyer, A., Filliat, D.: SnapNet-R: Consistent 3D multi-view semantic labeling for robotics. In: Proceedings of the IEEE international conference on computer vision (ICCV) Workshops, pp. 669–678 (2017)
Zoph, B., Cubuk, E.D., Ghiasi, G., Lin, T.-Y., Shlens, J., Le, Q.V.: Learning data augmentation strategies for object detection. In: European conference on computer vision, Springer, pp. 566–583 (2020)
Coleman, D., Sucan, I. A., Chitta, S., Correll, N.: Reducing the barrier to entry of complex robotic software: a MoveIt! case study. J. Softw. Eng. Robot. (2014)
Palossi, D., Conti, F., Benini, L.: An open source and open hardware deep learning-powered visual navigation engine for autonomous nano-uavs. In: 2019 15th International conference on distributed computing in sensor systems (DCOSS), pp. 604–611 (2019). https://doi.org/10.1109/DCOSS.2019.00111
Gautschi, M., Schiavone, P.D., Traber, A., Loi, I., Pullini, A., Rossi, D., Flamand, E., Gürkaynak, F.K., Benini, L.: Near-threshold RISC-V core with DSP extensions for scalable IoT endpoint devices. IEEE Trans. Very Large Scale Integr. (VLSI) Systems 25(10) (2017). https://doi.org/10.1109/TVLSI.2017.2654506
Clarke, T.A., Fryer, J.G.: The development of camera calibration methods and models. Photogram. Rec. 16(91), 51–66 (1998)
Article Google Scholar
Mahendran, S., Ali, H., Vidal, R.: 3D pose regression using convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 2174–2182 (2017)
Redmon, J., Farhadi, A.: https://arxiv.org/abs/1804.02767YOLOv3: An incremental improvement (2018). https://doi.org/10.48550/ARXIV.1804.02767. arXiv:1804.02767

Download references

Funding

Open access funding provided by Universitá della Svizzera italiana. This work was partially supported by the Secure Systems Research Center (SSRC) of the UAE Technology Innovation Institute (TII) and the Swiss National Science Foundation (SNSF) through the NCCR Robotics.

Author information

Authors and Affiliations

Dalle Molle Institute for Artificial Intelligence, USI and SUPSI, Lugano, 6962, Switzerland
Elia Cereda, Stefano Bonato, Mirko Nava, Alessandro Giusti & Daniele Palossi
Integrated Systems Laboratory, ETH Zürich, Zürich, 8092, Switzerland
Daniele Palossi

Authors

Elia Cereda
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Bonato
View author publications
You can also search for this author in PubMed Google Scholar
Mirko Nava
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Giusti
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Palossi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. E.C. wrote the main manuscript text and contributed the implementation and experiments for the drone-to-human use case. S.B. contributed the drone-to-drone use case. M.N. contributed the robot arm-to-object use case. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Elia Cereda.

Ethics declarations

Ethics approval

This is an observational study. No ethical approval is required for this article.

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent to publish

The authors affirm that human research participants provided informed consent for publication of the image in Fig. 1.

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 51222 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cereda, E., Bonato, S., Nava, M. et al. Vision-state Fusion: Improving Deep Neural Networks for Autonomous Robotics. J Intell Robot Syst 110, 58 (2024). https://doi.org/10.1007/s10846-024-02091-6

Download citation

Received: 05 August 2023
Accepted: 19 March 2024
Published: 10 April 2024
DOI: https://doi.org/10.1007/s10846-024-02091-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Vision-state Fusion: Improving Deep Neural Networks for Autonomous Robotics

Abstract

Article PDF

Similar content being viewed by others

Human Pose Estimation in UAV-Human Workspace

On-line object detection: a robotics challenge

Pose Forecasting in Industrial Human-Robot Collaboration

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent to publish

Competing interests

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Vision-state Fusion: Improving Deep Neural Networks for Autonomous Robotics

Abstract

Article PDF

Similar content being viewed by others

Human Pose Estimation in UAV-Human Workspace

On-line object detection: a robotics challenge

Pose Forecasting in Industrial Human-Robot Collaboration

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent to publish

Competing interests

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation