GPU and ROS the Use of General Parallel Processing Architecture for Robot Perception

  • Nicolas Dalmedico
  • Marco Antônio Simões Teixeira
  • Higor Santos Barbosa
  • André Schneider de Oliveira
  • Lucia Valeria Ramos de Arruda
  • Flavio Neves Jr
Part of the Studies in Computational Intelligence book series (SCI, volume 778)


This chapter presents a full tutorial on how to get started on performing parallel processing with ROS. The chapter starts with a guide on how to install the complete version of ROS on the Nvidia development boards Tegra K1, Tegra X1 and Tegra X2. The tutorial includes a guide on how to update the development boards with the latest OS, and configuring CUDA, ROS and OpenCV4Tegra so that they are ready to perform the sample packages included in this chapter. The chapter follows with a description on how to install CUDA in a computer with Ubuntu operating system. After that, the integration between ROS and CUDA is covered, with many examples on how to create packages and perform parallel processing over several of the most used ROS message types. The codes and examples presented on this chapter are available in GitHub and can be found under the repository in


Parallel processing CUDA ROS GPU 



The projects of this chapter were partially funded by National Counsel of Technological and Scientific Development of Brazil (CNPq), by Coordination for the Improvement of Higher Level People (CAPES) and by National Agency of Petroleum, Natural Gas and Biofuels (ANP) together with the Financier of Studies and Projects (FINEP) and Brazilian Ministry of Science and Technology (MCT) through the ANP Human Resources Program for the Petroleum and Gas Sector - PRH-ANP/MCT PRH10-UTFPR. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tegra X1 and Tegra K1 development boards used for this chapter.


  1. 1.
    C.J. Thompson, S. Hahn, M. Oskin, Using modern graphics architectures for general-purpose computing: a framework and analysis, in Proceedings 35th Annual IEEE/ACM International Symposium on Micro Architecture, (MICRO-35) (IEEE, New York, 2002), pp. 306–317Google Scholar
  2. 2.
    I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, P. Hanrahan, Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph. (TOG) 23(3), 777–786 (2004). ACMCrossRefGoogle Scholar
  3. 3.
    N.K. Govindaraju, B. Lloyd, W. Wang, M. Lin, D. Manocha, Fast computation of database operations using graphics processors, in Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (ACM, 2004), pp. 215–226Google Scholar
  4. 4.
    Z. Fan, F. Qiu, A. Kaufman, S. Yoakum-Stover, Gpu cluster for high performance computing, in Proceedings of the ACM/IEEE SC2004 Conference on Supercomputing (IEEE, New York, 2004), pp. 47–47Google Scholar
  5. 5.
    A. Barak, T. Ben-Nun, E. Levy, A. Shiloh, A package for opencl based heterogeneous computing on clusters with many gpu devices, in 2010 IEEE International Conference on Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS) (IEEE, New York 2010), pp. 1–7Google Scholar
  6. 6.
    Nvidia, Compute unified device architecture programming guide, 2007Google Scholar
  7. 7.
    V.W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A.D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund et al., Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. ACM SIGARCH Comput. Archit. News 38(3), 451–460 (2010)CrossRefGoogle Scholar
  8. 8.
    P. Micikevicius, 3d finite difference computation on GPUS using CUDA, in Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units (ACM, 2009), pp. 79–84Google Scholar
  9. 9.
    T. Preis, P. Virnau, W. Paul, J.J. Schneider, Gpu accelerated monte carlo simulation of the 2d and 3d ising model. J. Comput. Phys. 228(12), 4468–4477 (2009)CrossRefGoogle Scholar
  10. 10.
    D. Qiu, S. May, A. Nüchter, GPU-accelerated nearest neighbor search for 3d registration, in International Conference on Computer Vision Systems (Springer, Berlin, 2009), pp. 194–203Google Scholar
  11. 11.
    R. Ugolotti, G. Micconi, J. Aleotti, S.Cagnoni, GPU-based point cloud recognition using evolutionary algorithms, in European Conference on the Applications of Evolutionary Computation (Springer, Berlin, 2014), pp. 489–500Google Scholar
  12. 12.
    L.M.F. Christino, Aceleração por gpu de serviços em sistemas robóticos focado no processamento de tempo real de nuvem de pontos 3d, Ph.D. dissertation, Universidade de São PauloGoogle Scholar
  13. 13.
    K.B. Kaldestad, G. Hovland, D.A. Anisi, 3d sensor-based obstacle detection comparing octrees and point clouds using CUDA. Model. Identif. Control 33(4), 123 (2012)CrossRefGoogle Scholar
  14. 14.
    M. Liu, F. Pomerleau, F. Colas, R. Siegwart, Normal estimation for pointcloud using GPU based sparse tensor voting, in 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO) (IEEE, New York, 2012), pp. 91–96Google Scholar
  15. 15.
    R.B. Rusu, S. Cousins, 3D is here: Point cloud library (PCL), in 2011 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, New York, 2011), pp. 1–4Google Scholar
  16. 16.
    P. Michel, J. Chestnutt, S. Kagami, K. Nishiwaki, J. Kuffner, T. Kanade, GPU-accelerated real-time 3D tracking for humanoid locomotion and stair climbing, in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2007 (IEEE, New York, 2007), pp. 463–469Google Scholar
  17. 17.
    P. Henry, M. Krainin, E. Herbst, X. Ren, D. Fox, RGB-D mapping: using kinect-style depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31(5), 647–663 (2012)CrossRefGoogle Scholar
  18. 18.
    P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J.-M. Frahm, R. Yang, D. Nistér, M. Pollefeys, Real-time visibility-based fusion of depth maps, in IEEE 11th International Conference on Computer Vision, ICCV 2007 (IEEE, New York, 2007), pp. 1–8Google Scholar
  19. 19.
    C. Choi, H.I. Christensen, RGB-D object tracking: A particle filter approach on GPU, in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, New York, 2013), pp. 1084–1091Google Scholar
  20. 20.
    P.J.S. Leite, J.M.X.N. Teixeira, T.S.M.C. de Farias, V. Teichrieb, J. Kelner, Massively parallel nearest neighbor queries for dynamic point clouds on the GPU, in 21st International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD’09 (IEEE, New York, 2009), pp. 19–25Google Scholar
  21. 21.
    JetsonHacks, Jetsonhacks - developing for Nvidia jetson, [Online] (2017),
  22. 22.
    W. Lucetti, Ros hacking for opencv on Nvidia jetson tx1 & jetson tk1, [Online], (2016),
  23. 23.
    C. Zeller, Cuda c/c\(++\) basics, Nvidia Corporation, Supercomputing Tutorial, 2011, pp. 9–11Google Scholar
  24. 24.
    SICK, Lms200 technical description, [online][retrieved sep. 11, 2014], 2003Google Scholar
  25. 25.
    E. Rohmer, S.P. Singh, M. Freese, V-rep: a versatile and scalable robot simulation framework, in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, New York, 2013), pp. 1321–1326Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Nicolas Dalmedico
    • 1
  • Marco Antônio Simões Teixeira
    • 1
  • Higor Santos Barbosa
    • 1
  • André Schneider de Oliveira
    • 1
  • Lucia Valeria Ramos de Arruda
    • 1
  • Flavio Neves Jr
    • 1
  1. 1.Federal University of Technology - ParanaCuritibaBrazil

Personalised recommendations