Skip to main content

Robot-to-robot relative pose estimation using humans as markers


In this paper, we propose a method to determine the 3D relative pose of pairs of communicating robots by using human pose-based key-points as correspondences. We adopt a ‘leader-follower’ framework, where at first, the leader robot visually detects and triangulates the key-points using the state-of-the-art pose detector named OpenPose. Afterward, the follower robots match the corresponding 2D projections on their respective calibrated cameras and find their relative poses by solving the perspective-n-point (PnP) problem. In the proposed method, we design an efficient person re-identification technique for associating the mutually visible humans in the scene. Additionally, we present an iterative optimization algorithm to refine the associated key-points based on their local structural properties in the image space. We demonstrate that these refinement processes are essential to establish accurate key-point correspondences across viewpoints. Furthermore, we evaluate the performance of the proposed relative pose estimation system through several experiments conducted in terrestrial and underwater environments. Finally, we discuss the relevant operational challenges of this approach and analyze its feasibility for multi-robot cooperative systems in human-dominated social settings and feature-deprived environments such as underwater.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14


  1. 1.

  2. 2.


  1. Ahmed, E., Jones, M., & Marks, T. K. (2015). An improved deep learning architecture for Person Re-identification. In Conference on computer vision and pattern recognition (CVPR), (pp. 3908–3916). IEEE.

  2. Alp Güler, R., Neverova, N., & Kokkinos, I. (2018). DensePose: Dense human Pose estimation in the wild. In Conference on computer vision and pattern recognition (CVPR) (pp. 7297–7306). IEEE.

  3. Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: People detection and articulated Pose estimation. In Conference on computer vision and pattern recognition (CVPR) (pp. 1014–1021). IEEE.

  4. Avanaki, A. N. (2009). Exact global histogram specification optimized for structural similarity. Optical Review, 16(6), 613–621.

    Article  Google Scholar 

  5. Cao, Z., Simon, T., Wei, S.-E., & Sheikh, Y. (2017). Realtime multi-person 2d Pose estimation using part affinity fields. In Conference on computer vision and pattern recognition (CVPR) (pp. 7291–7299). IEEE.

  6. Damron, H., Li, A. Q., & Rekleitis, I. (2018). Underwater surveying via bearing only cooperative localization. In International conference on intelligent robots and systems (IROS) (pp. 3957–3963). IEEE/RSJ.

  7. Dunbabin, M., Dayoub, F., Lamont, R., & Martin, S. (2019). Real-time vision-only perception for robotic coral reef monitoring and management. In ICRA workshop on underwater robotics perception. IEEE.

  8. Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2008). Progressive search space reduction for human Pose estimation. In Conference on computer vision and pattern recognition (CVPR) (pp. 1–8). IEEE.

  9. Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.

    MathSciNet  Article  Google Scholar 

  10. Gkioxari, G., Hariharan, B., Girshick, R., & Malik, J. (2014). Using K-poselets for detecting people and localizing their keypoints. In Conference on computer vision and pattern recognition (CVPR) (pp. 3582–3589). IEEE.

  11. Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  12. Islam, M. J., Ho, M., & Sattar, J. (2018). Understanding human motion and gestures for underwater human–robot collaboration. Journal of Field Robotics (JFR), 1–23.

  13. Islam, M. J., Hong, J., & Sattar, J. (2019). Person-following by autonomous robots: A categorical overview. International Journal of Robotics Research (IJRR), 38(14), 1581–1618.

    Article  Google Scholar 

  14. Janabi-Sharifi, F., & Marey, M. (2010). A Kalman-filter-based method for pose estimation in visual servoing. Transactions on Robotics (TRO), 26(5), 939–947.

    Article  Google Scholar 

  15. Johnson, S., & Everingham, M. (2011). Learning effective human Pose estimation from inaccurate annotation. In Conference on computer vision and pattern recognition (CVPR) (pp. 1465–1472). IEEE.

  16. Johnson-Roberson, M., Bryson, M., Friedman, A., Pizarro, O., Troni, G., Ozog, P., & Henderson, J. C. (2017). High-resolution underwater robotic vision-based mapping and three-dimensional reconstruction for archaeology. Journal of Field Robotics (JFR), 34(4), 625–643.

    Article  Google Scholar 

  17. Kalaitzakis, M., Cain, B., Vitzilaios, N., Rekleitis, I., & Moulton, J. (2020). A marsupial robotic system for surveying and inspection of freshwater ecosystems. Journal of Field Robotics (JFR).

  18. Kim, A., & Eustice, R. M. (2013). Real-time visual SLAM for autonomous underwater hull inspection using visual saliency. IEEE Transactions on Robotics (TRO), 29(3), 719–733.

    Article  Google Scholar 

  19. Kümmerle, R., Ruhnke, M., Steder, B., Stachniss, C., & Burgard, W. (2013). A navigation system for robots operating in crowded urban environments. In International conference on robotics and automation (ICRA) (pp. 3225–3232). IEEE.

  20. Landa-Torres, I., Manjarres, D., Bilbao, S., & Del Ser, J. (2017). Underwater robot task planning using multi-objective meta-heuristics. Sensors, 17(4), 762.

    Article  Google Scholar 

  21. Lei, J., Song, M., Li, Z.-N., & Chen, C. (2015). Whole-body humanoid robot imitation with pose similarity evaluation. Signal Processing, 108, 136–146.

    Article  Google Scholar 

  22. Li, W., Zhao, R., Xiao, T., & Wang, X. (2014). Deepreid: Deep filter pairing neural network for person re-identification. In Conference on computer vision and pattern recognition (CVPR) (pp. 152–159). IEEE.

  23. Mainprice, J. & Berenson, D. (2013). Human–robot collaborative manipulation planning using early prediction of human motion. In International conference on intelligent robots and systems (IROS) (pp. 299–306). IEEE/RSJ.

  24. Manderson, T., Higuera, J. C. G., Cheng, R., & Dudek, G. (2018). Vision-based autonomous underwater swimming in dense coral for combined collision avoidance and target selection. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1885–1891). IEEE.

  25. Mead, R., & Matarić, M. J. (2017). Autonomous human–robot proxemics: Socially aware navigation based on interaction potential. Autonomous Robots, 41(5), 1189–1201.

    Article  Google Scholar 

  26. Montemerlo, M., Thrun, S., & Whittaker, W. (2002). Conditional particle filters for simultaneous mobile robot localization and people-tracking. In International conference on robotics and automation (ICRA) (Vol. 1, pp. 695–701). IEEE.

  27. NVIDIA™ (2014). Embedded Computing Boards. Accessed 2 August 2019.

  28. OpenCV. (2018). Fast library for approximate nearest neighbors (FLANN)-based 2D feature matching algorithm. Accessed 20 June 2020.

  29. Otero, D. & Vrscay, E. R. (2014). Solving optimization problems that employ structural similarity as the fidelity measure. In International conference on image processing, computer vision, and pattern recognition (IPCV) (p. 1).

  30. Pishchulin, L., Andriluka, M., Gehler, P., & Schiele, B. (2013). Poselet conditioned pictorial structures. In Conference on computer vision and pattern recognition (CVPR) (pp. 588–595). IEEE.

  31. Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P. V., & Schiele, B. (2016). DeepCut: Joint subset partition and labeling for multi person Pose estimation. In Conference on computer vision and pattern recognition (CVPR) (pp. 4929–4937). IEEE.

  32. Pishchulin, L., Jain, A., Andriluka, M., Thormählen, T., & Schiele, B. (2012). Articulated people detection and pose estimation: Reshaping the future. In Conference on computer vision and pattern recognition (CVPR) (pp. 3178–3185). IEEE.

  33. Ramakrishna, V., Munoz, D., Hebert, M., Bagnell, J. A., & Sheikh, Y. (2014). Pose machines: Articulated Pose estimation via inference machines. In European conference on computer vision (ECCV) (pp. 33–47). Springer.

  34. Rekleitis, I., Meger, D., & Dudek, G. (2006). Simultaneous planning, localization, and mapping in a camera sensor network. Robotics and Autonomous Systems, 54(11), 921–932.

    Article  Google Scholar 

  35. Rekleitis, I. M., Dudek, G., & Milios, E. E. (2002). Multi-robot cooperative localization: A study of trade-offs between efficiency and accuracy. In International conference on intelligent robots and systems (IROS) (Vol. 3, pp. 2690–2695). IEEE/RSJ.

  36. (2018). ROS time synchronizer. Accessed 20 June 2020.

  37. Sattar, J., Dudek, G., Chiu, O., Rekleitis, I., Giguere, P., Mills, A., Plamondon, N., Prahacs, C., Girdhar, Y., & Nahon, M., et al. (2008). Enabling autonomous capabilities in underwater robotics. In International conference on intelligent robots and systems (IROS) (pp. 3628–3634). IEEE/RSJ.

  38. Se, S., Lowe, D. G., & Little, J. J. (2005). Vision-based global localization and mapping for mobile robots. Transactions on Robotics (TRO), 21(3), 364–375.

    Article  Google Scholar 

  39. Shkurti, F., Chang, W.-D., Henderson, P., Islam, M. J., Higuera, J. C. G., Li, J., Manderson, T., Xu, A., Dudek, G., & Sattar, J. (2017). Underwater multi-robot convoying using visual tracking by detection. In International conference on intelligent robots and systems (IROS). IEEE/RSJ.

  40. Toshev, A. & Szegedy, C. (2014). DeepPose: Human Pose estimation via deep neural networks. In Conference on computer vision and pattern recognition (CVPR) (pp. 1653–1660). IEEE.

  41. Trawny, N. & Roumeliotis, S. I. (2010). On the global optimum of planar, range-based robot-to-robot relative pose estimation. In International conference on robotics and automation (ICRA) (pp. 3200–3206). IEEE.

  42. Trawny, N., Zhou, X. S., Zhou, K., & Roumeliotis, S. I. (2010). Inter-robot transformations in 3D. Transactions on Robotics (TRO), 26(2), 226–243.

    Article  Google Scholar 

  43. Valgren, C., & Lilienthal, A. J. (2010). SIFT, SURF & seasons: Appearance-based long-term localization in outdoor environments. Robotics and Autonomous Systems, 58(2), 149–156.

    Article  Google Scholar 

  44. Wang, J. & Wilson, W. J. (1992). 3D relative position and orientation estimation using Kalman filter for robot control. In International conference on robotics and automation (ICRA) (pp. 2638–2645). IEEE.

  45. Wang, Z., Bovik, A. C., Sheikh, H. R., Simoncelli, E. P., et al. (2004). Image quality assessment: From error visibility to structural similarity. Transactions on Image Processing (TIP), 13(4), 600–612.

    Article  Google Scholar 

  46. Zhao, L., Li, X., Zhuang, Y., & Wang, J. (2017). Deeply-learned part-aligned representations for person Re-identification. IEEE International conference on computer vision (ICCV) (pp. 3219–3228).

  47. Zheng, W.-S., Gong, S., & Xiang, T. (2011). Person Re-identification by probabilistic relative distance comparison. In Conference on computer vision and pattern recognition (CVPR) (pp. 649–656). IEEE.

  48. Zheng, Y., Kuang, Y., Sugimoto, S., Astrom, K., & Okutomi, M. (2013). Revisiting the PnP problem: A fast, general and optimal solution. In International conference on computer vision (ICCV) (pp. 2344–2351). IEEE.

  49. Zhou, X. S., & Roumeliotis, S. I. (2008). Robot-to-robot relative Pose estimation from range measurements. Transactions on Robotics (TRO), 24(6), 1379–1393.

    Article  Google Scholar 

  50. Zhou, X. S. & Roumeliotis, S. I. (2011). Determining the robot-to-robot 3D relative Pose using combinations of range and bearing measurements (Part II). In International conference on robotics and automation (ICRA) (pp. 4736–4743). IEEE.

Download references


We would like to thank Hyun Soo Park (Assistant Professor, University of Minnesota) for his valuable insights which immensely enriched this paper. We gratefully acknowledge the support of the MnDrive initiative and thank NVIDIA Corporation for donating two Titan-class GPUs for this research. In addition, we are grateful to the Bellairs Research Institute of Barbados for providing us with the facilities for field experiments; we also acknowledge our colleagues at the IRVLab and the participants of the 2019 Marine Robotics Sea Trials for their assistance in collecting data and conducting the experiments.

Author information



Corresponding author

Correspondence to Md Jahidul Islam.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Islam, M.J., Mo, J. & Sattar, J. Robot-to-robot relative pose estimation using humans as markers. Auton Robot 45, 579–593 (2021).

Download citation


  • Underwater human–robot cooperation
  • Marine robotics
  • Underwater visual perception