Skip to main content

Learning-Based Point Cloud Registration for 6D Object Pose Estimation in the Real World

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

In this work, we tackle the task of estimating the 6D pose of an object from point cloud data. While recent learning-based approaches to addressing this task have shown great success on synthetic datasets, we have observed them to fail in the presence of real-world data. We thus analyze the causes of these failures, which we trace back to the difference between the feature distributions of the source and target point clouds, and the sensitivity of the widely-used SVD-based loss function to the range of rotation between the two point clouds. We address the first challenge by introducing a new normalization strategy, Match Normalization, and the second via the use of a loss function based on the negative log likelihood of point correspondences. Our two contributions are general and can be applied to many existing learning-based 3D object registration frameworks, which we illustrate by implementing them in two of them, DCP and IDAM. Our experiments on the real-scene TUD-L [26], LINEMOD [23] and Occluded-LINEMOD [7] datasets evidence the benefits of our strategies. They allow for the first time learning-based 3D object registration methods to achieve meaningful results on real-world data. We therefore expect them to be key to the future development of point cloud registration methods. Our source code can be found at https://github.com/Dangzheng/MatchNorm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agamennoni, G., Fontana, S., Siegwart, R.Y., Sorrenti, D.G.: Point clouds registration with probabilistic data association. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4092–4098. IEEE (2016)

    Google Scholar 

  2. Aiger, D., Mitra, N.J., Cohen-Or, D.: 4-points congruent sets for robust pairwise surface registration. In: ACM SIGGRAPH 2008 papers, pp. 1–10 (2008)

    Google Scholar 

  3. Aoki, Y., Goforth, H., Srivatsan, R.A., Lucey, S.: Pointnetlk: robust & efficient point cloud registration using pointnet. In: Conference on Computer Vision and Pattern Recognition, Long Beach, California, pp. 7163–7172 (2019)

    Google Scholar 

  4. Atzmon, M., Maron, H., Lipman, Y.: Point convolutional neural networks by extension operators. ACM Trans. Graph. (TOG) (2018)

    Google Scholar 

  5. Besl, P., Mckay, N.: A method for registration of 3D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992)

    Article  Google Scholar 

  6. Bouaziz, S., Tagliasacchi, A., Pauly, M.: Sparse iterative closest point. In: Computer Graphics Forum, vol. 32, pp. 113–123. Wiley Online Library, Hoboken (2013)

    Google Scholar 

  7. Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation Using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_35

    Chapter  Google Scholar 

  8. Brachmann, E., Michel, F., Krull, A., Yang, M.Y., Gumhold, S., et al.: Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3364–3372 (2016)

    Google Scholar 

  9. Bronstein, A.M., Bronstein, M.M.: Regularized partial matching of rigid shapes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 143–154. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_11

    Chapter  Google Scholar 

  10. Bronstein, A.M., Bronstein, M.M., Bruckstein, A.M., Kimmel, R.: Partial similarity of objects, or how to compare a centaur to a horse. Int. J. Comput. Vision 84(2), 163–183 (2009)

    Article  Google Scholar 

  11. Cao, A.Q., Puy, G., Boulch, A., Marlet, R.: Pcam: product of cross-attention matrices for rigid registration of point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13229–13238 (2021)

    Google Scholar 

  12. Choy, C., Dong, W., Koltun, V.: Deep global registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  13. Choy, C., Park, J., Koltun, V.: Fully convolutional geometric features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8958–8966 (2019)

    Google Scholar 

  14. Dang, Z., Moo Yi, K., Hu, Y., Wang, F., Fua, P., Salzmann, M.: Eigendecomposition-free training of deep networks with zero eigenvalue-based losses. In: European Conference on Computer Vision, Munich, Germany, pp. 768–783 (2018)

    Google Scholar 

  15. Deng, H., Birdal, T., Ilic, S.: Ppf-foldnet: unsupervised learning of rotation invariant 3D local descriptors. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 602–618 (2018)

    Google Scholar 

  16. Deng, H., Birdal, T., Ilic, S.: Ppfnet: global context aware local features for robust 3D point matching. In: Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, pp. 195–205 (2018)

    Google Scholar 

  17. Drost, B., Ulrich, M., Bergmann, P., Hartinger, P., Steger, C.: Introducing mvtec itodd-a dataset for 3D object recognition in industry. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, pp. 2200–2208 (2017)

    Google Scholar 

  18. Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: efficient and robust 3D object recognition. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 998–1005 (2010)

    Google Scholar 

  19. Fitzgibbon, A.W.: Robust registration of 2D and 3D point sets. Image Vision Comput. 21(13–14), 1145–1153 (2003)

    Article  Google Scholar 

  20. Gelfand, N., Mitra, N.J., Guibas, L.J., Pottmann, H.: Robust global registration. In: Symposium on geometry processing, Vienna, Austria, p. 5 (2005)

    Google Scholar 

  21. Gower, J.C.: Generalized procrustes analysis. Psychometrika 40(1), 33–51 (1975)

    Article  MathSciNet  Google Scholar 

  22. Hähnel, D., Burgard, W.: Probabilistic matching for 3D scan registration. In: Proceedings of the VDI-Conference Robotik, vol. 2002. Citeseer (2002)

    Google Scholar 

  23. Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42

    Chapter  Google Scholar 

  24. Hinzmann, T., et al.: Collaborative 3D reconstruction using heterogeneous UAVs: system and experiments. In: Kulić, D., Nakamura, Y., Khatib, O., Venture, G. (eds.) ISER 2016. SPAR, vol. 1, pp. 43–56. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-50115-4_5

    Chapter  Google Scholar 

  25. Hodaň, T., Matas, J., Obdržálek, Š: On evaluation of 6D object pose estimation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 606–619. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_52

    Chapter  Google Scholar 

  26. Hodan, T., et al.: Bop: benchmark for 6D object pose estimation. In: European Conference on Computer Vision, Munich, Germany, pp. 19–34 (2018)

    Google Scholar 

  27. Huang, S., Gojcic, Z., Usvyatsov, M., Wieser, A., Schindler, K.: Predator: registration of 3D point clouds with low overlap. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4267–4276 (2021)

    Google Scholar 

  28. Ionescu, C., Vantzos, O., Sminchisescu, C.: Matrix backpropagation for deep networks with structured layers. In: Conference on Computer Vision and Pattern Recognition, Boston, MA, USA (2015)

    Google Scholar 

  29. Izatt, G., Dai, H., Tedrake, R.: Globally optimal object pose estimation in point clouds with mixed-integer programming. In: Amato, N.M., Hager, G., Thomas, S., Torres-Torriti, M. (eds.) Robotics Research. SPAR, vol. 10, pp. 695–710. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-28619-4_49

    Chapter  Google Scholar 

  30. Johnson, A.E., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. TPAMI 21(5), 433–449 (1999)

    Article  Google Scholar 

  31. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, San Diego, CA, USA (2015)

    Google Scholar 

  32. Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: CosyPose: consistent multi-view multi-object 6D pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 574–591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_34

    Chapter  Google Scholar 

  33. Le, H.M., Do, T.T., Hoang, T., Cheung, N.M.: Sdrsac: semidefinite-based randomized approach for robust point cloud registration without correspondences. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 124–133 (2019)

    Google Scholar 

  34. Li, J., Zhang, C., Xu, Z., Zhou, H., Zhang, C.: Iterative distance-aware similarity matrix convolution with mutual-supervised point elimination for efficient point cloud registration. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 378–394. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_23

    Chapter  Google Scholar 

  35. Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: Pointcnn: convolution on x-transformed points. In: Advances in Neural Information Processing Systems, Montréal, Quebec, Canada, pp. 820–830 (2018)

    Google Scholar 

  36. Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. In: European Conference on Computer Vision, Munich, Germany, pp. 683–698 (2018)

    Google Scholar 

  37. Litany, O., Bronstein, A.M., Bronstein, M.M.: Putting the pieces together: regularized multi-part shape matching. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7583, pp. 1–11. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33863-2_1

    Chapter  Google Scholar 

  38. Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence. Vancouver, British Columb (1981)

    Google Scholar 

  39. Maron, H., Dym, N., Kezurer, I., Kovalsky, S., Lipman, Y.: Point registration via efficient convex relaxation. ACM Trans. Graph. (TOG) 35(4), 1–12 (2016)

    Article  Google Scholar 

  40. Mellado, N., Aiger, D., Mitra, N.J.: Super 4 pcs fast global pointcloud registration via smart indexing. In: Computer Graphics Forum, vol. 33, pp. 205–215. Wiley Online Library (2014)

    Google Scholar 

  41. Mohamad, M., Ahmed, M.T., Rappaport, D., Greenspan, M.: Super generalized 4pcs for 3D registration. In: 2015 International Conference on 3D Vision, pp. 598–606. IEEE (2015)

    Google Scholar 

  42. Park, K., Patten, T., Vincze, M.: Pix2pose: pixel-wise coordinate regression of objects for 6D pose estimation. In: International Conference on Computer Vision, Seoul, Korea, pp. 7668–7677 (2019)

    Google Scholar 

  43. Paszke, A., et al.: Automatic differentiation in pytorch. In: International Conference on Learning Representations, Toulon, France (2017)

    Google Scholar 

  44. Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DoF pose estimation. In: Conference on Computer Vision and Pattern Recognition, Long Beach, California, pp. 4561–4570 (2019)

    Google Scholar 

  45. Pomerleau, F., Colas, F., Siegwart, R., et al.: A review of point cloud registration algorithms for mobile robotics. Found. Trends® Rob. 4(1), 1–104 (2015)

    Google Scholar 

  46. Qi, C., Su, H., Mo, K., Guibas, L.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii (2017)

    Google Scholar 

  47. Qi, C., Yi, L., Su, H., Guibas, L.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, Long Beach, California, United States (2017)

    Google Scholar 

  48. Rad, M., Lepetit, V.: Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: International Conference on Computer Vision, Venice, Italy, pp. 3828–3836 (2017)

    Google Scholar 

  49. Raposo, C., Barreto, J.P.: Using 2 point+ normal sets for fast registration of point clouds with small overlap. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 5652–5658. IEEE (2017)

    Google Scholar 

  50. Rosen, D.M., Carlone, L., Bandeira, A.S., Leonard, J.J.: Se-sync: a certifiably correct algorithm for synchronization over the special euclidean group. Int. J. Rob. Res. 38(2–3), 95–125 (2019)

    Article  Google Scholar 

  51. Rusinkiewicz, S., Levoy, M.: Efficient variants of the icp algorithm. In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp. 145–152. IEEE, Quebec City (2001)

    Google Scholar 

  52. Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (fpfh) for 3D registration. In: International Conference on Robotics and Automation, pp. 3212–3217. IEEE, Kobe (2009)

    Google Scholar 

  53. Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: International Conference on Intelligent Robots and Systems, pp. 3384–3391. IEEE, Nice (2008)

    Google Scholar 

  54. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach (2019)

    Google Scholar 

  55. Segal, A., Haehnel, D., Thrun, S.: Generalized-icp. In: In Robotics: Science and Systems, Cambridge (2009)

    Google Scholar 

  56. Su, H., et al.: Splatnet: sparse lattice networks for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2530–2539 (2018)

    Google Scholar 

  57. Sundermeyer, M., Marton, Z.C., Durner, M., Brucker, M., Triebel, R.: Implicit 3D orientation learning for 6D object detection from RGB images. In: European Conference on Computer Vision, Munich, Germany, pp. 699–715 (2018)

    Google Scholar 

  58. Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D., Birchfield, S.: Deep object pose estimation for semantic robotic grasping of household objects. arXiv preprint arXiv:1809.10790 (2018)

  59. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, Long Beach, California, United States, pp. 5998–6008 (2017)

    Google Scholar 

  60. Vidal, J., Lin, C.Y., Lladó, X., Martí, R.: A method for 6D pose estimation of free-form rigid objects using point pair features on range data. Sensors 18(8), 2678 (2018)

    Article  Google Scholar 

  61. Wang, C., et al.: Densefusion: 6D object pose estimation by iterative dense fusion. In: Conference on Computer Vision and Pattern Recognition, Long Beach, California, pp. 3343–3352 (2019)

    Google Scholar 

  62. Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: International Conference on Computer Vision, Seoul, Korea, pp. 2642–2651 (2019)

    Google Scholar 

  63. Wang, W., Dang, Z., Hu, Y., Fua, P., Salzmann, M.: Backpropagation-friendly eigendecomposition. In: Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada, pp. 3156–3164 (2019)

    Google Scholar 

  64. Wang, Y., Sun, Y., Liu, Z., Sarma, S., Bronstein, M., Solomon, J.: Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. (TOG) (2019)

    Google Scholar 

  65. Wang, Y., Solomon, J.M.: Deep closest point: learning representations for point cloud registration. In: International Conference on Computer Vision, Seoul, Korea, pp. 3523–3532 (2019)

    Google Scholar 

  66. Wang, Y., Solomon, J.M.: Prnet: Self-supervised learning for partial-to-partial registration. In: Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada, pp. 8812–8824 (2019)

    Google Scholar 

  67. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: Robotics: Science and Systems Conference, Pittsburgh, PA, USA (2018)

    Google Scholar 

  68. Yang, H., Carlone, L.: A polynomial-time solution for robust registration with extreme outlier rates. In: Robotics: Science and Systems Conference, Freiburg im Breisgau, Germany (2019)

    Google Scholar 

  69. Yang, H., Shi, J., Carlone, L.: Teaser: fast and certifiable point cloud registration. arXiv Preprint (2020)

    Google Scholar 

  70. Yang, J., Li, H., Campbell, D., Jia, Y.: Go-icp: a globally optimal solution to 3D icp point-set registration. TPAMI 38(11), 2241–2254 (2015)

    Article  Google Scholar 

  71. Yew, Z.J., Lee, G.H.: Rpm-net: robust point matching using learned features. In: Conference on Computer Vision and Pattern Recognition. Online (2020)

    Google Scholar 

  72. Yuan, W., Eckart, B., Kim, K., Jampani, V., Fox, D., Kautz, J.: DeepGMR: learning latent gaussian mixture models for registration. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 733–750. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_43

    Chapter  Google Scholar 

  73. Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. In: Advances in Neural Information Processing Systems, Long Beach, California, United States, pp. 3391–3401 (2017)

    Google Scholar 

  74. Zakharov, S., Shugurov, I., Ilic, S.: DPOD: 6D pose object detector and refiner. In: International Conference on Computer Vision, Seoul, Korea (2019)

    Google Scholar 

  75. Zhou, Q.-Y., Park, J., Koltun, V.: Fast global registration. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 766–782. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_47

    Chapter  Google Scholar 

  76. Zhou, Q.Y., Park, J., Koltun, V.: Open3D: a modern library for 3D data processing. arXiv Preprint (2018)

    Google Scholar 

Download references

Acknowledgements

Zheng Dang would like to thank to H. Chen for the highly-valuable discussions and for her encouragement. This work was funded in part by the Swiss Innovation Agency (Innosuisse).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Dang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dang, Z., Wang, L., Guo, Y., Salzmann, M. (2022). Learning-Based Point Cloud Registration for 6D Object Pose Estimation in the Real World. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19769-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19768-0

  • Online ISBN: 978-3-031-19769-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics