Skip to main content
Log in

Part-based tracking for object pose estimation

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Object pose estimation is crucial in human–computer interaction systems. The traditional point-based detection approaches rely on the robustness of feature points, the tracking methods utilize the similarity between frames to improve the speed, while the recent studies based on neural networks concentrate on solving specific invariance problems. Different from these methods, PTPE (Part-based Tracking for Pose Estimation) proposed in this paper focuses on how to balance the speed and accuracy under different conditions. In this method, the point matching is transformed into the part matching inside an object to enhance the reliability of the features. Additionally, a fast interframe tracking method is combined with learning models and structural information to enhance robustness. During tracking, multiple strategies are adopted for the different parts according to the matching effects evaluated by the learning models, so as to develop the locality and avoid the time consumption caused by undifferentiated full frame detection or learning. In addition, the constraints between parts are applied for parts detection optimization. Experiments show that PTPE is efficient both in accuracy and speed, especially in complex environments, when compared with classical algorithms that focus only on detection, interframe tracking, self-supervised models, and graph matching.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data availability

The corresponding author will supply the relevant data in response to reasonable requests.

References

  1. Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17, 185–203 (1981)

    Article  MATH  Google Scholar 

  2. Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision. IJCAI 81, 674–679 (1981)

    Google Scholar 

  3. Bouguet, J.-Y.: Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corp. 5(1–10), 4 (2001)

    Google Scholar 

  4. Ali, S., Daul, C., Galbrun, E.: Illumination invariant optical flow using neighborhood descriptors. Comput. Vis. Image Underst. 145, 95–110 (2016)

    Article  Google Scholar 

  5. Makar, M., Tsai, S.S., Chandrasekhar, V., Chen, D., Girod, B.: Interframe coding of canonical patches for low bit-rate mobile augmented reality. Int. J. Semant. Comput. 7(01), 5–24 (2013)

    Article  Google Scholar 

  6. Koyama, J., Makar, M., Araujo, A.F., Girod, B.: Interframe compression with selective update framework of local features for mobile augmented reality. In: IEEE International Conference on Multimedia and Expo Workshops, pp. 1–6 (2014)

  7. Crivellaro, A., Rad, M., Verdie, Y., Yi, K.M., Fua, P.: Robust 3d object tracking from monocular images using stable parts. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1465–1479 (2017)

    Article  Google Scholar 

  8. Oberweger, M., Rad, M., Lepetit, V.: Making deep heatmaps robust to partial occlusions for 3d object pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 119–134 (2018)

  9. Rad, M., Lepetit, V.: Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: IEEE International Conference on Computer Vision (ICCV), pp. 3848–3856 (2017)

  10. Lee, C.-Y., Badrinarayanan, V., Malisiewicz, T., Rabinovich, A.: Roomnet: end-to-end room layout estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)

  11. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision. Springer, pp. 21–37 (2016)

  12. DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)

  13. Barandiaran, J., Borro, D.: Edge-based markerless 3d tracking of rigid objects. In: International Conference on Artificial Reality and Telexistence, pp. 282–283 (2007)

  14. Wang, T., Ling, H.: Gracker: a graph-based planar object tracker. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), pp 1494–1501 (2018)

  15. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  16. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. (CVIU) 110(3), 346–359 (2008)

    Article  Google Scholar 

  17. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: Orb: an efficient alternative to sift or surf. In: IEEE International Conference on Computer Vision (ICCV), Barcelona, 6–13 November 2011, pp. 2564–2571 (2011)

  18. Leutenegger, S., Chli, M., Siegwart, R.Y.: Brisk: binary robust invariant scalable keypoints. In: IEEE International Conference on Computer Vision, pp. 2548–2555 (2011)

  19. Alcantarilla, P.F.: Fast explicit diffusion for accelerated features in nonlinear scale spaces. In: British Machine Vision Conference (BMVC), pp. 13.1–13.11 (2013)

  20. Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T.: D2-net: a trainable cnn for joint detection and description of local features. arXiv preprint arXiv:1905.03561

  21. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199

  22. Billings, G., Johnson-Roberson, M.: Silhonet: an rgb method for 6d object pose estimation. IEEE Robot. Autom. Lett. 4(4), 3727–3734 (2019)

    Article  Google Scholar 

  23. Peng, S., Zhou, X., Liu, Y., Lin, H., Bao, H.: Pvnet: pixel-wise voting network for 6dof object pose estimation. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (99), pp. 1–1 (2020)

  24. Song, C., Song, J., Huang, Q.: Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 431–440 (2020)

  25. Rosin, P.L.: Measuring corner properties. Comput. Vis. Image Underst. 73(2), 291–307 (1999)

    Article  Google Scholar 

  26. Makar, M., Tsai, S.S., Chandrasekhar, V., Chen, D., Girod, B.: Interframe coding of canonical patches for low bit-rate mobile augmented reality. Int. J. Semant. Comput. 7(01), 5–24 (2013)

    Article  Google Scholar 

  27. Pauwels, K., Rubio, L., Diaz, J., Ros, E.: Real-time model-based rigid object pose estimation and tracking combining dense and sparse visual cues. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, IEEE, pp. 2347–2354 (2013)

  28. Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3d object detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11784–11793 (2021)

Download references

Acknowledgements

This work is supported by the Scientific Research Funds of Huaqiao University, China (605-50Y21011). Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University. Provincial Key Laboratory of Computer Vision and Machine Learning of Educational Department of Fujian Province (201902).

Author information

Authors and Affiliations

Authors

Contributions

SY: manuscript text, figures, tables. JY: software, validataion. QL: software, validation. All authors reviewed the manuscript.

Corresponding author

Correspondence to Shuang Ye.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, S., Ye, J. & Lei, Q. Part-based tracking for object pose estimation. J Real-Time Image Proc 20, 99 (2023). https://doi.org/10.1007/s11554-023-01351-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-023-01351-2

Keywords

Navigation