YOLOPoint: Joint Keypoint and Object Detection

Backhaus, Anton; Luettel, Thorsten; Wuensche, Hans-Joachim

doi:10.1007/978-3-031-45382-3_10

Anton Backhaus¹¹,
Thorsten Luettel¹¹ &
Hans-Joachim Wuensche¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14124))

Included in the following conference series:

International Conference on Advanced Concepts for Intelligent Vision Systems

263 Accesses

Abstract

Intelligent vehicles of the future must be capable of understanding and navigating safely through their surroundings. Camera-based vehicle systems can use keypoints as well as objects as low- and high-level landmarks for GNSS-independent SLAM and visual odometry. To this end we propose YOLOPoint, a convolutional neural network model that simultaneously detects keypoints and objects in an image by combining YOLOv5 and SuperPoint to create a single forward-pass network that is both real-time capable and accurate. By using a shared backbone and a light-weight network structure, YOLOPoint is able to perform competitively on both the HPatches and KITTI benchmarks.

This research is funded by dtec.bw – Digitalization and Technology Research Center of the Bundeswehr. dtec.bw is funded by the European Union - NextGenerationEU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Accelerating autonomy: an integrated perception digital platform for next generation self-driving cars using faster R-CNN and DeepLabV3

Article 27 December 2023

Monocular 3D object detection via estimation of paired keypoints for autonomous driving

Article 03 January 2022

RTM3D: Real-Time Monocular 3D Detection from Object Keypoints for Autonomous Driving

References

Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: CVPR (2017)
Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chapter Google Scholar
Beer, L., Luettel, T., Wuensche, H.J.: GenPa-SLAM: using a general panoptic segmentation for a real-time semantic landmark SLAM. In: Proceedings of IEEE Intelligent Transportation Systems Conference (ITSC), pp. 873–879. IEEE, Macau, China (2022). https://doi.org/10.1109/ITSC55140.2022.9921983
Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOv4: optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection, vol. 1, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description (2018). http://arxiv.org/abs/1712.07629
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset (2013)
Google Scholar
Jau, Y.Y., Zhu, R., Su, H., Chandraker, M.: Deep keypoint-based camera pose estimation with geometric constraints, pp. 4950–4957 (2020). https://doi.org/10.1109/IROS45743.2020.9341229
Jiang, W., Trulls, E., Hosang, J., Tagliasacchi, A., Yi, K.M.: COTR: correspondence transformer for matching across images. CoRR abs/2103.14167 (2021)
Google Scholar
Jocher, G.: YOLOv5 by Ultralytics: v7.0 (2020). https://doi.org/10.5281/zenodo.7347926. http://github.com/ultralytics/yolov5
Kingma, D.P., Ba, J.: ADAM: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lin, T.Y., et al.: Microsoft coco: common objects in context (2014). http://arxiv.org/abs/1405.0312
Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999). https://doi.org/10.1109/ICCV.1999.790410
Maji, D., Nagori, S., Mathew, M., Poddar, D.: Yolo-pose: enhancing yolo for multi person pose estimation using object keypoint similarity loss (2022). https://doi.org/10.48550/ARXIV.2204.06806
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)
Article Google Scholar
Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. CoRR abs/1502.00956 (2015). http://arxiv.org/abs/1502.00956
Reich, A., Wuensche, H.J.: Fast detection of moving traffic participants in LiDAR point clouds by using particles augmented with free space information. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Kyoto, Japan (2022)
Google Scholar
Revaud, J., et al.: R2D2: repeatable and reliable detector and descriptor (2019). https://doi.org/10.48550/ARXIV.1906.06195
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_34
Chapter Google Scholar
Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. Int. J. Comput. Vision 37(2), 151–172 (2000)
Article MATH Google Scholar
Schweitzer, M., Wuensche, H.J.: Efficient keypoint matching for robot vision using GPUs. In: Proceedings of the 12th IEEE International Conference on Computer Vision, 5th IEEE Workshop on Embedded Computer Vision (2009). https://doi.org/10.1109/ICCVW.2009.5457621
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). https://doi.org/10.48550/ARXIV.1409.1556
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers (2021). https://doi.org/10.48550/ARXIV.2104.00680
Wang, C.Y., et al.: CSPnet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)
Google Scholar
Wang, M., Leelapatra, W.: A review of object detection based on convolutional neural networks and deep learning. Int. Sci. J. Eng. Technol. (ISJET) 6(1), 1–7 (2022)
Google Scholar
Xiao, Y., et al.: A review of object detection based on deep learning. Multimedia Tools Appl. 79(33), 23729–23791 (2020)
Article Google Scholar
Zhao, Z.Q., Zheng, P., Xu, S.T., Wu, X.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Autonomous Systems Technology, University of the Bundeswehr Munich, Neubiberg, Germany
Anton Backhaus, Thorsten Luettel & Hans-Joachim Wuensche

Authors

Anton Backhaus
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Luettel
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Joachim Wuensche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anton Backhaus .

Editor information

Editors and Affiliations

DGA TA, Toulouse, France
Jaques Blanc-Talon
University of Auckland, Auckland, New Zealand
Patrice Delmas
Ghent University, Ghent, Belgium
Wilfried Philips
University of Antwerp, Wilrijk, Belgium
Paul Scheunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Backhaus, A., Luettel, T., Wuensche, HJ. (2023). YOLOPoint: Joint Keypoint and Object Detection. In: Blanc-Talon, J., Delmas, P., Philips, W., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2023. Lecture Notes in Computer Science, vol 14124. Springer, Cham. https://doi.org/10.1007/978-3-031-45382-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-45382-3_10
Published: 14 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45381-6
Online ISBN: 978-3-031-45382-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

YOLOPoint: Joint Keypoint and Object Detection

Abstract

Access this chapter

Similar content being viewed by others

Accelerating autonomy: an integrated perception digital platform for next generation self-driving cars using faster R-CNN and DeepLabV3

Monocular 3D object detection via estimation of paired keypoints for autonomous driving

RTM3D: Real-Time Monocular 3D Detection from Object Keypoints for Autonomous Driving

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

YOLOPoint: Joint Keypoint and Object Detection

Abstract

Access this chapter

Similar content being viewed by others

Accelerating autonomy: an integrated perception digital platform for next generation self-driving cars using faster R-CNN and DeepLabV3

Monocular 3D object detection via estimation of paired keypoints for autonomous driving

RTM3D: Real-Time Monocular 3D Detection from Object Keypoints for Autonomous Driving

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation