Skip to main content

YOLOPoint: Joint Keypoint and Object Detection

  • Conference paper
  • First Online:
Advanced Concepts for Intelligent Vision Systems (ACIVS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14124))

  • 252 Accesses

Abstract

Intelligent vehicles of the future must be capable of understanding and navigating safely through their surroundings. Camera-based vehicle systems can use keypoints as well as objects as low- and high-level landmarks for GNSS-independent SLAM and visual odometry. To this end we propose YOLOPoint, a convolutional neural network model that simultaneously detects keypoints and objects in an image by combining YOLOv5 and SuperPoint to create a single forward-pass network that is both real-time capable and accurate. By using a shared backbone and a light-weight network structure, YOLOPoint is able to perform competitively on both the HPatches and KITTI benchmarks.

This research is funded by dtec.bw – Digitalization and Technology Research Center of the Bundeswehr. dtec.bw is funded by the European Union - NextGenerationEU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: CVPR (2017)

    Google Scholar 

  2. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32

    Chapter  Google Scholar 

  3. Beer, L., Luettel, T., Wuensche, H.J.: GenPa-SLAM: using a general panoptic segmentation for a real-time semantic landmark SLAM. In: Proceedings of IEEE Intelligent Transportation Systems Conference (ITSC), pp. 873–879. IEEE, Macau, China (2022). https://doi.org/10.1109/ITSC55140.2022.9921983

  4. Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOv4: optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020)

    Google Scholar 

  5. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection, vol. 1, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177

  6. DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description (2018). http://arxiv.org/abs/1712.07629

  7. Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  8. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset (2013)

    Google Scholar 

  9. Jau, Y.Y., Zhu, R., Su, H., Chandraker, M.: Deep keypoint-based camera pose estimation with geometric constraints, pp. 4950–4957 (2020). https://doi.org/10.1109/IROS45743.2020.9341229

  10. Jiang, W., Trulls, E., Hosang, J., Tagliasacchi, A., Yi, K.M.: COTR: correspondence transformer for matching across images. CoRR abs/2103.14167 (2021)

    Google Scholar 

  11. Jocher, G.: YOLOv5 by Ultralytics: v7.0 (2020). https://doi.org/10.5281/zenodo.7347926. http://github.com/ultralytics/yolov5

  12. Kingma, D.P., Ba, J.: ADAM: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  13. Lin, T.Y., et al.: Microsoft coco: common objects in context (2014). http://arxiv.org/abs/1405.0312

  14. Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999). https://doi.org/10.1109/ICCV.1999.790410

  15. Maji, D., Nagori, S., Mathew, M., Poddar, D.: Yolo-pose: enhancing yolo for multi person pose estimation using object keypoint similarity loss (2022). https://doi.org/10.48550/ARXIV.2204.06806

  16. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)

    Article  Google Scholar 

  17. Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. CoRR abs/1502.00956 (2015). http://arxiv.org/abs/1502.00956

  18. Reich, A., Wuensche, H.J.: Fast detection of moving traffic participants in LiDAR point clouds by using particles augmented with free space information. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Kyoto, Japan (2022)

    Google Scholar 

  19. Revaud, J., et al.: R2D2: repeatable and reliable detector and descriptor (2019). https://doi.org/10.48550/ARXIV.1906.06195

  20. Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_34

    Chapter  Google Scholar 

  21. Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. Int. J. Comput. Vision 37(2), 151–172 (2000)

    Article  MATH  Google Scholar 

  22. Schweitzer, M., Wuensche, H.J.: Efficient keypoint matching for robot vision using GPUs. In: Proceedings of the 12th IEEE International Conference on Computer Vision, 5th IEEE Workshop on Embedded Computer Vision (2009). https://doi.org/10.1109/ICCVW.2009.5457621

  23. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). https://doi.org/10.48550/ARXIV.1409.1556

  24. Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers (2021). https://doi.org/10.48550/ARXIV.2104.00680

  25. Wang, C.Y., et al.: CSPnet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)

    Google Scholar 

  26. Wang, M., Leelapatra, W.: A review of object detection based on convolutional neural networks and deep learning. Int. Sci. J. Eng. Technol. (ISJET) 6(1), 1–7 (2022)

    Google Scholar 

  27. Xiao, Y., et al.: A review of object detection based on deep learning. Multimedia Tools Appl. 79(33), 23729–23791 (2020)

    Article  Google Scholar 

  28. Zhao, Z.Q., Zheng, P., Xu, S.T., Wu, X.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anton Backhaus .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Backhaus, A., Luettel, T., Wuensche, HJ. (2023). YOLOPoint: Joint Keypoint and Object Detection. In: Blanc-Talon, J., Delmas, P., Philips, W., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2023. Lecture Notes in Computer Science, vol 14124. Springer, Cham. https://doi.org/10.1007/978-3-031-45382-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45382-3_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45381-6

  • Online ISBN: 978-3-031-45382-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics