Abstract
Intelligent vehicles of the future must be capable of understanding and navigating safely through their surroundings. Camera-based vehicle systems can use keypoints as well as objects as low- and high-level landmarks for GNSS-independent SLAM and visual odometry. To this end we propose YOLOPoint, a convolutional neural network model that simultaneously detects keypoints and objects in an image by combining YOLOv5 and SuperPoint to create a single forward-pass network that is both real-time capable and accurate. By using a shared backbone and a light-weight network structure, YOLOPoint is able to perform competitively on both the HPatches and KITTI benchmarks.
This research is funded by dtec.bw – Digitalization and Technology Research Center of the Bundeswehr. dtec.bw is funded by the European Union - NextGenerationEU.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: CVPR (2017)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Beer, L., Luettel, T., Wuensche, H.J.: GenPa-SLAM: using a general panoptic segmentation for a real-time semantic landmark SLAM. In: Proceedings of IEEE Intelligent Transportation Systems Conference (ITSC), pp. 873–879. IEEE, Macau, China (2022). https://doi.org/10.1109/ITSC55140.2022.9921983
Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOv4: optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection, vol. 1, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description (2018). http://arxiv.org/abs/1712.07629
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset (2013)
Jau, Y.Y., Zhu, R., Su, H., Chandraker, M.: Deep keypoint-based camera pose estimation with geometric constraints, pp. 4950–4957 (2020). https://doi.org/10.1109/IROS45743.2020.9341229
Jiang, W., Trulls, E., Hosang, J., Tagliasacchi, A., Yi, K.M.: COTR: correspondence transformer for matching across images. CoRR abs/2103.14167 (2021)
Jocher, G.: YOLOv5 by Ultralytics: v7.0 (2020). https://doi.org/10.5281/zenodo.7347926. http://github.com/ultralytics/yolov5
Kingma, D.P., Ba, J.: ADAM: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lin, T.Y., et al.: Microsoft coco: common objects in context (2014). http://arxiv.org/abs/1405.0312
Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999). https://doi.org/10.1109/ICCV.1999.790410
Maji, D., Nagori, S., Mathew, M., Poddar, D.: Yolo-pose: enhancing yolo for multi person pose estimation using object keypoint similarity loss (2022). https://doi.org/10.48550/ARXIV.2204.06806
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)
Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. CoRR abs/1502.00956 (2015). http://arxiv.org/abs/1502.00956
Reich, A., Wuensche, H.J.: Fast detection of moving traffic participants in LiDAR point clouds by using particles augmented with free space information. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Kyoto, Japan (2022)
Revaud, J., et al.: R2D2: repeatable and reliable detector and descriptor (2019). https://doi.org/10.48550/ARXIV.1906.06195
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_34
Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. Int. J. Comput. Vision 37(2), 151–172 (2000)
Schweitzer, M., Wuensche, H.J.: Efficient keypoint matching for robot vision using GPUs. In: Proceedings of the 12th IEEE International Conference on Computer Vision, 5th IEEE Workshop on Embedded Computer Vision (2009). https://doi.org/10.1109/ICCVW.2009.5457621
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). https://doi.org/10.48550/ARXIV.1409.1556
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers (2021). https://doi.org/10.48550/ARXIV.2104.00680
Wang, C.Y., et al.: CSPnet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)
Wang, M., Leelapatra, W.: A review of object detection based on convolutional neural networks and deep learning. Int. Sci. J. Eng. Technol. (ISJET) 6(1), 1–7 (2022)
Xiao, Y., et al.: A review of object detection based on deep learning. Multimedia Tools Appl. 79(33), 23729–23791 (2020)
Zhao, Z.Q., Zheng, P., Xu, S.T., Wu, X.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Backhaus, A., Luettel, T., Wuensche, HJ. (2023). YOLOPoint: Joint Keypoint and Object Detection. In: Blanc-Talon, J., Delmas, P., Philips, W., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2023. Lecture Notes in Computer Science, vol 14124. Springer, Cham. https://doi.org/10.1007/978-3-031-45382-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-45382-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45381-6
Online ISBN: 978-3-031-45382-3
eBook Packages: Computer ScienceComputer Science (R0)