Abstract
Multi-scale feature fusion has been widely used in handcrafted descriptors, but has not been fully explored in deep learning-based descriptor extraction. Simple concatenation of descriptors of different scales has not been successful in significantly improving performance for computer vision tasks. In this paper, we propose a novel convolutional neural network, based on center-surround adaptive multi-scale feature fusion. Our approach enables the network to focus on different center-surround scales, resulting in improved performance. We also introduce a novel regularization technique that uses second-order similarity to constrain the learning of local descriptors, based on the symmetric property of the similarity matrix. The proposed method outperforms single-scale or simple-concatenation descriptors on two datasets and achieves state-of-the-art results on the Brown dataset. Furthermore, our method demonstrates excellent generalization ability on the HPatches dataset. Our code is released on GitHub: https://github.com/Leung-GD/AFSRNet/tree/main.
Similar content being viewed by others
Data Availability
The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.
References
Xue J, Hou X, Zeng Y (2021) Review of image-based 3d reconstruction of building for automated construction progress monitoring. Appl Sci 11(17)
Ganesan K, Ganapathi II, Javed S et al (2023) Multimodal hybrid features in 3d ear recognition. Appl Intell 53(10):11,618-11,635
Cai Y, Li L, Wang D et al (2023) Htmatch: An efficient hybrid transformer based graph neural network for local feature matching. Signal Process 204(108):859
Di Y, Liao Y, Zhou H et al (2023) Femip: detector-free feature matching for multimodal images with policy gradient. Appl Intell 53(20):24068–24088
Zhu F, Zhu X, Huang Z et al (2021) Deep learning based data-adaptive descriptor for non-rigid multi-modal medical image registration. Signal Process 183(108):023
Ma J, Jiang X, Fan A et al (2021) Image matching from handcrafted to deep features: A survey. Int J Comput Vis 129(1):23–79
Jin Y, Mishkin D, Mishchuk A et al (2021) Image matching across wide baselines: From paper to practice. Int J Comput Vis 129(2):517–547
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International journal of computer vision 60:91–110
Bay H, Ess A, Tuytelaars T et al (2008) Speeded-up robust features (surf). Comput. Vis. Image Underst 110(3):346–359
Tian Y, Fan B, Wu F (2017) L2-net: Deep learning of discriminative patch descriptor in euclidean space. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Mishchuk A, Mishkin D, Radenovic F et al (2017) Working hard to know your neighbor’s margins: Local descriptor learning loss. In: Guyon I, Luxburg UV, Bengio S et al (eds) Advances in Neural Information Processing Systems, vol 30. Curran Associates Inc
Hausler S, Garg S, Xu M, et al (2021) Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 14,141–14,152
Xu Y, Gong M, Liu T et al (2019) Robust angular local descriptor learning. In: Jawahar C, Li H, Mori G et al (eds) Computer Vision - ACCV 2018. Springer International Publishing, Cham, pp 420–435
Tian Y, Yu X, Fan B, et al (2019) Sosnet: Second order similarity regularization for local descriptor learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Wang S, Guo X, Tie Y, et al (2021) Local feature descriptors with deep hypersphere learning. In: 2021 IEEE international conference on image processing (ICIP), pp 1524–1528
Zhang J, Jiao L, Ma W et al (2023) Rdlnet: A regularized descriptor learning network. IEEE Trans Neural Netw Learn Syst 34(9):5669–5681
Zhang L, Rusinkiewicz S (2019) Learning local descriptors with a cdf-based dynamic soft margin. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)
Liang P, Ji H, Cheng E et al (2021) Learning local descriptors with multi-level feature aggregation and spatial context pyramid. Neurocomputing 461:99–108
Zhang P, Zhang C, Liu B et al (2022) Leveraging local and global descriptors in parallel to search correspondences for visual localization. Pattern Recognit 122(108):344
He Y, Hu Y, Zhao W, et al (2023) Darkfeat: noise-robust feature detector and descriptor for extremely low-light raw images. In: Proceedings of the AAAI conference on artificial intelligence, pp 826–834
Lin TY, Dollár P, Girshick R, et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Deng C, Wang M, Liu L et al (2022) Extended feature pyramid network for small object detection. IEEE Trans Multimed 24:1968–1979
Jiang K, Wang Z, Yi P, et al (2020) Multi-scale progressive fusion network for single image deraining. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8346–8355
Wang G, Gan X, Cao Q et al (2023) Mfanet: multi-scale feature fusion network with attention mechanism. Vis Comput 39(7):2969–2980
He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Li Y, Chen Y, Wang N, et al (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6054–6063
Balntas V, Riba E, Ponsa D, et al (2016) Learning local feature descriptors with triplets and shallow convolutional neural networks. In: BMVC, p 3
Tian Y, Barroso Laguna A, Ng T, et al (2020) Hynet: Learning local descriptor with hybrid similarity measure and triplet loss. In: Larochelle H, Ranzato M, Hadsell R, et al (eds) Advances in neural information processing systems, vol 33. Curran Associates, Inc., pp 7401–7412
Brown M, Hua G, Winder S (2011) Discriminative learning of local image descriptors. IEEE Trans Pattern Anal Mach Intell 33(1):43–57
Balntas V, Lenc K, Vedaldi A, et al (2017) Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Miao Y, Lin Z, Ma X et al (2021) Learning transformation-invariant local descriptors with low-coupling binary codes. IEEE Trans Image Process 30:7554–7566
Fan B, Liu H, Zeng H et al (2021) Deep unsupervised binary descriptor learning through locality consistency and self distinctiveness. IEEE Trans Multimed 23:2770–2781
Wang W, Zhang L, Huang H (2023) Revisiting unsupervised local descriptor learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 2680–2688
Yin J, Liu Q, Meng F et al (2022) Stcdesc: Learning deep local descriptor using similar triangle constraint. Knowl Based Syst 248(108):799
Quan D, Wang S, Li Y et al (2021) Multi-relation attention network for image patch matching. IEEE Trans Image Process 30:7127–7142
Yu C, Liu Y, Li C et al (2022) Multibranch feature difference learning network for cross-spectral image patch matching. IEEE Trans Geosci Remote Sensing 60:1–15
Acknowledgements
This work was supported by the Guangdong Basic and Applied Basic Research Foundation (Grant No. 2021A1515011867).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflicts of interest.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, D., Liang, H. & Lam, KM. AFSRNet: learning local descriptors with adaptive multi-scale feature fusion and symmetric regularization. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05418-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05418-w