A Triplet-loss Dilated Residual Network for High-Resolution Representation Learning in Image Retrieval

Yousefzadeh, Saeideh; Pourreza, Hamidreza; Mahyar, Hamidreza

doi:10.1007/s11265-023-01865-9

A Triplet-loss Dilated Residual Network for High-Resolution Representation Learning in Image Retrieval

Published: 25 April 2023

Volume 95, pages 529–541, (2023)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Saeideh Yousefzadeh¹,
Hamidreza Pourreza¹ &
Hamidreza Mahyar ORCID: orcid.org/0000-0002-3560-8070²

179 Accesses
Explore all metrics

Abstract

Content-based image retrieval is the process of retrieving a subset of images from an extensive image gallery based on visual contents, such as color, shape or spatial relations, and texture. In some applications, such as localization, image retrieval is employed as the initial step. In such cases, the accuracy of the top-retrieved images significantly affects the overall system accuracy. The current paper introduces a simple yet efficient image retrieval system with a fewer trainable parameters, which offers acceptable accuracy in top-retrieved images. The proposed method benefits from a dilated residual convolutional neural network with triplet loss. Experimental evaluations show that this model can extract richer information (i.e., high-resolution representations) by enlarging the receptive field, thus improving image retrieval accuracy without increasing the depth or complexity of the model. To enhance the extracted representations’ robustness, the current research obtains candidate regions of interest from each feature map and applies Generalized-Mean pooling to the regions. As the choice of triplets in a triplet-based network affects the model training, we employ a triplet online mining method. We test the performance of the proposed method under various configurations on two of the challenging image-retrieval datasets, namely Revisited Paris6k (RPar) and UKBench. The experimental results show an accuracy of 94.54 and 80.23 (mean precision at rank 10) in the RPar medium and hard modes and 3.86 (recall at rank 4) in the UKBench dataset, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective triplet mining improves training of multi-scale pooled CNN for image retrieval

Article 06 January 2022

End-to-End Large-Scale Image Retrieval Network with Convolution and Vision Transformers

Object-Based Aggregation of Deep Features for Image Retrieval

Notes

Scale Invariant Feature Transform.
Bag Of Words.
Vector Locally Aggregated Descriptor.
https://www.flickr.com/

References

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., and Sivic, J. (2016). Netvlad: Cnn architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp 5297–5307). https://doi.org/10.1109/cvpr.2016.572
Song, Y., Chen, X., Wang, X., Zhang, Y., & Li, J. (2016). 6-dof image localization from massive geo-tagged reference images. IEEE Transactions on Multimedia, 18(8), 1542–1554. https://doi.org/10.1109/tmm.2016.2568743
Article Google Scholar
Nair, L. R., Subramaniam, K., and Prasannavenkatesan, G. (2020). A review on multiple approaches to medical image retrieval system. Intelligent Computing in Engineering: Select Proceedings of RICE 2019 (pp. 501–509). https://doi.org/10.1007/978-981-15-2780-7_55
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., and Tian, Q. (2015). Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1116–1124). https://doi.org/10.1109/iccv.2015.133
Schonberger, J. L., Radenovic, F., Chum, O., and Frahm, J.-M. (2015). From single image query to detailed 3d reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5126–5134). https://doi.org/10.1109/cvpr.2015.7299148
Chaudhuri, U., Banerjee, B., & Bhattacharya, A. (2019). Siamese graph convolutional network for content based remote sensing image retrieval. Computer vision and image understanding, 184, 22–30. https://doi.org/10.1016/j.cviu.2019.04.004
Article Google Scholar
Liu, Z., Luo, P., Qiu, S., Wang, X., and Tang, X. (2016). Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1096–1104). https://doi.org/10.1109/cvpr.2016.124
Ali, S., Sullivan, J., Maki, A., and Carlsson, S. (2015). A baseline for visual instance retrieval with deep convolutional networks. In Proceedings of International Conference on Learning Representations.
Li, X., Uricchio, T., Ballan, L., Bertini, M., Snoek, C. G., & Bimbo, A. D. (2016). Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Computing Surveys (CSUR), 49(1), 1–39. https://doi.org/10.1145/2906152
Article Google Scholar
Babenko, A. and Lempitsky, V. (2015). Aggregating local deep features for image retrieval. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1269–1277). https://doi.org/10.1109/iccv.2015.150
Tolias, G., Sicre, R., and Jégou, H. (2016). Particular object retrieval with integral max-pooling of CNN activations. In ICLR 2016-International Conference on Learning Representations (pp. 1–12).
Sünderhauf, N., Shirazi, S., Dayoub, F., Upcroft, B., and Milford, M. (2015). On the performance of convnet features for place recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 4297–4304). IEEE. https://doi.org/10.1109/iros.2015.7353986
Jun, H., Ko, B., Kim, Y., Kim, I., and Kim, J. (2019). Combination of multiple global descriptors for image retrieval. Preprint retrieved from http://arxiv.org/abs/1903.10663
Peng, X., Zhang, X., Li, Y., & Liu, B. (2020). Research on image feature extraction and retrieval algorithms based on convolutional neural network. Journal of Visual Communication and Image Representation, 69, 102705. https://doi.org/10.1016/j.jvcir.2019.102705
Article Google Scholar
Razavian, A. S., Sullivan, J., Carlsson, S., & Maki, A. (2016). Visual instance retrieval with deep convolutional networks. ITE Transactions on Media Technology and Applications, 4(3), 251–258. https://doi.org/10.3169/mta.4.251
Article Google Scholar
Chen, W., Liu, Y., Wang, W., Bakker, E., Georgiou, T., Fieguth, P., Liu, L., and Lew, M. S. (2021). Deep learning for instance retrieval: A survey. Preprint retrieved from http://arxiv.org/abs/2101.11282
Kalantidis, Y., Mellina, C., and Osindero, S. (2016). Cross-dimensional weighting for aggregated deep convolutional features. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part I 14 (pp. 685–701). Springer. https://doi.org/10.1007/978-3-319-46604-0_48
Min, W., Mei, S., Li, Z., & Jiang, S. (2020). A two-stage triplet network training framework for image retrieval. IEEE Transactions on Multimedia, 22(12), 3128–3138. https://doi.org/10.1109/tmm.2020.2974326
Article Google Scholar
Radenović, F., Tolias, G., & Chum, O. (2018). Fine-tuning CNN image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence, 41(7), 1655–1668. https://doi.org/10.1109/tpami.2018.2846566
Article Google Scholar
Schroff, F., Kalenichenko, D., and Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 815–823). https://doi.org/10.1109/cvpr.2015.7298682
Gordo, A., Almazán, J., Revaud, J., and Larlus, D. (2016). Deep image retrieval: Learning global representations for image search. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14 (pp. 241–257). Springer. https://doi.org/10.1007/978-3-319-46466-4_15
Hoffer, E. and Ailon, N. (2015). Deep metric learning using triplet network. In Similarity-Based Pattern Recognition: Third International Workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3 (pp. 84–92). Springer. https://doi.org/10.1007/978-3-319-24261-3_7
Wu, C.-Y., Manmatha, R., Smola, A. J., and Krahenbuhl, P. (2017). Sampling matters in deep embedding learning. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2840–2848). https://doi.org/10.1109/iccv.2017.309
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., and Wu, Y. (2014). Learning fine-grained image similarity with deep ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1386–1393). https://doi.org/10.1109/cvpr.2014.180
Wang, L., Li, Y., and Lazebnik, S. (2016). Learning deep structure-preserving image-text embeddings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5005–5013). https://doi.org/10.1109/cvpr.2016.541
Cao, R., Zhang, Q., Zhu, J., Li, Q., Li, Q., Liu, B., & Qiu, G. (2020). Enhancing remote sensing image retrieval using a triplet deep metric learning network. International Journal of Remote Sensing, 41(2), 740–751. https://doi.org/10.1080/2150704x.2019.1647368
Article Google Scholar
Gordo, A., Almazan, J., Revaud, J., & Larlus, D. (2017). End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision, 124(2), 237–254. https://doi.org/10.1007/s11263-017-1016-8
Article MathSciNet Google Scholar
Radenović, F., Tolias, G., and Chum, O. (2016). CNN image retrieval learns from bow: Unsupervised fine-tuning with hard examples. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 3–20). Springer. https://doi.org/10.1007/978-3-319-46448-0_1
Yu, F. and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. Preprint retrieved from http://arxiv.org/abs/1511.07122
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE transactions on pattern analysis and machine intelligence, 40(4), 834–848. https://doi.org/10.1109/tpami.2017.2699184
Article Google Scholar
Yu, F., Koltun, V., and Funkhouser, T. (2017a). Dilated residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 472–480). https://doi.org/10.1109/cvpr.2017.75
Sarlin, P.-E., Cadena, C., Siegwart, R., and Dymczyk, M. (2019). From coarse to fine: Robust hierarchical localization at large scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12716–12725). https://doi.org/10.1109/cvpr.2019.01300
Sarlin, P.-E., Debraine, F., Dymczyk, M., Siegwart, R., and Cadena, C. (2018). Leveraging deep visual descriptors for hierarchical efficient localization. In Conference on Robot Learning (pp. 456–465). PMLR.
Bai, C., Huang, L., Pan, X., Zheng, J., & Chen, S. (2018). Optimization of deep convolutional neural network for large scale image retrieval. Neurocomputing, 303, 60–67. https://doi.org/10.1016/j.neucom.2018.04.034
Article Google Scholar
Babenko, A., Slesarev, A., Chigorin, A., and Lempitsky, V. (2014). Neural codes for image retrieval. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13 (pp. 584–599). Springer.
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster r-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. https://doi.org/10.1109/tpami.2016.2577031
Article Google Scholar
Oh Song, H., Xiang, Y., Jegelka, S., and Savarese, S. (2016). Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4004–4012). https://doi.org/10.1109/cvpr.2016.434
Roy, S., Sangineto, E., Demir, B., & Sebe, N. (2020). Metric-learning-based deep hashing network for content-based retrieval of remote sensing images. IEEE Geoscience and Remote Sensing Letters, 18(2), 226–230. https://doi.org/10.1109/lgrs.2020.2974629
Article Google Scholar
Ge, W. (2018). Deep metric learning with hierarchical triplet loss. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 269–285). https://doi.org/10.1007/978-3-030-01231-1_17
Xuan, H., Stylianou, A., and Pless, R. (2020). Improved embeddings with easy positive triplet mining. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2474–2482). https://doi.org/10.1109/wacv45572.2020.9093432
Yu, B., Liu, T., Gong, M., Ding, C., and Tao, D. (2018). Correcting the triplet selection bias for triplet loss. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 71–87). https://doi.org/10.1007/978-3-030-01231-1_5
Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., and Moreno-Noguer, F. (2014). Fracking deep convolutional image descriptors. Preprint retrieved from http://arxiv.org/abs/1412.6537
Hermans, A., Beyer, L., and Leibe, B. (2017). In defense of the triplet loss for person re-identification. Preprint retrieved from http://arxiv.org/abs/1703.07737
Wang, Z., Li, Z., Sun, J., and Xu, Y. (2018). Selective convolutional features based generalized-mean pooling for fine-grained image retrieval. In 2018 IEEE Visual Communications and Image Processing (VCIP) (pp. 1–4). IEEE. https://doi.org/10.1109/vcip.2018.8698729
Cao, Y., Long, M., Wang, J., and Liu, S. (2017). Deep visual-semantic quantization for efficient image retrieval. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2017.104
Jin Kim, H., Dunn, E., and Frahm, J.-M. (2017). Learned contextual feature reweighting for image geo-localization. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2017.346
Wang, H., Cai, Y., Zhang, Y., Pan, H., Lv, W., and Han, H. (2015). Deep learning for image retrieval: What works and what doesn’t. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW) (pp. 1576–1583). IEEE.
Radenovic, F., Iscen, A., Tolias, G., Avrithis, Y., and Chum, O. (2018). Revisiting oxford and paris: Large-scale image retrieval benchmarking. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. https://doi.org/10.1109/cvpr.2018.00598
Nister, D. and Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR’06). IEEE. https://doi.org/10.1109/cvpr.2006.264
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., and Killeen, T. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32.
Yu, W., Yang, K., Yao, H., Sun, X., & Xu, P. (2017). Exploiting the complementary strengths of multi-layer CNN features for image retrieval. Neurocomputing, 237, 235–241. https://doi.org/10.1016/j.neucom.2016.12.002
Article Google Scholar
Azizpour, H., Sharif Razavian, A., Sullivan, J., Maki, A., & Carlsson, S. (2016). Factors of transferability for a generic ConvNet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9), 1790–1802. https://doi.org/10.1109/tpami.2015.2500224
Article Google Scholar
Bianco, S., Cadene, R., Celona, L., & Napoletano, P. (2018). Benchmark analysis of representative deep neural network architectures. IEEE Access, 6, 64270–64277. https://doi.org/10.1109/access.2018.2877890
Article Google Scholar
Canziani, A., Paszke, A., and Culurciello, E. (2016). An analysis of deep neural network models for practical applications. Preprint retrieved from http://arxiv.org/abs/1605.07678

Download references

Author information

Authors and Affiliations

Machine Vision Lab, Ferdowsi University of Mashhad, Mashhad, Iran
Saeideh Yousefzadeh & Hamidreza Pourreza
Faculty of Engineering, McMaster University, Hamilton, Canada
Hamidreza Mahyar

Authors

Saeideh Yousefzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Hamidreza Pourreza
View author publications
You can also search for this author in PubMed Google Scholar
Hamidreza Mahyar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamidreza Pourreza.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yousefzadeh, S., Pourreza, H. & Mahyar, H. A Triplet-loss Dilated Residual Network for High-Resolution Representation Learning in Image Retrieval. J Sign Process Syst 95, 529–541 (2023). https://doi.org/10.1007/s11265-023-01865-9

Download citation

Received: 27 January 2023
Revised: 12 March 2023
Accepted: 01 April 2023
Published: 25 April 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11265-023-01865-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Triplet-loss Dilated Residual Network for High-Resolution Representation Learning in Image Retrieval

Abstract

Access this article

Similar content being viewed by others

Effective triplet mining improves training of multi-scale pooled CNN for image retrieval

End-to-End Large-Scale Image Retrieval Network with Convolution and Vision Transformers

Object-Based Aggregation of Deep Features for Image Retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Triplet-loss Dilated Residual Network for High-Resolution Representation Learning in Image Retrieval

Abstract

Access this article

Similar content being viewed by others

Effective triplet mining improves training of multi-scale pooled CNN for image retrieval

End-to-End Large-Scale Image Retrieval Network with Convolution and Vision Transformers

Object-Based Aggregation of Deep Features for Image Retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation