Skip to main content
Log in

A fast weighted multi-view Bayesian learning scheme with deep learning for text-based image retrieval from unlabeled galleries

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose a new computationally fast method for text-based image retrieval from unlabeled galleries, where retrieval is formulated as a multi-class learning problem. While most existing methods assign images representing the same concept with equal importance during learning, we propose a weighted multi-view likelihood term to deal with the intra-class variations within training set of each concept. At first, we cluster each training set to detect the concept’s visual appearances (views). Because number of clusters may significantly vary from one set to another, abusively unifying such a hyper-parameter over all the sets could degrade the learning outcomes. We, therefore, propose to automatically and precisely accomplish this task using Davies-Bouldin index. Noting that images are represented using deep features, which are normalized using vanilla-L2 rule to deal with bursty visual features. The proposed multi-view term is constructed by combining multivariate normal probability density functions related to the resulting clusters. This term is then incorporated within a naïve Bayes classifier alongside with the prior probability of the concept, where each component is weighted using Expectation-Maximization (EM) algorithm. Given a textual query, relevant images are the ones that reach the maximum scores of posterior probability, which is calculated using our Bayesian learning scheme. Experimental results on public datasets demonstrate the effectiveness and rapidity of the proposed method compared to several other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. Aggarwal AK (2015) Machine Vision Based Self Position Estimation of Mobile Robots. Int J Electron Commun Eng Technol 6(10)

  2. Aiadi O, Khaldi B, Kherfi ML (2016) Retrieving images from unlabeled photo collections using a textual query. In: Second international conference on pattern analysis and intelligent systems, 218–223

  3. Amiri SH, Jamzad M (2015) Efficient multi-modal fusion on supergraph for scalable image annotation. Pattern Recogn 48(7):2241–2253

    Article  MATH  Google Scholar 

  4. Arora K, Aggarwal AK (2018) Approaches for image database retrieval based on color, texture, and shape features. In: Handbook of research on advanced concepts in real-time image and video processing. IGI Global, pp 28–50

    Chapter  Google Scholar 

  5. Bello-Cerezo R et al (2019) Comparative Evaluation of Hand-Crafted Image Descriptors vs. Off-the-Shelf CNN-Based Features for Colour Texture Classification under Ideal and Realistic Conditions. Appl Sci 9(4):738

    Article  Google Scholar 

  6. Cai X et al (2013) New graph structured sparsity model for multi-label image annotations. In: Proceedings of the IEEE International Conference on Computer Vision

  7. Cao X, Zhang H, Guo X, Liu S, Meng D (2015) Sled: semantic label embedding dictionary representation for multilabel image annotation. IEEE Trans Image Process 24(9):2746–2759

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen W et al (2021) Deep learning for instance retrieval: a survey. arXiv preprint

  9. Chen J et al (2010) WLD: A robust local image descriptor. IEEE Trans Pattern Anal Mach Intell 32(9):1705–1720

    Article  Google Scholar 

  10. Chen Y, Liu L, Tao J, Chen X, Xia R, Zhang Q, Xiong J, Yang K, Xie J (2021) The image annotation algorithm using convolutional features from intermediate layer of deep learning. Multimed Tools Appl 80(3):4237–4261

    Article  Google Scholar 

  11. Cusano C, Napoletano P, Schettini R (2016) Combining multiple features for color texture classification. J Electron Imaging 25(6):061410

    Article  Google Scholar 

  12. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227

    Article  Google Scholar 

  13. Escalante HJ, Hernández CA, Gonzalez JA, López-López A, Montes M, Morales EF, Enrique Sucar L, Villaseñor L, Grubinger M (2010) The segmented and annotated IAPR TC-12 benchmark. Comput Vis Image Underst 114(4):419–428

    Article  Google Scholar 

  14. Guillaumin M et al (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: 2009 IEEE 12th international conference on computer vision. IEEE

  15. Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM

  16. Jing X-Y, Wu F, Li Z, Hu R, Zhang D (2016) Multi-label dictionary learning for image annotation. IEEE Trans Image Process 25(6):2712–2725

    Article  MathSciNet  MATH  Google Scholar 

  17. Kalayeh MM, Idrees H, Shah M (2014) NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  18. Khaldi B, Aiadi O, Kherfi ML (2019) Combining colour and grey-level co-occurrence matrix features: a comparative study. IET Image Process 13(9):1401–1410

    Article  Google Scholar 

  19. Khaldi B, Aiadi O, Lamine KM (2020) Image representation using complete multi-texton histogram. Multimed Tools Appl 79(11):8267–8285

    Article  Google Scholar 

  20. Lavrenko V, Manmatha R, Jeon J (2004) A model for learning the semantics of pictures. In: Advances in neural information processing systems

  21. Li Z et al (2021) A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Trans Multimedia Comput Commun Appl (TOMM) 17(1):1–23

    Article  Google Scholar 

  22. Li H, Li W, Zhang H, He X, Zheng M, Song H (2021) Automatic image annotation by sequentially learning from multi-level semantic neighborhoods. IEEE Access 9:135742–135754

    Article  Google Scholar 

  23. Liu W, Tao D, Cheng J, Tang Y (2014) Multiview hessian discriminative sparse coding for image annotation. Comput Vis Image Underst 118:50–60

    Article  Google Scholar 

  24. Liu M et al (2015) Low-rank multi-view learning in matrix completion for multi-label image classification. In: Twenty-ninth AAAI conference on artificial intelligence, 2778–2784

  25. Moran S, Lavrenko V (2014) Sparse kernel learning for image annotation. In: Proceedings of international conference on multimedia retrieval. ACM

  26. Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In First international workshop on multimedia intelligent storage and retrieval management. Citeseer

  27. Nair LR et al (2020) Essentiality for bridging the gap between low and semantic level features in image retrieval systems: an overview. J Ambient Intell Humaniz Comput:1–13

  28. Rao SS et al (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159(7):1665–1680

    Article  Google Scholar 

  29. Salih FAA, Abdulla AA (2021) An efficient two-layer based technique for content-based image retrieval. UHD J Sci Technol 5(1):28–40

    Article  Google Scholar 

  30. Salih SF, Abdulla AA (2021) An improved content based image retrieval technique by exploiting bi-layer concept. UHD J Sci Technol 5(1):1–12

    Article  Google Scholar 

  31. Sharif Razavian A et al (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops

  32. Song H, Wang P, Yun J, Li W, Xue B, Wu G (2020) A weighted topic model learned from local semantic space for automatic image annotation. IEEE Access 8:76411–76422

    Article  Google Scholar 

  33. Srivastava D, Rajitha B, Agarwal S, Singh S (2018) Pattern-based image retrieval using GLCM. Neural Comput & Applic 32:1–14

    Google Scholar 

  34. Sun F, Tang J, Li H, Qi GJ, Huang TS (2014) Multi-label image categorization with sparse factor representation. IEEE Trans Image Process 23(3):1028–1037

    Article  MathSciNet  MATH  Google Scholar 

  35. Thukral R, Kumar A, Arora A (2019) Effect of different thresholding techniques for denoising of emg signals by using different wavelets. In: 2019 2nd international conference on intelligent communication and computational techniques (ICCT). IEEE.

  36. Verma Y, Jawahar C (2012) Image annotation using metric learning in semantic neighbourhoods. In: European conference on computer vision. 2012. Springer

  37. Von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM

  38. Wang C, et al (2009) Multi-label sparse coding for automatic image annotation. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE

  39. Wang W et al (2021) Exploring cross-image pixel contrast for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision

  40. Xue Z, Li G, Huang Q (2016) Joint multi-view representation learning and image tagging. In: Thirtieth AAAI Conference on Artificial Intelligence

  41. Xue Z, Li G, Huang Q (2018) Joint multi-view representation and image annotation via optimal predictive subspace learning. Inf Sci 451:180–194

    Article  MathSciNet  MATH  Google Scholar 

  42. Youcefa A, Kherfi ML, Khaldi B, Aiadi O (2019) Understanding user intention in image retrieval: generalization selection using multiple concept hierarchies. TELKOMNIKA 17(5):2572–2586

    Article  Google Scholar 

  43. Zhang M-L, Wu L (2015) Lift: multi-label learning with label-specific features. IEEE Trans Pattern Anal Mach Intell 37(1):107–120

    Article  MathSciNet  Google Scholar 

  44. Zhou T et al (2020) Motion-attentive transition for zero-shot video object segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aiadi Oussama.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oussama, A., Khaldi, B. & Kherfi, M.L. A fast weighted multi-view Bayesian learning scheme with deep learning for text-based image retrieval from unlabeled galleries. Multimed Tools Appl 82, 10795–10812 (2023). https://doi.org/10.1007/s11042-022-13788-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13788-x

Keywords

Navigation