Abstract
In this paper, we propose a new computationally fast method for text-based image retrieval from unlabeled galleries, where retrieval is formulated as a multi-class learning problem. While most existing methods assign images representing the same concept with equal importance during learning, we propose a weighted multi-view likelihood term to deal with the intra-class variations within training set of each concept. At first, we cluster each training set to detect the concept’s visual appearances (views). Because number of clusters may significantly vary from one set to another, abusively unifying such a hyper-parameter over all the sets could degrade the learning outcomes. We, therefore, propose to automatically and precisely accomplish this task using Davies-Bouldin index. Noting that images are represented using deep features, which are normalized using vanilla-L2 rule to deal with bursty visual features. The proposed multi-view term is constructed by combining multivariate normal probability density functions related to the resulting clusters. This term is then incorporated within a naïve Bayes classifier alongside with the prior probability of the concept, where each component is weighted using Expectation-Maximization (EM) algorithm. Given a textual query, relevant images are the ones that reach the maximum scores of posterior probability, which is calculated using our Bayesian learning scheme. Experimental results on public datasets demonstrate the effectiveness and rapidity of the proposed method compared to several other methods.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
Aggarwal AK (2015) Machine Vision Based Self Position Estimation of Mobile Robots. Int J Electron Commun Eng Technol 6(10)
Aiadi O, Khaldi B, Kherfi ML (2016) Retrieving images from unlabeled photo collections using a textual query. In: Second international conference on pattern analysis and intelligent systems, 218–223
Amiri SH, Jamzad M (2015) Efficient multi-modal fusion on supergraph for scalable image annotation. Pattern Recogn 48(7):2241–2253
Arora K, Aggarwal AK (2018) Approaches for image database retrieval based on color, texture, and shape features. In: Handbook of research on advanced concepts in real-time image and video processing. IGI Global, pp 28–50
Bello-Cerezo R et al (2019) Comparative Evaluation of Hand-Crafted Image Descriptors vs. Off-the-Shelf CNN-Based Features for Colour Texture Classification under Ideal and Realistic Conditions. Appl Sci 9(4):738
Cai X et al (2013) New graph structured sparsity model for multi-label image annotations. In: Proceedings of the IEEE International Conference on Computer Vision
Cao X, Zhang H, Guo X, Liu S, Meng D (2015) Sled: semantic label embedding dictionary representation for multilabel image annotation. IEEE Trans Image Process 24(9):2746–2759
Chen W et al (2021) Deep learning for instance retrieval: a survey. arXiv preprint
Chen J et al (2010) WLD: A robust local image descriptor. IEEE Trans Pattern Anal Mach Intell 32(9):1705–1720
Chen Y, Liu L, Tao J, Chen X, Xia R, Zhang Q, Xiong J, Yang K, Xie J (2021) The image annotation algorithm using convolutional features from intermediate layer of deep learning. Multimed Tools Appl 80(3):4237–4261
Cusano C, Napoletano P, Schettini R (2016) Combining multiple features for color texture classification. J Electron Imaging 25(6):061410
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227
Escalante HJ, Hernández CA, Gonzalez JA, López-López A, Montes M, Morales EF, Enrique Sucar L, Villaseñor L, Grubinger M (2010) The segmented and annotated IAPR TC-12 benchmark. Comput Vis Image Underst 114(4):419–428
Guillaumin M et al (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: 2009 IEEE 12th international conference on computer vision. IEEE
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM
Jing X-Y, Wu F, Li Z, Hu R, Zhang D (2016) Multi-label dictionary learning for image annotation. IEEE Trans Image Process 25(6):2712–2725
Kalayeh MM, Idrees H, Shah M (2014) NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Khaldi B, Aiadi O, Kherfi ML (2019) Combining colour and grey-level co-occurrence matrix features: a comparative study. IET Image Process 13(9):1401–1410
Khaldi B, Aiadi O, Lamine KM (2020) Image representation using complete multi-texton histogram. Multimed Tools Appl 79(11):8267–8285
Lavrenko V, Manmatha R, Jeon J (2004) A model for learning the semantics of pictures. In: Advances in neural information processing systems
Li Z et al (2021) A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Trans Multimedia Comput Commun Appl (TOMM) 17(1):1–23
Li H, Li W, Zhang H, He X, Zheng M, Song H (2021) Automatic image annotation by sequentially learning from multi-level semantic neighborhoods. IEEE Access 9:135742–135754
Liu W, Tao D, Cheng J, Tang Y (2014) Multiview hessian discriminative sparse coding for image annotation. Comput Vis Image Underst 118:50–60
Liu M et al (2015) Low-rank multi-view learning in matrix completion for multi-label image classification. In: Twenty-ninth AAAI conference on artificial intelligence, 2778–2784
Moran S, Lavrenko V (2014) Sparse kernel learning for image annotation. In: Proceedings of international conference on multimedia retrieval. ACM
Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In First international workshop on multimedia intelligent storage and retrieval management. Citeseer
Nair LR et al (2020) Essentiality for bridging the gap between low and semantic level features in image retrieval systems: an overview. J Ambient Intell Humaniz Comput:1–13
Rao SS et al (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159(7):1665–1680
Salih FAA, Abdulla AA (2021) An efficient two-layer based technique for content-based image retrieval. UHD J Sci Technol 5(1):28–40
Salih SF, Abdulla AA (2021) An improved content based image retrieval technique by exploiting bi-layer concept. UHD J Sci Technol 5(1):1–12
Sharif Razavian A et al (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops
Song H, Wang P, Yun J, Li W, Xue B, Wu G (2020) A weighted topic model learned from local semantic space for automatic image annotation. IEEE Access 8:76411–76422
Srivastava D, Rajitha B, Agarwal S, Singh S (2018) Pattern-based image retrieval using GLCM. Neural Comput & Applic 32:1–14
Sun F, Tang J, Li H, Qi GJ, Huang TS (2014) Multi-label image categorization with sparse factor representation. IEEE Trans Image Process 23(3):1028–1037
Thukral R, Kumar A, Arora A (2019) Effect of different thresholding techniques for denoising of emg signals by using different wavelets. In: 2019 2nd international conference on intelligent communication and computational techniques (ICCT). IEEE.
Verma Y, Jawahar C (2012) Image annotation using metric learning in semantic neighbourhoods. In: European conference on computer vision. 2012. Springer
Von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM
Wang C, et al (2009) Multi-label sparse coding for automatic image annotation. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE
Wang W et al (2021) Exploring cross-image pixel contrast for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Xue Z, Li G, Huang Q (2016) Joint multi-view representation learning and image tagging. In: Thirtieth AAAI Conference on Artificial Intelligence
Xue Z, Li G, Huang Q (2018) Joint multi-view representation and image annotation via optimal predictive subspace learning. Inf Sci 451:180–194
Youcefa A, Kherfi ML, Khaldi B, Aiadi O (2019) Understanding user intention in image retrieval: generalization selection using multiple concept hierarchies. TELKOMNIKA 17(5):2572–2586
Zhang M-L, Wu L (2015) Lift: multi-label learning with label-specific features. IEEE Trans Pattern Anal Mach Intell 37(1):107–120
Zhou T et al (2020) Motion-attentive transition for zero-shot video object segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Oussama, A., Khaldi, B. & Kherfi, M.L. A fast weighted multi-view Bayesian learning scheme with deep learning for text-based image retrieval from unlabeled galleries. Multimed Tools Appl 82, 10795–10812 (2023). https://doi.org/10.1007/s11042-022-13788-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13788-x