Abstract
The main challenge of content-based image retrieval systems is the difference between how images are described using algorithms and how humans understand the semantic concepts of an image. To overcome this challenge, many image retrieval methods have focused on scenarios that emphasize important regions of an image. However, losing part of the semantic features of an image is a problem that also exists in these approaches. Therefore, this article introduces a method for image retrieval using the fusion of deep features on a segmented neutrosophic set with the help of the image depth map. By transferring the original image to the neutrosophic domain, the image is decomposed into three levels: true, false, and indeterminate. True and false images have different representations of image brightness. The indeterminate image represents the boundary between the true and false images. It is also a representation of the edges in the image. Convolutional layers of deep neural networks are sensitive to changes in image brightness when extracting feature maps. For this reason, the extracted features from the true and false images are different from each other and can be considered as complementary to each other. In the second step, the image depth map is estimated using a vision transformer. Then the estimated depth map is binarized using a predefined threshold. By applying the binarized depth map to the neutrosophic domain, objects in near and far regions are classified. Effective features of each region are extracted using a pre-trained deep neural network, VGG-16. Important features from each group of images are selected using the Boruta-Shap algorithm. Finally, to reduce redundancy and unify the extracted features, feature fusion is performed in two stages, resulting in the final feature vector for each image. Experimental results confirm that extracting semantic and content features from different regions of an image using the proposed method leads to improved retrieval results and reduces semantic gaps.
Similar content being viewed by others
References
Bai, C., Chen, J., Huang, L., Kpalma, K., Chen, S.: Saliency-based multi-feature modeling for semantic image retrieval. J. Vis. Commun. Image Represent. 50(199), 204 (2018). https://doi.org/10.1016/J.JVCIR.2017.11.021
Wei, X.S., Luo, J.H., Wu, J., Zhou, Z.H.: Selective convolutional descriptor Aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26(6), 2868–2881 (2017). https://doi.org/10.1109/TIP.2017.2688133
Wang, H., Li, Z., Li, Y., Gupta, B.B., Choi, C.: Visual saliency guided complex image retrieval. Pattern Recogn. Lett. 130, 64–72 (2020). https://doi.org/10.1016/J.PATREC.2018.08.010
Pang, S., Zhu, J., Wang, J., Ordonez, V., Xue, J.: Building discriminative CNN image representations for object retrieval using the replicator equation. Pattern Recogn. 83, 150–160 (2018). https://doi.org/10.1016/J.PATCOG.2018.05.010
Pradhan, J., Pal, A.K., Banka, H.: A CBIR system based on saliency driven local image features and multi orientation texture features. J. Vis. Commun. Image Represent. 83, 103396 (2022). https://doi.org/10.1016/J.JVCIR.2021.103396
Lu, F., Liu, G.H.: Image retrieval using object semantic aggregation histogram. Cogn. Comput. (2023). https://doi.org/10.1007/S12559-023-10143-6/METRICS
Alsmadi, M.K.: Content-based image retrieval using color, shape and texture descriptors and features. Arab. J. Sci. Eng. 45(4), 3317–3330 (2020). https://doi.org/10.1007/S13369-020-04384-Y
Eisa, M.: A new approach for enhancing image retrieval using neutrosophic sets. Int. J. Comput. Appl. 95(8), 12–20 (2014). https://doi.org/10.5120/16613-6453
Dhar, S., Kundu, M.K.: Accurate multi-class image segmentation using weak continuity constraints and neutrosophic set. Appl. Soft Comput. 112, 107759 (2021). https://doi.org/10.1016/J.ASOC.2021.107759
Datta, S., Chaki, N., Modak, B.: A novel technique for dental radiographic image segmentation based on neutrosophic logic. Decis. Anal. J. 7, 100223 (2023). https://doi.org/10.1016/J.DAJOUR.2023.100223
Gonzalez-Garcia, A., Modolo, D., Ferrari, V.: Do semantic parts emerge in convolutional neural networks? Int. J. Comput. Vision 126(5), 476–494 (2018). https://doi.org/10.1007/S11263-017-1048-0/FIGURES/15
Khan, S., Rahmani, H., Shah, S.A.A., Bennamoun, M.: A guide to convolutional neural networks for computer vision. Synth. Lect. Comput. Vis. 8(1), 1–207 (2018). https://doi.org/10.2200/S00822ED1V01Y201712COV015
Dong, R., Liu, M., Li, F.: Multilayer convolutional feature aggregation algorithm for image retrieval. Math. Probl. Eng. (2019). https://doi.org/10.1155/2019/9794202
Zhan, Z., Zhou, G., Yang, X.: A method of hierarchical image retrieval for real-time photogrammetry based on multiple features. IEEE Access 8, 21524–21533 (2020). https://doi.org/10.1109/ACCESS.2020.2969287
Bai, C., Huang, L., Pan, X., Zheng, J., Chen, S.: Optimization of deep convolutional neural network for large scale image retrieval. Neurocomputing 303, 60–67 (2018). https://doi.org/10.1016/J.NEUCOM.2018.04.034
Mohite, N.B., Gonde, A.B.: Deep features based medical image retrieval. Multimed. Tools Appl. 81(8), 11379–11392 (2022). https://doi.org/10.1007/S11042-022-12085-X
Gkelios, S., Sophokleous, A., Plakias, S., Boutalis, Y., Chatzichristofis, S.A.: Deep convolutional features for image retrieval. Expert Syst. Appl. 177, 114940 (2021). https://doi.org/10.1016/J.ESWA.2021.114940
Sezavar, A., Farsi, H., Mohamadzadeh, S.: Content-based image retrieval by combining convolutional neural networks and sparse representation. Multimed. Tools Appl. 78(15), 20895–20912 (2019). https://doi.org/10.1007/S11042-019-7321-1
Huang, L., Bai, C., Lu, Y., Zhang, S., Chen, S.: Unsupervised adversarial image retrieval. Multimed. Syst. 28(2), 673–685 (2021). https://doi.org/10.1007/S00530-021-00866-7
Zeiler, M.D., and Fergus, R.: Visualizing and Understanding Convolutional Networks arXiv:1311.2901v3 [cs.CV] 28 Nov 2013. Computer Vision–ECCV, vol. 8689(PART 1), pp. 818–833. (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Simonyan, K., and Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015-Conference Track Proceedings. (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 07–12-June, pp. 1–9. (2015). https://doi.org/10.1109/CVPR.2015.7298594
He, K., Zhang, X., Ren, S., & Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, pp. 770–778. (2016). https://doi.org/10.1109/CVPR.2016.90
Mopuri, K. R., & Babu, R. V.: Object level deep feature pooling for compact image representation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2015-October, pp. 62–70. (2015). https://doi.org/10.1109/CVPRW.2015.7301273
Zhang, B., Wang, Q., Lu, X., Wang, F., Li, P.: Locality-constrained affine subspace coding for image classification and retrieval. Pattern Recogn. 100, 107167 (2020). https://doi.org/10.1016/J.PATCOG.2019.107167
Liu, G.-H., Li, Z.-Y., Yang, J.-Y., Zhang, D.: Exploiting sublimated deep features for image retrieval. Pattern Recognit. (2023). https://doi.org/10.1016/J.PATCOG.2023.110076
Wang, X., Zheng, Z., He, Y., Yan, F., Zeng, Z., Yang, Y.: Progressive local filter pruning for image retrieval acceleration. IEEE Trans. Multimedia (2023). https://doi.org/10.1109/TMM.2023.3256092
Wang, Y.W., Liu, G.H., Deng, Q.L.: Aggregating deep features of multi-CNN models for image retrieval. Neural Process. Lett. (2023). https://doi.org/10.1007/S11063-023-11297-Y/FIGURES/10
Lee, T., Yoon, Y., Chun, C., Ryu, S.: CNN-based road-surface crack detection model that responds to brightness changes. Electronics 10(12), 1402 (2021). https://doi.org/10.3390/electronics10121402
Li, Y., Luo, F., Xiao, C.: Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module. Comput. Vis. Med. 8(4), 631–647 (2022). https://doi.org/10.1007/s41095-022-0279-3
Jain, A., Muthuganapathy, R., and Ramani, K.: Content-based image retrieval using shape and depth from an engineering database. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4842 LNCS(PART 2), pp. 255–264. (2007). https://doi.org/10.1007/978-3-540-76856-2_25/COVER
Rahman, M., Oh, J., Tavanapong, W., and C. de Groen, P.: Content based image retrieval using depth maps for colonoscopy images, pp. 301–308. (2023). https://doi.org/10.5220/0011749100003414
Qiao, Y., Jiao, L., Yang, S., Hou, B.: A Novel segmentation based depth map up-sampling. IEEE Trans. Multimed. 21(1), 1–14 (2019). https://doi.org/10.1109/TMM.2018.2845699
Smarandache, F.: A Unifying Field in Logics: Neutrosophic Logic, Neutrosophy, Neutrosophic Set, Neutrosophic Probability, pp. 1–141. American Research Press, Champaign (1999)
Ranftl, R., Bochkovskiy, A., & Koltun, V.: Vision Transformers for Dense Prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp.12159–12168. (2021). https://doi.org/10.1109/ICCV48922.2021.01196
[PDF] Single-Image Depth Perception in the Wild|Semantic Scholar. (n.d.)
Simonyan, K., & Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv, arXiv:1409.1556. (2014)
Ji, P., Li, R., Bhanu, B., Xu, Y.: MonoIndoor: towards good practice of self-supervised monocular depth estimation for indoor environments. IEEE/CVF Int. Conf. Comput. Vis. (ICCV) 2021, 12767–12776 (2021). https://doi.org/10.1109/ICCV48922.2021.01255
Aich, S., Vianney, J. M. U., Islam, M. A., Kaur, M., and Liu, B.: Bidirectional attention network for monocular depth estimation. In: Proceedings - IEEE International Conference on Robotics and Automation, 2021-May, pp. 11746–11752. (2021). https://doi.org/10.1109/ICRA48506.2021.9560885
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1623–1637 (2022). https://doi.org/10.1109/TPAMI.2020.3019967
Kursa, M.B., Rudnicki, W.R.: Feature selection with the boruta package. J. Stat. Softw. 36(11), 1–13 (2010). https://doi.org/10.18637/JSS.V036.I11
Lundberg, S.M., Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., Lee, S.-I.: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 56–67 (2020). https://doi.org/10.1038/s42256-019-0138-9
Taheri, F., Rahbar, K., Salimi, P.: Effective features in content-based image retrieval from a combination of low-level features and deep Boltzmann machine. Multimed. Tools Appl. 2022, 1–24 (2022). https://doi.org/10.1007/S11042-022-13670-W
Janssens, B., Bogaert, M., Maton, M.: Predicting the next Pogačar: a data analytical approach to detect young professional cycling talents. Ann. Op. Res. (2022). https://doi.org/10.1007/S10479-021-04476-4/TABLES/10
Ghosh, I., Chaudhuri, T.D.: Integrating Navier-Stokes equation and neoteric iForest-BorutaShap-Facebook’s prophet framework for stock market prediction: an application in Indian context. Expert Syst. Appl. 210, 118391 (2022). https://doi.org/10.1016/J.ESWA.2022.118391
Unar, S., Wang, X., Wang, C., Wang, Y.: A decisive content based image retrieval approach for feature fusion in visual and textual images. Knowl.-Based Syst. 179, 8–20 (2019). https://doi.org/10.1016/J.KNOSYS.2019.05.001
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. (2007). https://doi.org/10.1109/CVPR.2007.383172
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A.: Lost in quantization: Improving particular object retrieval in large scale image databases. In: 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR. (2008). https://doi.org/10.1109/CVPR.2008.4587635
Zhou, Z., Wang, X., Li, C., Zeng, M., Li, Z.: Adaptive deep feature aggregation using Fourier transform and low-pass filtering for robust object retrieval. J. Vis. Commun. Image Represent. 72, 102860 (2020). https://doi.org/10.1016/J.JVCIR.2020.102860
Zhou, Y., Fan, H., Gao, S., Yang, Y., Zhang, X., Li, J., and Guo, Y.: Retrieval and Localization with Observation Constraints. In: Proceedings-IEEE International Conference on Robotics and Automation, 2021-May, pp. 5237–5244. (2021). https://doi.org/10.1109/ICRA48506.2021.9560987
Lu, Z., Liu, G.H., Lu, F., Zhang, B.J.: Image retrieval using dual-weighted deep feature descriptor. Int. J. Mach. Learn. Cybern (2023). https://doi.org/10.1007/S13042-022-01654-Z/TABLES/1
Liu, G.H., Yang, J.Y.: Exploiting deep textures for image retrieval. Int. J. Mach. Learn. Cybern. 14(2), 483–494 (2023). https://doi.org/10.1007/S13042-022-01645-0/FIGURES/7
Lu, F., Liu, G.H.: Image retrieval using contrastive weight aggregation histograms. Digit. Signal Process. 123, 103457 (2022). https://doi.org/10.1016/J.DSP.2022.103457
Author information
Authors and Affiliations
Contributions
FT contributed to conceptualization, methodology, visualization, software, validation, writing—original draft, read and approved the final manuscript. KR contributed to conceptualization, writing—review & editing, validation, project administration, supervision, read and approved the final manuscript. ZB contributed to analysis, writing—review & editing, and read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval and informed consent
Ethics approval not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Taheri, F., Rahbar, K. & Beheshtifard, Z. Content-based image retrieval through fusion of deep features extracted from segmented neutrosophic using depth map. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03335-0
Accepted:
Published:
DOI: https://doi.org/10.1007/s00371-024-03335-0