Skip to main content

Enhancing heterogeneous similarity estimation via neighborhood reversibility


With the popularity of social networks, people can easily generate rich content with multiple modalities. How to effectively and simply estimate the similarity of multi-modal content is becoming more and more important for providing better information searching service of rich media. This work attempts to enhance the similarity estimation so as to improve the accuracy of multi-modal data searching. Toward this end, a novel multi-modal feature extraction approach, which involves the neighborhood reversibility verifying of information objects with different modalities, is proposed to build reliable similarity estimation among multimedia documents. By verifying the neighborhood reversibility in both single- and multi-modal instances, the reliability of multi-modal subspace can be remarkably improved. In addition, a new adaptive strategy, which fully employs the distance distribution of returned searching instances, is proposed to handle the neighbor selection problem. To further address the out-of-sample problem, a new prediction scheme is proposed to predict the multi-modal features for new coming instances, which is essentially to construct an over-complete set of bases. Extensive experiments demonstrate that introducing the neighborhood reversibility verifying can significantly improve the searching accuracy of multi-modal documents.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. Bokhari M, Hasan F (2013) Multimodal information retrieval: challenges and future trends. Int J Comput Appl 74(14):9–12

    Google Scholar 

  2. Borlund P (2016) Interactive information retrieval: an evaluation perspective. In: Proceedings of the 2016 ACM on conference on human information interaction and retrieval. ACM, pp 151–151

  3. Chandrasekhar V, Sharifi M, Ross DA (2011) Survey and evaluation of audio fingerprinting schemes for mobile audio search. In: ISMIR

  4. Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval. ACM, p 48

  5. Daras P, Manolopoulou S, Axenopoulos A (2012) Search and retrieval of rich media objects supporting multiple multimodal queries. IEEE Trans Multimedia 14(3):734–746

    Article  Google Scholar 

  6. Fan J, Li G, Zhou L, Chen S, Hu J (2012) Seal: spatio-textual similarity search. Proceedings of the VLDB Endowment 5(9):824–835

    Article  Google Scholar 

  7. Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015) Incremental learning for ν-support vector regression. Neural Netw 67:140–150

    Article  Google Scholar 

  8. Jegou H, Schmid C, Harzallah H, Verbeek J (2010) Accurate image search using the contextual dissimilarity measure. IEEE Trans Pattern Anal Mach Intell 32(1):2–11

    Article  Google Scholar 

  9. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM International conference on multimedia, pp 675–678

  10. Johnson J, Krishna R, Stark M, Li LJ, Shamma DA, Bernstein MS, Fei-Fei L (2015) Image retrieval using scene graphs. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3668–3678

  11. Kalpathycramer J, De Herrera AGS, Demnerfushman D, Antani S, Bedrick S, Muller H (2015) Evaluating performance of biomedical image retrieval systems–an overview of the medical image retrieval task at imageclef 2004–2013. Comput Med Imaging Graph 39:55–61

    Article  Google Scholar 

  12. Knight PA (2008) The sinkhorn-knopp algorithm: convergence and applications. SIAM J Matrix Anal Appl 30(1):261–275

    MathSciNet  Article  MATH  Google Scholar 

  13. Li Y, Wang P, Su Y (2015) Robust image hashing based on selective quaternion invariance. IEEE Signal Process Lett 22(12):2396–2400

    Article  Google Scholar 

  14. Li Y, Zeng S, Yang Y (2015) Image matching with multi-order features. IEEE Signal Process Lett 22(12):2214–2218

    Article  Google Scholar 

  15. Mao X, Lin B, Cai D, He X, Pei J Parallel field alignment for cross media retrieval. In: Proceedings of the 21st ACM international conference on Multimedia, ACM, pp 897–906

  16. Masci J, Bronstein M, Bronstein A (2014) J.schmidhuber, Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36(4):824–830

    Article  Google Scholar 

  17. Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 251–260

  18. Ren J, Jiang X, Yuan J (2015) LBP Encoding schemes jointly utilizing the information of current bit and other lbp bits. IEEE Signal Process Lett 22(12):2373–2377

    Article  Google Scholar 

  19. Sánchez J., Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245

    MathSciNet  Article  MATH  Google Scholar 

  20. Shen L, Sun G, Huang Q, Wang S, Lin Z, Wu E (2015) Multi-level discriminative dictionary learning with application to large scale image classification. IEEE Trans Image Process 24(10):3109–3123

    MathSciNet  Article  Google Scholar 

  21. Wang H, Wang J (2014) An effective image representation method using kernel classification. In: 2014 IEEE 26th international conference on tools with artificial intelligence. IEEE, pp 853–858

  22. Wang M, Hua X. -S., Tang J, Hong R (2009) Beyond distance measurement: constructing neighborhood similarity for video annotation. IEEE Trans Multimedia 11 (3):465–476

    Article  Google Scholar 

  23. Wang F, Zuo W, Zhang L, Meng D, Zhang D (2015) A kernel classification framework for metric learning. IEEE Transactions on Neural Networks and Learning Systems 26(9):1950–1962

    MathSciNet  Article  Google Scholar 

  24. Wang J, Shi L, Wang H, Meng J, Wang JJ-Y, Sun Q, Gu Y Optimizing top precision performance measure of content-based image retrieval by learning similarity function. arXiv:1604.06620

  25. Wang J, Zhou Y, Duan K, Wang JJ-Y, Bensmail H (2015) Supervised cross-modal factor analysis for multiple modal data classification. In: 2015 IEEE international conference on systems, man, and cybernetics. IEEE, pp 1882–1888

  26. Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S Modality-dependent cross-media retrieval. ACM Trans Intell Syst Technol 7(4)(57):1–13

  27. Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295:395–406

    Article  Google Scholar 

  28. Wu F, Zhang H, Zhuang Y (2006) Learning semantic correlations for cross-media retrieval. In: IEEE international conference on image processing, pp 1465–1468

  29. Xia Z, Feng X, Peng J, Wu J, Fan J (2015) A regularized optimization framework for tag completion and image retrieval. Neurocomputing 147:500–508

    Article  Google Scholar 

  30. Xia Z, Wang X, Sun X, Wang Q A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans Parallel Distrib Syst

  31. Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: ACM international conference on multimedia, pp 175–184

  32. Zhang H, Weng J (2006) Measuring multi-modality similarities via subspace learning for cross-media retrieval. In: Advances in multimedia information processing, pp 979–988

  33. Zhang S, Yang M, Cour T, Yu K, Metaxas DN (2015) Query specific rank fusion for image retrieval. IEEE Trans Pattern Anal Mach Intell 37(4):803–815

    Article  Google Scholar 

  34. Zhangjie F, Xingming S, Qi L, Lu Z, Jiangang S (2015) Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans Commun 98(1):190–200

    Google Scholar 

  35. Zheng Z, Zhao Y, Wei S, Zhu Z (2013) Neighborhood reversibility verifying for image search. In: IEEE international conference on multimedia and expo (ICME), pp 1–6

  36. Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: SIGIR, pp 415–424

  37. Zhou Z, Wang Y, Wu QJ, Yang C-N, Sun X (2017) Effective and efficient global context verification for image copy detection. IEEE Trans Inf Forensics Secur 12(1):48–63

    Article  Google Scholar 

Download references


This work was supported in part by National Natural Science Foundation of China (No.61572065, No.61532005), Joint Fund of Ministry of Education of China and China Mobile (No.MCM20160102), and Fundamental Research Funds for the Central Universities (No.2015JBM028, No.2015JBZ002).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Shikui Wei.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wei, S., Zhao, Y., Yang, T. et al. Enhancing heterogeneous similarity estimation via neighborhood reversibility. Multimed Tools Appl 77, 1437–1452 (2018).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Neighbourhood reversibility verifying
  • Multi-modal retrieval
  • Adaptive strategy