Multimedia Tools and Applications

, Volume 77, Issue 1, pp 1437–1452 | Cite as

Enhancing heterogeneous similarity estimation via neighborhood reversibility

  • Shikui WeiEmail author
  • Yao Zhao
  • Tao Yang
  • Zhili Zhou
  • Shiming Ge


With the popularity of social networks, people can easily generate rich content with multiple modalities. How to effectively and simply estimate the similarity of multi-modal content is becoming more and more important for providing better information searching service of rich media. This work attempts to enhance the similarity estimation so as to improve the accuracy of multi-modal data searching. Toward this end, a novel multi-modal feature extraction approach, which involves the neighborhood reversibility verifying of information objects with different modalities, is proposed to build reliable similarity estimation among multimedia documents. By verifying the neighborhood reversibility in both single- and multi-modal instances, the reliability of multi-modal subspace can be remarkably improved. In addition, a new adaptive strategy, which fully employs the distance distribution of returned searching instances, is proposed to handle the neighbor selection problem. To further address the out-of-sample problem, a new prediction scheme is proposed to predict the multi-modal features for new coming instances, which is essentially to construct an over-complete set of bases. Extensive experiments demonstrate that introducing the neighborhood reversibility verifying can significantly improve the searching accuracy of multi-modal documents.


Neighbourhood reversibility verifying Multi-modal retrieval Adaptive strategy 



This work was supported in part by National Natural Science Foundation of China (No.61572065, No.61532005), Joint Fund of Ministry of Education of China and China Mobile (No.MCM20160102), and Fundamental Research Funds for the Central Universities (No.2015JBM028, No.2015JBZ002).


  1. 1.
    Bokhari M, Hasan F (2013) Multimodal information retrieval: challenges and future trends. Int J Comput Appl 74(14):9–12Google Scholar
  2. 2.
    Borlund P (2016) Interactive information retrieval: an evaluation perspective. In: Proceedings of the 2016 ACM on conference on human information interaction and retrieval. ACM, pp 151–151Google Scholar
  3. 3.
    Chandrasekhar V, Sharifi M, Ross DA (2011) Survey and evaluation of audio fingerprinting schemes for mobile audio search. In: ISMIRGoogle Scholar
  4. 4.
    Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval. ACM, p 48Google Scholar
  5. 5.
    Daras P, Manolopoulou S, Axenopoulos A (2012) Search and retrieval of rich media objects supporting multiple multimodal queries. IEEE Trans Multimedia 14(3):734–746CrossRefGoogle Scholar
  6. 6.
    Fan J, Li G, Zhou L, Chen S, Hu J (2012) Seal: spatio-textual similarity search. Proceedings of the VLDB Endowment 5(9):824–835CrossRefGoogle Scholar
  7. 7.
    Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015) Incremental learning for ν-support vector regression. Neural Netw 67:140–150CrossRefGoogle Scholar
  8. 8.
    Jegou H, Schmid C, Harzallah H, Verbeek J (2010) Accurate image search using the contextual dissimilarity measure. IEEE Trans Pattern Anal Mach Intell 32(1):2–11CrossRefGoogle Scholar
  9. 9.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM International conference on multimedia, pp 675–678Google Scholar
  10. 10.
    Johnson J, Krishna R, Stark M, Li LJ, Shamma DA, Bernstein MS, Fei-Fei L (2015) Image retrieval using scene graphs. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3668–3678Google Scholar
  11. 11.
    Kalpathycramer J, De Herrera AGS, Demnerfushman D, Antani S, Bedrick S, Muller H (2015) Evaluating performance of biomedical image retrieval systems–an overview of the medical image retrieval task at imageclef 2004–2013. Comput Med Imaging Graph 39:55–61CrossRefGoogle Scholar
  12. 12.
    Knight PA (2008) The sinkhorn-knopp algorithm: convergence and applications. SIAM J Matrix Anal Appl 30(1):261–275MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Li Y, Wang P, Su Y (2015) Robust image hashing based on selective quaternion invariance. IEEE Signal Process Lett 22(12):2396–2400CrossRefGoogle Scholar
  14. 14.
    Li Y, Zeng S, Yang Y (2015) Image matching with multi-order features. IEEE Signal Process Lett 22(12):2214–2218CrossRefGoogle Scholar
  15. 15.
    Mao X, Lin B, Cai D, He X, Pei J Parallel field alignment for cross media retrieval. In: Proceedings of the 21st ACM international conference on Multimedia, ACM, pp 897–906Google Scholar
  16. 16.
    Masci J, Bronstein M, Bronstein A (2014) J.schmidhuber, Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36(4):824–830CrossRefGoogle Scholar
  17. 17.
    Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 251–260Google Scholar
  18. 18.
    Ren J, Jiang X, Yuan J (2015) LBP Encoding schemes jointly utilizing the information of current bit and other lbp bits. IEEE Signal Process Lett 22(12):2373–2377CrossRefGoogle Scholar
  19. 19.
    Sánchez J., Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Shen L, Sun G, Huang Q, Wang S, Lin Z, Wu E (2015) Multi-level discriminative dictionary learning with application to large scale image classification. IEEE Trans Image Process 24(10):3109–3123MathSciNetCrossRefGoogle Scholar
  21. 21.
    Wang H, Wang J (2014) An effective image representation method using kernel classification. In: 2014 IEEE 26th international conference on tools with artificial intelligence. IEEE, pp 853–858Google Scholar
  22. 22.
    Wang M, Hua X. -S., Tang J, Hong R (2009) Beyond distance measurement: constructing neighborhood similarity for video annotation. IEEE Trans Multimedia 11 (3):465–476CrossRefGoogle Scholar
  23. 23.
    Wang F, Zuo W, Zhang L, Meng D, Zhang D (2015) A kernel classification framework for metric learning. IEEE Transactions on Neural Networks and Learning Systems 26(9):1950–1962MathSciNetCrossRefGoogle Scholar
  24. 24.
    Wang J, Shi L, Wang H, Meng J, Wang JJ-Y, Sun Q, Gu Y Optimizing top precision performance measure of content-based image retrieval by learning similarity function. arXiv:1604.06620
  25. 25.
    Wang J, Zhou Y, Duan K, Wang JJ-Y, Bensmail H (2015) Supervised cross-modal factor analysis for multiple modal data classification. In: 2015 IEEE international conference on systems, man, and cybernetics. IEEE, pp 1882–1888Google Scholar
  26. 26.
    Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S Modality-dependent cross-media retrieval. ACM Trans Intell Syst Technol 7(4)(57):1–13Google Scholar
  27. 27.
    Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295:395–406CrossRefGoogle Scholar
  28. 28.
    Wu F, Zhang H, Zhuang Y (2006) Learning semantic correlations for cross-media retrieval. In: IEEE international conference on image processing, pp 1465–1468Google Scholar
  29. 29.
    Xia Z, Feng X, Peng J, Wu J, Fan J (2015) A regularized optimization framework for tag completion and image retrieval. Neurocomputing 147:500–508CrossRefGoogle Scholar
  30. 30.
    Xia Z, Wang X, Sun X, Wang Q A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans Parallel Distrib SystGoogle Scholar
  31. 31.
    Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: ACM international conference on multimedia, pp 175–184Google Scholar
  32. 32.
    Zhang H, Weng J (2006) Measuring multi-modality similarities via subspace learning for cross-media retrieval. In: Advances in multimedia information processing, pp 979–988Google Scholar
  33. 33.
    Zhang S, Yang M, Cour T, Yu K, Metaxas DN (2015) Query specific rank fusion for image retrieval. IEEE Trans Pattern Anal Mach Intell 37(4):803–815CrossRefGoogle Scholar
  34. 34.
    Zhangjie F, Xingming S, Qi L, Lu Z, Jiangang S (2015) Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans Commun 98(1):190–200Google Scholar
  35. 35.
    Zheng Z, Zhao Y, Wei S, Zhu Z (2013) Neighborhood reversibility verifying for image search. In: IEEE international conference on multimedia and expo (ICME), pp 1–6Google Scholar
  36. 36.
    Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: SIGIR, pp 415–424Google Scholar
  37. 37.
    Zhou Z, Wang Y, Wu QJ, Yang C-N, Sun X (2017) Effective and efficient global context verification for image copy detection. IEEE Trans Inf Forensics Secur 12(1):48–63CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Institute of Information ScienceBeijing Jiaotong UniversityBeijingChina
  2. 2.School of Computer and SoftwareNanjing University of Information Science and TechnologyNanjingChina
  3. 3.Institute of Information EngineeringChinese Academy of SciencesBeijingChina

Personalised recommendations