World Wide Web

, Volume 22, Issue 2, pp 771–789 | Cite as

Global-view hashing: harnessing global relations in near-duplicate video retrieval

  • Weizhen Jing
  • Xiushan NieEmail author
  • Chaoran Cui
  • Xiaoming Xi
  • Gongping Yang
  • Yilong YinEmail author
Part of the following topical collections:
  1. Special Issue on Deep vs. Shallow: Learning for Emerging Web-scale Data Computing and Applications


Multi-view features are often used in video hashing for near-duplicate video retrieval because of their mutual assistance and complementarity. However, most methods only consider the local available information in multiple features, such as individual or pairwise structural relations, which do not fully utilize the dependent nature of multiple features. We thus propose a global-view hashing (GVH) framework to address the above-mentioned issue; our framework harnesses the global relations among samples characterized by multiple features. In the proposed framework, multiple features of all videos are jointly used to explore a common Hamming space, where the hash functions are obtained by comprehensively utilizing the relations from not only intra-view but also inter-view objects. In addition, the hash function obtained from the proposed GVH can learn multi-bit hash codes in a single iteration. Compared to existing video hashing schemes, the GVH not only globally considers the relations to obtain a more precise retrieval with short-length hash codes but also achieves multi-bit learning in a single iteration. We conduct extensive experiments on the CC_WEB_VIDEO and UQ_VIDEO datasets, and the experimental results show that our proposed method outperforms the state-of-the-art methods. As a side contribution, we will release the codes to facilitate other research.


Video hashing Near-duplicate video retrieval Global view Multi-bit learning 



This work is supported by the National Natural Science Foundation of China (61671274, 61573219, 61701281, 61701280), China Postdoctoral Science Foundation (2016M592190), Shandong Provincial Key Research and Development Plan (2017CXGC1504), the Fostering Project of Dominant Discipline and Talent Team of Shandong Province Higher Education Institutions, and the Fostering Project of Dominant Discipline and Talent Team of SDUFE.


  1. 1.
    Chen, Q., Sun, S.: Hierarchical multi-view fisher discriminant analysis. In: International Conference on Neural Information Processing, pp 289–298. Springer, Berlin (2009)Google Scholar
  2. 2.
    Chou, C.L., Chen, H.T., Lee, S.Y.: Pattern-based near-duplicate video retrieval and localization on Web-scale videos. IEEE Trans. Multimedia 17(3), 382–395 (2015)CrossRefGoogle Scholar
  3. 3.
    Chung, Y.C., Su, I.F., Lee, C., Liu, P.C.: Multiple k nearest neighbor search. World Wide Web 20(2), 371–398 (2017)CrossRefGoogle Scholar
  4. 4.
    Cirakman, O., Gunsel, B., Sengor, N.S., et al.: Content-based copy detection by a subspace learning based video fingerprinting scheme. Multimedia Tools Appl. 71(3), 1381–1409 (2014)CrossRefGoogle Scholar
  5. 5.
    Cui, B., Tung, A.K., Zhang, C., Zhao, Z.: Multiple feature fusion for social media applications. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp 435–446. ACM, New York (2010)Google Scholar
  6. 6.
    Dasgupta, S., Littman, M.L., McAllester, D.A.: Pac generalization bounds for co-training. In: Advances in Neural Information Processing Systems, pp 375–382 (2002)Google Scholar
  7. 7.
    Fu, Y., Cao, L., Guo, G., Huang, T.S.: Multiple feature fusion by subspace learning. In: Proceedings of the 2008 International Conference on Content-Based Image and Video Retrieval, pp 127–134. ACM, New York (2008)Google Scholar
  8. 8.
    Gao, L., Guo, Z., Zhang, H., Xu, X., Shen, H.T.: Video captioning with attention-based lstm and semantic consistency. IEEE Trans. Multimedia 19(9), 2045–2055 (2017)CrossRefGoogle Scholar
  9. 9.
    Hao, Y., Mu, T., Hong, R., Wang, M., An, N., Goulermas, J.Y.: Stochastic multiview hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multimedia 19(1), 1–14 (2017)CrossRefGoogle Scholar
  10. 10.
    Jiang, M.L., Tian, Y.H., Huang, T.J.: Video copy detection using a soft cascade of multimodal features. In: 2012 IEEE International Conference on Multimedia and Expo, pp 374–379. IEEE, Piscataway (2012)Google Scholar
  11. 11.
    Jolliffe, I.: Principal Component Analysis. Wiley Online Library (2002)Google Scholar
  12. 12.
    Kan, M., Shan, S., Zhang, H., Lao, S., Chen, X.: Multi-view discriminant analysis. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 188–194 (2016)CrossRefGoogle Scholar
  13. 13.
    Li, M., Vishal, M.: Robust video hashing via multilinear subspace projections. IEEE Trans. Image Process. 21(10), 4397–4409 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Li, Y., Mou, L., Jiang, M., Su, C., Fang, X., Qian, M., Tian, Y., Wang, Y., Huang, T., Gao, W.: Pku-idm@ trecvid 2010: copy detection with visual-audio feature fusion and sequential pyramid matching. online (2010)
  15. 15.
    Liong, V.E., Lu, J., Tan, Y.P., Zhou, J.: Deep video hashing. IEEE Trans. Multimedia 19(6), 1209–1219 (2017)CrossRefGoogle Scholar
  16. 16.
    Liu, J., Huang, Z., Cai, H., Shen, H.T., Ngo, C.W., Wang, W.: Near-duplicate video retrieval: current research and future trends. ACM Comput. Surv. (CSUR) 45(4), 44 (2013)CrossRefGoogle Scholar
  17. 17.
    Liu, X., Li, Z., Deng, C., Tao, D.: Distributed adaptive binary quantization for fast nearest neighbor search. IEEE Trans. Image Process. A Publ. IEEE Signal Process. Soc. PP(99), 1–1 (2017)zbMATHGoogle Scholar
  18. 18.
    Mou, L., Huang, T., Tian, Y., et al.: Content-based copy detection through multimodal feature representation and temporal pyramid matching. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 10(1), 5 (2013)Google Scholar
  19. 19.
    Nie, X., Qiao, J., Liu, J., Sun, J., Li, X., Liu, W.: Lle-Based video hashing for video identification. In: 2010 IEEE 10th International Conference on Signal Processing (ICSP), pp 1837–1840. IEEE, Piscataway (2010)Google Scholar
  20. 20.
    Nie, L., Wang, M., Zha, Z., Li, G., Chua, T.S.: Multimedia answering: enriching text Qa with media information. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 695–704. ACM, New York (2011)Google Scholar
  21. 21.
    Nie, L., Yan, S., Wang, M., Hong, R., Chua, T.S.: Harvesting visual concepts for image search with complex queries. In: ACM International Conference on Multimedia, pp 59–68 (2012)Google Scholar
  22. 22.
    Nie, X., Liu, J., Sun, J., Wang, L., Yang, X.: Robust video hashing based on representative-dispersive frames. Sci. China Inform. Sci. 56(6), 1–11 (2013)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Nie, X., Yin, Y., Sun, J., Liu, J., Cui, C.: Comprehensive feature-based robust video fingerprinting using tensor model. IEEE Trans. Multimedia 19(4), 785–796 (2017)CrossRefGoogle Scholar
  24. 24.
    Shen, H.T., Zhou, X., Huang, Z., Shao, J., Zhou, X.: Uqlips: a Real-Time Near-Duplicate video clip detection system. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 1374–1377. VLDB Endowment (2007)Google Scholar
  25. 25.
    Shen, X., Shen, F., Sun, Q.S., Yuan, Y.H.: Multi-View latent hashing for efficient multimedia search. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp 831–834. ACM, New York (2015)Google Scholar
  26. 26.
    Song, J, Yang, Y, Huang Z, et al.: Effective multiple feature hashing for large-scale near-duplicate video retrieval[J]. IEEE Trans. Multimedia 15(8), 1997–2008 (2013)CrossRefGoogle Scholar
  27. 27.
    Song, J., Gao, L., Nie, F., Shen, H.T., Yan, Y., Sebe, N.: Optimized graph learning using partial tags and multiple features for image and video annotation. IEEE Trans. Image Process. 25(11), 4999–5011 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Song, J, Gao, L, Liu, L, et al.: Quantization-based hashing: a general framework for scalable image and video retrieval[J]. Pattern Recogn. 75, 175–187 (2018)CrossRefGoogle Scholar
  29. 29.
    Wang, J., Zhang, T., Song, J., Sebe, N., Shen, H.T.: A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1–1 (2017)Google Scholar
  30. 30.
    Wang, X, Gao, L, Wang, P, et al.: Two-stream 3d convnet fusion for action recognition in videos with arbitrary size and length[J]. IEEE Transactions on Multimedia (2017)Google Scholar
  31. 31.
    Wei, S., Zhao, Y., Zhu, C., et al.: Frame fusion for video copy detection. IEEE Trans. Circuits Syst. Video Technol. 21(1), 15–28 (2011)CrossRefGoogle Scholar
  32. 32.
    Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Advances in Neural Information Processing Systems, pp 1753–1760 (2009)Google Scholar
  33. 33.
    Wu, X., Hauptmann, A.G., Ngo, C.W.: Practical elimination of near-duplicates from Web video search. In: Proceedings of the 15th ACM International Conference on Multimedia, pp 218–227. ACM, New York (2007)Google Scholar
  34. 34.
    Yang, G.B., Chen, N., Jiang, Q.: A robust hashing algorithm based on surf for video copy detection. Comput. Secur. 31(1), 33–39 (2012)CrossRefGoogle Scholar
  35. 35.
    Yang, Y., Song, J., Huang, Z., Ma, Z., Sebe, N., Hauptmann, A.G.: Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans. Multimedia 15(3), 572–581 (2013)CrossRefGoogle Scholar
  36. 36.
    Zhang, H., Gao, X., Wu, P., Xu, X.: A cross-media distance metric learning framework based on multi-view correlation mining and matching. World Wide Web 19(2), 181–197 (2016)CrossRefGoogle Scholar
  37. 37.
    Zhang, H., Wang, M., Hong, R., Chua, T.S.: Play and rewind: optimizing binary representations of videos by self-supervised temporal hashing. In: ACM on Multimedia Conference, pp 781–790 (2016)Google Scholar
  38. 38.
    Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 915–928 (2007)CrossRefGoogle Scholar
  39. 39.
    Zhu, X., Li, X., Zhang, S., Ju, C., Wu, X.: Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans. Neural Netw. Learn. Syst. 28(6), 1263–1275 (2017)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyShandong UniversityJinanChina
  2. 2.School of Computer Science and TechnologyShandong University of Finance and EconomicsJinanChina

Personalised recommendations