Multimedia Tools and Applications

, Volume 78, Issue 1, pp 311–336 | Cite as

Multiscale video sequence matching for near-duplicate detection and retrieval

  • Yuanyuan YangEmail author
  • Yonghong Tian
  • Tiejun Huang


As one of key technologies in content-based near-duplicate detection and video retrieval, video sequence matching can be used to judge whether two videos exist duplicate or near-duplicate segments or not. Despite a lot of research efforts devoted in recent years, how to precisely and efficiently perform sequence matching among videos (which may be subject to complex audio-visual transformations) from a large-scale database still remains a pretty challenging task. To address this problem, this paper proposes a multiscale video sequence matching (MS-VSM) method, which can gradually detect and locate the similar segments between videos from coarse to fine scales. At the coarse scale, it makes use of the Maximum Weight Matching (MWM) algorithm to rapidly select several candidate reference videos from the database for a given query. Then for each candidate video, its most similar segment with respect to the given query is obtained at the middle scale by the Constrained Longest Ascending Matching Subsequence (CLAMS) algorithm, and then can be used to judge whether that candidate exists near-duplicate or not. If so, the precise locations of the near-duplicate segments in both query and reference videos are determined at the fine scale by using bi-directional scanning to check the matching similarity at the segments’ boundaries. As such, the MS-VSM method can achieve excellent near-duplicate detection accuracy and localization precision with a very high processing efficiency. Extensive experiments show that it outperforms several state-of-the-art methods remarkably on several benchmarks.


Near-duplicate video detection Video sequence matching Multiscale matching Constrained longest ascending matching subsequence Bi-directional scanning 



This work is partially supported by grants from the National Key R&D Program of China under grant 2017YFB1002401, the National Natural Science Foundation of China under contract No. U1611461, No. 61390515, and No. 61425025.


  1. 1.
    Anguera X, Obrador P, Oliver N (2009) Multimodal video copy detection applied to social media. In: Proceedings of the first SIGMM workshop on Social media. ACM, pp 57–64Google Scholar
  2. 2.
    Cai Y, Tong W, Yang L, Hauptmann AG (2012) Constrained keypoint quantization: towards better bag-of-words model for large-scale multimedia retrieval. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval. ACM, p 16Google Scholar
  3. 3.
    Chen T, Jiang S, Chu L, Huang Q (2011) Detection and location of near-duplicate video sub-clips by finding dense subgraphs. In: Proceedings of the 19th ACM international conference on Multimedia. ACM, pp 1173–1176Google Scholar
  4. 4.
    Chiu C-Y, Wang H-M, Chen C-S (2010) Fast min-hashing indexing and robust spatio-temporal matching for detecting video copies. ACM Trans Multimedia Comput Comm Appl 6(2):23CrossRefGoogle Scholar
  5. 5.
    Chiu C-Y, Tsai T-H, Liou Y-C, Han G-W, Chang H-S (2014) Near-duplicate subsequence matching between the continuous stream and large video dataset. IEEE Trans Multimedia 16(7):1952—-1962CrossRefGoogle Scholar
  6. 6.
    Coskun B, Sankur B, Memon N (2006) Spatio–temporal transform based video hashing. IEEE Trans Multimedia 8(6):1190–1208CrossRefGoogle Scholar
  7. 7.
    Cui P, Wu Z, Jiang S, Huang Q (2010) Fast copy detection based on slice entropy scattergraph. In: 2010 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1236–1241Google Scholar
  8. 8.
    Diego F, Serrat J, López A. M. (2013) Joint spatio-temporal alignment of sequences. IEEE Trans Multimedia 15(6):1377–1387CrossRefGoogle Scholar
  9. 9.
    Dong W, Wang Z, Charikar M, Li K (2008) Efficiently matching sets of features with random histograms. In: Proceedings of the 16th ACM international conference on Multimedia. ACM, pp 179–188Google Scholar
  10. 10.
    Douze M, Gaidon A, Jegou H, Marszałek M, Schmid C, et al. (2008) Inria-lears video copy detection system. In: TREC Video Retrieval Evaluation (TRECVID Workshop)Google Scholar
  11. 11.
    Douze M, Jégou H, Schmid C (2010) An image-based approach to video copy detection with spatio-temporal post-filtering. IEEE Trans Multimedia 12(4):257–266CrossRefGoogle Scholar
  12. 12.
    Gengembre N, Berrani S-A (2008) A probabilistic framework for fusing frame-based searches within a video copy detection system. In: Proceedings of the 2008 international conference on Content-based image and video retrieval. ACM, pp 211–220Google Scholar
  13. 13.
    Haitsma J, Kalke T (2012) A highly robust audio fingerprinting system. In: Proceedings of Int’l Symp. Music Information Retrieval, Paris, France, pp 107–115Google Scholar
  14. 14.
    Huang T, Tian Y, Gao W, Lu J (2010) Mediaprinting: Identifying multimedia content for digital rights management. Computer 43(12):0028–35CrossRefGoogle Scholar
  15. 15.
    Huang Z, Shen HT, Shao J, Cui B, Zhou X (2010) Practical online near-duplicate subsequence detection for continuous video streams. IEEE Trans Multimedia 12(5):386–398CrossRefGoogle Scholar
  16. 16.
    Kim C, Vasudev B (2005) Spatiotemporal sequence matching for efficient video copy detection. IEEE Trans Circuits Syst Video Technol 15(1):127–132CrossRefGoogle Scholar
  17. 17.
    Kim H-S, Lee J, Liu H, Lee D (2008) Video linkage: group based copied video detection. In: Proceedings of the 2008 international conference on Content-based image and video retrieval. ACM, pp 397–406Google Scholar
  18. 18.
    Kim S, Choi JY, Han S, Ro YM (2014) Adaptive weighted fusion with new spatial and temporal fingerprints for improved video copy detection. Signal Process Image Commun 29(7):788–806CrossRefGoogle Scholar
  19. 19.
    Law-To J, Chen L, Joly A, Laptev I, Buisson O, Gouet-Brunet V, Boujemaa N, Stentiford F (2007) Video copy detection: a comparative study. In: Proceedings of ACM Int’l Conf. on Image and Video Retrieval (CIVR’07), Amsterdam, The Netherlands, pp 371–378Google Scholar
  20. 20.
    Law-To J, Joly A, Boujemaa N (2007) Muscle-vcd-2007: a live benchmark for video copy detectionGoogle Scholar
  21. 21.
    Lin J, Duan L -Y, Wang S, Bai Y, Lou Y, Chandrasekhar V, Huang T, Kot A, Gao W (2017) Hnip: Compact deep invariant representations for video matching, localization, and retrieval. IEEE Trans Multimedia 19(9):1968–1983CrossRefGoogle Scholar
  22. 22.
    Liu B, Li Z, Yang L, Wang M, Tian X (2011) Real-time video copy-location detection in large-scale repositories. IEEE Multimedia 18(3):22–31CrossRefGoogle Scholar
  23. 23.
    Liu H, Lu H, Xue X (2013) A segmentation and graph-based video sequence matching method for video copy detection. IEEE Trans Knowl Data Eng 25(8):1706–1718CrossRefGoogle Scholar
  24. 24.
    Liu H, Zhao Q, Wang H, Lv P, Chen Y (2016) An image-based near-duplicate video retrieval and localization using improved edit distance, Multimedia Tools and ApplicationsGoogle Scholar
  25. 25.
    Liu J, Huang Z, Cai H, Shen HT, Ngo CW, Wang W (2013) Near-duplicate video retrieval: Current research and future trends. ACM Comput Surv 45(4):44CrossRefGoogle Scholar
  26. 26.
    Liu J, Huang Z, Shen HT, Cui B (2011) Correlation-based retrieval for heavily changed near-duplicate videos. ACM Trans Inf Syst 29(4):21CrossRefGoogle Scholar
  27. 27.
    Liu Y, Zhao W-L, Ngo C-W, Xu C-S, Lu H-Q (2010) Coherent bag-of audio words model for efficient large-scale video copy detection. In: Proceedings of the ACM International Conference on Image and Video Retrieval. ACM, pp 89–96Google Scholar
  28. 28.
    Lowe DG (1999) Object recognition from local scale-invariant features 2:1150–1157Google Scholar
  29. 29.
    Lowe DG (2004) Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vis 60(2):91–110MathSciNetCrossRefGoogle Scholar
  30. 30.
    Malekesmaeili M, Fatourechi M, Ward RK (2009) Video copy detection using temporally informative representative images. In: International Conference on Machine Learning and Applications, 2009. ICMLA’09. IEEE, pp 69–74Google Scholar
  31. 31.
    Mou L, Huang T, Tian Y, Jiang M, Gao W (2013) Content-based copy detection through multimodal feature representation and temporal pyramid matching. ACM Trans Multimed Comput Commun Appl 10(1):5CrossRefGoogle Scholar
  32. 32.
    Peng Y, Ngo C-W (2006) Clip-based similarity measure for query-dependent clip retrieval and video summarization. IEEE Trans Circuits Syst Video Technol 16(5):612–627CrossRefGoogle Scholar
  33. 33.
    Qian M, Mou L, Li J, Tian Y (2014) Video picture-in-picture detection using spatio-temporal slicing. In: Proceedings of ICME’2014 Workshop on Emerg. Multimedia Sys. and Appl., Chengdu, ChinaGoogle Scholar
  34. 34.
    Ren J, Chang F, Wood T, Zhang JR (2012) Efficient video copy detection via aligning video signature time series. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval. ACM, p 14Google Scholar
  35. 35.
    Roopalakshmi R, Ram Mohana Reddy G (2013) A novel spatio-temporal registration framework for video copy localization based on multimodal features. Signal Process 93(8):2339–2351CrossRefGoogle Scholar
  36. 36.
    Shang L, Yang L, Wang F, Chan K-P, Hua X-S (2010) Real-time large scale near-duplicate web video retrieval. In: Proceedings of the international conference on Multimedia. ACM, pp 531–540Google Scholar
  37. 37.
    Shen HT, Shao J, Huang Z, Zhou X (2009) Effective and efficient query processing for video subsequence identification. IEEE Trans Knowl Data Eng 21(3):321–334CrossRefGoogle Scholar
  38. 38.
    Song J, Yang Y, Huang Z, Shen HT, Hong R (2013) Multiple feature hashing for large scale near-duplicate video retrieval. IEEE Trans Multimedia 15(8):1997–2008CrossRefGoogle Scholar
  39. 39.
    Song J, Yang Y, Huang Z, Shen HT, Luo J (2013) Effective multiple feature hashing for large-scale near-duplicate video retrieval, . IEEE Trans Multimedia 15(8):1997–2008CrossRefGoogle Scholar
  40. 40.
    Tan H-K, Ngo C-W, Hong R, Chua T-S (2009) Scalable detection of partial near-duplicate videos by visual-temporal consistency. In: Proceedings of the 17th ACM international conference on Multimedia. ACM, pp 145–154Google Scholar
  41. 41.
    Tian Y, Huang T, Jiang M, Gao W (2013) Video copy-detection and localization with a scalable cascading framework. IEEE MultiMedia 20(3):72–86CrossRefGoogle Scholar
  42. 42.
    Wei S, Zhao Y, Zhu C, Xu C, Zhu Z (2011) Frame fusion for video copy detection. IEEE Trans Circuits Syst Video Technol 21(1):15–28CrossRefGoogle Scholar
  43. 43.
    Wu X, Hauptmann AG, Ngo C-W (2007) Practical elimination of near-duplicates from web video search. In: Proceedings of the 15th international conference on Multimedia. ACM, pp 218–227Google Scholar
  44. 44.
    Wu Z, Aizawa K (2014) Self-similarity-based partial near-duplicate video retrieval and alignment. Int J Multimed Inf Retrieval 3(1):1–14CrossRefGoogle Scholar
  45. 45.
    Yeh M-C, Cheng K-T (2009) Video copy detection by fast sequence matching. In: Proceedings of the ACM International Conference on Image and Video Retrieval. ACM, p 45Google Scholar
  46. 46.
    Yeh M-C, Cheng K-T (2011) Fast visual retrieval using accelerated sequence matching. IEEE Trans Multimedia 13(2):320–329CrossRefGoogle Scholar
  47. 47.
    Zhang L, Zhang B (2014) Quotient space based problem solving: A theoretical foundation of granular computing. Elsevier Inc., AmsterdamzbMATHGoogle Scholar
  48. 48.
    Zheng L, Qiu G, Huang J, Fu H (2011) Salient covariance for near-duplicate image and video detection. In: 2011 18th IEEE International Conference on Image Processing (ICIP). IEEE, pp 2537–2540Google Scholar
  49. 49.
    Zhou X, Chen L, Zhou X (2012) Structure tensor series-based large scale near-duplicate video retrieval. IEEE Trans Multimedia 14(4):1220–1233CrossRefGoogle Scholar
  50. 50.
    Zhou X, Zhou X, Chen L, Bouguettaya A (2012) Efficient subsequence matching over large video databases. The VLDB J 21:489–508CrossRefGoogle Scholar
  51. 51.
    Zhou X, Zhou X, Chen L, Bouguettaya A, Xiao N, Taylor JA (2009) An efficient near-duplicate video shot detection method using shot-based interest points. IEEE Trans Multimedia 11(5):879–891CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Beijing University of Posts and TelecommunicationsBeijingChina
  2. 2.Peking UniversityBeijingChina

Personalised recommendations