Multiscale video sequence matching for near-duplicate detection and retrieval


As one of key technologies in content-based near-duplicate detection and video retrieval, video sequence matching can be used to judge whether two videos exist duplicate or near-duplicate segments or not. Despite a lot of research efforts devoted in recent years, how to precisely and efficiently perform sequence matching among videos (which may be subject to complex audio-visual transformations) from a large-scale database still remains a pretty challenging task. To address this problem, this paper proposes a multiscale video sequence matching (MS-VSM) method, which can gradually detect and locate the similar segments between videos from coarse to fine scales. At the coarse scale, it makes use of the Maximum Weight Matching (MWM) algorithm to rapidly select several candidate reference videos from the database for a given query. Then for each candidate video, its most similar segment with respect to the given query is obtained at the middle scale by the Constrained Longest Ascending Matching Subsequence (CLAMS) algorithm, and then can be used to judge whether that candidate exists near-duplicate or not. If so, the precise locations of the near-duplicate segments in both query and reference videos are determined at the fine scale by using bi-directional scanning to check the matching similarity at the segments’ boundaries. As such, the MS-VSM method can achieve excellent near-duplicate detection accuracy and localization precision with a very high processing efficiency. Extensive experiments show that it outperforms several state-of-the-art methods remarkably on several benchmarks.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    Legally, only videos in which the length of identical or similar content is more than a pre-defined value (e.g., 10 seconds) can be treated as duplicates or near-duplicates. According to our sampling method, the frame number of a 10-seconds-video is 6, thus we can set ζ1 = 6 in our experiments.


  1. 1.

    Anguera X, Obrador P, Oliver N (2009) Multimodal video copy detection applied to social media. In: Proceedings of the first SIGMM workshop on Social media. ACM, pp 57–64

  2. 2.

    Cai Y, Tong W, Yang L, Hauptmann AG (2012) Constrained keypoint quantization: towards better bag-of-words model for large-scale multimedia retrieval. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval. ACM, p 16

  3. 3.

    Chen T, Jiang S, Chu L, Huang Q (2011) Detection and location of near-duplicate video sub-clips by finding dense subgraphs. In: Proceedings of the 19th ACM international conference on Multimedia. ACM, pp 1173–1176

  4. 4.

    Chiu C-Y, Wang H-M, Chen C-S (2010) Fast min-hashing indexing and robust spatio-temporal matching for detecting video copies. ACM Trans Multimedia Comput Comm Appl 6(2):23

    Article  Google Scholar 

  5. 5.

    Chiu C-Y, Tsai T-H, Liou Y-C, Han G-W, Chang H-S (2014) Near-duplicate subsequence matching between the continuous stream and large video dataset. IEEE Trans Multimedia 16(7):1952—-1962

    Article  Google Scholar 

  6. 6.

    Coskun B, Sankur B, Memon N (2006) Spatio–temporal transform based video hashing. IEEE Trans Multimedia 8(6):1190–1208

    Article  Google Scholar 

  7. 7.

    Cui P, Wu Z, Jiang S, Huang Q (2010) Fast copy detection based on slice entropy scattergraph. In: 2010 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1236–1241

  8. 8.

    Diego F, Serrat J, López A. M. (2013) Joint spatio-temporal alignment of sequences. IEEE Trans Multimedia 15(6):1377–1387

    Article  Google Scholar 

  9. 9.

    Dong W, Wang Z, Charikar M, Li K (2008) Efficiently matching sets of features with random histograms. In: Proceedings of the 16th ACM international conference on Multimedia. ACM, pp 179–188

  10. 10.

    Douze M, Gaidon A, Jegou H, Marszałek M, Schmid C, et al. (2008) Inria-lears video copy detection system. In: TREC Video Retrieval Evaluation (TRECVID Workshop)

  11. 11.

    Douze M, Jégou H, Schmid C (2010) An image-based approach to video copy detection with spatio-temporal post-filtering. IEEE Trans Multimedia 12(4):257–266

    Article  Google Scholar 

  12. 12.

    Gengembre N, Berrani S-A (2008) A probabilistic framework for fusing frame-based searches within a video copy detection system. In: Proceedings of the 2008 international conference on Content-based image and video retrieval. ACM, pp 211–220

  13. 13.

    Haitsma J, Kalke T (2012) A highly robust audio fingerprinting system. In: Proceedings of Int’l Symp. Music Information Retrieval, Paris, France, pp 107–115

  14. 14.

    Huang T, Tian Y, Gao W, Lu J (2010) Mediaprinting: Identifying multimedia content for digital rights management. Computer 43(12):0028–35

    Article  Google Scholar 

  15. 15.

    Huang Z, Shen HT, Shao J, Cui B, Zhou X (2010) Practical online near-duplicate subsequence detection for continuous video streams. IEEE Trans Multimedia 12(5):386–398

    Article  Google Scholar 

  16. 16.

    Kim C, Vasudev B (2005) Spatiotemporal sequence matching for efficient video copy detection. IEEE Trans Circuits Syst Video Technol 15(1):127–132

    Article  Google Scholar 

  17. 17.

    Kim H-S, Lee J, Liu H, Lee D (2008) Video linkage: group based copied video detection. In: Proceedings of the 2008 international conference on Content-based image and video retrieval. ACM, pp 397–406

  18. 18.

    Kim S, Choi JY, Han S, Ro YM (2014) Adaptive weighted fusion with new spatial and temporal fingerprints for improved video copy detection. Signal Process Image Commun 29(7):788–806

    Article  Google Scholar 

  19. 19.

    Law-To J, Chen L, Joly A, Laptev I, Buisson O, Gouet-Brunet V, Boujemaa N, Stentiford F (2007) Video copy detection: a comparative study. In: Proceedings of ACM Int’l Conf. on Image and Video Retrieval (CIVR’07), Amsterdam, The Netherlands, pp 371–378

  20. 20.

    Law-To J, Joly A, Boujemaa N (2007) Muscle-vcd-2007: a live benchmark for video copy detection

  21. 21.

    Lin J, Duan L -Y, Wang S, Bai Y, Lou Y, Chandrasekhar V, Huang T, Kot A, Gao W (2017) Hnip: Compact deep invariant representations for video matching, localization, and retrieval. IEEE Trans Multimedia 19(9):1968–1983

    Article  Google Scholar 

  22. 22.

    Liu B, Li Z, Yang L, Wang M, Tian X (2011) Real-time video copy-location detection in large-scale repositories. IEEE Multimedia 18(3):22–31

    Article  Google Scholar 

  23. 23.

    Liu H, Lu H, Xue X (2013) A segmentation and graph-based video sequence matching method for video copy detection. IEEE Trans Knowl Data Eng 25(8):1706–1718

    Article  Google Scholar 

  24. 24.

    Liu H, Zhao Q, Wang H, Lv P, Chen Y (2016) An image-based near-duplicate video retrieval and localization using improved edit distance, Multimedia Tools and Applications

  25. 25.

    Liu J, Huang Z, Cai H, Shen HT, Ngo CW, Wang W (2013) Near-duplicate video retrieval: Current research and future trends. ACM Comput Surv 45(4):44

    Article  Google Scholar 

  26. 26.

    Liu J, Huang Z, Shen HT, Cui B (2011) Correlation-based retrieval for heavily changed near-duplicate videos. ACM Trans Inf Syst 29(4):21

    Article  Google Scholar 

  27. 27.

    Liu Y, Zhao W-L, Ngo C-W, Xu C-S, Lu H-Q (2010) Coherent bag-of audio words model for efficient large-scale video copy detection. In: Proceedings of the ACM International Conference on Image and Video Retrieval. ACM, pp 89–96

  28. 28.

    Lowe DG (1999) Object recognition from local scale-invariant features 2:1150–1157

  29. 29.

    Lowe DG (2004) Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vis 60(2):91–110

    MathSciNet  Article  Google Scholar 

  30. 30.

    Malekesmaeili M, Fatourechi M, Ward RK (2009) Video copy detection using temporally informative representative images. In: International Conference on Machine Learning and Applications, 2009. ICMLA’09. IEEE, pp 69–74

  31. 31.

    Mou L, Huang T, Tian Y, Jiang M, Gao W (2013) Content-based copy detection through multimodal feature representation and temporal pyramid matching. ACM Trans Multimed Comput Commun Appl 10(1):5

    Article  Google Scholar 

  32. 32.

    Peng Y, Ngo C-W (2006) Clip-based similarity measure for query-dependent clip retrieval and video summarization. IEEE Trans Circuits Syst Video Technol 16(5):612–627

    Article  Google Scholar 

  33. 33.

    Qian M, Mou L, Li J, Tian Y (2014) Video picture-in-picture detection using spatio-temporal slicing. In: Proceedings of ICME’2014 Workshop on Emerg. Multimedia Sys. and Appl., Chengdu, China

  34. 34.

    Ren J, Chang F, Wood T, Zhang JR (2012) Efficient video copy detection via aligning video signature time series. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval. ACM, p 14

  35. 35.

    Roopalakshmi R, Ram Mohana Reddy G (2013) A novel spatio-temporal registration framework for video copy localization based on multimodal features. Signal Process 93(8):2339–2351

    Article  Google Scholar 

  36. 36.

    Shang L, Yang L, Wang F, Chan K-P, Hua X-S (2010) Real-time large scale near-duplicate web video retrieval. In: Proceedings of the international conference on Multimedia. ACM, pp 531–540

  37. 37.

    Shen HT, Shao J, Huang Z, Zhou X (2009) Effective and efficient query processing for video subsequence identification. IEEE Trans Knowl Data Eng 21(3):321–334

    Article  Google Scholar 

  38. 38.

    Song J, Yang Y, Huang Z, Shen HT, Hong R (2013) Multiple feature hashing for large scale near-duplicate video retrieval. IEEE Trans Multimedia 15(8):1997–2008

    Article  Google Scholar 

  39. 39.

    Song J, Yang Y, Huang Z, Shen HT, Luo J (2013) Effective multiple feature hashing for large-scale near-duplicate video retrieval, . IEEE Trans Multimedia 15(8):1997–2008

    Article  Google Scholar 

  40. 40.

    Tan H-K, Ngo C-W, Hong R, Chua T-S (2009) Scalable detection of partial near-duplicate videos by visual-temporal consistency. In: Proceedings of the 17th ACM international conference on Multimedia. ACM, pp 145–154

  41. 41.

    Tian Y, Huang T, Jiang M, Gao W (2013) Video copy-detection and localization with a scalable cascading framework. IEEE MultiMedia 20(3):72–86

    Article  Google Scholar 

  42. 42.

    Wei S, Zhao Y, Zhu C, Xu C, Zhu Z (2011) Frame fusion for video copy detection. IEEE Trans Circuits Syst Video Technol 21(1):15–28

    Article  Google Scholar 

  43. 43.

    Wu X, Hauptmann AG, Ngo C-W (2007) Practical elimination of near-duplicates from web video search. In: Proceedings of the 15th international conference on Multimedia. ACM, pp 218–227

  44. 44.

    Wu Z, Aizawa K (2014) Self-similarity-based partial near-duplicate video retrieval and alignment. Int J Multimed Inf Retrieval 3(1):1–14

    Article  Google Scholar 

  45. 45.

    Yeh M-C, Cheng K-T (2009) Video copy detection by fast sequence matching. In: Proceedings of the ACM International Conference on Image and Video Retrieval. ACM, p 45

  46. 46.

    Yeh M-C, Cheng K-T (2011) Fast visual retrieval using accelerated sequence matching. IEEE Trans Multimedia 13(2):320–329

    Article  Google Scholar 

  47. 47.

    Zhang L, Zhang B (2014) Quotient space based problem solving: A theoretical foundation of granular computing. Elsevier Inc., Amsterdam

    Google Scholar 

  48. 48.

    Zheng L, Qiu G, Huang J, Fu H (2011) Salient covariance for near-duplicate image and video detection. In: 2011 18th IEEE International Conference on Image Processing (ICIP). IEEE, pp 2537–2540

  49. 49.

    Zhou X, Chen L, Zhou X (2012) Structure tensor series-based large scale near-duplicate video retrieval. IEEE Trans Multimedia 14(4):1220–1233

    Article  Google Scholar 

  50. 50.

    Zhou X, Zhou X, Chen L, Bouguettaya A (2012) Efficient subsequence matching over large video databases. The VLDB J 21:489–508

    Article  Google Scholar 

  51. 51.

    Zhou X, Zhou X, Chen L, Bouguettaya A, Xiao N, Taylor JA (2009) An efficient near-duplicate video shot detection method using shot-based interest points. IEEE Trans Multimedia 11(5):879–891

    Article  Google Scholar 

Download references


This work is partially supported by grants from the National Key R&D Program of China under grant 2017YFB1002401, the National Natural Science Foundation of China under contract No. U1611461, No. 61390515, and No. 61425025.

Author information



Corresponding author

Correspondence to Yuanyuan Yang.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Tian, Y. & Huang, T. Multiscale video sequence matching for near-duplicate detection and retrieval. Multimed Tools Appl 78, 311–336 (2019).

Download citation


  • Near-duplicate video detection
  • Video sequence matching
  • Multiscale matching
  • Constrained longest ascending matching subsequence
  • Bi-directional scanning