Skip to main content

Finding Near-Duplicate Videos in Large-Scale Collections

  • Chapter
  • First Online:
Video Verification in the Fake News Era

Abstract

This chapter discusses the problem of Near-Duplicate Video Retrieval (NDVR). The main objective of a typical NDVR approach is: given a query video, retrieve all near-duplicate videos in a video repository and rank them based on their similarity to the query. Several approaches have been introduced in the literature, which can be roughly classified in three categories based on the level of video matching, i.e., (i) video-level, (ii) frame-level, and (iii) filter-and-refine matching. Two methods based on video-level matching are presented in this chapter. The first is an unsupervised scheme that relies on a modified Bag-of-Words (BoW) video representation. The second is a s upervised method based on Deep Metric Learning (DML). For the development of both methods, features are extracted from the intermediate layers of Convolutional Neural Networks and leveraged as frame descriptors, since they offer a compact and informative image representation, and lead to increased system efficiency. Extensive evaluation has been conducted on publicly available benchmark datasets, and the presented methods are compared with state-of-the-art approaches, achieving the best results in all evaluation setups.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://blogs.wsj.com/metropolis/2011/04/28/weather-journal-clouds-gathered-but-no-tornado-damage/ (accessed on March 2019).

  2. 2.

    https://images.google.com/ (accessed on March 2019).

  3. 3.

    https://www.tineye.com/ (accessed on March 2019).

  4. 4.

    https://www.youtube.com/yt/about/press/ (accessed on March 2019).

  5. 5.

    https://github.com/BVLC/caffe/wiki/Model-Zoo (accessed on March 2019).

  6. 6.

    http://spark.apache.org (accessed on March 2019).

  7. 7.

    The features have been learned on the ImageNet dataset, since pretrained networks are utilized. However, ImageNet is a comprehensive dataset, so the learned features can be used in other computer vision tasks (i.e., NDVR) without the need of retraining.

References

  1. Silverman C et al (2014) Verification handbook. European Journalism Center, EJC

    Google Scholar 

  2. Wu X, Hauptmann AG, Ngo CW (2007) Practical elimination of near-duplicates from web video search. In: Proceedings of the 15th ACM international conference on multimedia. ACM, pp 218–227

    Google Scholar 

  3. Revaud J, Douze M, Schmid C, Jégou H (2013) Event retrieval in large video collections with circulant temporal encoding. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2459–2466

    Google Scholar 

  4. Basharat A, Zhai Y, Shah M (2008) Content based video matching using spatiotemporal volumes. Comput Vis Image Underst 110(3):360–377

    Article  Google Scholar 

  5. Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) CNN Features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813

    Google Scholar 

  6. Zheng L, Zhao Y, Wang S, Wang J, Tian Q (2016) Good practice in CNN feature transfer. arXiv preprint arXiv:1604.00133

  7. Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris Y (2017) Near-duplicate video retrieval by aggregating intermediate CNN layers. In: International conference on multimedia modeling. Springer, Berlin, pp 251–263

    Google Scholar 

  8. Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris Y (2017) Near-duplicate video retrieval with deep metric learning. In: 2017 IEEE international conference on computer vision workshop (ICCVW). IEEE, pp 347–356

    Google Scholar 

  9. Liu H, Lu H, Xue X (2013) A segmentation and graph-based video sequence matching method for video copy detection. IEEE Trans Knowl Data Eng 25(8):1706–1718

    Article  Google Scholar 

  10. Liu J, Huang Z, Cai H, Shen HT, Ngo CW, Wang W (2013) Near-duplicate video retrieval: Current research and future trends. ACM Comput Surv (CSUR) 45(4):44

    Article  Google Scholar 

  11. Shen HT, Zhou X, Huang Z, Shao J, Zhou X (2007) UQLIPS: a real-time near-duplicate video clip detection system. In: Proceedings of the 33rd international conference on very large data bases. VLDB Endowment, pp 1374–1377

    Google Scholar 

  12. Cherubini M, De Oliveira R, Oliver N (2009) Understanding near-duplicate videos: a user-centric approach. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 35–44

    Google Scholar 

  13. Law-To J, Joly A, Boujemaa N (2007) Muscle-VCD-2007: a live benchmark for video copy detection

    Google Scholar 

  14. Shang L, Yang L, Wang F, Chan KP, Hua XS (2010) Real-time large scale near-duplicate web video retrieval. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 531–540

    Google Scholar 

  15. Song J, Yang Y, Huang Z, Shen HT, Luo J (2013) Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans Multimedia 15(8):1997–2008

    Article  Google Scholar 

  16. Hao Y, Mu T, Hong R, Wang M, An N, Goulermas JY (2017) Stochastic multiview hashing for large-scale near-duplicate video retrieval. IEEE Trans Multimedia 19(1):1–14

    Article  Google Scholar 

  17. Jing W, Nie X, Cui C, Xi X, Yang G, Yin Y (2018) Global-view hashing: harnessing global relations in near-duplicate video retrieval. World wide web, pp 1–19

    Google Scholar 

  18. Liu, L., Lai, W., Hua, X.S., Yang, S.Q.: Video histogram: a novel video signature for efficient web video duplicate detection. In: International conference on multimedia modeling. Springer, pp 94–103

    Google Scholar 

  19. Huang Z, Shen HT, Shao J, Zhou X, Cui B (2009) Bounded coordinate system indexing for real-time video clip search. ACM Trans Inf Syst (TOIS) 27(3):17

    Article  Google Scholar 

  20. Cai Y, Yang L, Ping W, Wang F, Mei T, Hua XS, Li S (2011) Million-scale near-duplicate video retrieval system. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 837–838

    Google Scholar 

  21. Song J, Yang Y, Huang Z, Shen HT, Hong R (2011) Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 423–432

    Google Scholar 

  22. Hao Y, Mu T, Goulermas JY, Jiang J, Hong R, Wang M (2017) Unsupervised t-distributed video hashing and its deep hashing extension. IEEE Trans Image Process 26(11):5531–5544

    Article  MathSciNet  Google Scholar 

  23. Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928

    Article  Google Scholar 

  24. Huang J, Kumar SR, Mitra M, Zhu WJ, Zabih R (1997) Image indexing using color correlograms. In: 1997 IEEE computer society conference on computer vision and pattern recognition, 1997. Proceedings. IEEE, pp 762–768

    Google Scholar 

  25. Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Ninth IEEE international conference on computer vision, 2003. Proceedings. IEEE, pp 1470–1477

    Google Scholar 

  26. Tan HK, Ngo CW, Hong R, Chua TS (2009) Scalable detection of partial near-duplicate videos by visual-temporal consistency. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 145–154

    Google Scholar 

  27. Douze M, Jégou H, Schmid C (2010) An image-based approach to video copy detection with spatio-temporal post-filtering. IEEE Trans Multimedia 12(4):257–266

    Article  Google Scholar 

  28. Jiang YG, Wang J (2016) Partial copy detection in videos: A benchmark and an evaluation of popular methods. IEEE Trans Big Data 2(1):32–42

    Article  Google Scholar 

  29. Wang L, Bao Y, Li H, Fan X, Luo Z (2017) Compact CNN based video representation for efficient video copy detection. In: International conference on multimedia modeling. Springer, pp 576–587

    Google Scholar 

  30. Huang Z, Shen HT, Shao J, Cui B, Zhou X (2010) Practical online near-duplicate subsequence detection for continuous video streams. IEEE Trans Multimedia 12(5):386–398

    Article  Google Scholar 

  31. Wu Z, Aizawa K (2014) Self-similarity-based partial near-duplicate video retrieval and alignment. Int J Multimed Inf Retr 3(1):1–14

    Article  Google Scholar 

  32. Poullot S, Tsukatani S, Phuong Nguyen A, Jégou H, Satoh S (2015) Temporal matching kernel with explicit feature maps. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 381–390

    Google Scholar 

  33. Baraldi L, Douze M, Cucchiara R, Jégou H (2018) LAMV: learning to align and match videos with kernelized temporal layers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7804–7813

    Google Scholar 

  34. Jiang YG, Jiang Y, Wang J (2014) VCDB: a large-scale database for partial copy detection in videos. In: European conference on computer vision. Springer, pp 357–371

    Google Scholar 

  35. Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis 60(1):63–86

    Article  Google Scholar 

  36. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  37. Heikkilä M, Pietikäinen M, Schmid C (2009) Description of interest regions with local binary patterns. Pattern Recognit 42(3):425–436

    Article  Google Scholar 

  38. Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision (ECCV 2008). Springer, pp 304–317

    Google Scholar 

  39. Tian Y, Huang T, Jiang M, Gao W (2013) Video copy-detection and localization with a scalable cascading framework. IEEE MultiMedia 20(3):72–86

    Article  Google Scholar 

  40. Jiang M, Tian Y, Huang T (2012) Video copy detection using a soft cascade of multimodal features. In: Proceedings of the 2012 IEEE international conference on multimedia and expo. IEEE, pp 374–379

    Google Scholar 

  41. Tian Y, Qian M, Huang T (2015) Tasc: a transformation-aware soft cascading approach for multimodal video copy detection. ACM Trans Inf Syst (TOIS) 33(2):7

    Article  Google Scholar 

  42. Chou CL, Chen HT, Lee SY (2015) Pattern-based near-duplicate video retrieval and localization on web-scale videos. IEEE Trans Multimedia 17(3):382–395

    Article  Google Scholar 

  43. Tian Y, Jiang M, Mou L, Rang X, Huang T (2011) A multimodal video copy detection approach with sequential pyramid matching. In: 2011 18th IEEE international conference on image processing. IEEE, pp 3629–3632

    Google Scholar 

  44. Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359. http://dx.doi.org/10.1016/j.cviu.2007.09.014

    Article  Google Scholar 

  45. Chen J, Huang T (2008) A robust feature extraction algorithm for audio fingerprinting. In: Pacific-rim conference on multimedia. Springer, pp 887–890

    Google Scholar 

  46. Yang Y, Tian Y, Huang T (2018) Multiscale video sequence matching for near-duplicate detection and retrieval. Multimed Tools Appl, 1–26

    Google Scholar 

  47. Kraaij W, Awad G (2011) TRECVID 2011 content-based copy detection: task overview. Online Proceedigs of TRECVid 2010

    Google Scholar 

  48. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678

    Google Scholar 

  49. Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris I (2019) FIVR: fine-grained incident video retrieval. IEEE Trans Multimedia

    Google Scholar 

  50. TREC Video Retrieval Evaluation: TRECVID (2018) https://trecvid.nist.gov/

  51. Yue-Hei Ng J, Yang F, Davis LS (2015) Exploiting local features from deep networks for image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 53–61

    Google Scholar 

  52. Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3304–3311

    Google Scholar 

  53. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (2012), pp 1097–1105

    Google Scholar 

  54. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR arXiv:abs/1409.1556

  55. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2015), pp 1–9

    Google Scholar 

  56. Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. Proc VLDB Endowment 5(7):622–633

    Article  Google Scholar 

  57. Theano Development Team (2016) Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints arXiv:abs/1605.02688

  58. Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  59. Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004, vol 2. IEEE, pp II–II

    Google Scholar 

  60. Van der Walt S, Schönberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, Gouillart E, Yu T (2014) scikit-image: image processing in python. PeerJ 2:e453

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgos Kordopatis-Zilos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., Kompatsiaris, I. (2019). Finding Near-Duplicate Videos in Large-Scale Collections. In: Mezaris, V., Nixon, L., Papadopoulos, S., Teyssou, D. (eds) Video Verification in the Fake News Era. Springer, Cham. https://doi.org/10.1007/978-3-030-26752-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26752-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26751-3

  • Online ISBN: 978-3-030-26752-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics