Finding Near-Duplicate Videos in Large-Scale Collections

Kordopatis-Zilos, Giorgos; Papadopoulos, Symeon; Patras, Ioannis; Kompatsiaris, Ioannis

doi:10.1007/978-3-030-26752-0_4

Giorgos Kordopatis-Zilos^5,6,
Symeon Papadopoulos⁵,
Ioannis Patras⁶ &
…
Ioannis Kompatsiaris⁵

995 Accesses
3 Citations

Abstract

This chapter discusses the problem of Near-Duplicate Video Retrieval (NDVR). The main objective of a typical NDVR approach is: given a query video, retrieve all near-duplicate videos in a video repository and rank them based on their similarity to the query. Several approaches have been introduced in the literature, which can be roughly classified in three categories based on the level of video matching, i.e., (i) video-level, (ii) frame-level, and (iii) filter-and-refine matching. Two methods based on video-level matching are presented in this chapter. The first is an unsupervised scheme that relies on a modified Bag-of-Words (BoW) video representation. The second is a s upervised method based on Deep Metric Learning (DML). For the development of both methods, features are extracted from the intermediate layers of Convolutional Neural Networks and leveraged as frame descriptors, since they offer a compact and informative image representation, and lead to increased system efficiency. Extensive evaluation has been conducted on publicly available benchmark datasets, and the presented methods are compared with state-of-the-art approaches, achieving the best results in all evaluation setups.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://blogs.wsj.com/metropolis/2011/04/28/weather-journal-clouds-gathered-but-no-tornado-damage/ (accessed on March 2019).
2.
https://images.google.com/ (accessed on March 2019).
3.
https://www.tineye.com/ (accessed on March 2019).
4.
https://www.youtube.com/yt/about/press/ (accessed on March 2019).
5.
https://github.com/BVLC/caffe/wiki/Model-Zoo (accessed on March 2019).
6.
http://spark.apache.org (accessed on March 2019).
7.
The features have been learned on the ImageNet dataset, since pretrained networks are utilized. However, ImageNet is a comprehensive dataset, so the learned features can be used in other computer vision tasks (i.e., NDVR) without the need of retraining.

References

Silverman C et al (2014) Verification handbook. European Journalism Center, EJC
Google Scholar
Wu X, Hauptmann AG, Ngo CW (2007) Practical elimination of near-duplicates from web video search. In: Proceedings of the 15th ACM international conference on multimedia. ACM, pp 218–227
Google Scholar
Revaud J, Douze M, Schmid C, Jégou H (2013) Event retrieval in large video collections with circulant temporal encoding. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2459–2466
Google Scholar
Basharat A, Zhai Y, Shah M (2008) Content based video matching using spatiotemporal volumes. Comput Vis Image Underst 110(3):360–377
Article Google Scholar
Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) CNN Features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813
Google Scholar
Zheng L, Zhao Y, Wang S, Wang J, Tian Q (2016) Good practice in CNN feature transfer. arXiv preprint arXiv:1604.00133
Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris Y (2017) Near-duplicate video retrieval by aggregating intermediate CNN layers. In: International conference on multimedia modeling. Springer, Berlin, pp 251–263
Google Scholar
Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris Y (2017) Near-duplicate video retrieval with deep metric learning. In: 2017 IEEE international conference on computer vision workshop (ICCVW). IEEE, pp 347–356
Google Scholar
Liu H, Lu H, Xue X (2013) A segmentation and graph-based video sequence matching method for video copy detection. IEEE Trans Knowl Data Eng 25(8):1706–1718
Article Google Scholar
Liu J, Huang Z, Cai H, Shen HT, Ngo CW, Wang W (2013) Near-duplicate video retrieval: Current research and future trends. ACM Comput Surv (CSUR) 45(4):44
Article Google Scholar
Shen HT, Zhou X, Huang Z, Shao J, Zhou X (2007) UQLIPS: a real-time near-duplicate video clip detection system. In: Proceedings of the 33rd international conference on very large data bases. VLDB Endowment, pp 1374–1377
Google Scholar
Cherubini M, De Oliveira R, Oliver N (2009) Understanding near-duplicate videos: a user-centric approach. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 35–44
Google Scholar
Law-To J, Joly A, Boujemaa N (2007) Muscle-VCD-2007: a live benchmark for video copy detection
Google Scholar
Shang L, Yang L, Wang F, Chan KP, Hua XS (2010) Real-time large scale near-duplicate web video retrieval. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 531–540
Google Scholar
Song J, Yang Y, Huang Z, Shen HT, Luo J (2013) Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans Multimedia 15(8):1997–2008
Article Google Scholar
Hao Y, Mu T, Hong R, Wang M, An N, Goulermas JY (2017) Stochastic multiview hashing for large-scale near-duplicate video retrieval. IEEE Trans Multimedia 19(1):1–14
Article Google Scholar
Jing W, Nie X, Cui C, Xi X, Yang G, Yin Y (2018) Global-view hashing: harnessing global relations in near-duplicate video retrieval. World wide web, pp 1–19
Google Scholar
Liu, L., Lai, W., Hua, X.S., Yang, S.Q.: Video histogram: a novel video signature for efficient web video duplicate detection. In: International conference on multimedia modeling. Springer, pp 94–103
Google Scholar
Huang Z, Shen HT, Shao J, Zhou X, Cui B (2009) Bounded coordinate system indexing for real-time video clip search. ACM Trans Inf Syst (TOIS) 27(3):17
Article Google Scholar
Cai Y, Yang L, Ping W, Wang F, Mei T, Hua XS, Li S (2011) Million-scale near-duplicate video retrieval system. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 837–838
Google Scholar
Song J, Yang Y, Huang Z, Shen HT, Hong R (2011) Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 423–432
Google Scholar
Hao Y, Mu T, Goulermas JY, Jiang J, Hong R, Wang M (2017) Unsupervised t-distributed video hashing and its deep hashing extension. IEEE Trans Image Process 26(11):5531–5544
Article MathSciNet Google Scholar
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928
Article Google Scholar
Huang J, Kumar SR, Mitra M, Zhu WJ, Zabih R (1997) Image indexing using color correlograms. In: 1997 IEEE computer society conference on computer vision and pattern recognition, 1997. Proceedings. IEEE, pp 762–768
Google Scholar
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Ninth IEEE international conference on computer vision, 2003. Proceedings. IEEE, pp 1470–1477
Google Scholar
Tan HK, Ngo CW, Hong R, Chua TS (2009) Scalable detection of partial near-duplicate videos by visual-temporal consistency. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 145–154
Google Scholar
Douze M, Jégou H, Schmid C (2010) An image-based approach to video copy detection with spatio-temporal post-filtering. IEEE Trans Multimedia 12(4):257–266
Article Google Scholar
Jiang YG, Wang J (2016) Partial copy detection in videos: A benchmark and an evaluation of popular methods. IEEE Trans Big Data 2(1):32–42
Article Google Scholar
Wang L, Bao Y, Li H, Fan X, Luo Z (2017) Compact CNN based video representation for efficient video copy detection. In: International conference on multimedia modeling. Springer, pp 576–587
Google Scholar
Huang Z, Shen HT, Shao J, Cui B, Zhou X (2010) Practical online near-duplicate subsequence detection for continuous video streams. IEEE Trans Multimedia 12(5):386–398
Article Google Scholar
Wu Z, Aizawa K (2014) Self-similarity-based partial near-duplicate video retrieval and alignment. Int J Multimed Inf Retr 3(1):1–14
Article Google Scholar
Poullot S, Tsukatani S, Phuong Nguyen A, Jégou H, Satoh S (2015) Temporal matching kernel with explicit feature maps. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 381–390
Google Scholar
Baraldi L, Douze M, Cucchiara R, Jégou H (2018) LAMV: learning to align and match videos with kernelized temporal layers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7804–7813
Google Scholar
Jiang YG, Jiang Y, Wang J (2014) VCDB: a large-scale database for partial copy detection in videos. In: European conference on computer vision. Springer, pp 357–371
Google Scholar
Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis 60(1):63–86
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Heikkilä M, Pietikäinen M, Schmid C (2009) Description of interest regions with local binary patterns. Pattern Recognit 42(3):425–436
Article Google Scholar
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision (ECCV 2008). Springer, pp 304–317
Google Scholar
Tian Y, Huang T, Jiang M, Gao W (2013) Video copy-detection and localization with a scalable cascading framework. IEEE MultiMedia 20(3):72–86
Article Google Scholar
Jiang M, Tian Y, Huang T (2012) Video copy detection using a soft cascade of multimodal features. In: Proceedings of the 2012 IEEE international conference on multimedia and expo. IEEE, pp 374–379
Google Scholar
Tian Y, Qian M, Huang T (2015) Tasc: a transformation-aware soft cascading approach for multimodal video copy detection. ACM Trans Inf Syst (TOIS) 33(2):7
Article Google Scholar
Chou CL, Chen HT, Lee SY (2015) Pattern-based near-duplicate video retrieval and localization on web-scale videos. IEEE Trans Multimedia 17(3):382–395
Article Google Scholar
Tian Y, Jiang M, Mou L, Rang X, Huang T (2011) A multimodal video copy detection approach with sequential pyramid matching. In: 2011 18th IEEE international conference on image processing. IEEE, pp 3629–3632
Google Scholar
Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359. http://dx.doi.org/10.1016/j.cviu.2007.09.014
Article Google Scholar
Chen J, Huang T (2008) A robust feature extraction algorithm for audio fingerprinting. In: Pacific-rim conference on multimedia. Springer, pp 887–890
Google Scholar
Yang Y, Tian Y, Huang T (2018) Multiscale video sequence matching for near-duplicate detection and retrieval. Multimed Tools Appl, 1–26
Google Scholar
Kraaij W, Awad G (2011) TRECVID 2011 content-based copy detection: task overview. Online Proceedigs of TRECVid 2010
Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678
Google Scholar
Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris I (2019) FIVR: fine-grained incident video retrieval. IEEE Trans Multimedia
Google Scholar
TREC Video Retrieval Evaluation: TRECVID (2018) https://trecvid.nist.gov/
Yue-Hei Ng J, Yang F, Davis LS (2015) Exploiting local features from deep networks for image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 53–61
Google Scholar
Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3304–3311
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (2012), pp 1097–1105
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR arXiv:abs/1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2015), pp 1–9
Google Scholar
Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. Proc VLDB Endowment 5(7):622–633
Article Google Scholar
Theano Development Team (2016) Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints arXiv:abs/1605.02688
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004, vol 2. IEEE, pp II–II
Google Scholar
Van der Walt S, Schönberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, Gouillart E, Yu T (2014) scikit-image: image processing in python. PeerJ 2:e453
Article Google Scholar

Download references

Author information

Authors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Giorgos Kordopatis-Zilos, Symeon Papadopoulos & Ioannis Kompatsiaris
School of Electronic Engineering and Computer Science, Queen Mary University, London, UK
Giorgos Kordopatis-Zilos & Ioannis Patras

Authors

Giorgos Kordopatis-Zilos
View author publications
You can also search for this author in PubMed Google Scholar
Symeon Papadopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Patras
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Kompatsiaris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giorgos Kordopatis-Zilos .

Editor information

Editors and Affiliations

Centre for Research and Technology Hellas, Information Technologies Institute, Thermi, Thessaloniki, Greece
Vasileios Mezaris
MODUL Technology GmbH, MODUL University Vienna, Vienna, Austria
Lyndon Nixon
Centre for Research and Technology Hellas, Information Technologies Institute, Thermi, Thessaloniki, Greece
Symeon Papadopoulos
Agence France-Presse, Paris, France
Denis Teyssou

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., Kompatsiaris, I. (2019). Finding Near-Duplicate Videos in Large-Scale Collections. In: Mezaris, V., Nixon, L., Papadopoulos, S., Teyssou, D. (eds) Video Verification in the Fake News Era. Springer, Cham. https://doi.org/10.1007/978-3-030-26752-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-26752-0_4
Published: 18 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26751-3
Online ISBN: 978-3-030-26752-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics