Abstract
This chapter discusses the problem of Near-Duplicate Video Retrieval (NDVR). The main objective of a typical NDVR approach is: given a query video, retrieve all near-duplicate videos in a video repository and rank them based on their similarity to the query. Several approaches have been introduced in the literature, which can be roughly classified in three categories based on the level of video matching, i.e., (i) video-level, (ii) frame-level, and (iii) filter-and-refine matching. Two methods based on video-level matching are presented in this chapter. The first is an unsupervised scheme that relies on a modified Bag-of-Words (BoW) video representation. The second is a s upervised method based on Deep Metric Learning (DML). For the development of both methods, features are extracted from the intermediate layers of Convolutional Neural Networks and leveraged as frame descriptors, since they offer a compact and informative image representation, and lead to increased system efficiency. Extensive evaluation has been conducted on publicly available benchmark datasets, and the presented methods are compared with state-of-the-art approaches, achieving the best results in all evaluation setups.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
https://blogs.wsj.com/metropolis/2011/04/28/weather-journal-clouds-gathered-but-no-tornado-damage/ (accessed on March 2019).
- 2.
https://images.google.com/ (accessed on March 2019).
- 3.
https://www.tineye.com/ (accessed on March 2019).
- 4.
https://www.youtube.com/yt/about/press/ (accessed on March 2019).
- 5.
https://github.com/BVLC/caffe/wiki/Model-Zoo (accessed on March 2019).
- 6.
http://spark.apache.org (accessed on March 2019).
- 7.
The features have been learned on the ImageNet dataset, since pretrained networks are utilized. However, ImageNet is a comprehensive dataset, so the learned features can be used in other computer vision tasks (i.e., NDVR) without the need of retraining.
References
Silverman C et al (2014) Verification handbook. European Journalism Center, EJC
Wu X, Hauptmann AG, Ngo CW (2007) Practical elimination of near-duplicates from web video search. In: Proceedings of the 15th ACM international conference on multimedia. ACM, pp 218–227
Revaud J, Douze M, Schmid C, Jégou H (2013) Event retrieval in large video collections with circulant temporal encoding. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2459–2466
Basharat A, Zhai Y, Shah M (2008) Content based video matching using spatiotemporal volumes. Comput Vis Image Underst 110(3):360–377
Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) CNN Features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813
Zheng L, Zhao Y, Wang S, Wang J, Tian Q (2016) Good practice in CNN feature transfer. arXiv preprint arXiv:1604.00133
Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris Y (2017) Near-duplicate video retrieval by aggregating intermediate CNN layers. In: International conference on multimedia modeling. Springer, Berlin, pp 251–263
Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris Y (2017) Near-duplicate video retrieval with deep metric learning. In: 2017 IEEE international conference on computer vision workshop (ICCVW). IEEE, pp 347–356
Liu H, Lu H, Xue X (2013) A segmentation and graph-based video sequence matching method for video copy detection. IEEE Trans Knowl Data Eng 25(8):1706–1718
Liu J, Huang Z, Cai H, Shen HT, Ngo CW, Wang W (2013) Near-duplicate video retrieval: Current research and future trends. ACM Comput Surv (CSUR) 45(4):44
Shen HT, Zhou X, Huang Z, Shao J, Zhou X (2007) UQLIPS: a real-time near-duplicate video clip detection system. In: Proceedings of the 33rd international conference on very large data bases. VLDB Endowment, pp 1374–1377
Cherubini M, De Oliveira R, Oliver N (2009) Understanding near-duplicate videos: a user-centric approach. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 35–44
Law-To J, Joly A, Boujemaa N (2007) Muscle-VCD-2007: a live benchmark for video copy detection
Shang L, Yang L, Wang F, Chan KP, Hua XS (2010) Real-time large scale near-duplicate web video retrieval. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 531–540
Song J, Yang Y, Huang Z, Shen HT, Luo J (2013) Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans Multimedia 15(8):1997–2008
Hao Y, Mu T, Hong R, Wang M, An N, Goulermas JY (2017) Stochastic multiview hashing for large-scale near-duplicate video retrieval. IEEE Trans Multimedia 19(1):1–14
Jing W, Nie X, Cui C, Xi X, Yang G, Yin Y (2018) Global-view hashing: harnessing global relations in near-duplicate video retrieval. World wide web, pp 1–19
Liu, L., Lai, W., Hua, X.S., Yang, S.Q.: Video histogram: a novel video signature for efficient web video duplicate detection. In: International conference on multimedia modeling. Springer, pp 94–103
Huang Z, Shen HT, Shao J, Zhou X, Cui B (2009) Bounded coordinate system indexing for real-time video clip search. ACM Trans Inf Syst (TOIS) 27(3):17
Cai Y, Yang L, Ping W, Wang F, Mei T, Hua XS, Li S (2011) Million-scale near-duplicate video retrieval system. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 837–838
Song J, Yang Y, Huang Z, Shen HT, Hong R (2011) Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 423–432
Hao Y, Mu T, Goulermas JY, Jiang J, Hong R, Wang M (2017) Unsupervised t-distributed video hashing and its deep hashing extension. IEEE Trans Image Process 26(11):5531–5544
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928
Huang J, Kumar SR, Mitra M, Zhu WJ, Zabih R (1997) Image indexing using color correlograms. In: 1997 IEEE computer society conference on computer vision and pattern recognition, 1997. Proceedings. IEEE, pp 762–768
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Ninth IEEE international conference on computer vision, 2003. Proceedings. IEEE, pp 1470–1477
Tan HK, Ngo CW, Hong R, Chua TS (2009) Scalable detection of partial near-duplicate videos by visual-temporal consistency. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 145–154
Douze M, Jégou H, Schmid C (2010) An image-based approach to video copy detection with spatio-temporal post-filtering. IEEE Trans Multimedia 12(4):257–266
Jiang YG, Wang J (2016) Partial copy detection in videos: A benchmark and an evaluation of popular methods. IEEE Trans Big Data 2(1):32–42
Wang L, Bao Y, Li H, Fan X, Luo Z (2017) Compact CNN based video representation for efficient video copy detection. In: International conference on multimedia modeling. Springer, pp 576–587
Huang Z, Shen HT, Shao J, Cui B, Zhou X (2010) Practical online near-duplicate subsequence detection for continuous video streams. IEEE Trans Multimedia 12(5):386–398
Wu Z, Aizawa K (2014) Self-similarity-based partial near-duplicate video retrieval and alignment. Int J Multimed Inf Retr 3(1):1–14
Poullot S, Tsukatani S, Phuong Nguyen A, Jégou H, Satoh S (2015) Temporal matching kernel with explicit feature maps. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 381–390
Baraldi L, Douze M, Cucchiara R, Jégou H (2018) LAMV: learning to align and match videos with kernelized temporal layers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7804–7813
Jiang YG, Jiang Y, Wang J (2014) VCDB: a large-scale database for partial copy detection in videos. In: European conference on computer vision. Springer, pp 357–371
Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis 60(1):63–86
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Heikkilä M, Pietikäinen M, Schmid C (2009) Description of interest regions with local binary patterns. Pattern Recognit 42(3):425–436
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision (ECCV 2008). Springer, pp 304–317
Tian Y, Huang T, Jiang M, Gao W (2013) Video copy-detection and localization with a scalable cascading framework. IEEE MultiMedia 20(3):72–86
Jiang M, Tian Y, Huang T (2012) Video copy detection using a soft cascade of multimodal features. In: Proceedings of the 2012 IEEE international conference on multimedia and expo. IEEE, pp 374–379
Tian Y, Qian M, Huang T (2015) Tasc: a transformation-aware soft cascading approach for multimodal video copy detection. ACM Trans Inf Syst (TOIS) 33(2):7
Chou CL, Chen HT, Lee SY (2015) Pattern-based near-duplicate video retrieval and localization on web-scale videos. IEEE Trans Multimedia 17(3):382–395
Tian Y, Jiang M, Mou L, Rang X, Huang T (2011) A multimodal video copy detection approach with sequential pyramid matching. In: 2011 18th IEEE international conference on image processing. IEEE, pp 3629–3632
Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359. http://dx.doi.org/10.1016/j.cviu.2007.09.014
Chen J, Huang T (2008) A robust feature extraction algorithm for audio fingerprinting. In: Pacific-rim conference on multimedia. Springer, pp 887–890
Yang Y, Tian Y, Huang T (2018) Multiscale video sequence matching for near-duplicate detection and retrieval. Multimed Tools Appl, 1–26
Kraaij W, Awad G (2011) TRECVID 2011 content-based copy detection: task overview. Online Proceedigs of TRECVid 2010
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678
Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris I (2019) FIVR: fine-grained incident video retrieval. IEEE Trans Multimedia
TREC Video Retrieval Evaluation: TRECVID (2018) https://trecvid.nist.gov/
Yue-Hei Ng J, Yang F, Davis LS (2015) Exploiting local features from deep networks for image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 53–61
Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3304–3311
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (2012), pp 1097–1105
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR arXiv:abs/1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2015), pp 1–9
Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. Proc VLDB Endowment 5(7):622–633
Theano Development Team (2016) Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints arXiv:abs/1605.02688
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004, vol 2. IEEE, pp II–II
Van der Walt S, Schönberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, Gouillart E, Yu T (2014) scikit-image: image processing in python. PeerJ 2:e453
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., Kompatsiaris, I. (2019). Finding Near-Duplicate Videos in Large-Scale Collections. In: Mezaris, V., Nixon, L., Papadopoulos, S., Teyssou, D. (eds) Video Verification in the Fake News Era. Springer, Cham. https://doi.org/10.1007/978-3-030-26752-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-26752-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26751-3
Online ISBN: 978-3-030-26752-0
eBook Packages: Computer ScienceComputer Science (R0)