Finding Semantically Related Videos in Closed Collections

Markatopoulou, Foteini; Zampoglou, Markos; Apostolidis, Evlampios; Papadopoulos, Symeon; Mezaris, Vasileios; Patras, Ioannis; Kompatsiaris, Ioannis

doi:10.1007/978-3-030-26752-0_5

Foteini Markatopoulou⁵,
Markos Zampoglou⁵,
Evlampios Apostolidis^5,6,
Symeon Papadopoulos⁵,
Vasileios Mezaris⁵,
Ioannis Patras⁶ &
…
Ioannis Kompatsiaris⁵

869 Accesses

Abstract

Modern newsroom tools offer advanced functionality for automatic and semi-automatic content collection from the web and social media sources to accompany news stories. However, the content collected in this way often tends to be unstructured and may include irrelevant items. An important step in the verification process is to organize this content, both with respect to what it shows, and with respect to its origin. This chapter presents our efforts in this direction, which resulted in two components. One aims to detect semantic concepts in video shots, to help annotation and organization of content collections. We implement a system based on deep learning, featuring a number of advances and adaptations of existing algorithms to increase performance for the task. The other component aims to detect logos in videos in order to identify their provenance. We present our progress from a keypoint-based detection system to a system based on deep learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.nationsencyclopedia.com/WorldStats/CIA-Television-broadcast-stations.html, accessed 08 April 2019.
2.
https://github.com/markatopoulou/fvmtl-ccelc.
3.
https://github.com/rbgirshick/py-faster-rcnn/.
4.
https://github.com/tensorflow/models/tree/master/research/object_detection/.
5.
http://press.liacs.nl/mirflickr/.
6.
http://opencv.org/.
7.
http://www.cs.ubc.ca/research/flann/.
8.
https://github.com/rbgirshick/py-faster-rcnn.
9.
See footnote 4.

References

Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215–322
Article Google Scholar
Markatopoulou F, Moumtzidou A, Tzelepis C, Avgerinakis K, Gkalelis N, Vrochidis S, Mezaris V, Kompatsiaris I (2013) ITI-CERTH participation to TRECVID 2013. In: TRECVID 2013 workshop, Gaithersburg, MD, USA, vol 1, p 43
Google Scholar
Gkalelis N, Mezaris V, Kompatsiaris I (2010) A joint content-event model for event-centric multimedia indexing. In: IEEE international conference on semantic computing (ICSC), pp 79–84
Google Scholar
Sidiropoulos P, Mezaris V, Kompatsiaris I, Meinedo H, Bugalho M, Trancoso I (2011) Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans Circuits Syst Video Technol 21(8):1163–1177. https://doi.org/10.1109/TCSVT.2011.2138830
Article Google Scholar
Tzelepis C, Galanopoulos D, Mezaris V, Patras I (2016) Learning to detect video events from zero or very few video examples. Image Vision Comput 53(C):35–44. https://doi.org/10.1016/j.imavis.2015.09.005
Article Google Scholar
Tzelepis C, Ma Z, Mezaris V, Ionescu B, Kompatsiaris I, Boato G, Sebe N, Yan S (2016) Event-based media processing and analysis. Image Vision Comput 53(C), 3–19. https://doi.org/10.1016/j.imavis.2016.05.005
Article Google Scholar
Markatopoulou F, Galanopoulos D, Mezaris V, Patras I (2017) Query and keyframe representations for ad-hoc video search. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, ICMR ’17, pp 407–411. ACM, NY, USA. https://doi.org/10.1145/3078971.3079041, http://doi.acm.org/10.1145/3078971.3079041
Galanopoulos D, Markatopoulou F, Mezaris V, Patras I (2017) Concept language models and event-based concept number selection for zero-example event detection. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, ICMR ’17, pp 397–401. ACM, New York, NY, USA. https://doi.org/10.1145/3078971.3079043, http://doi.acm.org/10.1145/3078971.3079043
Gkalelis N, Mezaris V (2017) Incremental accelerated kernel discriminant analysis. In: Proceedings of the 25th ACM international conference on multimedia, MM ’17, pp 1575–1583. ACM, New York, USA. https://doi.org/10.1145/3123266.3123401, http://doi.acm.org/10.1145/3123266.3123401
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:abs/1409.1556
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Google Scholar
Pittaras N, Markatopoulou F, Mezaris V, Patras I (2017) Comparison of fine-tuning and extension strategies for deep convolutional neural networks. In: Proceedings of the 23rd international conference on MultiMedia modeling (MMM 2017), pp 102–114. Springer, Reykjavik, Iceland
Google Scholar
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV 2011), pp 2564–2571
Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359. https://doi.org/10.1016/j.cviu.2007.09.014
Article Google Scholar
Van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596
Article Google Scholar
Csurka G, Perronnin F (2011) Fisher vectors: beyond bag-of-visual-words image representations. In: Richard P, Braz J (eds) Computer vision, imaging and computer graphics theory and applications, vol 229. Communications in computer and information science. Springer, Berlin, pp 28–42
Chapter Google Scholar
Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3304–3311. IEEE
Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678. ACM
Google Scholar
Abadi M, Agarwal A, Barham P et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from https://www.tensorflow.org/
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2015), pp 1–9
Google Scholar
Xie S, Girshick R, Dollár P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. arXiv:1611.05431
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2016), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Safadi B, Derbas N, Hamadi A, Budnik M, Mulhem P, Qu G (2014) LIG at TRECVid 2014 : semantic indexing tion of the semantic indexing. In: TRECVID 2014 workshop, Gaithersburg, MD, USA
Google Scholar
Snoek C, Sande K, Fontijne D, Cappallo S, Gemert J, Habibian A, Mensink T, Mettes P, Tao R, Koelma D et al (2014) Mediamill at trecvid 2014: searching concepts, objects, instances and events in video
Google Scholar
Snoek CGM, Cappallo S, Fontijne D, Julian D, Koelma DC, Mettes P, van de Sande KEA, Sarah A, Stokman H, Towal RB (2015) Qualcomm research and university of Amsterdam at TRECVID 2015: recognizing concepts, objects, and events in video. In: Proceedings of TRECVID 2015. NIST, USA (2015)
Google Scholar
Wei Y, Xia W, Lin M, Huang J, Ni B, Dong J, Zhao Y, Yan S (2016) Hcp: a flexible cnn framework for multi-label image classification. IEEE Trans Pattern Anal Mach Intell 38(9):1901–1907. https://doi.org/10.1109/TPAMI.2015.2491929
Article Google Scholar
Wang X, Zheng WS, Li X, Zhang J (2016) Cross-scenario transfer person reidentification. IEEE Trans Circuits Syst Video Technol 26(8):1447–1460
Article Google Scholar
Bishay M, Patras I (2017) Fusing multilabel deep networks for facial action unit detection. In: Proceedings of the 12th IEEE international conference on automatic face and gesture recognition (FG)
Google Scholar
Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. Advances in neural information processing systems (NIPS 2007)
Google Scholar
Obozinski G, Taskar B (2006) Multi-task feature selection. In: the 23rd international conference on machine learning (ICML 2006). Workshop of structural knowledge transfer for machine learning. Pittsburgh, Pennsylvania
Google Scholar
Mousavi H, Srinivas U, Monga V, Suo Y, Dao M, Tran T (2014) Multi-task image classification via collaborative, hierarchical spike-and-slab priors. In: the IEEE international conference on image processing (ICIP 2014), pp 4236–4240. Paris, France
Google Scholar
Evgeniou T, Pontil M (2004) Regularized multi–task learning. In: the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), pp 109–117. Seattle, WA
Google Scholar
Daumé III H (2009) Bayesian multitask learning with latent hierarchies. In: Proceedings of the 25th conference on uncertainty in artificial intelligence (UAI 2009), pp 135–142. AUAI Press, Quebec, Canada
Google Scholar
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272
Article Google Scholar
Zhou J, Chen J, Ye J (2011) Clustered multi-task learning via alternating structure optimization. Advances in neural information processing systems (NIPS 2011)
Google Scholar
Sun G, Chen Y, Liu X, Wu E (2015) Adaptive multi-task learning for fine-grained categorization. In: Proceedings of the IEEE international conference on image processing (ICIP 2015), pp 996–1000
Google Scholar
Markatopoulou F, Mezaris V, Patras I (2016) Online multi-task learning for semantic concept detection in video. In: Proceedings of the IEEE international conference on image processing (ICIP 2016), pp 186–190
Google Scholar
Kumar A, Daume H (2012) Learning task grouping and overlap in multi-task learning. In: the 29th ACM international conference on machine learning (ICML 2012), pp 1383–1390. Edinburgh, Scotland
Google Scholar
Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning. In: The 13th European conference on computer vision (ECCV 2014). Springer, Zurich, Switzerland, pp 94–108
Chapter Google Scholar
Markatopoulou F, Mezaris V, Patras I (2016) Deep multi-task learning with label correlation constraint for video concept detection. In: Proceedings of the international conference ACM multimedia (ACMMM 2016), pp 501–505. ACM, Amsterdam, The Netherlands
Google Scholar
Yang Y, Hospedales TM (2015) A unified perspective on multi-domain and multi-task learning. In: The international conference on learning representations (ICLR 2015), San Diego, California
Google Scholar
Smith J, Naphade M, Natsev A (2003) Multimedia semantic indexing using model vectors. In: Proceedings of the international conference on multimedia and expo (ICME 2003), pp 445–448. IEEE, New York. https://doi.org/10.1109/ICME.2003.1221649
Weng MF, Chuang YY (2012) Cross-domain multicue fusion for concept-based video indexing. IEEE Trans Pattern Anal Mach Intell 34(10):1927–1941
Article Google Scholar
Deng J, Ding N, Jia Y, Frome A, Murphy K, Bengio S, Li Y, Neven H, Adam H (2014) Large-scale object classification using label relation graphs, pp 48–64. Springer, Zrich, Switzerland
Chapter Google Scholar
Ding N, Deng J, Murphy KP, Neven H (2015) Probabilistic label relation graphs with ising models. In: Proceedings of the 2015 IEEE international conference on computer vision (ICCV 2015), pp 1161–1169. IEEE, Washington, DC, USA
Google Scholar
Markatopoulou F, Mezaris V, Pittaras N, Patras I (2015) Local features and a two-layer stacking architecture for semantic concept detection in video. IEEE Trans Emerg Top Comput 3:193–204
Article Google Scholar
Qi GJ et al (2007) Correlative multi-label video annotation. In: Proceedings of the 15th international conference on multimedia, pp 17–26. ACM, New York
Google Scholar
Yang Y, Wu F, Nie F, Shen HT, Zhuang Y, Hauptmann AG (2012) Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Trans Image Process 21(3):1339–1351
Article MathSciNet Google Scholar
Wang H, Huang H, Ding C (2011) Image annotation using bi-relational graph of images and semantic labels. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2011), pp 793–800
Google Scholar
Wang H, Huang H, Ding C (2009) Image annotation using multi-label correlated green’s function. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2009), pp 2029–2034
Google Scholar
Zhang ML, Zhang K (2010) Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2010), pp 999–1008. ACM, New York, USA
Google Scholar
Lu Y, Zhang W, Zhang K, Xue X (2012) Semantic context learning with large-scale weakly-labeled image set. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 1859–1863. ACM, New York, USA
Google Scholar
Baumgartner M (2009) Uncovering deterministic causal structures: a boolean approach. Synthese 170(1):71–96
Article MathSciNet Google Scholar
Luo Q, Zhang S, Huang T, Gao W, Tian Q (2014) Superimage: packing semantic-relevant images for indexing and retrieval. In: Proceedings of the international conference on multimedia retrieval (ICMR 2014), pp 41–48. ACM, New York, USA
Google Scholar
Cai X, Nie F, Cai W, Huang H (2013) New graph structured sparsity model for multi-label image annotations. In: Proceedings of the IEEE international conference on computer vision (ICCV 2013), pp 801–808
Google Scholar
Taskar B, Guestrin C, Koller D (2003) Max-margin markov networks. In: Proceedings of the 16th international conference on neural information processing systems (NIPS 2003). MIT Press
Google Scholar
Deng J, Satheesh S, Berg AC, Li F (2011) Fast and balanced: efficient label tree learning for large scale object recognition. In: Advances in neural information processing systems, pp 567–575. Curran Associates, Inc
Google Scholar
Sucar LE, Bielza C, Morales EF, Hernandez-Leal P, Zaragoza JH, Larra P (2014) Multi-label classification with bayesiannetwork-based chain classifiers. Pattern Recognit Lett 41:14–22
Article Google Scholar
Schwing AG, Urtasun R (2015) Fully connected deep structured networks. arXiv:abs/1503.02351
Deng Z, Vahdat A, Hu H, Mori G (2015) Structure inference machines: recurrent neural networks for analyzing relations in group activity recognition. arXiv:abs/1511.04196
Zheng S, Jayasumana S et al (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the international conference on computer vision (ICCV 2015)
Google Scholar
Zhao X, Li X, Zhang Z (2015) Joint structural learning to rank with deep linear feature learning. IEEE Trans Knowl Data Eng 27(10):2756–2769
Article Google Scholar
Jalali A, Sanghavi S, Ruan C, Ravikumar PK (2010) A dirty model for multi-task learning. In: Advances in neural information processing systems, pp 964–972. Curran Associates
Google Scholar
Markatopoulou F, Mezaris V, Patras I (2019) Implicit and explicit concept relations in deep neural networks for multi-label video/image annotation. IEEE Trans Circuits Syst Video Technol 29(6):1631–1644. https://doi.org/10.1109/TCSVT.2018.2848458
Article Google Scholar
Fellbaum C (1998) WordNet: an electronic lexical database. Bradford Books
Google Scholar
Over P et al (2013) TRECVID 2013 – An overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID 2013. NIST, USA
Google Scholar
Russakovsky O, Deng J, Su H et al (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV 2015) 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Yilmaz E, Kanoulas E, Aslam JA (2008) A simple and efficient sampling method for estimating ap and ndcg. In: the 31st ACM international conference on research and development in information retrieval (SIGIR 2008), pp 603–610, Singapore
Google Scholar
Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2014), pp 2155–2162. IEEE Computer Society
Google Scholar
Dollár P, Appel R, Belongie SJ, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Article Google Scholar
Hoi SCH, Wu X, Liu H, Wu Y, Wang H, Xue H, Wu Q (2015) LOGO-net: large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv:abs/1511.02462
Oliveira G, Frazão X, Pimentel A, Ribeiro B (2016) Automatic graphic logo detection via fast region-based convolutional networks. arXiv:abs/1604.06083
Ku D, Cheng J, Gao G (2013) Translucent-static TV logo recognition by SUSAN corner extracting and matching. In: 2013 Third international conference on innovative computing technology (INTECH), pp 44–48. IEEE
Google Scholar
Zhang X, Zhang D, Liu F, Zhang Y, Liu Y, Li J (2013) Spatial HOG based TV logo detection. In: Lu K, Mei T, Wu X (eds) International conference on internet multimedia computing and service, ICIMCS ’13, Huangshan, China - 17–19 August 2013, pp 76–81. ACM
Google Scholar
Shen L, Wu W, Zheng S (2012) TV logo recognition based on luminance variance. In: IET international conference on information science and control engineering 2012 (ICISCE 2012), pp 1–4. IET
Google Scholar
Romberg S, Lienhart R (2013) Bundle min-hashing for logo recognition. In: Proceedings of the 3rd ACM international conference on multimedia retrieval, pp 113–120. ACM
Google Scholar
Revaud J, Douze M, Schmid C (2012) Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM international conference on Multimedia, pp 965–968. ACM
Google Scholar
Rusinol M, Llados J (2009) Logo spotting by a bag-of-words approach for document categorization. In: 2009 10th international conference on document analysis and recognition, pp 111–115. IEEE
Google Scholar
Le VP, Nayef N, Visani M, Ogier JM, De Tran C (2014) Document retrieval based on logo spotting using key-point matching. In: 2014 22nd international conference on pattern recognition, pp 3056–3061. IEEE
Google Scholar
Chum O, Zisserman A (2007) An exemplar model for learning object classes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2007), pp 1–8. IEEE
Google Scholar
Ferrari V, Fevrier L, Jurie F, Schmid C (2008) Groups of adjacent contour segments for object detection. IEEE Trans Pattern Anal Mach Intell 30(1):36–51
Article Google Scholar
Gu C, Lim JJ, Arbeláez P, Malik J (2009) Recognition using regions. In: IEEE conference on computer vision and pattern recognition (CVPR 2009), pp 1030–1037. IEEE
Google Scholar
Bianco S, Buzzelli M, Mazzini D, Schettini R (2015) Logo recognition using CNN features. In: Proceedings of 2015 international conference on image analysis and processing, pp 438–448. Springer
Google Scholar
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision (ICCV 2015), pp 1440–1448
Google Scholar
Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, 7–12 December 2015, Montreal, Quebec, Canada, pp 91–99. http://papers.nips.cc/book/advances-in-neural-information-processing-systems-28-2015
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2016), pp 779–788
Google Scholar
Su H, Zhu X, Gong S (2017) Deep learning logo detection with data expansion by synthesising context. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp 530–539. IEEE
Google Scholar
Ratner AJ, Ehrenberg H, Hussain Z, Dunnmon J, Ré C (2017) Learning to compose domain-specific transformations for data augmentation. In: Advances in neural information processing systems, pp 3236–3246
Google Scholar
Mariani G, Scheidegger F, Istrate R, Bekas C, Malossi C (2018) Bagan: Data augmentation with balancing GAN. arXiv:1803.09655

Download references

Author information

Authors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Foteini Markatopoulou, Markos Zampoglou, Evlampios Apostolidis, Symeon Papadopoulos, Vasileios Mezaris & Ioannis Kompatsiaris
School of Electronic Engineering and Computer Science, Queen Mary University, London, UK
Evlampios Apostolidis & Ioannis Patras

Authors

Foteini Markatopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Markos Zampoglou
View author publications
You can also search for this author in PubMed Google Scholar
Evlampios Apostolidis
View author publications
You can also search for this author in PubMed Google Scholar
Symeon Papadopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Vasileios Mezaris
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Patras
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Kompatsiaris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markos Zampoglou .

Editor information

Editors and Affiliations

Centre for Research and Technology Hellas, Information Technologies Institute, Thermi, Thessaloniki, Greece
Vasileios Mezaris
MODUL Technology GmbH, MODUL University Vienna, Vienna, Austria
Lyndon Nixon
Centre for Research and Technology Hellas, Information Technologies Institute, Thermi, Thessaloniki, Greece
Symeon Papadopoulos
Agence France-Presse, Paris, France
Denis Teyssou

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Markatopoulou, F. et al. (2019). Finding Semantically Related Videos in Closed Collections. In: Mezaris, V., Nixon, L., Papadopoulos, S., Teyssou, D. (eds) Video Verification in the Fake News Era. Springer, Cham. https://doi.org/10.1007/978-3-030-26752-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-26752-0_5
Published: 18 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26751-3
Online ISBN: 978-3-030-26752-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics