Abstract
Modern newsroom tools offer advanced functionality for automatic and semi-automatic content collection from the web and social media sources to accompany news stories. However, the content collected in this way often tends to be unstructured and may include irrelevant items. An important step in the verification process is to organize this content, both with respect to what it shows, and with respect to its origin. This chapter presents our efforts in this direction, which resulted in two components. One aims to detect semantic concepts in video shots, to help annotation and organization of content collections. We implement a system based on deep learning, featuring a number of advances and adaptations of existing algorithms to increase performance for the task. The other component aims to detect logos in videos in order to identify their provenance. We present our progress from a keypoint-based detection system to a system based on deep learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
https://www.nationsencyclopedia.com/WorldStats/CIA-Television-broadcast-stations.html, accessed 08 April 2019.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
See footnote 4.
References
Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215–322
Markatopoulou F, Moumtzidou A, Tzelepis C, Avgerinakis K, Gkalelis N, Vrochidis S, Mezaris V, Kompatsiaris I (2013) ITI-CERTH participation to TRECVID 2013. In: TRECVID 2013 workshop, Gaithersburg, MD, USA, vol 1, p 43
Gkalelis N, Mezaris V, Kompatsiaris I (2010) A joint content-event model for event-centric multimedia indexing. In: IEEE international conference on semantic computing (ICSC), pp 79–84
Sidiropoulos P, Mezaris V, Kompatsiaris I, Meinedo H, Bugalho M, Trancoso I (2011) Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans Circuits Syst Video Technol 21(8):1163–1177. https://doi.org/10.1109/TCSVT.2011.2138830
Tzelepis C, Galanopoulos D, Mezaris V, Patras I (2016) Learning to detect video events from zero or very few video examples. Image Vision Comput 53(C):35–44. https://doi.org/10.1016/j.imavis.2015.09.005
Tzelepis C, Ma Z, Mezaris V, Ionescu B, Kompatsiaris I, Boato G, Sebe N, Yan S (2016) Event-based media processing and analysis. Image Vision Comput 53(C), 3–19. https://doi.org/10.1016/j.imavis.2016.05.005
Markatopoulou F, Galanopoulos D, Mezaris V, Patras I (2017) Query and keyframe representations for ad-hoc video search. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, ICMR ’17, pp 407–411. ACM, NY, USA. https://doi.org/10.1145/3078971.3079041, http://doi.acm.org/10.1145/3078971.3079041
Galanopoulos D, Markatopoulou F, Mezaris V, Patras I (2017) Concept language models and event-based concept number selection for zero-example event detection. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, ICMR ’17, pp 397–401. ACM, New York, NY, USA. https://doi.org/10.1145/3078971.3079043, http://doi.acm.org/10.1145/3078971.3079043
Gkalelis N, Mezaris V (2017) Incremental accelerated kernel discriminant analysis. In: Proceedings of the 25th ACM international conference on multimedia, MM ’17, pp 1575–1583. ACM, New York, USA. https://doi.org/10.1145/3123266.3123401, http://doi.acm.org/10.1145/3123266.3123401
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:abs/1409.1556
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Pittaras N, Markatopoulou F, Mezaris V, Patras I (2017) Comparison of fine-tuning and extension strategies for deep convolutional neural networks. In: Proceedings of the 23rd international conference on MultiMedia modeling (MMM 2017), pp 102–114. Springer, Reykjavik, Iceland
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV 2011), pp 2564–2571
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359. https://doi.org/10.1016/j.cviu.2007.09.014
Van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596
Csurka G, Perronnin F (2011) Fisher vectors: beyond bag-of-visual-words image representations. In: Richard P, Braz J (eds) Computer vision, imaging and computer graphics theory and applications, vol 229. Communications in computer and information science. Springer, Berlin, pp 28–42
Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3304–3311. IEEE
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678. ACM
Abadi M, Agarwal A, Barham P et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from https://www.tensorflow.org/
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2015), pp 1–9
Xie S, Girshick R, Dollár P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. arXiv:1611.05431
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2016), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Safadi B, Derbas N, Hamadi A, Budnik M, Mulhem P, Qu G (2014) LIG at TRECVid 2014 : semantic indexing tion of the semantic indexing. In: TRECVID 2014 workshop, Gaithersburg, MD, USA
Snoek C, Sande K, Fontijne D, Cappallo S, Gemert J, Habibian A, Mensink T, Mettes P, Tao R, Koelma D et al (2014) Mediamill at trecvid 2014: searching concepts, objects, instances and events in video
Snoek CGM, Cappallo S, Fontijne D, Julian D, Koelma DC, Mettes P, van de Sande KEA, Sarah A, Stokman H, Towal RB (2015) Qualcomm research and university of Amsterdam at TRECVID 2015: recognizing concepts, objects, and events in video. In: Proceedings of TRECVID 2015. NIST, USA (2015)
Wei Y, Xia W, Lin M, Huang J, Ni B, Dong J, Zhao Y, Yan S (2016) Hcp: a flexible cnn framework for multi-label image classification. IEEE Trans Pattern Anal Mach Intell 38(9):1901–1907. https://doi.org/10.1109/TPAMI.2015.2491929
Wang X, Zheng WS, Li X, Zhang J (2016) Cross-scenario transfer person reidentification. IEEE Trans Circuits Syst Video Technol 26(8):1447–1460
Bishay M, Patras I (2017) Fusing multilabel deep networks for facial action unit detection. In: Proceedings of the 12th IEEE international conference on automatic face and gesture recognition (FG)
Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. Advances in neural information processing systems (NIPS 2007)
Obozinski G, Taskar B (2006) Multi-task feature selection. In: the 23rd international conference on machine learning (ICML 2006). Workshop of structural knowledge transfer for machine learning. Pittsburgh, Pennsylvania
Mousavi H, Srinivas U, Monga V, Suo Y, Dao M, Tran T (2014) Multi-task image classification via collaborative, hierarchical spike-and-slab priors. In: the IEEE international conference on image processing (ICIP 2014), pp 4236–4240. Paris, France
Evgeniou T, Pontil M (2004) Regularized multi–task learning. In: the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), pp 109–117. Seattle, WA
Daumé III H (2009) Bayesian multitask learning with latent hierarchies. In: Proceedings of the 25th conference on uncertainty in artificial intelligence (UAI 2009), pp 135–142. AUAI Press, Quebec, Canada
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272
Zhou J, Chen J, Ye J (2011) Clustered multi-task learning via alternating structure optimization. Advances in neural information processing systems (NIPS 2011)
Sun G, Chen Y, Liu X, Wu E (2015) Adaptive multi-task learning for fine-grained categorization. In: Proceedings of the IEEE international conference on image processing (ICIP 2015), pp 996–1000
Markatopoulou F, Mezaris V, Patras I (2016) Online multi-task learning for semantic concept detection in video. In: Proceedings of the IEEE international conference on image processing (ICIP 2016), pp 186–190
Kumar A, Daume H (2012) Learning task grouping and overlap in multi-task learning. In: the 29th ACM international conference on machine learning (ICML 2012), pp 1383–1390. Edinburgh, Scotland
Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning. In: The 13th European conference on computer vision (ECCV 2014). Springer, Zurich, Switzerland, pp 94–108
Markatopoulou F, Mezaris V, Patras I (2016) Deep multi-task learning with label correlation constraint for video concept detection. In: Proceedings of the international conference ACM multimedia (ACMMM 2016), pp 501–505. ACM, Amsterdam, The Netherlands
Yang Y, Hospedales TM (2015) A unified perspective on multi-domain and multi-task learning. In: The international conference on learning representations (ICLR 2015), San Diego, California
Smith J, Naphade M, Natsev A (2003) Multimedia semantic indexing using model vectors. In: Proceedings of the international conference on multimedia and expo (ICME 2003), pp 445–448. IEEE, New York. https://doi.org/10.1109/ICME.2003.1221649
Weng MF, Chuang YY (2012) Cross-domain multicue fusion for concept-based video indexing. IEEE Trans Pattern Anal Mach Intell 34(10):1927–1941
Deng J, Ding N, Jia Y, Frome A, Murphy K, Bengio S, Li Y, Neven H, Adam H (2014) Large-scale object classification using label relation graphs, pp 48–64. Springer, Zrich, Switzerland
Ding N, Deng J, Murphy KP, Neven H (2015) Probabilistic label relation graphs with ising models. In: Proceedings of the 2015 IEEE international conference on computer vision (ICCV 2015), pp 1161–1169. IEEE, Washington, DC, USA
Markatopoulou F, Mezaris V, Pittaras N, Patras I (2015) Local features and a two-layer stacking architecture for semantic concept detection in video. IEEE Trans Emerg Top Comput 3:193–204
Qi GJ et al (2007) Correlative multi-label video annotation. In: Proceedings of the 15th international conference on multimedia, pp 17–26. ACM, New York
Yang Y, Wu F, Nie F, Shen HT, Zhuang Y, Hauptmann AG (2012) Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Trans Image Process 21(3):1339–1351
Wang H, Huang H, Ding C (2011) Image annotation using bi-relational graph of images and semantic labels. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2011), pp 793–800
Wang H, Huang H, Ding C (2009) Image annotation using multi-label correlated green’s function. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2009), pp 2029–2034
Zhang ML, Zhang K (2010) Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2010), pp 999–1008. ACM, New York, USA
Lu Y, Zhang W, Zhang K, Xue X (2012) Semantic context learning with large-scale weakly-labeled image set. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 1859–1863. ACM, New York, USA
Baumgartner M (2009) Uncovering deterministic causal structures: a boolean approach. Synthese 170(1):71–96
Luo Q, Zhang S, Huang T, Gao W, Tian Q (2014) Superimage: packing semantic-relevant images for indexing and retrieval. In: Proceedings of the international conference on multimedia retrieval (ICMR 2014), pp 41–48. ACM, New York, USA
Cai X, Nie F, Cai W, Huang H (2013) New graph structured sparsity model for multi-label image annotations. In: Proceedings of the IEEE international conference on computer vision (ICCV 2013), pp 801–808
Taskar B, Guestrin C, Koller D (2003) Max-margin markov networks. In: Proceedings of the 16th international conference on neural information processing systems (NIPS 2003). MIT Press
Deng J, Satheesh S, Berg AC, Li F (2011) Fast and balanced: efficient label tree learning for large scale object recognition. In: Advances in neural information processing systems, pp 567–575. Curran Associates, Inc
Sucar LE, Bielza C, Morales EF, Hernandez-Leal P, Zaragoza JH, Larra P (2014) Multi-label classification with bayesiannetwork-based chain classifiers. Pattern Recognit Lett 41:14–22
Schwing AG, Urtasun R (2015) Fully connected deep structured networks. arXiv:abs/1503.02351
Deng Z, Vahdat A, Hu H, Mori G (2015) Structure inference machines: recurrent neural networks for analyzing relations in group activity recognition. arXiv:abs/1511.04196
Zheng S, Jayasumana S et al (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the international conference on computer vision (ICCV 2015)
Zhao X, Li X, Zhang Z (2015) Joint structural learning to rank with deep linear feature learning. IEEE Trans Knowl Data Eng 27(10):2756–2769
Jalali A, Sanghavi S, Ruan C, Ravikumar PK (2010) A dirty model for multi-task learning. In: Advances in neural information processing systems, pp 964–972. Curran Associates
Markatopoulou F, Mezaris V, Patras I (2019) Implicit and explicit concept relations in deep neural networks for multi-label video/image annotation. IEEE Trans Circuits Syst Video Technol 29(6):1631–1644. https://doi.org/10.1109/TCSVT.2018.2848458
Fellbaum C (1998) WordNet: an electronic lexical database. Bradford Books
Over P et al (2013) TRECVID 2013 – An overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID 2013. NIST, USA
Russakovsky O, Deng J, Su H et al (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV 2015) 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Yilmaz E, Kanoulas E, Aslam JA (2008) A simple and efficient sampling method for estimating ap and ndcg. In: the 31st ACM international conference on research and development in information retrieval (SIGIR 2008), pp 603–610, Singapore
Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2014), pp 2155–2162. IEEE Computer Society
Dollár P, Appel R, Belongie SJ, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Hoi SCH, Wu X, Liu H, Wu Y, Wang H, Xue H, Wu Q (2015) LOGO-net: large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv:abs/1511.02462
Oliveira G, Frazão X, Pimentel A, Ribeiro B (2016) Automatic graphic logo detection via fast region-based convolutional networks. arXiv:abs/1604.06083
Ku D, Cheng J, Gao G (2013) Translucent-static TV logo recognition by SUSAN corner extracting and matching. In: 2013 Third international conference on innovative computing technology (INTECH), pp 44–48. IEEE
Zhang X, Zhang D, Liu F, Zhang Y, Liu Y, Li J (2013) Spatial HOG based TV logo detection. In: Lu K, Mei T, Wu X (eds) International conference on internet multimedia computing and service, ICIMCS ’13, Huangshan, China - 17–19 August 2013, pp 76–81. ACM
Shen L, Wu W, Zheng S (2012) TV logo recognition based on luminance variance. In: IET international conference on information science and control engineering 2012 (ICISCE 2012), pp 1–4. IET
Romberg S, Lienhart R (2013) Bundle min-hashing for logo recognition. In: Proceedings of the 3rd ACM international conference on multimedia retrieval, pp 113–120. ACM
Revaud J, Douze M, Schmid C (2012) Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM international conference on Multimedia, pp 965–968. ACM
Rusinol M, Llados J (2009) Logo spotting by a bag-of-words approach for document categorization. In: 2009 10th international conference on document analysis and recognition, pp 111–115. IEEE
Le VP, Nayef N, Visani M, Ogier JM, De Tran C (2014) Document retrieval based on logo spotting using key-point matching. In: 2014 22nd international conference on pattern recognition, pp 3056–3061. IEEE
Chum O, Zisserman A (2007) An exemplar model for learning object classes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2007), pp 1–8. IEEE
Ferrari V, Fevrier L, Jurie F, Schmid C (2008) Groups of adjacent contour segments for object detection. IEEE Trans Pattern Anal Mach Intell 30(1):36–51
Gu C, Lim JJ, Arbeláez P, Malik J (2009) Recognition using regions. In: IEEE conference on computer vision and pattern recognition (CVPR 2009), pp 1030–1037. IEEE
Bianco S, Buzzelli M, Mazzini D, Schettini R (2015) Logo recognition using CNN features. In: Proceedings of 2015 international conference on image analysis and processing, pp 438–448. Springer
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision (ICCV 2015), pp 1440–1448
Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, 7–12 December 2015, Montreal, Quebec, Canada, pp 91–99. http://papers.nips.cc/book/advances-in-neural-information-processing-systems-28-2015
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2016), pp 779–788
Su H, Zhu X, Gong S (2017) Deep learning logo detection with data expansion by synthesising context. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp 530–539. IEEE
Ratner AJ, Ehrenberg H, Hussain Z, Dunnmon J, Ré C (2017) Learning to compose domain-specific transformations for data augmentation. In: Advances in neural information processing systems, pp 3236–3246
Mariani G, Scheidegger F, Istrate R, Bekas C, Malossi C (2018) Bagan: Data augmentation with balancing GAN. arXiv:1803.09655
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Markatopoulou, F. et al. (2019). Finding Semantically Related Videos in Closed Collections. In: Mezaris, V., Nixon, L., Papadopoulos, S., Teyssou, D. (eds) Video Verification in the Fake News Era. Springer, Cham. https://doi.org/10.1007/978-3-030-26752-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-26752-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26751-3
Online ISBN: 978-3-030-26752-0
eBook Packages: Computer ScienceComputer Science (R0)