Skip to main content

Finding Semantically Related Videos in Closed Collections

  • Chapter
  • First Online:
Video Verification in the Fake News Era

Abstract

Modern newsroom tools offer advanced functionality for automatic and semi-automatic content collection from the web and social media sources to accompany news stories. However, the content collected in this way often tends to be unstructured and may include irrelevant items. An important step in the verification process is to organize this content, both with respect to what it shows, and with respect to its origin. This chapter presents our efforts in this direction, which resulted in two components. One aims to detect semantic concepts in video shots, to help annotation and organization of content collections. We implement a system based on deep learning, featuring a number of advances and adaptations of existing algorithms to increase performance for the task. The other component aims to detect logos in videos in order to identify their provenance. We present our progress from a keypoint-based detection system to a system based on deep learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.nationsencyclopedia.com/WorldStats/CIA-Television-broadcast-stations.html, accessed 08 April 2019.

  2. 2.

    https://github.com/markatopoulou/fvmtl-ccelc.

  3. 3.

    https://github.com/rbgirshick/py-faster-rcnn/.

  4. 4.

    https://github.com/tensorflow/models/tree/master/research/object_detection/.

  5. 5.

    http://press.liacs.nl/mirflickr/.

  6. 6.

    http://opencv.org/.

  7. 7.

    http://www.cs.ubc.ca/research/flann/.

  8. 8.

    https://github.com/rbgirshick/py-faster-rcnn.

  9. 9.

    See footnote 4.

References

  1. Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215–322

    Article  Google Scholar 

  2. Markatopoulou F, Moumtzidou A, Tzelepis C, Avgerinakis K, Gkalelis N, Vrochidis S, Mezaris V, Kompatsiaris I (2013) ITI-CERTH participation to TRECVID 2013. In: TRECVID 2013 workshop, Gaithersburg, MD, USA, vol 1, p 43

    Google Scholar 

  3. Gkalelis N, Mezaris V, Kompatsiaris I (2010) A joint content-event model for event-centric multimedia indexing. In: IEEE international conference on semantic computing (ICSC), pp 79–84

    Google Scholar 

  4. Sidiropoulos P, Mezaris V, Kompatsiaris I, Meinedo H, Bugalho M, Trancoso I (2011) Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans Circuits Syst Video Technol 21(8):1163–1177. https://doi.org/10.1109/TCSVT.2011.2138830

    Article  Google Scholar 

  5. Tzelepis C, Galanopoulos D, Mezaris V, Patras I (2016) Learning to detect video events from zero or very few video examples. Image Vision Comput 53(C):35–44. https://doi.org/10.1016/j.imavis.2015.09.005

    Article  Google Scholar 

  6. Tzelepis C, Ma Z, Mezaris V, Ionescu B, Kompatsiaris I, Boato G, Sebe N, Yan S (2016) Event-based media processing and analysis. Image Vision Comput 53(C), 3–19. https://doi.org/10.1016/j.imavis.2016.05.005

    Article  Google Scholar 

  7. Markatopoulou F, Galanopoulos D, Mezaris V, Patras I (2017) Query and keyframe representations for ad-hoc video search. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, ICMR ’17, pp 407–411. ACM, NY, USA. https://doi.org/10.1145/3078971.3079041, http://doi.acm.org/10.1145/3078971.3079041

  8. Galanopoulos D, Markatopoulou F, Mezaris V, Patras I (2017) Concept language models and event-based concept number selection for zero-example event detection. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, ICMR ’17, pp 397–401. ACM, New York, NY, USA. https://doi.org/10.1145/3078971.3079043, http://doi.acm.org/10.1145/3078971.3079043

  9. Gkalelis N, Mezaris V (2017) Incremental accelerated kernel discriminant analysis. In: Proceedings of the 25th ACM international conference on multimedia, MM ’17, pp 1575–1583. ACM, New York, USA. https://doi.org/10.1145/3123266.3123401, http://doi.acm.org/10.1145/3123266.3123401

  10. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:abs/1409.1556

  11. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

    Google Scholar 

  12. Pittaras N, Markatopoulou F, Mezaris V, Patras I (2017) Comparison of fine-tuning and extension strategies for deep convolutional neural networks. In: Proceedings of the 23rd international conference on MultiMedia modeling (MMM 2017), pp 102–114. Springer, Reykjavik, Iceland

    Google Scholar 

  13. Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV 2011), pp 2564–2571

    Google Scholar 

  14. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  15. Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359. https://doi.org/10.1016/j.cviu.2007.09.014

    Article  Google Scholar 

  16. Van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596

    Article  Google Scholar 

  17. Csurka G, Perronnin F (2011) Fisher vectors: beyond bag-of-visual-words image representations. In: Richard P, Braz J (eds) Computer vision, imaging and computer graphics theory and applications, vol 229. Communications in computer and information science. Springer, Berlin, pp 28–42

    Chapter  Google Scholar 

  18. Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3304–3311. IEEE

    Google Scholar 

  19. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678. ACM

    Google Scholar 

  20. Abadi M, Agarwal A, Barham P et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from https://www.tensorflow.org/

  21. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2015), pp 1–9

    Google Scholar 

  22. Xie S, Girshick R, Dollár P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. arXiv:1611.05431

  23. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2016), pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  24. Safadi B, Derbas N, Hamadi A, Budnik M, Mulhem P, Qu G (2014) LIG at TRECVid 2014 : semantic indexing tion of the semantic indexing. In: TRECVID 2014 workshop, Gaithersburg, MD, USA

    Google Scholar 

  25. Snoek C, Sande K, Fontijne D, Cappallo S, Gemert J, Habibian A, Mensink T, Mettes P, Tao R, Koelma D et al (2014) Mediamill at trecvid 2014: searching concepts, objects, instances and events in video

    Google Scholar 

  26. Snoek CGM, Cappallo S, Fontijne D, Julian D, Koelma DC, Mettes P, van de Sande KEA, Sarah A, Stokman H, Towal RB (2015) Qualcomm research and university of Amsterdam at TRECVID 2015: recognizing concepts, objects, and events in video. In: Proceedings of TRECVID 2015. NIST, USA (2015)

    Google Scholar 

  27. Wei Y, Xia W, Lin M, Huang J, Ni B, Dong J, Zhao Y, Yan S (2016) Hcp: a flexible cnn framework for multi-label image classification. IEEE Trans Pattern Anal Mach Intell 38(9):1901–1907. https://doi.org/10.1109/TPAMI.2015.2491929

    Article  Google Scholar 

  28. Wang X, Zheng WS, Li X, Zhang J (2016) Cross-scenario transfer person reidentification. IEEE Trans Circuits Syst Video Technol 26(8):1447–1460

    Article  Google Scholar 

  29. Bishay M, Patras I (2017) Fusing multilabel deep networks for facial action unit detection. In: Proceedings of the 12th IEEE international conference on automatic face and gesture recognition (FG)

    Google Scholar 

  30. Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. Advances in neural information processing systems (NIPS 2007)

    Google Scholar 

  31. Obozinski G, Taskar B (2006) Multi-task feature selection. In: the 23rd international conference on machine learning (ICML 2006). Workshop of structural knowledge transfer for machine learning. Pittsburgh, Pennsylvania

    Google Scholar 

  32. Mousavi H, Srinivas U, Monga V, Suo Y, Dao M, Tran T (2014) Multi-task image classification via collaborative, hierarchical spike-and-slab priors. In: the IEEE international conference on image processing (ICIP 2014), pp 4236–4240. Paris, France

    Google Scholar 

  33. Evgeniou T, Pontil M (2004) Regularized multi–task learning. In: the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), pp 109–117. Seattle, WA

    Google Scholar 

  34. Daumé III H (2009) Bayesian multitask learning with latent hierarchies. In: Proceedings of the 25th conference on uncertainty in artificial intelligence (UAI 2009), pp 135–142. AUAI Press, Quebec, Canada

    Google Scholar 

  35. Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272

    Article  Google Scholar 

  36. Zhou J, Chen J, Ye J (2011) Clustered multi-task learning via alternating structure optimization. Advances in neural information processing systems (NIPS 2011)

    Google Scholar 

  37. Sun G, Chen Y, Liu X, Wu E (2015) Adaptive multi-task learning for fine-grained categorization. In: Proceedings of the IEEE international conference on image processing (ICIP 2015), pp 996–1000

    Google Scholar 

  38. Markatopoulou F, Mezaris V, Patras I (2016) Online multi-task learning for semantic concept detection in video. In: Proceedings of the IEEE international conference on image processing (ICIP 2016), pp 186–190

    Google Scholar 

  39. Kumar A, Daume H (2012) Learning task grouping and overlap in multi-task learning. In: the 29th ACM international conference on machine learning (ICML 2012), pp 1383–1390. Edinburgh, Scotland

    Google Scholar 

  40. Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning. In: The 13th European conference on computer vision (ECCV 2014). Springer, Zurich, Switzerland, pp 94–108

    Chapter  Google Scholar 

  41. Markatopoulou F, Mezaris V, Patras I (2016) Deep multi-task learning with label correlation constraint for video concept detection. In: Proceedings of the international conference ACM multimedia (ACMMM 2016), pp 501–505. ACM, Amsterdam, The Netherlands

    Google Scholar 

  42. Yang Y, Hospedales TM (2015) A unified perspective on multi-domain and multi-task learning. In: The international conference on learning representations (ICLR 2015), San Diego, California

    Google Scholar 

  43. Smith J, Naphade M, Natsev A (2003) Multimedia semantic indexing using model vectors. In: Proceedings of the international conference on multimedia and expo (ICME 2003), pp 445–448. IEEE, New York. https://doi.org/10.1109/ICME.2003.1221649

  44. Weng MF, Chuang YY (2012) Cross-domain multicue fusion for concept-based video indexing. IEEE Trans Pattern Anal Mach Intell 34(10):1927–1941

    Article  Google Scholar 

  45. Deng J, Ding N, Jia Y, Frome A, Murphy K, Bengio S, Li Y, Neven H, Adam H (2014) Large-scale object classification using label relation graphs, pp 48–64. Springer, Zrich, Switzerland

    Chapter  Google Scholar 

  46. Ding N, Deng J, Murphy KP, Neven H (2015) Probabilistic label relation graphs with ising models. In: Proceedings of the 2015 IEEE international conference on computer vision (ICCV 2015), pp 1161–1169. IEEE, Washington, DC, USA

    Google Scholar 

  47. Markatopoulou F, Mezaris V, Pittaras N, Patras I (2015) Local features and a two-layer stacking architecture for semantic concept detection in video. IEEE Trans Emerg Top Comput 3:193–204

    Article  Google Scholar 

  48. Qi GJ et al (2007) Correlative multi-label video annotation. In: Proceedings of the 15th international conference on multimedia, pp 17–26. ACM, New York

    Google Scholar 

  49. Yang Y, Wu F, Nie F, Shen HT, Zhuang Y, Hauptmann AG (2012) Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Trans Image Process 21(3):1339–1351

    Article  MathSciNet  Google Scholar 

  50. Wang H, Huang H, Ding C (2011) Image annotation using bi-relational graph of images and semantic labels. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2011), pp 793–800

    Google Scholar 

  51. Wang H, Huang H, Ding C (2009) Image annotation using multi-label correlated green’s function. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2009), pp 2029–2034

    Google Scholar 

  52. Zhang ML, Zhang K (2010) Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2010), pp 999–1008. ACM, New York, USA

    Google Scholar 

  53. Lu Y, Zhang W, Zhang K, Xue X (2012) Semantic context learning with large-scale weakly-labeled image set. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 1859–1863. ACM, New York, USA

    Google Scholar 

  54. Baumgartner M (2009) Uncovering deterministic causal structures: a boolean approach. Synthese 170(1):71–96

    Article  MathSciNet  Google Scholar 

  55. Luo Q, Zhang S, Huang T, Gao W, Tian Q (2014) Superimage: packing semantic-relevant images for indexing and retrieval. In: Proceedings of the international conference on multimedia retrieval (ICMR 2014), pp 41–48. ACM, New York, USA

    Google Scholar 

  56. Cai X, Nie F, Cai W, Huang H (2013) New graph structured sparsity model for multi-label image annotations. In: Proceedings of the IEEE international conference on computer vision (ICCV 2013), pp 801–808

    Google Scholar 

  57. Taskar B, Guestrin C, Koller D (2003) Max-margin markov networks. In: Proceedings of the 16th international conference on neural information processing systems (NIPS 2003). MIT Press

    Google Scholar 

  58. Deng J, Satheesh S, Berg AC, Li F (2011) Fast and balanced: efficient label tree learning for large scale object recognition. In: Advances in neural information processing systems, pp 567–575. Curran Associates, Inc

    Google Scholar 

  59. Sucar LE, Bielza C, Morales EF, Hernandez-Leal P, Zaragoza JH, Larra P (2014) Multi-label classification with bayesiannetwork-based chain classifiers. Pattern Recognit Lett 41:14–22

    Article  Google Scholar 

  60. Schwing AG, Urtasun R (2015) Fully connected deep structured networks. arXiv:abs/1503.02351

  61. Deng Z, Vahdat A, Hu H, Mori G (2015) Structure inference machines: recurrent neural networks for analyzing relations in group activity recognition. arXiv:abs/1511.04196

  62. Zheng S, Jayasumana S et al (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the international conference on computer vision (ICCV 2015)

    Google Scholar 

  63. Zhao X, Li X, Zhang Z (2015) Joint structural learning to rank with deep linear feature learning. IEEE Trans Knowl Data Eng 27(10):2756–2769

    Article  Google Scholar 

  64. Jalali A, Sanghavi S, Ruan C, Ravikumar PK (2010) A dirty model for multi-task learning. In: Advances in neural information processing systems, pp 964–972. Curran Associates

    Google Scholar 

  65. Markatopoulou F, Mezaris V, Patras I (2019) Implicit and explicit concept relations in deep neural networks for multi-label video/image annotation. IEEE Trans Circuits Syst Video Technol 29(6):1631–1644. https://doi.org/10.1109/TCSVT.2018.2848458

    Article  Google Scholar 

  66. Fellbaum C (1998) WordNet: an electronic lexical database. Bradford Books

    Google Scholar 

  67. Over P et al (2013) TRECVID 2013 – An overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID 2013. NIST, USA

    Google Scholar 

  68. Russakovsky O, Deng J, Su H et al (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV 2015) 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  69. Yilmaz E, Kanoulas E, Aslam JA (2008) A simple and efficient sampling method for estimating ap and ndcg. In: the 31st ACM international conference on research and development in information retrieval (SIGIR 2008), pp 603–610, Singapore

    Google Scholar 

  70. Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2014), pp 2155–2162. IEEE Computer Society

    Google Scholar 

  71. Dollár P, Appel R, Belongie SJ, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545

    Article  Google Scholar 

  72. Hoi SCH, Wu X, Liu H, Wu Y, Wang H, Xue H, Wu Q (2015) LOGO-net: large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv:abs/1511.02462

  73. Oliveira G, Frazão X, Pimentel A, Ribeiro B (2016) Automatic graphic logo detection via fast region-based convolutional networks. arXiv:abs/1604.06083

  74. Ku D, Cheng J, Gao G (2013) Translucent-static TV logo recognition by SUSAN corner extracting and matching. In: 2013 Third international conference on innovative computing technology (INTECH), pp 44–48. IEEE

    Google Scholar 

  75. Zhang X, Zhang D, Liu F, Zhang Y, Liu Y, Li J (2013) Spatial HOG based TV logo detection. In: Lu K, Mei T, Wu X (eds) International conference on internet multimedia computing and service, ICIMCS ’13, Huangshan, China - 17–19 August 2013, pp 76–81. ACM

    Google Scholar 

  76. Shen L, Wu W, Zheng S (2012) TV logo recognition based on luminance variance. In: IET international conference on information science and control engineering 2012 (ICISCE 2012), pp 1–4. IET

    Google Scholar 

  77. Romberg S, Lienhart R (2013) Bundle min-hashing for logo recognition. In: Proceedings of the 3rd ACM international conference on multimedia retrieval, pp 113–120. ACM

    Google Scholar 

  78. Revaud J, Douze M, Schmid C (2012) Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM international conference on Multimedia, pp 965–968. ACM

    Google Scholar 

  79. Rusinol M, Llados J (2009) Logo spotting by a bag-of-words approach for document categorization. In: 2009 10th international conference on document analysis and recognition, pp 111–115. IEEE

    Google Scholar 

  80. Le VP, Nayef N, Visani M, Ogier JM, De Tran C (2014) Document retrieval based on logo spotting using key-point matching. In: 2014 22nd international conference on pattern recognition, pp 3056–3061. IEEE

    Google Scholar 

  81. Chum O, Zisserman A (2007) An exemplar model for learning object classes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2007), pp 1–8. IEEE

    Google Scholar 

  82. Ferrari V, Fevrier L, Jurie F, Schmid C (2008) Groups of adjacent contour segments for object detection. IEEE Trans Pattern Anal Mach Intell 30(1):36–51

    Article  Google Scholar 

  83. Gu C, Lim JJ, Arbeláez P, Malik J (2009) Recognition using regions. In: IEEE conference on computer vision and pattern recognition (CVPR 2009), pp 1030–1037. IEEE

    Google Scholar 

  84. Bianco S, Buzzelli M, Mazzini D, Schettini R (2015) Logo recognition using CNN features. In: Proceedings of 2015 international conference on image analysis and processing, pp 438–448. Springer

    Google Scholar 

  85. Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision (ICCV 2015), pp 1440–1448

    Google Scholar 

  86. Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, 7–12 December 2015, Montreal, Quebec, Canada, pp 91–99. http://papers.nips.cc/book/advances-in-neural-information-processing-systems-28-2015

  87. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2016), pp 779–788

    Google Scholar 

  88. Su H, Zhu X, Gong S (2017) Deep learning logo detection with data expansion by synthesising context. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp 530–539. IEEE

    Google Scholar 

  89. Ratner AJ, Ehrenberg H, Hussain Z, Dunnmon J, Ré C (2017) Learning to compose domain-specific transformations for data augmentation. In: Advances in neural information processing systems, pp 3236–3246

    Google Scholar 

  90. Mariani G, Scheidegger F, Istrate R, Bekas C, Malossi C (2018) Bagan: Data augmentation with balancing GAN. arXiv:1803.09655

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markos Zampoglou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Markatopoulou, F. et al. (2019). Finding Semantically Related Videos in Closed Collections. In: Mezaris, V., Nixon, L., Papadopoulos, S., Teyssou, D. (eds) Video Verification in the Fake News Era. Springer, Cham. https://doi.org/10.1007/978-3-030-26752-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26752-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26751-3

  • Online ISBN: 978-3-030-26752-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics