Advertisement

Finding Semantically Related Videos in Closed Collections

  • Foteini Markatopoulou
  • Markos ZampoglouEmail author
  • Evlampios Apostolidis
  • Symeon Papadopoulos
  • Vasileios Mezaris
  • Ioannis Patras
  • Ioannis Kompatsiaris
Chapter

Abstract

Modern newsroom tools offer advanced functionality for automatic and semi-automatic content collection from the web and social media sources to accompany news stories. However, the content collected in this way often tends to be unstructured and may include irrelevant items. An important step in the verification process is to organize this content, both with respect to what it shows, and with respect to its origin. This chapter presents our efforts in this direction, which resulted in two components. One aims to detect semantic concepts in video shots, to help annotation and organization of content collections. We implement a system based on deep learning, featuring a number of advances and adaptations of existing algorithms to increase performance for the task. The other component aims to detect logos in videos in order to identify their provenance. We present our progress from a keypoint-based detection system to a system based on deep learning.

References

  1. 1.
    Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215–322CrossRefGoogle Scholar
  2. 2.
    Markatopoulou F, Moumtzidou A, Tzelepis C, Avgerinakis K, Gkalelis N, Vrochidis S, Mezaris V, Kompatsiaris I (2013) ITI-CERTH participation to TRECVID 2013. In: TRECVID 2013 workshop, Gaithersburg, MD, USA, vol 1, p 43Google Scholar
  3. 3.
    Gkalelis N, Mezaris V, Kompatsiaris I (2010) A joint content-event model for event-centric multimedia indexing. In: IEEE international conference on semantic computing (ICSC), pp 79–84Google Scholar
  4. 4.
    Sidiropoulos P, Mezaris V, Kompatsiaris I, Meinedo H, Bugalho M, Trancoso I (2011) Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans Circuits Syst Video Technol 21(8):1163–1177.  https://doi.org/10.1109/TCSVT.2011.2138830CrossRefGoogle Scholar
  5. 5.
    Tzelepis C, Galanopoulos D, Mezaris V, Patras I (2016) Learning to detect video events from zero or very few video examples. Image Vision Comput 53(C):35–44.  https://doi.org/10.1016/j.imavis.2015.09.005CrossRefGoogle Scholar
  6. 6.
    Tzelepis C, Ma Z, Mezaris V, Ionescu B, Kompatsiaris I, Boato G, Sebe N, Yan S (2016) Event-based media processing and analysis. Image Vision Comput 53(C), 3–19.  https://doi.org/10.1016/j.imavis.2016.05.005CrossRefGoogle Scholar
  7. 7.
    Markatopoulou F, Galanopoulos D, Mezaris V, Patras I (2017) Query and keyframe representations for ad-hoc video search. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, ICMR ’17, pp 407–411. ACM, NY, USA.  https://doi.org/10.1145/3078971.3079041, http://doi.acm.org/10.1145/3078971.3079041
  8. 8.
    Galanopoulos D, Markatopoulou F, Mezaris V, Patras I (2017) Concept language models and event-based concept number selection for zero-example event detection. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, ICMR ’17, pp 397–401. ACM, New York, NY, USA.  https://doi.org/10.1145/3078971.3079043, http://doi.acm.org/10.1145/3078971.3079043
  9. 9.
    Gkalelis N, Mezaris V (2017) Incremental accelerated kernel discriminant analysis. In: Proceedings of the 25th ACM international conference on multimedia, MM ’17, pp 1575–1583. ACM, New York, USA.  https://doi.org/10.1145/3123266.3123401, http://doi.acm.org/10.1145/3123266.3123401
  10. 10.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:abs/1409.1556
  11. 11.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
  12. 12.
    Pittaras N, Markatopoulou F, Mezaris V, Patras I (2017) Comparison of fine-tuning and extension strategies for deep convolutional neural networks. In: Proceedings of the 23rd international conference on MultiMedia modeling (MMM 2017), pp 102–114. Springer, Reykjavik, IcelandGoogle Scholar
  13. 13.
    Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV 2011), pp 2564–2571Google Scholar
  14. 14.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  15. 15.
    Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359.  https://doi.org/10.1016/j.cviu.2007.09.014CrossRefGoogle Scholar
  16. 16.
    Van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596CrossRefGoogle Scholar
  17. 17.
    Csurka G, Perronnin F (2011) Fisher vectors: beyond bag-of-visual-words image representations. In: Richard P, Braz J (eds) Computer vision, imaging and computer graphics theory and applications, vol 229. Communications in computer and information science. Springer, Berlin, pp 28–42CrossRefGoogle Scholar
  18. 18.
    Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3304–3311. IEEEGoogle Scholar
  19. 19.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678. ACMGoogle Scholar
  20. 20.
    Abadi M, Agarwal A, Barham P et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from https://www.tensorflow.org/
  21. 21.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2015), pp 1–9Google Scholar
  22. 22.
    Xie S, Girshick R, Dollár P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. arXiv:1611.05431
  23. 23.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2016), pp 770–778.  https://doi.org/10.1109/CVPR.2016.90
  24. 24.
    Safadi B, Derbas N, Hamadi A, Budnik M, Mulhem P, Qu G (2014) LIG at TRECVid 2014 : semantic indexing tion of the semantic indexing. In: TRECVID 2014 workshop, Gaithersburg, MD, USAGoogle Scholar
  25. 25.
    Snoek C, Sande K, Fontijne D, Cappallo S, Gemert J, Habibian A, Mensink T, Mettes P, Tao R, Koelma D et al (2014) Mediamill at trecvid 2014: searching concepts, objects, instances and events in videoGoogle Scholar
  26. 26.
    Snoek CGM, Cappallo S, Fontijne D, Julian D, Koelma DC, Mettes P, van de Sande KEA, Sarah A, Stokman H, Towal RB (2015) Qualcomm research and university of Amsterdam at TRECVID 2015: recognizing concepts, objects, and events in video. In: Proceedings of TRECVID 2015. NIST, USA (2015)Google Scholar
  27. 27.
    Wei Y, Xia W, Lin M, Huang J, Ni B, Dong J, Zhao Y, Yan S (2016) Hcp: a flexible cnn framework for multi-label image classification. IEEE Trans Pattern Anal Mach Intell 38(9):1901–1907.  https://doi.org/10.1109/TPAMI.2015.2491929CrossRefGoogle Scholar
  28. 28.
    Wang X, Zheng WS, Li X, Zhang J (2016) Cross-scenario transfer person reidentification. IEEE Trans Circuits Syst Video Technol 26(8):1447–1460CrossRefGoogle Scholar
  29. 29.
    Bishay M, Patras I (2017) Fusing multilabel deep networks for facial action unit detection. In: Proceedings of the 12th IEEE international conference on automatic face and gesture recognition (FG)Google Scholar
  30. 30.
    Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. Advances in neural information processing systems (NIPS 2007)Google Scholar
  31. 31.
    Obozinski G, Taskar B (2006) Multi-task feature selection. In: the 23rd international conference on machine learning (ICML 2006). Workshop of structural knowledge transfer for machine learning. Pittsburgh, PennsylvaniaGoogle Scholar
  32. 32.
    Mousavi H, Srinivas U, Monga V, Suo Y, Dao M, Tran T (2014) Multi-task image classification via collaborative, hierarchical spike-and-slab priors. In: the IEEE international conference on image processing (ICIP 2014), pp 4236–4240. Paris, FranceGoogle Scholar
  33. 33.
    Evgeniou T, Pontil M (2004) Regularized multi–task learning. In: the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), pp 109–117. Seattle, WAGoogle Scholar
  34. 34.
    Daumé III H (2009) Bayesian multitask learning with latent hierarchies. In: Proceedings of the 25th conference on uncertainty in artificial intelligence (UAI 2009), pp 135–142. AUAI Press, Quebec, CanadaGoogle Scholar
  35. 35.
    Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272CrossRefGoogle Scholar
  36. 36.
    Zhou J, Chen J, Ye J (2011) Clustered multi-task learning via alternating structure optimization. Advances in neural information processing systems (NIPS 2011)Google Scholar
  37. 37.
    Sun G, Chen Y, Liu X, Wu E (2015) Adaptive multi-task learning for fine-grained categorization. In: Proceedings of the IEEE international conference on image processing (ICIP 2015), pp 996–1000Google Scholar
  38. 38.
    Markatopoulou F, Mezaris V, Patras I (2016) Online multi-task learning for semantic concept detection in video. In: Proceedings of the IEEE international conference on image processing (ICIP 2016), pp 186–190Google Scholar
  39. 39.
    Kumar A, Daume H (2012) Learning task grouping and overlap in multi-task learning. In: the 29th ACM international conference on machine learning (ICML 2012), pp 1383–1390. Edinburgh, ScotlandGoogle Scholar
  40. 40.
    Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning. In: The 13th European conference on computer vision (ECCV 2014). Springer, Zurich, Switzerland, pp 94–108CrossRefGoogle Scholar
  41. 41.
    Markatopoulou F, Mezaris V, Patras I (2016) Deep multi-task learning with label correlation constraint for video concept detection. In: Proceedings of the international conference ACM multimedia (ACMMM 2016), pp 501–505. ACM, Amsterdam, The NetherlandsGoogle Scholar
  42. 42.
    Yang Y, Hospedales TM (2015) A unified perspective on multi-domain and multi-task learning. In: The international conference on learning representations (ICLR 2015), San Diego, CaliforniaGoogle Scholar
  43. 43.
    Smith J, Naphade M, Natsev A (2003) Multimedia semantic indexing using model vectors. In: Proceedings of the international conference on multimedia and expo (ICME 2003), pp 445–448. IEEE, New York.  https://doi.org/10.1109/ICME.2003.1221649
  44. 44.
    Weng MF, Chuang YY (2012) Cross-domain multicue fusion for concept-based video indexing. IEEE Trans Pattern Anal Mach Intell 34(10):1927–1941CrossRefGoogle Scholar
  45. 45.
    Deng J, Ding N, Jia Y, Frome A, Murphy K, Bengio S, Li Y, Neven H, Adam H (2014) Large-scale object classification using label relation graphs, pp 48–64. Springer, Zrich, SwitzerlandCrossRefGoogle Scholar
  46. 46.
    Ding N, Deng J, Murphy KP, Neven H (2015) Probabilistic label relation graphs with ising models. In: Proceedings of the 2015 IEEE international conference on computer vision (ICCV 2015), pp 1161–1169. IEEE, Washington, DC, USAGoogle Scholar
  47. 47.
    Markatopoulou F, Mezaris V, Pittaras N, Patras I (2015) Local features and a two-layer stacking architecture for semantic concept detection in video. IEEE Trans Emerg Top Comput 3:193–204CrossRefGoogle Scholar
  48. 48.
    Qi GJ et al (2007) Correlative multi-label video annotation. In: Proceedings of the 15th international conference on multimedia, pp 17–26. ACM, New YorkGoogle Scholar
  49. 49.
    Yang Y, Wu F, Nie F, Shen HT, Zhuang Y, Hauptmann AG (2012) Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Trans Image Process 21(3):1339–1351MathSciNetCrossRefGoogle Scholar
  50. 50.
    Wang H, Huang H, Ding C (2011) Image annotation using bi-relational graph of images and semantic labels. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2011), pp 793–800Google Scholar
  51. 51.
    Wang H, Huang H, Ding C (2009) Image annotation using multi-label correlated green’s function. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2009), pp 2029–2034Google Scholar
  52. 52.
    Zhang ML, Zhang K (2010) Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2010), pp 999–1008. ACM, New York, USAGoogle Scholar
  53. 53.
    Lu Y, Zhang W, Zhang K, Xue X (2012) Semantic context learning with large-scale weakly-labeled image set. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 1859–1863. ACM, New York, USAGoogle Scholar
  54. 54.
    Baumgartner M (2009) Uncovering deterministic causal structures: a boolean approach. Synthese 170(1):71–96MathSciNetCrossRefGoogle Scholar
  55. 55.
    Luo Q, Zhang S, Huang T, Gao W, Tian Q (2014) Superimage: packing semantic-relevant images for indexing and retrieval. In: Proceedings of the international conference on multimedia retrieval (ICMR 2014), pp 41–48. ACM, New York, USAGoogle Scholar
  56. 56.
    Cai X, Nie F, Cai W, Huang H (2013) New graph structured sparsity model for multi-label image annotations. In: Proceedings of the IEEE international conference on computer vision (ICCV 2013), pp 801–808Google Scholar
  57. 57.
    Taskar B, Guestrin C, Koller D (2003) Max-margin markov networks. In: Proceedings of the 16th international conference on neural information processing systems (NIPS 2003). MIT PressGoogle Scholar
  58. 58.
    Deng J, Satheesh S, Berg AC, Li F (2011) Fast and balanced: efficient label tree learning for large scale object recognition. In: Advances in neural information processing systems, pp 567–575. Curran Associates, IncGoogle Scholar
  59. 59.
    Sucar LE, Bielza C, Morales EF, Hernandez-Leal P, Zaragoza JH, Larra P (2014) Multi-label classification with bayesiannetwork-based chain classifiers. Pattern Recognit Lett 41:14–22CrossRefGoogle Scholar
  60. 60.
    Schwing AG, Urtasun R (2015) Fully connected deep structured networks. arXiv:abs/1503.02351
  61. 61.
    Deng Z, Vahdat A, Hu H, Mori G (2015) Structure inference machines: recurrent neural networks for analyzing relations in group activity recognition. arXiv:abs/1511.04196
  62. 62.
    Zheng S, Jayasumana S et al (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the international conference on computer vision (ICCV 2015)Google Scholar
  63. 63.
    Zhao X, Li X, Zhang Z (2015) Joint structural learning to rank with deep linear feature learning. IEEE Trans Knowl Data Eng 27(10):2756–2769CrossRefGoogle Scholar
  64. 64.
    Jalali A, Sanghavi S, Ruan C, Ravikumar PK (2010) A dirty model for multi-task learning. In: Advances in neural information processing systems, pp 964–972. Curran AssociatesGoogle Scholar
  65. 65.
    Markatopoulou F, Mezaris V, Patras I (2019) Implicit and explicit concept relations in deep neural networks for multi-label video/image annotation. IEEE Trans Circuits Syst Video Technol 29(6):1631–1644.  https://doi.org/10.1109/TCSVT.2018.2848458CrossRefGoogle Scholar
  66. 66.
    Fellbaum C (1998) WordNet: an electronic lexical database. Bradford BooksGoogle Scholar
  67. 67.
    Over P et al (2013) TRECVID 2013 – An overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID 2013. NIST, USAGoogle Scholar
  68. 68.
    Russakovsky O, Deng J, Su H et al (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV 2015) 115(3):211–252.  https://doi.org/10.1007/s11263-015-0816-yMathSciNetCrossRefGoogle Scholar
  69. 69.
    Yilmaz E, Kanoulas E, Aslam JA (2008) A simple and efficient sampling method for estimating ap and ndcg. In: the 31st ACM international conference on research and development in information retrieval (SIGIR 2008), pp 603–610, SingaporeGoogle Scholar
  70. 70.
    Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2014), pp 2155–2162. IEEE Computer SocietyGoogle Scholar
  71. 71.
    Dollár P, Appel R, Belongie SJ, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545CrossRefGoogle Scholar
  72. 72.
    Hoi SCH, Wu X, Liu H, Wu Y, Wang H, Xue H, Wu Q (2015) LOGO-net: large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv:abs/1511.02462
  73. 73.
    Oliveira G, Frazão X, Pimentel A, Ribeiro B (2016) Automatic graphic logo detection via fast region-based convolutional networks. arXiv:abs/1604.06083
  74. 74.
    Ku D, Cheng J, Gao G (2013) Translucent-static TV logo recognition by SUSAN corner extracting and matching. In: 2013 Third international conference on innovative computing technology (INTECH), pp 44–48. IEEEGoogle Scholar
  75. 75.
    Zhang X, Zhang D, Liu F, Zhang Y, Liu Y, Li J (2013) Spatial HOG based TV logo detection. In: Lu K, Mei T, Wu X (eds) International conference on internet multimedia computing and service, ICIMCS ’13, Huangshan, China - 17–19 August 2013, pp 76–81. ACMGoogle Scholar
  76. 76.
    Shen L, Wu W, Zheng S (2012) TV logo recognition based on luminance variance. In: IET international conference on information science and control engineering 2012 (ICISCE 2012), pp 1–4. IETGoogle Scholar
  77. 77.
    Romberg S, Lienhart R (2013) Bundle min-hashing for logo recognition. In: Proceedings of the 3rd ACM international conference on multimedia retrieval, pp 113–120. ACMGoogle Scholar
  78. 78.
    Revaud J, Douze M, Schmid C (2012) Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM international conference on Multimedia, pp 965–968. ACMGoogle Scholar
  79. 79.
    Rusinol M, Llados J (2009) Logo spotting by a bag-of-words approach for document categorization. In: 2009 10th international conference on document analysis and recognition, pp 111–115. IEEEGoogle Scholar
  80. 80.
    Le VP, Nayef N, Visani M, Ogier JM, De Tran C (2014) Document retrieval based on logo spotting using key-point matching. In: 2014 22nd international conference on pattern recognition, pp 3056–3061. IEEEGoogle Scholar
  81. 81.
    Chum O, Zisserman A (2007) An exemplar model for learning object classes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2007), pp 1–8. IEEEGoogle Scholar
  82. 82.
    Ferrari V, Fevrier L, Jurie F, Schmid C (2008) Groups of adjacent contour segments for object detection. IEEE Trans Pattern Anal Mach Intell 30(1):36–51CrossRefGoogle Scholar
  83. 83.
    Gu C, Lim JJ, Arbeláez P, Malik J (2009) Recognition using regions. In: IEEE conference on computer vision and pattern recognition (CVPR 2009), pp 1030–1037. IEEEGoogle Scholar
  84. 84.
    Bianco S, Buzzelli M, Mazzini D, Schettini R (2015) Logo recognition using CNN features. In: Proceedings of 2015 international conference on image analysis and processing, pp 438–448. SpringerGoogle Scholar
  85. 85.
    Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision (ICCV 2015), pp 1440–1448Google Scholar
  86. 86.
    Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, 7–12 December 2015, Montreal, Quebec, Canada, pp 91–99. http://papers.nips.cc/book/advances-in-neural-information-processing-systems-28-2015
  87. 87.
    Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2016), pp 779–788Google Scholar
  88. 88.
    Su H, Zhu X, Gong S (2017) Deep learning logo detection with data expansion by synthesising context. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp 530–539. IEEEGoogle Scholar
  89. 89.
    Ratner AJ, Ehrenberg H, Hussain Z, Dunnmon J, Ré C (2017) Learning to compose domain-specific transformations for data augmentation. In: Advances in neural information processing systems, pp 3236–3246Google Scholar
  90. 90.
    Mariani G, Scheidegger F, Istrate R, Bekas C, Malossi C (2018) Bagan: Data augmentation with balancing GAN. arXiv:1803.09655

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Foteini Markatopoulou
    • 1
  • Markos Zampoglou
    • 1
    Email author
  • Evlampios Apostolidis
    • 1
    • 2
  • Symeon Papadopoulos
    • 1
  • Vasileios Mezaris
    • 1
  • Ioannis Patras
    • 2
  • Ioannis Kompatsiaris
    • 1
  1. 1.Information Technologies InstituteCentre for Research and Technology HellasThessalonikiGreece
  2. 2.School of Electronic Engineering and Computer ScienceQueen Mary UniversityLondonUK

Personalised recommendations