Skip to main content

Classifying advertising video by topicalizing high-level semantic concepts


The recent proliferation of videos has driven the research into various applications, ranging from video analysis to indexing and retrieval. These applications greatly benefit from domain knowledge of videos. As a special kind of videos, classifying ad video is a key task because it allows automatic organization of videos according to categories or genres, and this further enables ad video indexing and retrieval. However, classifying ad video is challenging due to its unconstraint content and distinctive expression. While many studies focus on selecting ads relevant to the target videos, to the best of our knowledge, few focuses on ad video classification. To classify ad video, we propose a novel video representation that aims to capture the latent semantics of ad video in an unsupervised manner. In particular, this paper integrates the posterior occurrence probability between brand/logo information and the high-level object information into a latent Dirichlet allocation unified learning paradigm, named ppLDA. A topical representation for ad video is obtained by the proposed method, which can support category-related task. Our experiments on 10,111 real-world ad videos downloaded from Internet demonstrate that the proposed method could effectively differentiate ad videos.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26


  1. Bagdanov AD, Ballan L, Bertini M, Del Bimbo A (2007) Trademark matching and retrieval in sports video databases. In: Proceedings of the international workshop on workshop on multimedia information retrieval pp 79–86

  2. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  3. Brezeale D, Cook DJ (2006) Using closed captions and visual features to classify movies by genre. In: Poster session of the seventh international workshop on Multimedia Data Mining (MDM/KDD2006)

  4. Brezeale D, Cook DJ (2008) Automatic video classification: a survey of the literature. Syst Man Cybernet Part C Appl Rev IEEE Trans 38:416–430

    Article  Google Scholar 

  5. Cao X, Wei X, Han Y, Chen X (2015) An object-level high-order contextual descriptor based on semantic, spatial, and scale cues. IEEE Trans Cybernet 45:1327–1339

    Article  Google Scholar 

  6. Chemudugunta C, Smyth P, Steyvers M (2006) Modeling general and specific aspects of documents with a probabilistic topic model. In: NIPS pp 241–248

  7. Darji MC, Patel NM, Shah ZH (2016) Extraction of video songs from movies using audio features. In: International symposium on advanced computing and communication pp 60–64

  8. Dimitrova N, Agnihotri L, Wei G (2015) Video classification based on HMM using text and faces. In: Signal processing conference, 2000 European

  9. Fan J, Liang RZ (2016) Stochastic learning of multi-instance dictionary for earth mover’s distance-based histogram comparison. Neural Comput Appl.

  10. Fan J, Luo H, Xiao J, Wu L (2004) Semantic video classification and feature subset selection under context and concept uncertainty. In: Digital libraries, 2004. Proceedings of the 2004 Joint ACM/IEEE Conference on pp 192–201

  11. Fernandez-Beltran R, Pla F (2016) Latent topics-based relevance feedback for video retrieval. Pattern Recogn 51:72–84

    Article  Google Scholar 

  12. Fu Z, Huang F, Sun X, Vasilakos A, Yang CN (1939) Enabling semantic search based on conceptual graphs over encrypted outsourced data. IEEE Trans Serv Comput PP(99):1–1

    Google Scholar 

  13. Gu B, Sheng VS, Li S (2015) Bi-parameter space partition for cost-sensitive SVM. In: International conference on artificial intelligence pp 3532–3539

  14. Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. Multimed IEEE Trans 7:143–154

    Article  Google Scholar 

  15. Hartigan JA, Wong MA (1979) A K-means clustering algorithm. Appl Stat 28:100–108

    Article  MATH  Google Scholar 

  16. Hou S, Zhou S, Chen L, Feng Y, Karim A (2015) Multi-label learning with label relevance in advertising video. Neurocomputing 171:932–948

    Article  Google Scholar 

  17. Inouye D, Ravikumar PD, Dhillon IS (2014) Admixture of poisson MRFs: a topic model with word dependencies. In: ICML pp 683–691

  18. Jasinschi RS, Louie J (2001) Automatic TV program genre classification based on audio patterns pp 370–375

  19. Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: Computer Vision and Pattern Recognition (CVPR), 2010 I.E. conference on, pp 3304–3311

  20. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: Computer vision and pattern recognition pp 1725–1732

  21. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1725–1732

  22. Kobla V, Dementhon D, Doermann DS (1999) Identifying sports videos using replay, text, and camera motion features. In: Electronic imaging

  23. Li L-J, Su H, Fei-Fei L, Xing EP (2010) Object bank: a high-level image representation for scene classification & semantic feature sparsification. In: Advances in neural information processing systems pp 1378–1386

  24. Liu Y, Feng X, Zhou Z (2015) Multimodal video classification with stacked contractive autoencoders. Signal Process 120:761–766

    Article  Google Scholar 

  25. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110

    Article  Google Scholar 

  26. Mironică I, Duţă IC, Ionescu B, Sebe N (2016) A modified vector of locally aggregated descriptors approach for fast video classification. Multimed Tools Appl 75:9045–9072

    Article  Google Scholar 

  27. Ng YH, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: Computer vision and pattern recognition pp 4694–4702

  28. Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. European Conference on Computer Vision. Springer, Berlin, Heidelberg, pp 490–503

    Google Scholar 

  29. Ou W, Xie Z, Lv Z (2015) Spatially regularized latent topic model for simultaneous object discovery and segmentation. In: Systems, Man, and Cybernetics (SMC), 2015 I.E. international conference on pp 2938–2943

  30. Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. European Conference on Computer Vision. Springer-Verlag, Berlin, Heidelberg, pp 143–156

    Google Scholar 

  31. Psyllos AP, Anagnostopoulos C-N, Kayafas E (2010) Vehicle logo recognition using a SIFT-based enhanced matching scheme. Intell Transp Syst IEEE Trans 11:322–328

    Article  Google Scholar 

  32. Roach M, Mason J (2001) Classification of video genre using audio. Proc Eurospeech 4:2693–2696

    Google Scholar 

  33. Roach M, Mason JS, Pawlewski M (2001) Motion-based classification of cartoons. In: International symposium on intelligent multimedia, video and speech processing pp 146–149

  34. Roach M, Mason J, Xu LQ (2002) Video genre verification using both acoustic and visual modes. In: Multimedia signal processing, 2002 I.E. workshop on pp 157–160

  35. Sahbi H, Ballan L, Serra G, Del Bimbo A (2013) Context-dependent logo matching and recognition. Image Processing IEEE Trans 22:1018–1031

    Article  MathSciNet  MATH  Google Scholar 

  36. Song W, Hagras H (2016) A big-bang big-crunch fuzzy logic based system for sports video scene classification. In: IEEE international conference on fuzzy systems pp 642–649

  37. Song J, Gao L, Nie F, Shen H, Yan Y, Sebe N (2016) Optimized graph learning with partial tags and multiple features for image and video annotation. IEEE Trans Image Process Publ IEEE Signal Process Soc 25:4999–5011

    Article  MathSciNet  Google Scholar 

  38. Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. Computer Vision and Pattern Recognition. arXiv:1212.0402

  39. Uijlings JRR, Duta IC, Rostamzadeh N, Sebe N (2014) Realtime video classification using dense HOF/HOG. In: International conference on multimedia retrieval p 145

  40. Wallach HM (2006) Topic modeling: beyond bag-of-words. In: International conference on machine learning pp 977–984

  41. Wang P, Cai R, Yang S-Q (2003) A hybrid approach to news video classification multimodal features. In: Information, communications and signal processing, 2003 and fourth pacific rim conference on multimedia. Proceedings of the 2003 joint conference of the fourth international conference on pp 787–791

  42. Wang X, Mccallum A, Wei X (2007) Topical N-Grams: phrase and topic discovery, with an application to information retrieval. In: IEEE international conference on data mining pp 697–702

  43. Wang M, Li W, Liu D, Ni B, Shen J, Yan S (2015) Facilitating image search with a scalable and compact semantic mapping. IEEE Trans Cybernet 45:1561–1574

    Article  Google Scholar 

  44. Wang Z, Li L, Huang Q (2015) Cross-media topic detection with refined CNN based image-dominant topic model. In: Proceedings of the 23rd ACM international conference on multimedia pp 1171–1174

  45. Wang X, Gao L, Song J, Shen H (2017) Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Signal Process Lett 24:510–514

    Article  Google Scholar 

  46. Xu LQ, Li Y (2003) Video classification using spatial-temporal features and PCA. In: International conference on multimedia and expo pp 485–488

  47. Xu D, Ricci E, Yan Y, Song J, Sebe N (2015) Learning deep representations of appearance and motion for anomalous event detection. Computer Vision and Pattern Recognition. arXiv:1510.01553

  48. Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann AG (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimed 15:572–581

    Article  Google Scholar 

  49. Yang Y, Song J, Huang Z, Ma Z (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. Multimed IEEE Trans 15:572–581

    Article  Google Scholar 

  50. Ye H, Wu Z, Zhao RW, Wang X, Jiang YG, Xue X (2015) Evaluating two-stream CNN for video classification. In: ACM on international conference on multimedia retrieval pp 435–442

  51. Yi J, Peng Y, Xiao J (2013) Exploiting semantic and visual context for effective video annotation. IEEE Trans Multimed 15:1400–1414

    Article  Google Scholar 

  52. Yusoff Y, Christmas WJ, Kittler J (2000) Video shot cut detection using adaptive thresholding. In: BMVC pp 1–10

  53. Zhang H, Lu J (2010) Creating ensembles of classifiers via fuzzy clustering and deflection. Elsevier North-Holland, Inc, Amsterdam

    Google Scholar 

  54. Zhang H, Cao L, Gao S (2014) A locality correlation preserving support vector machine. Pattern Recogn 47:3168–3178

    Article  MATH  Google Scholar 

Download references


S. Hou would like to thank Cheng Liang for her substantial effort in the revision, including the design and implementation of experiments as well as proofreading the manuscript. This work was made possible through support from the major project of Natural Science Foundation of Shandong Province (ZR2016FQ20), Postdoctoral Science Foundation of China (2017 M612338), Fundamental Science and Frontier Technology Research of Chongqing CSTC (cstc2015jcyjBX0124), Natural Science Foundation of China (NSFC) (61702313,61572300), Natural Science Foundation of Shandong Province in China (ZR2014FM001), Taishan Scholar Program of Shandong Province in China (TSHW201502038).

Natural Science Foundation of China (NSFC) (61702313), Natural Science Foundation of Shandong Province (ZR2016FQ20), Postdoctoral Science Foundation of China (2017 M612338),

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Shangbo Zhou or Yuanjie Zheng.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, S., Zhou, S., Liu, W. et al. Classifying advertising video by topicalizing high-level semantic concepts. Multimed Tools Appl 77, 25475–25511 (2018).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: