Classifying advertising video by topicalizing high-level semantic concepts

Hou, Sujuan; Zhou, Shangbo; Liu, Wenjie; Zheng, Yuanjie

doi:10.1007/s11042-018-5801-3

Classifying advertising video by topicalizing high-level semantic concepts

Published: 10 March 2018

Volume 77, pages 25475–25511, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Sujuan Hou^1,2,
Shangbo Zhou³,
Wenjie Liu⁴ &
…
Yuanjie Zheng^1,2,5

585 Accesses
21 Citations
Explore all metrics

Abstract

The recent proliferation of videos has driven the research into various applications, ranging from video analysis to indexing and retrieval. These applications greatly benefit from domain knowledge of videos. As a special kind of videos, classifying ad video is a key task because it allows automatic organization of videos according to categories or genres, and this further enables ad video indexing and retrieval. However, classifying ad video is challenging due to its unconstraint content and distinctive expression. While many studies focus on selecting ads relevant to the target videos, to the best of our knowledge, few focuses on ad video classification. To classify ad video, we propose a novel video representation that aims to capture the latent semantics of ad video in an unsupervised manner. In particular, this paper integrates the posterior occurrence probability between brand/logo information and the high-level object information into a latent Dirichlet allocation unified learning paradigm, named ppLDA. A topical representation for ad video is obtained by the proposed method, which can support category-related task. Our experiments on 10,111 real-world ad videos downloaded from Internet demonstrate that the proposed method could effectively differentiate ad videos.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on neural topic models: methods, applications, and challenges

Article Open access 25 January 2024

VSEM-SAMMI: An Explainable Multimodal Learning Approach to Predict User-Generated Image Helpfulness and Product Sales

Article Open access 18 April 2024

Sentiment analysis of movie reviews based on NB approaches using TF–IDF and count vectorizer

Article 16 April 2024

References

Bagdanov AD, Ballan L, Bertini M, Del Bimbo A (2007) Trademark matching and retrieval in sports video databases. In: Proceedings of the international workshop on workshop on multimedia information retrieval pp 79–86
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Brezeale D, Cook DJ (2006) Using closed captions and visual features to classify movies by genre. In: Poster session of the seventh international workshop on Multimedia Data Mining (MDM/KDD2006)
Brezeale D, Cook DJ (2008) Automatic video classification: a survey of the literature. Syst Man Cybernet Part C Appl Rev IEEE Trans 38:416–430
Article Google Scholar
Cao X, Wei X, Han Y, Chen X (2015) An object-level high-order contextual descriptor based on semantic, spatial, and scale cues. IEEE Trans Cybernet 45:1327–1339
Article Google Scholar
Chemudugunta C, Smyth P, Steyvers M (2006) Modeling general and specific aspects of documents with a probabilistic topic model. In: NIPS pp 241–248
Darji MC, Patel NM, Shah ZH (2016) Extraction of video songs from movies using audio features. In: International symposium on advanced computing and communication pp 60–64
Dimitrova N, Agnihotri L, Wei G (2015) Video classification based on HMM using text and faces. In: Signal processing conference, 2000 European
Fan J, Liang RZ (2016) Stochastic learning of multi-instance dictionary for earth mover’s distance-based histogram comparison. Neural Comput Appl. https://doi.org/10.1007/s00521-016-2603-2
Fan J, Luo H, Xiao J, Wu L (2004) Semantic video classification and feature subset selection under context and concept uncertainty. In: Digital libraries, 2004. Proceedings of the 2004 Joint ACM/IEEE Conference on pp 192–201
Fernandez-Beltran R, Pla F (2016) Latent topics-based relevance feedback for video retrieval. Pattern Recogn 51:72–84
Article Google Scholar
Fu Z, Huang F, Sun X, Vasilakos A, Yang CN (1939) Enabling semantic search based on conceptual graphs over encrypted outsourced data. IEEE Trans Serv Comput PP(99):1–1
Google Scholar
Gu B, Sheng VS, Li S (2015) Bi-parameter space partition for cost-sensitive SVM. In: International conference on artificial intelligence pp 3532–3539
Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. Multimed IEEE Trans 7:143–154
Article Google Scholar
Hartigan JA, Wong MA (1979) A K-means clustering algorithm. Appl Stat 28:100–108
Article MATH Google Scholar
Hou S, Zhou S, Chen L, Feng Y, Karim A (2015) Multi-label learning with label relevance in advertising video. Neurocomputing 171:932–948
Article Google Scholar
Inouye D, Ravikumar PD, Dhillon IS (2014) Admixture of poisson MRFs: a topic model with word dependencies. In: ICML pp 683–691
Jasinschi RS, Louie J (2001) Automatic TV program genre classification based on audio patterns pp 370–375
Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: Computer Vision and Pattern Recognition (CVPR), 2010 I.E. conference on, pp 3304–3311
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: Computer vision and pattern recognition pp 1725–1732
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1725–1732
Kobla V, Dementhon D, Doermann DS (1999) Identifying sports videos using replay, text, and camera motion features. In: Electronic imaging
Li L-J, Su H, Fei-Fei L, Xing EP (2010) Object bank: a high-level image representation for scene classification & semantic feature sparsification. In: Advances in neural information processing systems pp 1378–1386
Liu Y, Feng X, Zhou Z (2015) Multimodal video classification with stacked contractive autoencoders. Signal Process 120:761–766
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Article Google Scholar
Mironică I, Duţă IC, Ionescu B, Sebe N (2016) A modified vector of locally aggregated descriptors approach for fast video classification. Multimed Tools Appl 75:9045–9072
Article Google Scholar
Ng YH, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: Computer vision and pattern recognition pp 4694–4702
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. European Conference on Computer Vision. Springer, Berlin, Heidelberg, pp 490–503
Google Scholar
Ou W, Xie Z, Lv Z (2015) Spatially regularized latent topic model for simultaneous object discovery and segmentation. In: Systems, Man, and Cybernetics (SMC), 2015 I.E. international conference on pp 2938–2943
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. European Conference on Computer Vision. Springer-Verlag, Berlin, Heidelberg, pp 143–156
Google Scholar
Psyllos AP, Anagnostopoulos C-N, Kayafas E (2010) Vehicle logo recognition using a SIFT-based enhanced matching scheme. Intell Transp Syst IEEE Trans 11:322–328
Article Google Scholar
Roach M, Mason J (2001) Classification of video genre using audio. Proc Eurospeech 4:2693–2696
Google Scholar
Roach M, Mason JS, Pawlewski M (2001) Motion-based classification of cartoons. In: International symposium on intelligent multimedia, video and speech processing pp 146–149
Roach M, Mason J, Xu LQ (2002) Video genre verification using both acoustic and visual modes. In: Multimedia signal processing, 2002 I.E. workshop on pp 157–160
Sahbi H, Ballan L, Serra G, Del Bimbo A (2013) Context-dependent logo matching and recognition. Image Processing IEEE Trans 22:1018–1031
Article MathSciNet MATH Google Scholar
Song W, Hagras H (2016) A big-bang big-crunch fuzzy logic based system for sports video scene classification. In: IEEE international conference on fuzzy systems pp 642–649
Song J, Gao L, Nie F, Shen H, Yan Y, Sebe N (2016) Optimized graph learning with partial tags and multiple features for image and video annotation. IEEE Trans Image Process Publ IEEE Signal Process Soc 25:4999–5011
Article MathSciNet Google Scholar
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. Computer Vision and Pattern Recognition. arXiv:1212.0402
Uijlings JRR, Duta IC, Rostamzadeh N, Sebe N (2014) Realtime video classification using dense HOF/HOG. In: International conference on multimedia retrieval p 145
Wallach HM (2006) Topic modeling: beyond bag-of-words. In: International conference on machine learning pp 977–984
Wang P, Cai R, Yang S-Q (2003) A hybrid approach to news video classification multimodal features. In: Information, communications and signal processing, 2003 and fourth pacific rim conference on multimedia. Proceedings of the 2003 joint conference of the fourth international conference on pp 787–791
Wang X, Mccallum A, Wei X (2007) Topical N-Grams: phrase and topic discovery, with an application to information retrieval. In: IEEE international conference on data mining pp 697–702
Wang M, Li W, Liu D, Ni B, Shen J, Yan S (2015) Facilitating image search with a scalable and compact semantic mapping. IEEE Trans Cybernet 45:1561–1574
Article Google Scholar
Wang Z, Li L, Huang Q (2015) Cross-media topic detection with refined CNN based image-dominant topic model. In: Proceedings of the 23rd ACM international conference on multimedia pp 1171–1174
Wang X, Gao L, Song J, Shen H (2017) Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Signal Process Lett 24:510–514
Article Google Scholar
Xu LQ, Li Y (2003) Video classification using spatial-temporal features and PCA. In: International conference on multimedia and expo pp 485–488
Xu D, Ricci E, Yan Y, Song J, Sebe N (2015) Learning deep representations of appearance and motion for anomalous event detection. Computer Vision and Pattern Recognition. arXiv:1510.01553
Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann AG (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimed 15:572–581
Article Google Scholar
Yang Y, Song J, Huang Z, Ma Z (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. Multimed IEEE Trans 15:572–581
Article Google Scholar
Ye H, Wu Z, Zhao RW, Wang X, Jiang YG, Xue X (2015) Evaluating two-stream CNN for video classification. In: ACM on international conference on multimedia retrieval pp 435–442
Yi J, Peng Y, Xiao J (2013) Exploiting semantic and visual context for effective video annotation. IEEE Trans Multimed 15:1400–1414
Article Google Scholar
Yusoff Y, Christmas WJ, Kittler J (2000) Video shot cut detection using adaptive thresholding. In: BMVC pp 1–10
Zhang H, Lu J (2010) Creating ensembles of classifiers via fuzzy clustering and deflection. Elsevier North-Holland, Inc, Amsterdam
Google Scholar
Zhang H, Cao L, Gao S (2014) A locality correlation preserving support vector machine. Pattern Recogn 47:3168–3178
Article MATH Google Scholar

Download references

Acknowledgements

S. Hou would like to thank Cheng Liang for her substantial effort in the revision, including the design and implementation of experiments as well as proofreading the manuscript. This work was made possible through support from the major project of Natural Science Foundation of Shandong Province (ZR2016FQ20), Postdoctoral Science Foundation of China (2017 M612338), Fundamental Science and Frontier Technology Research of Chongqing CSTC (cstc2015jcyjBX0124), Natural Science Foundation of China (NSFC) (61702313,61572300), Natural Science Foundation of Shandong Province in China (ZR2014FM001), Taishan Scholar Program of Shandong Province in China (TSHW201502038).

Natural Science Foundation of China (NSFC) (61702313), Natural Science Foundation of Shandong Province (ZR2016FQ20), Postdoctoral Science Foundation of China (2017 M612338),

Author information

Authors and Affiliations

School of Information Science and Engineering, Shandong Normal University, Jinan, 250014, China
Sujuan Hou & Yuanjie Zheng
Institute of Life Sciences, Shandong Normal University, Jinan, 250014, China
Sujuan Hou & Yuanjie Zheng
School of Computer Science, Chongqing University, Chongqing, 400030, China
Shangbo Zhou
School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing, People’s Republic of China
Wenjie Liu
Key Laboratory of Intelligent Information Processing, Shandong Normal University, Jinan, 250014, China
Yuanjie Zheng

Authors

Sujuan Hou
View author publications
You can also search for this author in PubMed Google Scholar
Shangbo Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanjie Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shangbo Zhou or Yuanjie Zheng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hou, S., Zhou, S., Liu, W. et al. Classifying advertising video by topicalizing high-level semantic concepts. Multimed Tools Appl 77, 25475–25511 (2018). https://doi.org/10.1007/s11042-018-5801-3

Download citation

Received: 22 November 2016
Revised: 26 January 2018
Accepted: 14 February 2018
Published: 10 March 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11042-018-5801-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classifying advertising video by topicalizing high-level semantic concepts

Abstract

Access this article

Similar content being viewed by others

A survey on neural topic models: methods, applications, and challenges

VSEM-SAMMI: An Explainable Multimodal Learning Approach to Predict User-Generated Image Helpfulness and Product Sales

Sentiment analysis of movie reviews based on NB approaches using TF–IDF and count vectorizer

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classifying advertising video by topicalizing high-level semantic concepts

Abstract

Access this article

Similar content being viewed by others

A survey on neural topic models: methods, applications, and challenges

VSEM-SAMMI: An Explainable Multimodal Learning Approach to Predict User-Generated Image Helpfulness and Product Sales

Sentiment analysis of movie reviews based on NB approaches using TF–IDF and count vectorizer

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation