Accurate online video tagging via probabilistic hybrid modeling

Shen, Jialie; Wang, Meng; Chua, Tat-Seng

doi:10.1007/s00530-014-0399-4

Accurate online video tagging via probabilistic hybrid modeling

Special Issue Paper
Published: 13 August 2014

Volume 22, pages 99–113, (2016)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Jialie Shen¹,
Meng Wang² &
Tat-Seng Chua³

380 Accesses
12 Citations
Explore all metrics

Abstract

Accurate video tagging has been becoming increasingly crucial for online video management and search. This article documents a novel framework called comprehensive video tagger (CVTagger) to facilitate accurate tag-based video annotation. The system applies both multimodal and temporal properties combined with a novel classification framework with hierarchical structure based on multilayer concept model and regression analysis. The advanced architecture enables effective incorporation of both video concept dependency and temporal dynamics. Using a large-scale test collection containing 50,000 YouTube videos, a set of empirical studies have been carried out and experimental results demonstrate various advantages of CVTagger over the state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment Analysis in the Age of Generative AI

Article Open access 05 March 2024

Recommender Systems: Techniques, Applications, and Challenges

Human Action Recognition and Prediction: A Survey

Article 28 March 2022

Notes

http://www.youtube.com.
http://www.metacafe.com.
http://www-nlpir.nist.gov/projects/tv2010/tv2010.html.
The algorithm can be applied to estimate both.
This paper uses AVT and RT to symbolize the approach present in [27] and [32], respectively.

References

Bertino, E., Fan, J., Ferrari, E., Hacid, M.S., Elmagarmid, A.K., Zhu, X.: A hierarchical access control model for video database systems. ACM Trans. Inf. Syst. 21(2), 155–191 (2003)
Article Google Scholar
Chang, S.F., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui, A.C., Luo, J.: Large-scale multimodal semantic concept detection for consumer video. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval, pp. 255–264 (2007)
Chen, L., Xu, D., Tsang, I.W.H., Luo, J.: Tag-based web photo retrieval improved by batch mode re-tagging. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3440–3446 (2010)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. 39(1), 1–38 (1977)
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)
MATH Google Scholar
Fan, J., Elmagarmid, A.K., Zhu, X., Aref, W.G., Wu, L.: Classview: hierarchical video shot classification, indexing, and accessing. IEEE Trans. Multimed. 6(1), 70–86 (2004)
Article Google Scholar
Figueiredo, M., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)
Article Google Scholar
Filippova, K., Hall, K.B.: Improved video categorization from text metadata and user comments. In: Proceedings of ACM SIGIR conference, pp. 835–842 (2011)
Gao, Y., Wang, F., Luan, H.B., Chua, T.S.: Brand data gathering from live social media streams. In: Proceedings of ACM ICMR, p. 169 (2014)
Gao, Y., Wang, M., Zha, Z., Shen, J., Li, X., Wu, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)
Article MathSciNet Google Scholar
Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice Hall, Upper Saddle River (2002)
Google Scholar
Hauptmann, A., Christel, M.G., Rong, Y.: Video retrieval based on semantic concepts. Proc. IEEE 96(4), 602–622 (2008)
Article Google Scholar
Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: Proceedings of ACM SIGIR conference (2008)
Jiang, W., Cotton, C., Chang, S.F., Ellis, D., Loui, A.C.: Short-term audio-visual atoms for generic video concept classification. In: Proceedings of ACM International Conference on Multimedia (2009)
Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.G.: Representations of keypoint-based semantic concept detection: a comprehensive survey. IEEE Trans. Multimed. 12(1), 42–53 (2010)
Article Google Scholar
Kender, J.R., Naphade, M.R.: Video news shot labelling refinement via shot rhythm models. In: Proceedings of IEEE International Conference on Multimedia and Expo (2006)
Liu, K.H., Weng, M.F., Tseng, C.Y., Chuang, Y.Y., Chen, M.S.: Association and temporal rule mining for post-filtering of semantic concept detection in video. IEEE Trans. Multimed. 10(2), 240–251 (2008)
Article Google Scholar
Logan, B.: Mel frequency cepstral coefficients for music modeling. In: Proceedings of the ISMIR (2000)
Lu, L., Liu, D., Zhang, H.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Acoust. Speech Signal 14(1), 5–18 (2006)
MathSciNet Google Scholar
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Naphade, M.R., Smith, J.R.: On the detection of semantic concepts at trecvid. In: Proceedings of ACM Multimedia (2004)
Naphade, M.R., Smith, J.R., Tesic, J., Chang, S.F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: A large-scale concept ontology for multimedia. IEEE Multimed. 13(3), 86–91 (2006)
Article Google Scholar
Scholkopf, B., Burges, C., Smola, A.: Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge (1999)
Google Scholar
Shen, J., Cheng, Z.: Personalized video similarity measure. Multimed. Syst. 17(5), 421–433 (2011)
Article MathSciNet Google Scholar
Shen, J., Meng, W., Yan, S., Pang, H., Hua, X.: Effective music tagging through advanced statistical modelling. In: Proceedings of ACM SIGIR Conference, pp. 635–642 (2010)
Shen, J., Wang, M., Yan, S., Hua, X.S.: Multimedia tagging: past, present and future. In: Proceedings of ACM Multimedia, pp. 639–640 (2011)
Siersdorfer, S., Pedro, J.S., Sanderson, M.: Automatic video tagging using content redundancy. In: Proceedings of ACM SIGIR (2009)
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and trecvid. In: MIR ’06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pp. 321–330 (2006)
Snoek, C., Worring, M.: Concept-based video retrieval. Found. Trends Inf. Retr. 2(4), 215–322 (2009)
Article Google Scholar
Snoek, C.G., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of ACM International Conference on Multimedia (2006)
Song, Y., Hua, X.S., Dai, L.R., Wang, M.: Semi-automatic video annotation based on active learning with multiple complementary predictors. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval (2005)
Toderici, G., Aradhye, H., Pasca, M., Sbaiz, L., Yagnik, J.: Finding meaning on YouTube: tag recommendation and category discovery. In: CVPR (2010)
Truong, B.T., Venkatesh, S.: Video abstraction: a systematic review and classification. ACM TOMCCAP 3(1), Article 3 (2007)
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)
Article Google Scholar
Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video diver: generic video indexing with diverse features. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval (2007)
Wang, M., Hua, X.S., Hong, R., Tang, J., Qi, G.J., Song, Y.: Unified video annotation via multi-graph learning. IEEE Trans. Circuits Syst. Video Technol. 19(5), 733–746 (2009)
Article Google Scholar
Yang, J., Hauptmann, A.G.: Exploring temporal consistency for video analysis and retrieval. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval (2006)
Zhao, W.L., Wu, X., Ngo, C.W.: On the annotation of web videos by efficient near-duplicate search. IEEE Trans. Multimed. 12(5), 448–461 (2010)
Article Google Scholar
Zhu, X., Elmagarmid, A.K., Xue, X., Wu, L., Catlin, A.C.: Insightvideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval. IEEE Trans. Multimed. 7(4), 648–666 (2005)
Article Google Scholar

Download references

Acknowledgments

Jialie Shen is supported by Academic Research Fund (AcRF) Tier-2 (MOE2013-T2-2-156), Ministry of Education (MOE), Singapore.

Author information

Authors and Affiliations

School of Information Systems, Singapore Management University, Singapore, 178902, Singapore
Jialie Shen
Hefei University of Technology, Hefei, China
Meng Wang
Department of Computer Science, National University of Singapore, Kent Ridge, Singapore, 117543, Singapore
Tat-Seng Chua

Authors

Jialie Shen
View author publications
You can also search for this author in PubMed Google Scholar
Meng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tat-Seng Chua
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jialie Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, J., Wang, M. & Chua, TS. Accurate online video tagging via probabilistic hybrid modeling. Multimedia Systems 22, 99–113 (2016). https://doi.org/10.1007/s00530-014-0399-4

Download citation

Published: 13 August 2014
Issue Date: February 2016
DOI: https://doi.org/10.1007/s00530-014-0399-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate online video tagging via probabilistic hybrid modeling

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in the Age of Generative AI

Recommender Systems: Techniques, Applications, and Challenges

Human Action Recognition and Prediction: A Survey

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accurate online video tagging via probabilistic hybrid modeling

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in the Age of Generative AI

Recommender Systems: Techniques, Applications, and Challenges

Human Action Recognition and Prediction: A Survey

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation