Abstract
Accurate video tagging has been becoming increasingly crucial for online video management and search. This article documents a novel framework called comprehensive video tagger (CVTagger) to facilitate accurate tag-based video annotation. The system applies both multimodal and temporal properties combined with a novel classification framework with hierarchical structure based on multilayer concept model and regression analysis. The advanced architecture enables effective incorporation of both video concept dependency and temporal dynamics. Using a large-scale test collection containing 50,000 YouTube videos, a set of empirical studies have been carried out and experimental results demonstrate various advantages of CVTagger over the state-of-the-art techniques.
Similar content being viewed by others
Notes
The algorithm can be applied to estimate both.
References
Bertino, E., Fan, J., Ferrari, E., Hacid, M.S., Elmagarmid, A.K., Zhu, X.: A hierarchical access control model for video database systems. ACM Trans. Inf. Syst. 21(2), 155–191 (2003)
Chang, S.F., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui, A.C., Luo, J.: Large-scale multimodal semantic concept detection for consumer video. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval, pp. 255–264 (2007)
Chen, L., Xu, D., Tsang, I.W.H., Luo, J.: Tag-based web photo retrieval improved by batch mode re-tagging. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3440–3446 (2010)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. 39(1), 1–38 (1977)
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)
Fan, J., Elmagarmid, A.K., Zhu, X., Aref, W.G., Wu, L.: Classview: hierarchical video shot classification, indexing, and accessing. IEEE Trans. Multimed. 6(1), 70–86 (2004)
Figueiredo, M., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)
Filippova, K., Hall, K.B.: Improved video categorization from text metadata and user comments. In: Proceedings of ACM SIGIR conference, pp. 835–842 (2011)
Gao, Y., Wang, F., Luan, H.B., Chua, T.S.: Brand data gathering from live social media streams. In: Proceedings of ACM ICMR, p. 169 (2014)
Gao, Y., Wang, M., Zha, Z., Shen, J., Li, X., Wu, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)
Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice Hall, Upper Saddle River (2002)
Hauptmann, A., Christel, M.G., Rong, Y.: Video retrieval based on semantic concepts. Proc. IEEE 96(4), 602–622 (2008)
Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: Proceedings of ACM SIGIR conference (2008)
Jiang, W., Cotton, C., Chang, S.F., Ellis, D., Loui, A.C.: Short-term audio-visual atoms for generic video concept classification. In: Proceedings of ACM International Conference on Multimedia (2009)
Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.G.: Representations of keypoint-based semantic concept detection: a comprehensive survey. IEEE Trans. Multimed. 12(1), 42–53 (2010)
Kender, J.R., Naphade, M.R.: Video news shot labelling refinement via shot rhythm models. In: Proceedings of IEEE International Conference on Multimedia and Expo (2006)
Liu, K.H., Weng, M.F., Tseng, C.Y., Chuang, Y.Y., Chen, M.S.: Association and temporal rule mining for post-filtering of semantic concept detection in video. IEEE Trans. Multimed. 10(2), 240–251 (2008)
Logan, B.: Mel frequency cepstral coefficients for music modeling. In: Proceedings of the ISMIR (2000)
Lu, L., Liu, D., Zhang, H.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Acoust. Speech Signal 14(1), 5–18 (2006)
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Naphade, M.R., Smith, J.R.: On the detection of semantic concepts at trecvid. In: Proceedings of ACM Multimedia (2004)
Naphade, M.R., Smith, J.R., Tesic, J., Chang, S.F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: A large-scale concept ontology for multimedia. IEEE Multimed. 13(3), 86–91 (2006)
Scholkopf, B., Burges, C., Smola, A.: Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge (1999)
Shen, J., Cheng, Z.: Personalized video similarity measure. Multimed. Syst. 17(5), 421–433 (2011)
Shen, J., Meng, W., Yan, S., Pang, H., Hua, X.: Effective music tagging through advanced statistical modelling. In: Proceedings of ACM SIGIR Conference, pp. 635–642 (2010)
Shen, J., Wang, M., Yan, S., Hua, X.S.: Multimedia tagging: past, present and future. In: Proceedings of ACM Multimedia, pp. 639–640 (2011)
Siersdorfer, S., Pedro, J.S., Sanderson, M.: Automatic video tagging using content redundancy. In: Proceedings of ACM SIGIR (2009)
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and trecvid. In: MIR ’06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pp. 321–330 (2006)
Snoek, C., Worring, M.: Concept-based video retrieval. Found. Trends Inf. Retr. 2(4), 215–322 (2009)
Snoek, C.G., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of ACM International Conference on Multimedia (2006)
Song, Y., Hua, X.S., Dai, L.R., Wang, M.: Semi-automatic video annotation based on active learning with multiple complementary predictors. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval (2005)
Toderici, G., Aradhye, H., Pasca, M., Sbaiz, L., Yagnik, J.: Finding meaning on YouTube: tag recommendation and category discovery. In: CVPR (2010)
Truong, B.T., Venkatesh, S.: Video abstraction: a systematic review and classification. ACM TOMCCAP 3(1), Article 3 (2007)
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)
Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video diver: generic video indexing with diverse features. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval (2007)
Wang, M., Hua, X.S., Hong, R., Tang, J., Qi, G.J., Song, Y.: Unified video annotation via multi-graph learning. IEEE Trans. Circuits Syst. Video Technol. 19(5), 733–746 (2009)
Yang, J., Hauptmann, A.G.: Exploring temporal consistency for video analysis and retrieval. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval (2006)
Zhao, W.L., Wu, X., Ngo, C.W.: On the annotation of web videos by efficient near-duplicate search. IEEE Trans. Multimed. 12(5), 448–461 (2010)
Zhu, X., Elmagarmid, A.K., Xue, X., Wu, L., Catlin, A.C.: Insightvideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval. IEEE Trans. Multimed. 7(4), 648–666 (2005)
Acknowledgments
Jialie Shen is supported by Academic Research Fund (AcRF) Tier-2 (MOE2013-T2-2-156), Ministry of Education (MOE), Singapore.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shen, J., Wang, M. & Chua, TS. Accurate online video tagging via probabilistic hybrid modeling. Multimedia Systems 22, 99–113 (2016). https://doi.org/10.1007/s00530-014-0399-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-014-0399-4