Multimedia Systems

, Volume 22, Issue 1, pp 99–113 | Cite as

Accurate online video tagging via probabilistic hybrid modeling

  • Jialie ShenEmail author
  • Meng Wang
  • Tat-Seng Chua
Special Issue Paper


Accurate video tagging has been becoming increasingly crucial for online video management and search. This article documents a novel framework called comprehensive video tagger (CVTagger) to facilitate accurate tag-based video annotation. The system applies both multimodal and temporal properties combined with a novel classification framework with hierarchical structure based on multilayer concept model and regression analysis. The advanced architecture enables effective incorporation of both video concept dependency and temporal dynamics. Using a large-scale test collection containing 50,000 YouTube videos, a set of empirical studies have been carried out and experimental results demonstrate various advantages of CVTagger over the state-of-the-art techniques.


Online video Social multimedia Tagging 



Jialie Shen is supported by Academic Research Fund (AcRF) Tier-2 (MOE2013-T2-2-156), Ministry of Education (MOE), Singapore.


  1. 1.
    Bertino, E., Fan, J., Ferrari, E., Hacid, M.S., Elmagarmid, A.K., Zhu, X.: A hierarchical access control model for video database systems. ACM Trans. Inf. Syst. 21(2), 155–191 (2003)CrossRefGoogle Scholar
  2. 2.
    Chang, S.F., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui, A.C., Luo, J.: Large-scale multimodal semantic concept detection for consumer video. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval, pp. 255–264 (2007)Google Scholar
  3. 3.
    Chen, L., Xu, D., Tsang, I.W.H., Luo, J.: Tag-based web photo retrieval improved by batch mode re-tagging. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3440–3446 (2010)Google Scholar
  4. 4.
    Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. 39(1), 1–38 (1977)Google Scholar
  5. 5.
    Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)zbMATHGoogle Scholar
  6. 6.
    Fan, J., Elmagarmid, A.K., Zhu, X., Aref, W.G., Wu, L.: Classview: hierarchical video shot classification, indexing, and accessing. IEEE Trans. Multimed. 6(1), 70–86 (2004)CrossRefGoogle Scholar
  7. 7.
    Figueiredo, M., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)CrossRefGoogle Scholar
  8. 8.
    Filippova, K., Hall, K.B.: Improved video categorization from text metadata and user comments. In: Proceedings of ACM SIGIR conference, pp. 835–842 (2011)Google Scholar
  9. 9.
    Gao, Y., Wang, F., Luan, H.B., Chua, T.S.: Brand data gathering from live social media streams. In: Proceedings of ACM ICMR, p. 169 (2014)Google Scholar
  10. 10.
    Gao, Y., Wang, M., Zha, Z., Shen, J., Li, X., Wu, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice Hall, Upper Saddle River (2002)Google Scholar
  12. 12.
    Hauptmann, A., Christel, M.G., Rong, Y.: Video retrieval based on semantic concepts. Proc. IEEE 96(4), 602–622 (2008)CrossRefGoogle Scholar
  13. 13.
    Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: Proceedings of ACM SIGIR conference (2008)Google Scholar
  14. 14.
    Jiang, W., Cotton, C., Chang, S.F., Ellis, D., Loui, A.C.: Short-term audio-visual atoms for generic video concept classification. In: Proceedings of ACM International Conference on Multimedia (2009)Google Scholar
  15. 15.
    Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.G.: Representations of keypoint-based semantic concept detection: a comprehensive survey. IEEE Trans. Multimed. 12(1), 42–53 (2010)CrossRefGoogle Scholar
  16. 16.
    Kender, J.R., Naphade, M.R.: Video news shot labelling refinement via shot rhythm models. In: Proceedings of IEEE International Conference on Multimedia and Expo (2006)Google Scholar
  17. 17.
    Liu, K.H., Weng, M.F., Tseng, C.Y., Chuang, Y.Y., Chen, M.S.: Association and temporal rule mining for post-filtering of semantic concept detection in video. IEEE Trans. Multimed. 10(2), 240–251 (2008)CrossRefGoogle Scholar
  18. 18.
    Logan, B.: Mel frequency cepstral coefficients for music modeling. In: Proceedings of the ISMIR (2000)Google Scholar
  19. 19.
    Lu, L., Liu, D., Zhang, H.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Acoust. Speech Signal 14(1), 5–18 (2006)MathSciNetGoogle Scholar
  20. 20.
    Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  21. 21.
    Naphade, M.R., Smith, J.R.: On the detection of semantic concepts at trecvid. In: Proceedings of ACM Multimedia (2004)Google Scholar
  22. 22.
    Naphade, M.R., Smith, J.R., Tesic, J., Chang, S.F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: A large-scale concept ontology for multimedia. IEEE Multimed. 13(3), 86–91 (2006)CrossRefGoogle Scholar
  23. 23.
    Scholkopf, B., Burges, C., Smola, A.: Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge (1999)Google Scholar
  24. 24.
    Shen, J., Cheng, Z.: Personalized video similarity measure. Multimed. Syst. 17(5), 421–433 (2011)CrossRefMathSciNetGoogle Scholar
  25. 25.
    Shen, J., Meng, W., Yan, S., Pang, H., Hua, X.: Effective music tagging through advanced statistical modelling. In: Proceedings of ACM SIGIR Conference, pp. 635–642 (2010)Google Scholar
  26. 26.
    Shen, J., Wang, M., Yan, S., Hua, X.S.: Multimedia tagging: past, present and future. In: Proceedings of ACM Multimedia, pp. 639–640 (2011)Google Scholar
  27. 27.
    Siersdorfer, S., Pedro, J.S., Sanderson, M.: Automatic video tagging using content redundancy. In: Proceedings of ACM SIGIR (2009)Google Scholar
  28. 28.
    Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and trecvid. In: MIR ’06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pp. 321–330 (2006)Google Scholar
  29. 29.
    Snoek, C., Worring, M.: Concept-based video retrieval. Found. Trends Inf. Retr. 2(4), 215–322 (2009)CrossRefGoogle Scholar
  30. 30.
    Snoek, C.G., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of ACM International Conference on Multimedia (2006)Google Scholar
  31. 31.
    Song, Y., Hua, X.S., Dai, L.R., Wang, M.: Semi-automatic video annotation based on active learning with multiple complementary predictors. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval (2005)Google Scholar
  32. 32.
    Toderici, G., Aradhye, H., Pasca, M., Sbaiz, L., Yagnik, J.: Finding meaning on YouTube: tag recommendation and category discovery. In: CVPR (2010)Google Scholar
  33. 33.
    Truong, B.T., Venkatesh, S.: Video abstraction: a systematic review and classification. ACM TOMCCAP 3(1), Article 3 (2007)Google Scholar
  34. 34.
    Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)CrossRefGoogle Scholar
  35. 35.
    Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video diver: generic video indexing with diverse features. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval (2007)Google Scholar
  36. 36.
    Wang, M., Hua, X.S., Hong, R., Tang, J., Qi, G.J., Song, Y.: Unified video annotation via multi-graph learning. IEEE Trans. Circuits Syst. Video Technol. 19(5), 733–746 (2009)CrossRefGoogle Scholar
  37. 37.
    Yang, J., Hauptmann, A.G.: Exploring temporal consistency for video analysis and retrieval. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval (2006)Google Scholar
  38. 38.
    Zhao, W.L., Wu, X., Ngo, C.W.: On the annotation of web videos by efficient near-duplicate search. IEEE Trans. Multimed. 12(5), 448–461 (2010)CrossRefGoogle Scholar
  39. 39.
    Zhu, X., Elmagarmid, A.K., Xue, X., Wu, L., Catlin, A.C.: Insightvideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval. IEEE Trans. Multimed. 7(4), 648–666 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.School of Information SystemsSingapore Management UniversitySingaporeSingapore
  2. 2.Hefei University of TechnologyHefeiChina
  3. 3.Department of Computer ScienceNational University of SingaporeSingaporeSingapore

Personalised recommendations