Skip to main content
Log in

Accurate online video tagging via probabilistic hybrid modeling

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Accurate video tagging has been becoming increasingly crucial for online video management and search. This article documents a novel framework called comprehensive video tagger (CVTagger) to facilitate accurate tag-based video annotation. The system applies both multimodal and temporal properties combined with a novel classification framework with hierarchical structure based on multilayer concept model and regression analysis. The advanced architecture enables effective incorporation of both video concept dependency and temporal dynamics. Using a large-scale test collection containing 50,000 YouTube videos, a set of empirical studies have been carried out and experimental results demonstrate various advantages of CVTagger over the state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.youtube.com.

  2. http://www.metacafe.com.

  3. http://www-nlpir.nist.gov/projects/tv2010/tv2010.html.

  4. The algorithm can be applied to estimate both.

  5. This paper uses AVT and RT to symbolize the approach present in [27] and [32], respectively.

References

  1. Bertino, E., Fan, J., Ferrari, E., Hacid, M.S., Elmagarmid, A.K., Zhu, X.: A hierarchical access control model for video database systems. ACM Trans. Inf. Syst. 21(2), 155–191 (2003)

    Article  Google Scholar 

  2. Chang, S.F., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui, A.C., Luo, J.: Large-scale multimodal semantic concept detection for consumer video. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval, pp. 255–264 (2007)

  3. Chen, L., Xu, D., Tsang, I.W.H., Luo, J.: Tag-based web photo retrieval improved by batch mode re-tagging. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3440–3446 (2010)

  4. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. 39(1), 1–38 (1977)

  5. Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)

    MATH  Google Scholar 

  6. Fan, J., Elmagarmid, A.K., Zhu, X., Aref, W.G., Wu, L.: Classview: hierarchical video shot classification, indexing, and accessing. IEEE Trans. Multimed. 6(1), 70–86 (2004)

    Article  Google Scholar 

  7. Figueiredo, M., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)

    Article  Google Scholar 

  8. Filippova, K., Hall, K.B.: Improved video categorization from text metadata and user comments. In: Proceedings of ACM SIGIR conference, pp. 835–842 (2011)

  9. Gao, Y., Wang, F., Luan, H.B., Chua, T.S.: Brand data gathering from live social media streams. In: Proceedings of ACM ICMR, p. 169 (2014)

  10. Gao, Y., Wang, M., Zha, Z., Shen, J., Li, X., Wu, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)

    Article  MathSciNet  Google Scholar 

  11. Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice Hall, Upper Saddle River (2002)

    Google Scholar 

  12. Hauptmann, A., Christel, M.G., Rong, Y.: Video retrieval based on semantic concepts. Proc. IEEE 96(4), 602–622 (2008)

    Article  Google Scholar 

  13. Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: Proceedings of ACM SIGIR conference (2008)

  14. Jiang, W., Cotton, C., Chang, S.F., Ellis, D., Loui, A.C.: Short-term audio-visual atoms for generic video concept classification. In: Proceedings of ACM International Conference on Multimedia (2009)

  15. Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.G.: Representations of keypoint-based semantic concept detection: a comprehensive survey. IEEE Trans. Multimed. 12(1), 42–53 (2010)

    Article  Google Scholar 

  16. Kender, J.R., Naphade, M.R.: Video news shot labelling refinement via shot rhythm models. In: Proceedings of IEEE International Conference on Multimedia and Expo (2006)

  17. Liu, K.H., Weng, M.F., Tseng, C.Y., Chuang, Y.Y., Chen, M.S.: Association and temporal rule mining for post-filtering of semantic concept detection in video. IEEE Trans. Multimed. 10(2), 240–251 (2008)

    Article  Google Scholar 

  18. Logan, B.: Mel frequency cepstral coefficients for music modeling. In: Proceedings of the ISMIR (2000)

  19. Lu, L., Liu, D., Zhang, H.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Acoust. Speech Signal 14(1), 5–18 (2006)

    MathSciNet  Google Scholar 

  20. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  21. Naphade, M.R., Smith, J.R.: On the detection of semantic concepts at trecvid. In: Proceedings of ACM Multimedia (2004)

  22. Naphade, M.R., Smith, J.R., Tesic, J., Chang, S.F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: A large-scale concept ontology for multimedia. IEEE Multimed. 13(3), 86–91 (2006)

    Article  Google Scholar 

  23. Scholkopf, B., Burges, C., Smola, A.: Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge (1999)

    Google Scholar 

  24. Shen, J., Cheng, Z.: Personalized video similarity measure. Multimed. Syst. 17(5), 421–433 (2011)

    Article  MathSciNet  Google Scholar 

  25. Shen, J., Meng, W., Yan, S., Pang, H., Hua, X.: Effective music tagging through advanced statistical modelling. In: Proceedings of ACM SIGIR Conference, pp. 635–642 (2010)

  26. Shen, J., Wang, M., Yan, S., Hua, X.S.: Multimedia tagging: past, present and future. In: Proceedings of ACM Multimedia, pp. 639–640 (2011)

  27. Siersdorfer, S., Pedro, J.S., Sanderson, M.: Automatic video tagging using content redundancy. In: Proceedings of ACM SIGIR (2009)

  28. Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and trecvid. In: MIR ’06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pp. 321–330 (2006)

  29. Snoek, C., Worring, M.: Concept-based video retrieval. Found. Trends Inf. Retr. 2(4), 215–322 (2009)

    Article  Google Scholar 

  30. Snoek, C.G., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of ACM International Conference on Multimedia (2006)

  31. Song, Y., Hua, X.S., Dai, L.R., Wang, M.: Semi-automatic video annotation based on active learning with multiple complementary predictors. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval (2005)

  32. Toderici, G., Aradhye, H., Pasca, M., Sbaiz, L., Yagnik, J.: Finding meaning on YouTube: tag recommendation and category discovery. In: CVPR (2010)

  33. Truong, B.T., Venkatesh, S.: Video abstraction: a systematic review and classification. ACM TOMCCAP 3(1), Article 3 (2007)

  34. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)

    Article  Google Scholar 

  35. Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video diver: generic video indexing with diverse features. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval (2007)

  36. Wang, M., Hua, X.S., Hong, R., Tang, J., Qi, G.J., Song, Y.: Unified video annotation via multi-graph learning. IEEE Trans. Circuits Syst. Video Technol. 19(5), 733–746 (2009)

    Article  Google Scholar 

  37. Yang, J., Hauptmann, A.G.: Exploring temporal consistency for video analysis and retrieval. In: Proceedings of ACM International Workshop on Multimedia Information Retrieval (2006)

  38. Zhao, W.L., Wu, X., Ngo, C.W.: On the annotation of web videos by efficient near-duplicate search. IEEE Trans. Multimed. 12(5), 448–461 (2010)

    Article  Google Scholar 

  39. Zhu, X., Elmagarmid, A.K., Xue, X., Wu, L., Catlin, A.C.: Insightvideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval. IEEE Trans. Multimed. 7(4), 648–666 (2005)

    Article  Google Scholar 

Download references

Acknowledgments

Jialie Shen is supported by Academic Research Fund (AcRF) Tier-2 (MOE2013-T2-2-156), Ministry of Education (MOE), Singapore.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jialie Shen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, J., Wang, M. & Chua, TS. Accurate online video tagging via probabilistic hybrid modeling. Multimedia Systems 22, 99–113 (2016). https://doi.org/10.1007/s00530-014-0399-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-014-0399-4

Keywords

Navigation