A Novel Video Classification Method Based on Hybrid Generative/Discriminative Models

  • Zhi Zeng
  • Wei Liang
  • Heping Li
  • Shuwu Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5342)


We consider the problem of automatically classifying videos into predefined categories based on the analysis of their audio contents. In detail, given a set of labeled videos (such as news, sitcoms, sports, etc.), our objective is to classify a new video into one of these categories. To solve this problem, a novel audio features based video classification method combining an unsupervised generative model named probabilistic Latent Semantic Analysis (pLSA) with a multi-class discriminative classifier is proposed. Since general audio signals usually show complicated distribution in the feature space, k-means clustering method is firstly used to group temporal signal segments with similar low-level features into natural clusters, which are adopted as “audio words”. Then, the audio stream of a video is decomposed into a bag of “audio words”. To classify those bags of “audio words” which extracted from videos, latent “topics” are discovered by pLSA, and subsequently, training a multi-class classifier on the “topic” distribution vector for each video. Encouraging classification results have been achieved in our experiments.


Video classification pLSA audio content mining 


  1. 1.
    Zhu, W., Toklu, C., Liou, S.-P.: Automatic news video segmentation and categorization based on closed-captioned text. In: Proc. IEEE Int. Conf. Multimedia Expo (ICME), pp. 829–832 (2001)Google Scholar
  2. 2.
    Brezeale, D., Cook, D.J.: Using closed captions and visual features to classify movies by genre. In: Poster Session 7th Int. Workshop Multimedia Data Min (MDM/KDD), San Jose, CA (2006)Google Scholar
  3. 3.
    Liu, Z., Wang, Y., Chen, T.: Audio feature extraction and analysis for scene segmentation and classification. J. VLSI Signal Process. Syst. 20(1/2), 61–79 (1998)CrossRefGoogle Scholar
  4. 4.
    Huang, J., Liu, Z., Wang, Y., Chen, Y., Wong, E.K.: Integration of multimodal features for video scene classification based on HMM. In: Proc. 3rd IEEE Workshop Multimedia Signal Process, pp. 53–58 (1999)Google Scholar
  5. 5.
    Roach, M., Mason, J.: Classification of video genre using audio. Eurospeech 4, 2693–2696 (2001)Google Scholar
  6. 6.
    Dinh, P.Q., Dorai, C., Venkatesh, S.: Video genre categorization using audio wavelet coefficients. In: 5th Asian Conf. Comput.Vis., Melbourne, Australia (2002)Google Scholar
  7. 7.
    Moncrieff, S., Venkatesh, S., Dorai, C.: Horror film genre typing and scene labeling via audio analysis. In: Proc. Int. Conf. Multimedia Expo (ICME), vol. 1, pp. 193–196 (2003)Google Scholar
  8. 8.
    Iyengar, G., Lippman, A.: Models for automatic classification of video sequences. In: Sethi, I.K., Jain, R.C. (eds.) Proc. SPIE Storage Retrieval Image Video Databases VI, vol. 3312, pp. 216–227 (1997)Google Scholar
  9. 9.
    Girgensohn, A., Foote, J.: Video classification using transform coefficients. In: Proc. IEEE Int. Conf. Acoust. Speech Signal Process (ICASSP), vol. 6, pp. 3045–3048 (1999)Google Scholar
  10. 10.
    Wei, G., Agnihotri, L., Dimitrova, N.: TV program classification based on face and text processing. In: Proc. IEEE Int. Conf. Multimedia Expo., vol. 3, pp. 1345–1348 (2000)Google Scholar
  11. 11.
    Truong, B.T., Dorai, C., Venkatesh, S.: Automatic genre identification for content-based video categorization. In: Proc. 15th Int. Conf. Pattern Recognit., vol. IV, pp. 230–233 (2000)Google Scholar
  12. 12.
    Lu, C., Drew, M.S., Au, J.: Classification of summarized videos using hidden Markov models on compressed chromaticity signatures. In: Proc. 9th ACM Int. Conf. Multimedia, pp. 479–482 (2001)Google Scholar
  13. 13.
    Qi, W., Gu, L., Jiang, H., Chen, X.-R., Zhang, H.-J.: Integrating visual, audio and text analysis for news video. In: Proc. 7th IEEE Int. Conf. Image Process (ICIP), pp. 520–523 (2000)Google Scholar
  14. 14.
    Wang, P., Cai, R., Yang, S.Q.: A hybrid approach to news video classification multimodal features. In: Proc. Joint Conf. 4th Int. Conf. Inf., Commun. Signal Process. 4th Pacific Rim Conf. Multimedia, pp. 787–791 (2003)Google Scholar
  15. 15.
    Brezeale, D., Cook, D.J.: Automatic Video Classification: A Survey of the Literature. IEEE Trans. Systems, Man, and Cybernetics-Part C: Applications and Reviews 38(3), 416–430 (2008)CrossRefGoogle Scholar
  16. 16.
    Lu, L., Zhang, H.J., Jiang, H.: Content analysis for audio classification and segmentation. IEEE Trans. Speech Audio Process 10(7), 504–516 (2002)CrossRefGoogle Scholar
  17. 17.
    Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process 10(5), 293–302 (2002)CrossRefGoogle Scholar
  18. 18.
    Cai, R., Lu, L., Hanjalic, A.: Unsupervised content discovery in composite audio. In: Proc. ACM Multimedia 2005, pp. 628–637 (2005)Google Scholar
  19. 19.
    Lu, L., Hanjalic, A.: Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval. IEEE Trans. Multimedia 10(1), 74–85 (2008)CrossRefGoogle Scholar
  20. 20.
    Cai, R., Lu, L., Hanjalic, A., Zhang, H.J., Cai, L.-H.: A flexible framework for key audio effects detection and auditory context inference. IEEE Trans. Audio, Speech, Lang, Process 14(3), 1026–1039 (2006)CrossRefGoogle Scholar
  21. 21.
    Hofmann, T.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning 41(2), 177–196 (2001)CrossRefzbMATHGoogle Scholar
  22. 22.
    Bosch, A., Zisserman, A., Munoz, X.: Scene classification using a hybrid generative /discriminative approach. IEEE Trans. PAMI 30(4), 712–727 (2008)CrossRefGoogle Scholar
  23. 23.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B 39, 1–38 (1977)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Zhi Zeng
    • 1
  • Wei Liang
    • 1
  • Heping Li
    • 1
  • Shuwu Zhang
    • 1
  1. 1.Digital Content Technology Research Center, Institute of AutomationChinese Academy of SciencesBeijingChina

Personalised recommendations