Multimedia Systems

, Volume 20, Issue 4, pp 429–445 | Cite as

Graph regularized GM-pLSA and its applications to video content analysis

Regular Paper
  • 156 Downloads

Abstract

As standard probabilistic latent semantic analysis (pLSA) is oriented to discrete quantity only, pLSA with Gaussian mixtures (GM-pLSA) succeeding in transferring it to continuous feature space is proposed, which uses Gaussian mixture model to describe the feature distribution under each latent aspect. However, inheriting from pLSA, GM-pLSA still overlooks the intrinsic interdependence between terms, which indeed is an important clue for performance improvement. In this paper, we present a graph regularized GM-pLSA (GRGM-pLSA) model as an extension of GM-pLSA to embed this term correlation information into the process of model learning. Specifically, grounded on the manifold regularization principle, a graph regularizer is introduced to characterize the correlation between terms; by imposing it on the objective function of GM-pLSA, model parameters of GRGM-pLSA are derived via corresponding expectation maximization algorithm. Furthermore, two applications to video content analysis are devised. One is video categorization where GRGM-pLSA serves for feature mapping with two kinds of sub-shot correlations, respectively, incorporated, while the other provides a new perspective on video concept detection, which transforms the detection task to a GRGM-pLSA-based visual-to-textual feature conversion problem. Extensive experiments and comparison with GM-pLSA and several state-of-the-art approaches in both applications demonstrate the effectiveness of GRGM-pLSA.

Keywords

Probabilistic latent semantic analysis with Gaussian mixtures Term correlation Graph regularizer Video categorization Video concept detection 

Notes

Acknowledgments

This work is supported by the National Science Foundation of China (61273274, 4123104), National 973 Key Research Program of China (2011CB302203), Ph.D. Programs Foundation of Ministry of Education of China (20100009110004), National Key Technology R&D Program of China (2012BAH01F03) and Tsinghua-Tencent Joint Lab for IIT.

References

  1. 1.
    Ewerth, R., Freisleben, B.: Semi-supervised learning for semantic video retrieval. In: Proceedings of ACM International Conference on Image and Video Retrieval, pp. 154–161 (2007)Google Scholar
  2. 2.
    Zha, Z., Mei, T., Wang, J., Wang, Z., Hua, X.: Graph-based semi-supervised learning with multiple labels. J. Vis. Commun. Image Represent. 20(2), 97–103 (2009)CrossRefGoogle Scholar
  3. 3.
    Yang, J., Yan, R., Hauptmann, A.G.: Cross-domain video concept detection using adaptive SVMs. In: Proceedings of ACM International Conference on Multimedia, pp. 188–197 (2007)Google Scholar
  4. 4.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(2), 177–196 (2001)CrossRefMATHGoogle Scholar
  5. 5.
    Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. In: Proceedings of European Conference on Computer Vision, pp. 517–530 (2006)Google Scholar
  6. 6.
    Hörster, E., Lienhart, R., Slaney, M.: Continuous visual vocabulary models for pLSA-based scene recognition. In: Proceedings of ACM International Conference on Image and Video Retrieval, pp. 319–328 (2008)Google Scholar
  7. 7.
    Monay, F., Gatica-Perez, D.: Modeling semantic aspects for cross-media image indexing. IEEE Trans. Pattern Anal. Mach. Intell. 29(4), 1802–1817 (2007)CrossRefGoogle Scholar
  8. 8.
    Li, Z., Shi, Z., Liu, X., Shi, Z.: Modeling continuous visual features for semantic image annotation and retrieval. Pattern Recognit. Lett. 32(3), 516–523 (2011)CrossRefGoogle Scholar
  9. 9.
    Ahrendt, P., Larsen, J., Goutte, C.: Co-occurrence models in music genre classification. In: Proceedings of IEEE Workshop on Machine Learning for Signal Processing, pp. 247–252 (2005)Google Scholar
  10. 10.
    Bekkerman, R., Allan, J.: Using bigrams in text categorization. CIIR Technical Report IR-408 (2004)Google Scholar
  11. 11.
    Chen, B.: Word topic models for spoken document retrieval and transcription. ACM Trans. Asian Lang. Inf. Process. 8(1), 1–27 (2009)Google Scholar
  12. 12.
    Wong, S., Kim, T., Cipolla, R.: Learning motion categories using both semantic and structural information. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–6 (2007)Google Scholar
  13. 13.
    Fergus, R., Li, F., Perona, P., Zisserman, A.: Learning object categories from Google’s image search. In: Proceedings of International Conference on Computer Vision, pp. 1816–1823 (2005)Google Scholar
  14. 14.
    Zhang, J., Gong, S.: Action categorization by structural probabilistic latent semantic analysis. Comput. Vis. Image Underst. 114(8), 857–864 (2010)CrossRefGoogle Scholar
  15. 15.
    Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)MATHMathSciNetGoogle Scholar
  16. 16.
    Brezeale, D., Cook, D.J.: Automatic video classification: a survey of the literature. IEEE Trans. Syst. Man Cybern. Part C 38(3), 416–430 (2008)CrossRefGoogle Scholar
  17. 17.
    Huang, C., Shih, H., Chao, C.: Semantic analysis of soccer video using dynamic Bayesian network. IEEE Trans. Multimed. 8(4), 749–760 (2006)CrossRefGoogle Scholar
  18. 18.
    Lehane, B., O’Connor, N.E., Murphy, N.: Action sequence detection in motion pictures. In: Proceedings of European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (2004)Google Scholar
  19. 19.
    Xu, G., Ma, Y., Zhang, H., Yang, S.: An HMM-based framework for video semantic analysis. IEEE Trans. Circuits Syst. Video Technol. 15(11), 1422–1433 (2005)CrossRefGoogle Scholar
  20. 20.
    Xu, C., Wang, J., Lu, H., Zhang, Y.: A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans. Multimed. 10(3), 421–436 (2008)CrossRefGoogle Scholar
  21. 21.
    Truong, B.T., Venkatesh, S., Dorai, C.: Automatic genre identification for content-based video categorization. In: Proceedings of International Conference on Pattern Recognition, pp. 4230–4233 (2000)Google Scholar
  22. 22.
    Yuan, X., Lai, W., Mei, T., Hua, X., Wu, X., Li, S.: Automatic video genre categorization using hierarchical SVM. In: Proceedings of IEEE International Conference on Image Processing, pp. 2905–2908 (2006)Google Scholar
  23. 23.
    Ulges, A., Schulze, C., Koch, M., Breuel, T.M.: Learning automatic concept detectors from online video. Comput. Vis. Image Underst. 114(4), 429–438 (2010)CrossRefGoogle Scholar
  24. 24.
    Hu, W., Xie, N., Li, L., Zeng, X., Maybank, S.J.: A survey on visual content-based video indexing and retrieval. IEEE Trans. Syst. Man Cybern. Part C 41(6), 797–819 (2011)CrossRefGoogle Scholar
  25. 25.
    Yang, L., Liu, J., Yang, X., Hua, X.: Multi-modality web video categorization. In: Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 265–274 (2007)Google Scholar
  26. 26.
    Lee, K., Ellis, D.P.W.: Audio-based semantic concept classification for consumer video. IEEE Trans. Audio Speech Lang. Process. 16(6), 1406–1416 (2010)CrossRefGoogle Scholar
  27. 27.
    http://www-nlpir.nist.gov/projects/trecvid/. NIST. Trec video retrieval evaluation (trecvid)
  28. 28.
    Shi, R., Chua, T., Lee, C., Gao, S.: Bayesian learning of hierarchical multinomial mixture models of concepts for automatic image annotation. In: Proceedings of ACM International Conference on Image and Video Retrieval, pp. 102–112 (2006)Google Scholar
  29. 29.
    Grangier, D., Bengio, S.: A discriminative kernel-based approach to rank images from text queries. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1371–1384 (2008)CrossRefGoogle Scholar
  30. 30.
    Yanagawa, A., Chang, S., Kennedy, L., Hsu, W.: Columbia university’s baseline detectors for 374 LSCOM semantic visual concepts. Columbia University ADVENT Technical Report # 222-2006-8 (2007)Google Scholar
  31. 31.
    Li, Y., Tian, Y., Duan, L., Yang, J., Huang, T., Gao, W.: Sequence multi-labeling: a unified video annotation scheme with spatial and temporal context. IEEE Trans. Multimed. 12(8), 814–828 (2010)CrossRefGoogle Scholar
  32. 32.
    Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)Google Scholar
  33. 33.
    Jiang, W., Cotton, C.V., Chang, S., Ellis, D., Loui, A.C.: Short-term audio-visual atoms for generic video concept classification. In: Proceedings of ACM International Conference on Multimedia, pp. 5–14 (2009)Google Scholar
  34. 34.
    Liu, K., Weng, M., Tseng, C., Chuang, Y., Chen, M.: Association and temporal rule mining for post-filtering of semantic concept detection in video. IEEE Trans. Multimed. 10(2), 240–251 (2008)CrossRefGoogle Scholar
  35. 35.
    Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 119–126 (2003)Google Scholar
  36. 36.
    Feng, S., Manmatha, R., Lavrenko, V.: Multiple Bernoulli relevance models for image and video annotation. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1002–1009 (2004)Google Scholar
  37. 37.
    Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 127–134 (2003)Google Scholar
  38. 38.
    Monay, F., Gatica-Perez, D.: On image auto-annotation with latent space models. In: Proceedings of ACM International Conference on Multimedia, pp. 275–278 (2003)Google Scholar
  39. 39.
    Li, Z., Shi, Z., Liu, X., Li, Z., Shi, Z.: Fusing semantic aspects for image annotation and retrieval. J. Vis. Commun. Image Represent. 21(8), 798–805 (2010)CrossRefGoogle Scholar
  40. 40.
    Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B. 39(1), 1–38 (1977)MATHMathSciNetGoogle Scholar
  41. 41.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems 14, pp 585–591. MIT press (2001)Google Scholar
  42. 42.
    Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems 16, pp 321–328. MIT press (2004)Google Scholar
  43. 43.
    Tang, J., Hua, X., Mei, T., Qi, G., Wu, X.: Video annotation based on temporally consistent Gaussian random field. Electron. Lett. 43(8), 448–449 (2007)CrossRefGoogle Scholar
  44. 44.
    Liu, J., Cai, D., He, X.: Gaussian mixture model with local consistency. In: Proceedings of AAAI Conference on Artificial Intelligence, pp. 512–517 (2010)Google Scholar
  45. 45.
    Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548–1560 (2011)CrossRefGoogle Scholar
  46. 46.
    Yang, J., Hauptmann, A.G.: Exploring temporal consistency for video analysis and retrieval. In: Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 33–42 (2006)Google Scholar
  47. 47.
    Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T.S., Yan, S.: Sparse representations for computer vision and pattern recognition. Proc. IEEE 98(6), 1031–1044 (2010)CrossRefGoogle Scholar
  48. 48.
    Wang, C., Yan, S., Zhang, L., Zhang, H.: Multi-label sparse coding for automatic image annotation. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1643–1650 (2009)Google Scholar
  49. 49.
    Ulges, A., Schulze, C., Keysers, D., Breuel, T.M.: A system that learns to tag videos by watching Youtube. In: Proceedings of International Conference on Computer Vision Systems, pp. 415–424 (2008)Google Scholar
  50. 50.
    Yanagawa, A., Hsu, W., Chang, S.: Brief descriptions of visual features for baseline TRECVID concept detectors. Columbia University ADVENT Technical Report #219-2006-5 (2006)Google Scholar
  51. 51.
    Jiang, Y., Yang, J., Ngo, C., Hauptmann, A.G.: Representations of keypoint-based semantic concept detection: a comprehensive study. IEEE Trans. Multimed. 12(1), 42–53 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Institute of Information ScienceBeijing Jiaotong UniversityBeijingChina

Personalised recommendations