Advertisement

Modeling Video Dynamics with Deep Dynencoder

  • Xing Yan
  • Hong Chang
  • Shiguang Shan
  • Xilin Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8692)

Abstract

Videos always exhibit various pattern motions, which can be modeled according to dynamics between adjacent frames. Previous methods based on linear dynamic system can model dynamic textures but have limited capacity of representing sophisticated nonlinear dynamics. Inspired by the nonlinear expression power of deep autoencoders, we propose a novel model named dynencoder which has an autoencoder at the bottom and a variant of it at the top (named as dynpredictor). It generates hidden states from raw pixel inputs via the autoencoder and then encodes the dynamic of state transition over time via the dynpredictor. Deep dynencoder can be constructed by proper stacking strategy and trained by layer-wise pre-training and joint fine-tuning. Experiments verify that our model can describe sophisticated video dynamics and synthesize endless video texture sequences with high visual quality. We also design classification and clustering methods based on our model and demonstrate the efficacy of them on traffic scene classification and motion segmentation.

Keywords

Video Dynamics Deep Model Autoencoder Time Series Dynamic Textures 

References

  1. 1.
    Baktashmotlagh, M., Harandi, M., Bigdeli, A., Lovell, B., Salzmann, M.: Non-linear stationary subspace analysis with application to video classification. In: International Conference on Machine Learning, pp. 450–458 (2013)Google Scholar
  2. 2.
    Basharat, A., Shah, M.: Time series prediction by chaotic modeling of nonlinear dynamical systems. In: IEEE International Conference on Computer Vision, pp. 1941–1948 (2009)Google Scholar
  3. 3.
    Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Boots, B., Gordon, G.J., Siddiqi, S.M.: A constraint generation approach to learning stable linear dynamical systems. In: Advances in Neural Information Processing Systems 20, pp. 1329–1336 (2008)Google Scholar
  5. 5.
    Chan, A., Vasconcelos, N.: Probabilistic kernels for the classification of auto-regressive visual processes. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 846–851 (2005)Google Scholar
  6. 6.
    Chan, A., Vasconcelos, N.: Classifying video with kernel dynamic textures. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–6 (2007)Google Scholar
  7. 7.
    Chan, A., Vasconcelos, N.: Modeling, clustering, and segmenting video with mixtures of dynamic textures. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(5), 909–926 (2008)CrossRefGoogle Scholar
  8. 8.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3), 27 (2011)CrossRefGoogle Scholar
  9. 9.
    Doretto, G., Chiuso, A., Wu, Y., Soatto, S.: Dynamic textures. International Journal of Computer Vision 51(2), 91–109 (2003)CrossRefzbMATHGoogle Scholar
  10. 10.
    Doretto, G., Cremers, D., Favaro, P., Soatto, S.: Dynamic texture segmentation. In: IEEE International Conference on Computer Vision (October 2003)Google Scholar
  11. 11.
    Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research 11, 625–660 (2010)zbMATHMathSciNetGoogle Scholar
  12. 12.
    Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (2011)Google Scholar
  13. 13.
    Liu, C.B., Lin, R.S., Ahuja, N., Yang, M.H.: Dynamic textures synthesis as nonlinear manifold learning and traversing. In: BMVC, pp. 859–868 (2006)Google Scholar
  14. 14.
    Martin, R.: A metric for ARMA processes. IEEE Transactions on Signal Processing 48(4), 1164–1170 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    Masiero, A., Chiuso, A.: Nonlinear temporal textures synthesis: a monte carlo approach. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part II. LNCS, vol. 3952, pp. 283–294. Springer, Heidelberg (2006)Google Scholar
  16. 16.
    Péteri, R., Fazekas, S., Huiskes, M.: Dyntex: A comprehensive database of dynamic textures. Pattern Recognition Letters 31(12), 1627–1632 (2010)CrossRefGoogle Scholar
  17. 17.
    Ravichandran, A., Chaudhry, R., Vidal, R.: Categorizing dynamic textures using a bag of dynamical systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(2), 342–353 (2013)CrossRefGoogle Scholar
  18. 18.
    Saisan, P., Doretto, G., Wu, Y.N., Soatto, S.: Dynamic texture recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2 (2001)Google Scholar
  19. 19.
    Sankaranarayanan, A.C., Turaga, P.K., Baraniuk, R.G., Chellappa, R.: Compressive acquisition of dynamic scenes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 129–142. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  20. 20.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)CrossRefGoogle Scholar
  21. 21.
    Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional learning of spatio-temporal features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 140–153. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  22. 22.
    Taylor, G.W., Hinton, G.E.: Factored conditional restricted boltzmann machines for modeling motion style. In: Annual International Conference on Machine Learning, pp. 1025–1032. ACM (2009)Google Scholar
  23. 23.
    Taylor, G.W., Hinton, G.E., Roweis, S.T.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems 19, p. 1345 (2007)Google Scholar
  24. 24.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 3371–3408 (2010)zbMATHMathSciNetGoogle Scholar
  25. 25.
    Wolf, L., Hassner, T., Taigman, Y.: The one-shot similarity kernel. In: IEEE International Conference on Computer Vision, pp. 897–902 (2009)Google Scholar
  26. 26.
    Xie, J., Xu, L., Chen, E.: Image denoising and inpainting with deep neural networks. In: Advances in Neural Information Processing Systems, pp. 350–358 (2012)Google Scholar
  27. 27.
    Yan, X., Chang, H., Chen, X.: Temporally multiple dynamic textures synthesis using piecewise linear dynamic systems. In: IEEE International Conference on Image Processing (2013)Google Scholar
  28. 28.
    Yuan, L., Wen, F., Liu, C., Shum, H.-Y.: Synthesizing dynamic texture with closed-loop linear dynamic system. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 603–616. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Xing Yan
    • 1
  • Hong Chang
    • 1
  • Shiguang Shan
    • 1
  • Xilin Chen
    • 1
  1. 1.Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Institute of Computing Technology, CASBeijingChina

Personalised recommendations