DYAN: A Dynamical Atoms-Based Network for Video Prediction

  • Wenqian Liu
  • Abhishek Sharma
  • Octavia CampsEmail author
  • Mario Sznaier
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11216)


The ability to anticipate the future is essential when making real time critical decisions, provides valuable information to understand dynamic natural scenes, and can help unsupervised video representation learning. State-of-art video prediction is based on complex architectures that need to learn large numbers of parameters, are potentially hard to train, slow to run, and may produce blurry predictions. In this paper, we introduce DYAN, a novel network with very few parameters and easy to train, which produces accurate, high quality frame predictions, faster than previous approaches. DYAN owes its good qualities to its encoder and decoder, which are designed following concepts from systems identification theory and exploit the dynamics-based invariants of the data. Extensive experiments using several standard video datasets show that DYAN is superior generating frames and that it generalizes well across domains.


Video autoencoder Sparse coding Video prediction 


  1. 1.
    Ayazoglu, M., Li, B., Dicle, C., Sznaier, M., Camps, O.I.: Dynamic subspace-based coordinated multicamera tracking. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2462–2469. IEEE (2011)Google Scholar
  2. 2.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Dicle, C., Yilmaz, B., Camps, O., Sznaier, M.: Solving temporal puzzles. In: CVPR, pp. 5896–5905 (2016)Google Scholar
  4. 4.
    Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 304–311. IEEE (2009)Google Scholar
  5. 5.
    Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)Google Scholar
  6. 6.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRefGoogle Scholar
  7. 7.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  8. 8.
    Gregor, K., LeCun, Y.: Learning fast approximations of sparse coding. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 399–406 (2010)Google Scholar
  9. 9.
    Hesterberg, T., Choi, N.H., Meier, L., Fraley, C.: Least angle and l1 penalized regression: a review. Stat. Surv. 2, 61–93 (2008)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)CrossRefGoogle Scholar
  11. 11.
    Hou, R., Chen, C., Shah, M.: Tube convolutional neural network (T-CNN) for action detection in videos. arXiv preprint arXiv:1703.10664 (2017)
  12. 12.
    Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017)Google Scholar
  13. 13.
    Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: ICML, pp. 2342–2350 (2015)Google Scholar
  14. 14.
    Kalchbrenner, N., et al.: Video pixel networks. arXiv preprint arXiv:1610.00527 (2016)
  15. 15.
    Kalogeiton, V., Weinzaepfel, P., Ferrari, V., Schmid, C.: Action tubelet detector for spatio-temporal action localization. arXiv preprint arXiv:1705.01861 (2017)
  16. 16.
    Li, B., Camps, O.I., Sznaier, M.: Cross-view activity recognition using hankelets. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1362–1369. IEEE (2012)Google Scholar
  17. 17.
    Liang, X., Lee, L., Dai, W., Xing, E.P.: Dual motion GAN for future-flow embedded video prediction. arXiv preprint arXiv:1708.00284 (2017)
  18. 18.
    Liu, Z., Yeh, R., Tang, X., Liu, Y., Agarwala, A.: Video frame synthesis using deep voxel flow. In: International Conference on Computer Vision (ICCV), vol. 2 (2017)Google Scholar
  19. 19.
    Lotter, W., Kreiman, G., Cox, D.: Deep predictive coding networks for video prediction and unsupervised learning. arXiv preprint arXiv:1605.08104 (2016)
  20. 20.
    Luc, P., Neverova, N., Couprie, C., Verbeek, J., LeCun, Y.: Predicting deeper into the future of semantic segmentation. In: ICCV 2017-International Conference on Computer Vision, p. 10 (2017)Google Scholar
  21. 21.
    Luo, Z., Peng, B., Huang, D.A., Alahi, A., Fei-Fei, L.: Unsupervised learning of long-term motion dynamics for videos. arXiv preprint arXiv:1701.01821 vol. 2 (2017)
  22. 22.
    Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015)
  23. 23.
    Meyer, S., Wang, O., Zimmer, H., Grosse, M., Sorkine-Hornung, A.: Phase-based frame interpolation for video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1410–1418 (2015)Google Scholar
  24. 24.
    Moreau, T., Bruna, J.: Understanding the learned iterative soft thresholding algorithm with matrix factorization. arXiv preprint arXiv:1706.01338 (2017)
  25. 25.
    Mundy, J.L., Zisserman, A.: Geometric invariance in computer vision, vol. 92. MIT press Cambridge, MA (1992)Google Scholar
  26. 26.
    Pathak, D., Girshick, R., Dollár, P., Darrell, T., Hariharan, B.: Learning features by watching objects move. In: Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  27. 27.
    Peng, X., Schmid, C.: Multi-region two-stream R-CNN for action detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 744–759. Springer, Cham (2016). Scholar
  28. 28.
    Ranzato, M., Szlam, A., Bruna, J., Mathieu, M., Collobert, R., Chopra, S.: Video (language) modeling: a baseline for generative models of natural videos. arXiv preprint arXiv:1412.6604 (2014)
  29. 29.
    Saha, S., Singh, G., Sapienza, M., Torr, P.H., Cuzzolin, F.: Deep learning for detecting multiple space-time action tubes in videos. arXiv preprint arXiv:1608.01529 (2016)
  30. 30.
    Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
  31. 31.
    Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning, pp. 843–852 (2015)Google Scholar
  32. 32.
    Sun, X., Nasrabadi, N.M., Tran, T.D.: Supervised multilayer sparse coding networks for image classification. arXiv preprint arXiv:1701.08349 (2017)
  33. 33.
    Wadhwa, N., Rubinstein, M., Durand, F., Freeman, W.T.: Phase-based video motion processing. ACM Trans. Graph. (TOG) 32(4), 80 (2013)CrossRefGoogle Scholar
  34. 34.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRefGoogle Scholar
  35. 35.
    Xue, T., Wu, J., Bouman, K., Freeman, B.: Visual dynamics: probabilistic future frame synthesis via cross convolutional networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2016)Google Scholar
  36. 36.
    Yilmaz, B., Bekiroglu, K., Lagoa, C., Sznaier, M.: A randomized algorithm for parsimonious model identification. IEEE Trans. Autom. Control. 63(2), 532–539 (2018)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Zhou, Y., Berg, T.L.: Learning temporal transformations from time-lapse videos. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 262–277. Springer, Cham (2016). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Electrical and Computer EngineeringNortheastern UniversityBostonUSA

Personalised recommendations