Making a Case for Learning Motion Representations with Phase

  • S. L. PinteaEmail author
  • J. C. van Gemert
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9915)


This work advocates Eulerian motion representation learning over the current standard Lagrangian optical flow model. Eulerian motion is well captured by using phase, as obtained by decomposing the image through a complex-steerable pyramid. We discuss the gain of Eulerian motion in a set of practical use cases: (i) action recognition, (ii) motion prediction in static images, (iii) motion transfer in static images and, (iv) motion transfer in video. For each task we motivate the phase-based direction and provide a possible approach.


Optical Flow Static Image Action Recognition Motion Prediction Motion Representation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is part of the research programme Technology in Motion (TIM [628.004.001]), financed by the Netherlands Organisation for Scientific Research (NWO).


  1. 1.
    Chen, J.G., Wadhwa, N., Cha, Y.J., Durand, F., Freeman, W.T., Buyukozturk, O.: Modal identification of simple structures with high-speed video using motion magnification. J. Sound Vib. 345, 58–71 (2015)CrossRefGoogle Scholar
  2. 2.
    Davis, A., Chen, J.G., Durand, F.: Image-space modal bases for plausible manipulation of objects in video. ACM-TOG 34(6), 239:1–239:7 (2015)CrossRefGoogle Scholar
  3. 3.
    Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: BMVC (2010)Google Scholar
  4. 4.
    Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: NIPS, pp. 1486–1494 (2015)Google Scholar
  5. 5.
    Diba, A., Mohammad Pazandeh, A., Van Gool, L.: Efficient two-stream motion and appearance 3D CNNs for video classification. arXiv preprint arXiv:1608.08851 (2016)
  6. 6.
    Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp. 2625–2634 (2015)Google Scholar
  7. 7.
    Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. arXiv preprint arXiv:1604.06573 (2016)
  8. 8.
    Fernando, B., Gavves, E., Oramas, J.M., Ghodrati, A., Tuytelaars, T.: Modeling video evolution for action recognition. In: CVPR, pp. 5378–5387 (2015)Google Scholar
  9. 9.
    Fleet, D.J., Jepson, A.D.: Computation of component image velocity from local phase information. IJCV 5(1), 77–104 (1990)CrossRefGoogle Scholar
  10. 10.
    Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. PAMI 13(9), 891–906 (1991)CrossRefGoogle Scholar
  11. 11.
    Gatys, L.A., Bethge, M., Hertzmann, A., Shechtman, E.: Preserving color in neural artistic style transfer. arXiv preprint arXiv:1606.05897 (2016)
  12. 12.
    Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)
  13. 13.
    Gautama, T., Van Hulle, M.: A phase-based approach to the estimation of the optical flow field using spatial filtering. TNN 13(5), 1127–1136 (2002)Google Scholar
  14. 14.
    Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR, pp. 2555–2562 (2013)Google Scholar
  15. 15.
    Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. PAMI 35(1), 221–231 (2013)CrossRefGoogle Scholar
  16. 16.
    Kooij, J.F.P., Gemert, J.C.: Depth-aware motion magnification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 467–482. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-46484-8_28 CrossRefGoogle Scholar
  17. 17.
    Li, Z., Gavves, E., Jain, M., Snoek, C.G.M.: Videolstm convolves, attends and flows for action recognition. arXiv preprint arXiv:1607.01794 (2016)
  18. 18.
    Meyer, S., Wang, O., Zimmer, H., Grosse, M., Sorkine-Hornung, A.: Phase-based frame interpolation for video. In: CVPR, pp. 1410–1418 (2015)Google Scholar
  19. 19.
    Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: ICCV, pp. 1817–1824 (2013)Google Scholar
  20. 20.
    Pintea, S.L., Gemert, J.C., Smeulders, A.W.M.: Déjà vu: motion prediction in static images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 172–187. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10578-9_12 Google Scholar
  21. 21.
    Ruder, M., Dosovitskiy, A., Brox, T.: Artistic style transfer for videos. arXiv preprint arXiv:1604.08610 (2016)
  22. 22.
    Simoncelli, E.P., Freeman, W.T., Adelson, E.H., Heeger, D.J.: Shiftable multiscale transforms. Trans. Inf. Theory 38(2), 587–607 (1992)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)Google Scholar
  24. 24.
    Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using LSTMs. CoRR, abs/1502.04681, 2 (2015)Google Scholar
  25. 25.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: C3D: generic features for video analysis. CoRR, abs/1412.0767, 2:7 (2014)Google Scholar
  26. 26.
    van Gemert, J., Jain, M., Gati, E., Snoek, C.: APT: action localization proposals from dense trajectories. In: BMVC, vol. 2, p. 4 (2015)Google Scholar
  27. 27.
    Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating the future by watching unlabeled video. arXiv preprint arXiv:1504.08023 (2015)
  28. 28.
    Wadhwa, N., Rubinstein, M., Durand, F., Freeman, W.T.: Phase-based video motion processing. ACM-TOG 32(4), 80 (2013)CrossRefzbMATHGoogle Scholar
  29. 29.
    Walker, J., Gupta, A., Hebert, M.: Dense optical flow prediction from a static image. In: ICCV (2015)Google Scholar
  30. 30.
    Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR, pp. 4305–4314 (2015)Google Scholar
  31. 31.
    Wu, H.Y., Rubinstein, M., Shih, E., Guttag, J., Durand, F., Freeman, W.T.: Eulerian video magnification for revealing subtle changes in the world. SIGGRAPH 31(4), 1–8 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Computer Vision LabDelft University of TechnologyDelftNetherlands

Personalised recommendations