Action Recognition from Optical Flow Visualizations
Optical flow is an important computer vision technique used for motion estimation, object tracking and activity recognition. In this paper, we study the effectiveness of the optical flow feature in recognizing simple actions by using only their RGB visualizations as input to a deep neural network. Feeding only the optical flow visualizations, instead of the raw multimedia content, ensures that only a single motion feature is used as a classification criterion. Here, we deal with human action recognition as a multi-class classification problem. In order to categorize an action, we train an AlexNet-like Convolutional Neural Network (CNN) on Farneback optical flow visualization features of the action videos. We have chosen the KTH data set, which contains six types of action videos, namely walking, running, boxing, jogging, hand-clapping and hand-waving. The accuracy obtained on the test set is 84.72%, and it is naturally less than the state of the art since only a single motion feature is used for classification, but it is high enough to show the effectiveness of optical flow visualization as a good distinguishing criterion for action recognition. The AlexNet-like CNN was trained in Caffe on two NVIDIA Quadro K4200 GPU cards, while the Farneback optical flow features were calculated using OpenCV library.
KeywordsOptical flow Convolutional Neural Networks KTH data set Action recognition
- 1.Aaron F. Bobick. Action Recognition Using Temporal Templates. Journal of Chemical Information and Modeling, 53(9):1689–1699, 2013.Google Scholar
- 4.Simon Baker, Daniel Scharstein, J. P. Lewis, Stefan Roth, Michael J. Black, and Richard Szeliski. A database and evaluation methodology for optical flow. International Journal of Computer Vision, 92(1):1–31, 2011.Google Scholar
- 5.G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.Google Scholar
- 6.Thomas Brox, Nils Papenberg, and Joachim Weickert. High Accuracy Optical Flow Estimation Based on a Theory for Warping. Computer Vision - ECCV 2004, 4(May):25–36, 2004.Google Scholar
- 7.Gunnar Farnebäck. Two-frame Motion Estimation Based on Polynomial Expansion. In Proceedings of the 13th Scandinavian Conference on Image Analysis, SCIA’03, pages 363–370, Berlin, Heidelberg, 2003. Springer-Verlag.Google Scholar
- 8.Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazirbas, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, and Thomas Brox. Flownet: Learning optical flow with convolutional networks. CoRR, arXiv:1504.06852, 2015.
- 9.David Fleet and Yair Weiss. Optical Flow Estimation. Mathematical models for Computer Vision: The Handbook, pages 239–257, 2005.Google Scholar
- 10.Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the KITTI vision benchmark suite. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3354–3361, 2012.Google Scholar
- 11.Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 9:249–256, 2010.Google Scholar
- 14.Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093, 2014.
- 15.Alex Krizhevsky, IIya Sulskever, and Geoffrey E Hinton. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information and Processing Systems (NIPS), pages 1–9, 2012.Google Scholar
- 16.Ivan Laptev, Marcin Marszałek, Cordelia Schmid, and Benjamin Rozenfeld. Learning realistic human actions from movies. 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2008.Google Scholar
- 17.J Liu and M Shah. Learning human action via information maximization. Conference on Computer Vision and Pattern Recognition, pages 2971–2978, 2008.Google Scholar
- 18.BD Lucas and T Kanade. An Iterative Image Registration Technique with an Application to Stereo Vision. Ijcai, 130:121–129, 1981.Google Scholar
- 19.Upal Mahbub, Hafiz Imtiaz, and Md Atiqur Rahman Ahad. An optical flow based approach for action recognition. 14th International Conference on Computer and Information Technology, ICCIT 2011, (Iccit):646–651, 2011.Google Scholar
- 20.Pol Rosello. Predicting Future Optical Flow from Static Video Frames. 2016.Google Scholar
- 21.Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3):211–252, 2015.MathSciNetCrossRefGoogle Scholar
- 22.Christian Schuldt, Ivan Laptev, and Barbara Caputo. Recognizing human actions: A local svm approach. In Proceedings of the Pattern Recognition, 17th International Conference on (ICPR’04) Volume 3 - Volume 03, ICPR ’04, pages 32–36, Washington, DC, USA, 2004. IEEE Computer Society.Google Scholar
- 23.Karen Simonyan and Andrew Zisserman. Two-Stream Convolutional Networks for Action Recognition in Videos. arXiv preprint arXiv:1406.2199, pages 1–11, 2014.
- 24.Michalis Vrigkas, Christophoros Nikou, and Ioannis a. Kakadiaris. A Review of Human Activity Recognition Methods. Frontiers in Robotics and AI, 2(November):1–28, nov 2015.Google Scholar
- 25.Heng Wang, Muhammad Muneeb Ullah, Alexander Klaser, Ivan Laptev, and Cordelia Schmid. Evaluation of local spatio-temporal features for action recognition. BMVC 2009 - British Machine Vision Conference, pages 124.1–124.11, 2009.Google Scholar
- 26.Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, and Cordelia Schmid. DeepFlow: Large displacement optical flow with deep matching. Proceedings of the IEEE International Conference on Computer Vision, (Section 2):1385–1392, 2013.Google Scholar
- 27.Zoran Zivkovic. Improved adaptive Gaussian mixture model for background subtraction. Proceedings of the 17th International Conference on Pattern Recognition, 2(2):28–31 Vol. 2, 2004.Google Scholar