Abstract
A novel framework for action recognition in video using empirical covariance matrices of bags of low-dimensional feature vectors is developed. The feature vectors are extracted from segments of silhouette tunnels of moving objects and coarsely capture their shapes. The matrix logarithm is used to map the segment covariance matrices, which live in a nonlinear Riemannian manifold, to the vector space of symmetric matrices. A recently developed sparse linear representation framework for dictionary-based classification is then applied to the log-covariance matrices. The log-covariance matrix of a query segment is approximated by a sparse linear combination of the log-covariance matrices of training segments and the sparse coefficients are used to determine the action label of the query segment. This approach is tested on the Weizmann and the UT-Tower human action datasets. The new approach attains a segment-level classification rate of 96.74% for the Weizmann dataset and 96.15% for the UT-Tower dataset. Additionally, the proposed method is computationally and memory efficient and easy to implement.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. IEEE Trans. Pattern Anal. Machine Intell. 32(2), 288–303 (2010)
Arsigny, V., Pennec, P., Ayache, X.: Log-euclidean metrics for fast and simple calculus on diffusion tensors. Magnetic resonance in medicine 56(2), 411–421 (2006)
Chen, C.C., Ryoo, M.S., Aggarwal, J.K.: UT-Tower Dataset: Aerial View Activity Classification Challenge (2010), http://cvrc.ece.utexas.edu/SDHA2010/Aerial_View_Activity.html
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE Int’l Workshop VS-PETS (2005)
Donoho, D.L.: For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59, 797–829 (2004)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Machine Intell. 29(12), 2247–2253 (2007)
Guo, K., Ishwar, P., Konrad, J.: Action recognition from video by covariance matching of silhouette tunnels. In: Proc. Brazilian Symp. on Computer Graphics and Image Proc. (October 2009)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learing realistic human actions from movies. In: Proc. IEEE Conf. Computer Vision Pattern Recognition (June 2008)
Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: Intern. J. Comput. Vis. (March 2008)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: Proc. Int. Conf. Pattern Recognition (June 2004)
Seo, H.J., Milanfar, P.: Action recognition from one example. IEEE Trans. Pattern Anal. Machine Intell. (submitted)
Starner, T., Pentland, A.: Visual recognition of american sign language using hidden markov models. In: IEEE Int. Conf. on Automatic Face and Gesture Recognition (1995)
Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Machine Intell. 31(2), 210–227 (2009)
Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time sequential images using hidden markov model. In: Proc. IEEE Conf. Computer Vision Pattern Recognition (June 1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, K., Ishwar, P., Konrad, J. (2010). Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels. In: Ünay, D., Çataltepe, Z., Aksoy, S. (eds) Recognizing Patterns in Signals, Speech, Images and Videos. ICPR 2010. Lecture Notes in Computer Science, vol 6388. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17711-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-17711-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17710-1
Online ISBN: 978-3-642-17711-8
eBook Packages: Computer ScienceComputer Science (R0)