Abstract
In this paper, a new framework is proposed for cross-view action recognition. Spatio-temporal patches are extracted as low-level feature and each patch is represented as a linear dynamical system (LDS). Bag of dynamical systems (BODS) is employed for middle-level representation. In order to bridge different views, we transform BODS pairs into a bilingual BODS through transferable dictionary pairs. Bilingual dictionaries are learned for the source and target view, which guarantee that the same action from the two views have same high-level representation. Support vector machine (SVM) is employed as the classifier. The experimental results on the IXMAS multi-view dataset show the effectiveness of proposed algorithm compared with others. The performance on the top view is also excellent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: Proc. Int’l Conf. on Computer Vision and Pattern Recognition (2008)
Lin, Z., Jiang, Z., Davis, L.S.: Recognizing actions by shap-motion prototype trees. In: Proc. Int’l Conf. on Computer Vision, pp. 444–451 (2009)
Gorelick, L., Blank, M., Shechtman, E., et al.: Actions as space-time shapes. IEEE Trans. Pattern Analysis and Machine Intelligence 29(12), 2247–2253 (2007)
Grundmann, M., Merier, F., Essa, I.: 3D shape context and distance transform for action recognition. In: Proc. Int’l Conf. on Pattern Recognition, pp. 1–4 (2008)
Efros, A., Berg, A.C., Mori, G., et al.: Recognizing action at a distance. In: Proc. Int’l Conf. on Computer Vision (2003)
Junejo, I., Dexter, E., Laptev, I., et al.: View-independent action recognition from temporal self-similarities. IEEE Trans. on Pattern Recognition and Machine Intelligence 33(1), 173–185 (2011)
Farhadi, A., Tabrizi, M., Endres, I., et al.: A latent model of discriminative aspect. In: Proc. Int’l Conf. on Computer Vision, pp. 1–8 (2009)
Farhadi, A., Tabrizi, M.K.: Learning to recognize activities from the wrong view point. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 154–166. Springer, Heidelberg (2008)
Liu, J., Shah, M., Kuipers, B., et al.: Cross-View Action Recognition via View Knowledge Transfer. In: Proc. Int’l Conf. on Computer Vision and Pattern Recognition (2011)
Li, R., Zickler, T.: Discriminative Virtual Views for Cross-View Action Recognition. In: Proc. Int’l Conf. on Computer Vision and Pattern Recognition (2012)
Li, B., Camps, O.I., Sznaier, M.: Cross-view activity recognition using Hankelets. In: Proc. Int’l Conf. on Computer Vision and Pattern Recognition, pp. 1362–1369 (2012)
Zheng, J., Jiang, Z., Phillips, J., et al.: Cross-View Action Recognition via a Transferable Dictionary Pair. In: Proc. of the British Machine Vision Conference (2012)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79(3), 299–318 (2008)
Ravichandran, A., Chaudhry, R., Vidal, R.: View-invariant dynamic texture recognition using a bag of dynamical systems. In: Proc. Int’l Conf. on Computer Vision and Pattern Recognition, pp. 1651–1657 (2009)
Weinland, D., Boyer, E., Ronfard, R.: Action Recognition from Arbitrary Views using 3D Exemplars. In: Proc. Int’l Conf. on Computer Vision (2007)
Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. Int’l Conf. on Computer Vision, pp. 432–439 (2003)
Dollar, P., Rabaud, V., Cottrell, G., et al.: Behavior Recognition via Sparse Spatio-Temporal Features. In: 2nd Joint IEEE Int’l Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proc. Int’l Conf. on Computer Vision and Pattern Recognition (2005)
Wang, H., Ullah, M.M., Klaser, A., et al.: Evaluation of local spatio-temporal features for action recognition. In: Proc. British Machine Vision Conference, pp. 127–138 (2009)
Doretto, G., Chiuso, A., Soatto, S., Wu, Y.N.: Dynamic textures. International Journal of Computer Vision 51(2), 91–109 (2003)
De. Cock, K., De. Moor, B.: Subspace angles between linear stochastic models. In: Proc. IEEE Conf. on Decision and Control, pp. 1561–1566 (2000)
Jiang, Y., Ngo, C.W.: Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval. Computer Vision and Image Understanding 113(3), 405–414 (2008)
Aharon, M., Elad, M., Bruckstein, A.: K -SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing (2006)
Tropp, J.A., Gilbert, A.C.: Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory 53(12), 4655–4666 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, C., Yang, S., Gan, Z. (2013). Cross-View Action Recognition via Bilingual Bag of Dynamical Systems. In: Sun, C., Fang, F., Zhou, ZH., Yang, W., Liu, ZY. (eds) Intelligence Science and Big Data Engineering. IScIDE 2013. Lecture Notes in Computer Science, vol 8261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42057-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-42057-3_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-42056-6
Online ISBN: 978-3-642-42057-3
eBook Packages: Computer ScienceComputer Science (R0)