Abstract
In the last decade, we observed a great interest in evaluation of local visual features in the domain of images. The aim is to provide researchers guidance when selecting the best approaches for new applications and data-sets. Most of the state-of-the-art features have been extended to the temporal domain to allow for video retrieval and categorization using similar techniques to those used for images. However, there is no comprehensive evaluation of these. We provide the first comparative evaluation based on isolated and well defined alterations of video data. We select the three most promising approaches, namely the Harris3D, Hessian3D, and Gabor detectors and the HOG/HOF, SURF3D, and HOG3D descriptors. For the evaluation of the detectors, we measure their repeatability on the challenges treating the videos as 3D volumes. To evaluate the robustness of spatio-temporal descriptors, we propose a principled classification pipeline where the increasingly altered videos build a set of queries. This allows for an in-depth analysis of local detectors and descriptors and their combinations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cula, O.G., Dana, K.J.: Compact representation of bidirectional texture functions. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1, 1041 (2001)
Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of human actions in video. In: ICCV (2009)
Junejo, I., Dexter, E., Laptev, I., Pérez, P.: View-independent action recognition from temporal self-similarities. PAMI (2009)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)
Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004)
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV (2003)
Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS, pp. 65–72 (2005)
Ke, Q., Kanade, T.: Quasiconvex optimization for robust geometric reconstruction. In: ICCV, pp. 986–993 (2005)
Oikonomopoulos, A., Patras, I., Pantic, M.: Kernel-based recognition of human actions using spatiotemporal salient points. In: CVPR, p. 151 (2006)
Wang, H., Ullah, M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC (2009)
Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: ICCV, pp. 1–8 (2007)
Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC, pp. 995–1004 (2008)
Wong, S.F., Cipolla, R.: Extracting spatiotemporal interest points using global information. In: ICCV, pp. 1–8 (2007)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. PAMI 29, 2247–2253 (2007)
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A comparison of affine region detectors. IJCV 65, 43–72 (2005)
Stöttinger, J., Zambanini, S., Khan, R., Hanbury, A.: Feeval - a dataset for evaluation of spatio-temporal local features. In: ICPR (2010)
Harris, C., Stephens, M.: A combined corner and edge detection. In: AVC, pp. 147–151 (1988)
Lindeberg, T.: Feature detection with automatic scale selection. IJCV 30, 79–116 (1998)
Pönitz, T., Donner, R., Stöttinger, J., Hanbury, A.: Efficient and distinct large scale bags of words. In: AAPR (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stöttinger, J., Goras, B.T., Pöntiz, T., Hanbury, A., Sebe, N., Gevers, T. (2011). Systematic Evaluation of Spatio-Temporal Features on Comparative Video Challenges. In: Koch, R., Huang, F. (eds) Computer Vision – ACCV 2010 Workshops. ACCV 2010. Lecture Notes in Computer Science, vol 6468. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22822-3_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-22822-3_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22821-6
Online ISBN: 978-3-642-22822-3
eBook Packages: Computer ScienceComputer Science (R0)