Advertisement

Multimedia Tools and Applications

, Volume 48, Issue 1, pp 69–87 | Cite as

Video event classification using string kernels

  • Lamberto BallanEmail author
  • Marco Bertini
  • Alberto Del Bimbo
  • Giuseppe Serra
Article

Abstract

Event recognition is a crucial task to provide high-level semantic description of the video content. The bag-of-words (BoW) approach has proven to be successful for the categorization of objects and scenes in images, but it is unable to model temporal information between consecutive frames. In this paper we present a method to introduce temporal information for video event recognition within the BoW approach. Events are modeled as a sequence composed of histograms of visual features, computed from each frame using the traditional BoW. The sequences are treated as strings (phrases) where each histogram is considered as a character. Event classification of these sequences of variable length, depending on the duration of the video clips, are performed using SVM classifiers with a string kernel that uses the Needlemann-Wunsch edit distance. Experimental results, performed on two domains, soccer videos and a subset of TRECVID 2005 news videos, demonstrate the validity of the proposed approach.

Keywords

Video annotation Event classification Bag-of-words String kernel Edit distance 

Notes

Acknowledgements

This work is partially supported by the EU IST VidiVideo Project (Contract FP6-045547) and IM3I Project (Contract FP7-222267). The authors thank Filippo Amendola for his support in the preparation of the experiments.

References

  1. 1.
    Bahlmann C, Haasdonk B, Burkhardt H (2002) On-line handwriting recognition with support vector machines—a kernel approach. In: Proc. of int’l workshop on frontiers in handwriting recognitionGoogle Scholar
  2. 2.
    Ballan L, Bertini M, Del Bimbo A, Serra G (2009) Action categorization in soccer videos using string kernels. In: Proc. of IEEE int’l workshop on content-based multimedia indexing (CBMI). Chania, CreteGoogle Scholar
  3. 3.
    Ballan L, Bertini M, Del Bimbo A, Serra G (2009) Semantic annotation of soccer videos by visual instance clustering and spatial/temporal reasoning in ontologies. Multimedia Tools and Applications (in press)Google Scholar
  4. 4.
    Berg C, Christensen JPR, Ressel P (1984) Harmonic analysis on semigroups. Springer, BerlinzbMATHGoogle Scholar
  5. 5.
    Bertini M, Del Bimbo A, Serra G (2008) Learning rules for semantic video event annotation. In: Proc. of int’l conference on visual information systems (VISUAL)Google Scholar
  6. 6.
    Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267CrossRefGoogle Scholar
  7. 7.
    Boiman O, Irani M (2007) Detecting irregularities in images and in video. Int J Comput Vis 74(1):17–31CrossRefGoogle Scholar
  8. 8.
    Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proc. of ACM int’l workshop on computational learning theoryGoogle Scholar
  9. 9.
    Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm
  10. 10.
    Chen J, Ye J (2008) Training svm with indefinite kernels. In: Proc. of int’l conference on machine learning (ICML)Google Scholar
  11. 11.
    Cover T (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput 14(3):326–334zbMATHCrossRefGoogle Scholar
  12. 12.
    Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: Proc. of int’l workshop on VS-PETSGoogle Scholar
  13. 13.
    Ebadollahi S, Xie L, Chang SF, Smith JR (2006) Visual event detection using multi-dimensional concept dynamics. In: Proc. of IEEE int’l conference on multimedia and expo (ICME)Google Scholar
  14. 14.
    Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: Proc. of int’l conference on computer vision and pattern recognition (CVPR)Google Scholar
  15. 15.
    Fergus R, Perona P, Zisserman A (2005) A sparse object category model for efficient learning and exhaustive recognition. In: Proc. of int’l conference on computer vision and pattern recognition (CVPR)Google Scholar
  16. 16.
    Francois ARJ, Nevatia R, Hobbs JR, Bolles RC (2005) VERL: an ontology framework for representing and annotating video events. IEEE Multimed 12(4):76–86CrossRefGoogle Scholar
  17. 17.
    Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic, LondonzbMATHGoogle Scholar
  18. 18.
    Haasdonk B (2005) Feature space interpretation of svms with indefinite kernels. IEEE Trans Pattern Anal Mach Intell 27(4):482–492CrossRefGoogle Scholar
  19. 19.
    Haubold A, Naphade M (2007) Classification of video events using 4-dimensional time-compressed motion features. In: Proc. of ACM int’l conference on image and video retrieval (CIVR)Google Scholar
  20. 20.
    Ke Y, Sukthankar R, Hebert M (2005) Efficient visual event detection using volumetric features. In: Proc. of int’l conference on computer vision (ICCV)Google Scholar
  21. 21.
    Kennedy L (2006) Revision of LSCOM event/activity annotations, DTO challenge workshop on large scale concept ontology for multimedia. Advent technical report #221-2006-7, Columbia UniversityGoogle Scholar
  22. 22.
    Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–123CrossRefGoogle Scholar
  23. 23.
    Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proc. of int’l conference on computer vision and pattern recognition (CVPR)Google Scholar
  24. 24.
    Leslie C, Eskin E, Weston J, Noble WS (2003) Mismatch string kernels for SVM protein classification. In: Proc. of int’l conference on neural information processing systems (NIPS)Google Scholar
  25. 25.
    Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:563–569CrossRefGoogle Scholar
  26. 26.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  27. 27.
    Luss R, D’Aspremont A (2008) Support vector machine classification with indefinite kernels. In: Proc. of int’l conference on neural information processing systems (NIPS)Google Scholar
  28. 28.
    Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis 60(1):144–152CrossRefGoogle Scholar
  29. 29.
    Moreno PJ, Ho PP, Vasconcelos N (2003) A kullback-leibler divergence based kernel for svm classification in multimedia applications. In: Proc. of int’l conference on neural information processing systems (NIPS)Google Scholar
  30. 30.
    Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1): 31–88CrossRefGoogle Scholar
  31. 31.
    Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453CrossRefGoogle Scholar
  32. 32.
    Neuhaus M, Bunke H (2006) Edit distance-based kernel functions for structural pattern classification. Pattern Recogn 39(10):1852–1863zbMATHCrossRefGoogle Scholar
  33. 33.
    Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318CrossRefGoogle Scholar
  34. 34.
    Riedel DE, Venkatesh S, Liu W (2008) Recognising online spatial activities using a bioinformatics inspired sequence alignment approach. Pattern Recogn 41(11):3481–3492zbMATHCrossRefGoogle Scholar
  35. 35.
    Sadlier DA, O’Connor NE (2005) Event detection in field sports video using audio-visual features and a support vector machine. EEE Trans Circuits Syst Video Technol 15(10):1225–1233CrossRefGoogle Scholar
  36. 36.
    Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proc. of int’l conference on pattern recognition (ICPR)Google Scholar
  37. 37.
    Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New YorkGoogle Scholar
  38. 38.
    Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proc. of int’l conference on computer vision (ICCV)Google Scholar
  39. 39.
    Smeaton AF Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proc. of ACM int’l workshop on multimedia information retrieval (MIR)Google Scholar
  40. 40.
    Wang F, Jiang YG, Ngo CW (2008) Video event detection using motion relativity and visual relatedness. In: Proc. of ACM int’l conference on multimedia (MM)Google Scholar
  41. 41.
    Xiang T, Gong S (2008) Incremental and adaptive abnormal behaviour detectionq incremental and adaptive abnormal behaviour detection. Comput Vis Image Underst 111:59–73CrossRefGoogle Scholar
  42. 42.
    Xu D, Chang SF (2008) Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Intell 30(11):1985–1997CrossRefGoogle Scholar
  43. 43.
    Yang J, Hauptmann AG (2006) Exploring temporal consistency for video analysis and retrieval. In: Proc. of ACM int’l workshop on multimedia information retrieval (MIR)Google Scholar
  44. 44.
    Yang J, Jiang YG, Hauptmann AG, Ngo CW (2007) Evaluating bag-of-visual-words representations in scene classification. In: Proc. of ACM int’l workshop on multimedia information retrieval (MIR)Google Scholar
  45. 45.
    Zhang D, Perez DG, Bengio S, McCowan I (2005) Semi-supervised adapted HMMs for unusual event detection. In: Proc. of int’l conference on computer vision and pattern recognition (CVPR)Google Scholar
  46. 46.
    Zhang J, Marszałek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2):213–238CrossRefGoogle Scholar
  47. 47.
    Zhou X, Zhuang X, Yan S, Chang SF, Hasegawa-Johnson M, Huang T (2008) Sift-bag kernel for video event analysis. In: Proc. of ACM int’l conference on multimedia (MM)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Lamberto Ballan
    • 1
    Email author
  • Marco Bertini
    • 1
  • Alberto Del Bimbo
    • 1
  • Giuseppe Serra
    • 1
  1. 1.Media Integration and Communication CenterUniversity of FlorenceFlorenceItaly

Personalised recommendations