Multimedia Tools and Applications

, Volume 74, Issue 24, pp 11073–11098 | Cite as

A camera motion histogram descriptor for video shot classification

  • Muhammad Abul Hasan
  • Min Xu
  • Xiangjian He
  • Yi Wang


In this paper, a novel camera motion descriptor is proposed for video shot classification. In the proposed method, raw motion information of consecutive video frames are extracted by computing the motion vector of each macroblock to form motion vector fields (MVFs). Next, a motion consistency analysis is applied on MVFs to eliminate the inconsistent motion vectors. Then, MVFs are divided into nine (3 × 3) local regions and the singular value decomposition (SVD) technique is applied on the motion vectors extracted from each local region in the temporal direction. Consistent motion vectors of a number of MVFs are compactly represented at a time to characterize temporal camera motion. Accordingly, each local region of the whole video shot is represented using a sequence of compactly represented vectors. Finally, the sequence of vectors is converted into a histogram to describe the camera motions of each local region. Combination of all the local histograms is considered as the camera motion descriptor of a video shot. The shot descriptors are used in a classifier to classify video shots. In this work, we use support vector machine (SVM) for performing classification tasks. The experimental results show that the proposed camera motion descriptor has strong discriminative capability to classify different camera motion patterns in professionally captured video shots effectively. We also show that our proposed approach outperforms two state-of-the-art video shot classification methods.


Camera motion descriptor Motion characterization Shot classification Singular value decomposition 



We would like to thank the reviewers for the valuable comments. This work is partly supported by a UTS International Research Scholarship, and the Specialized Research Fund for the Doctoral Program of Higher Education of China (20120041120050).


  1. 1.
    Abdollahian G, Taskiran CM, Pizlo Z, Delp EJ (2010) Camera motion-based analysis of user generated video. IEEE Trans Multimed 12(1):28– 41CrossRefGoogle Scholar
  2. 2.
    Almeida J, Minetto R, Almeida TA, Torres R da S, Leite NJ (2009) Robust estimation of camera motion using optical flow models. Advances in visual computing. Springer, Berlin Heidelberg New York, pp 435– 446Google Scholar
  3. 3.
    Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimed Tools Appl 51(1):279–302CrossRefGoogle Scholar
  4. 4.
    Bhattacharya S, Mehran R, Shah M, et al. (2014) Classification of cinematographic shots using lie algebra and its application to complex event recognition. IEEE Trans MultimedGoogle Scholar
  5. 5.
    Black MJ, Yacoob Y, Jepson AD, Fleet DJ (1997) Learning parameterized models of image motion. In: Proceedings of IEEE computer society conference on Computer vision and pattern recognition, pp 561 – 567Google Scholar
  6. 6.
    Bouthemy P, Gelgon M, Ganansia F (1999) A unified approach to shot change detection and camera motion characterization. IEEE Trans Circ Syst Video Technol 9(7):1030–1044CrossRefGoogle Scholar
  7. 7.
    Canini L, Benini S, Leonardi R (2013) Classifying cinematographic shot types. Multimed Tools Appl 62(1):51–73CrossRefGoogle Scholar
  8. 8.
    Chang S, Chen W, Meng HJ, Sundaram H, Zhong D (1998) A fully automated content-based video search engine supporting spatiotemporal queries. IEEE Trans Circ Syst Video Technol 8(5):602–615CrossRefGoogle Scholar
  9. 9.
    Duan L-Y, Jin JS, Tian Q, Xu C-S (2006) Nonparametric motion characterization for robust classification of camera motion patterns. IEEE Trans Multimed 8(2):323–340CrossRefGoogle Scholar
  10. 10.
    Duric Z, Rosenfeld A (1996) Image sequence stabilization in real time. Real-Time Imaging 2(5):271–284CrossRefGoogle Scholar
  11. 11.
    Ewerth R, Schwalb M, Tessmann P, Freisleben B (2004) Estimation of arbitrary camera motion in MPEG videos. In: Proceedings of the 17th international conference on Pattern Recognition. ICPR’04, vol 1, pp 512–515Google Scholar
  12. 12.
    Fablet R, Bouthemy P, Perez P (2002) Nonparametric motion characterization using causal probabilistic models for video indexing and retrieval. IEEE Trans Image Process 11(4):393–407CrossRefGoogle Scholar
  13. 13.
    Friedman J (1996) Another approach to polychotomous classifcation. Technical report, Stanford University, Department of StatisticsGoogle Scholar
  14. 14.
    Hsu C-W, Lin C-J (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425CrossRefGoogle Scholar
  15. 15.
    Hu W, Xie N, Li L, Zeng X, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern Part C Appl Rev 41(6):797–819CrossRefGoogle Scholar
  16. 16.
    Irani M, Anandan P (1998) Video indexing based on mosaic representations. Proc IEEE 86(5):905–921CrossRefGoogle Scholar
  17. 17.
    Jain AK, Vailaya A, Wei X (1999) Query by video clip. Multimed Syst 7(5):369–384CrossRefGoogle Scholar
  18. 18.
    Jeannin S, Divakaran A (2001) Mpeg-7 visual motion descriptors. IEEE Trans Circ Syst Video Technol 11(6):720–724CrossRefGoogle Scholar
  19. 19.
    Jin R, Qi Y, Hauptmann A (2002) A probabilistic model for camera zoom detection. In: Proceedings of 16th international conference on Pattern recognition, vol 3, pp 859– 862Google Scholar
  20. 20.
    Kim J-G, Chang HS, Kim J, Kim H-M (2000) Efficient camera motion characterization for mpeg video indexing. In: IEEE international conference on Multimedia and expo. ICME, vol 2, pp 1171– 1174Google Scholar
  21. 21.
    Lan D-J, Ma Y-F, Zhang H-J (2003) A systemic framework of camera motion analysis for home video. In: Proceedings of international conference on Image processing. ICIP’03, vol 1, pp I–289–92Google Scholar
  22. 22.
    Lee S, Hayes III MH (2002) Real-time camera motion classification for content-based indexing and retrieval using templates. In: IEEE International conference on Acoustics, speech, and signal processing. ICASSP, vol 4, pp IV– 3664– IV–3667Google Scholar
  23. 23.
    Ma Y-F, Hua X-S, Lu L, Zhang H-J (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans Multimed 7(5):907–919CrossRefGoogle Scholar
  24. 24.
    Ma S, Wang W (2010) Effective camera motion analysis approach. In: IEEE international conference on Networking, sensing and control. ICNSC, pp 111–116Google Scholar
  25. 25.
    Ma Y-F, Zhang HJ (2003) Contrast-based image attention analysis by using fuzzy growing. ACM Multimedia’03.pp. 374–381Google Scholar
  26. 26.
    Minetto R, Leite NJ, Stolfi J (2007) Reliable detection of camera motion based on weighted optical flow fitting. Visapp (2), pp.435–440Google Scholar
  27. 27.
    Ngo C-W, Pong T-C, Zhang H-J (2002) Motion-based video representation for scene change detection. Int J Comput Vis 50(2):127–142zbMATHCrossRefGoogle Scholar
  28. 28.
    Ngo C-W, Pong T-C, Zhang H-J (2003) Motion analysis and segmentation through spatio-temporal slices processing. IEEE Trans Image Process 12(3):341–355CrossRefGoogle Scholar
  29. 29.
    Nguyen N-T, Laurendeau D, Branzan-Albu A (2010) A robust method for camera motion estimation in movies based on optical flow. Int J Intell Syst Technol Appl 9(3):228–238Google Scholar
  30. 30.
    Oh J, Sankuratri P (2002) Automatic distinction of camera and object motions in video sequences. In: Proceeding of IEEE International conference on Multimedia and Expo. ICME ’02, vol 1, pp 81–84Google Scholar
  31. 31.
    Sun X, Divakaran A, Manjunath BS (2001) A motion activity descriptor and its extraction in compressed domain. In: Proceedings of the second IEEE pacific rim conference on Multimedia: advances in multimedia information processing. PCM ’01. Springer, London, pp 450– 457Google Scholar
  32. 32.
    Tan Y-P, Saur DD, Kulkami SR, Ramadge PJ (2000) Rapid estimation of camera motion from compressed video with application to video annotation. IEEE Trans Circ Syst Video Technol 10(1):133–146CrossRefGoogle Scholar
  33. 33.
    Tok M, Glantz A, Krutz A, Sikora T (2013) Monte-carlo-based parametric motion estimation using a hybrid model approach. IEEE Trans Circ Syst Video Technol 23(4):607–620CrossRefGoogle Scholar
  34. 34.
    Vapnik VN (1995) The nature of statistical learning theory. Springer-Verlag, New YorkzbMATHCrossRefGoogle Scholar
  35. 35.
    Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A (2003) Overview of the h.264/avc video coding standard. IEEE Trans Circ Syst Video Technol 13(7):560–576CrossRefGoogle Scholar
  36. 36.
    Yao Y-S, Chellappa R (1995) Electronic stabilization and feature tracking in long image sequences. DTIC DocumentGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Muhammad Abul Hasan
    • 1
  • Min Xu
    • 1
  • Xiangjian He
    • 1
  • Yi Wang
    • 2
  1. 1.Research Centre for Innovation in IT Services and Applications (iNEXT)University of Technology, SydneySydneyAustralia
  2. 2.School of SoftwareDalian University of TechnologyDalianChina

Personalised recommendations