Advertisement

Multimedia Tools and Applications

, Volume 72, Issue 2, pp 1773–1802 | Cite as

Object segmentation and key-pose based summarization for motion video

  • Zhiqiang TianEmail author
  • Jianru Xue
  • Xuguang Lan
  • Ce Li
  • Nanning Zheng
Article

Abstract

This paper proposes a key-pose based video summarization system for a video shot facilitated by using a video object segmentation method. Firstly, we detect the camera motion and extract video objects by a 3D graph-based algorithm. Once the objects are obtained, each of them is represented by a shape descriptor. Secondly, in order to find representative frames which preserve scene content as much accurately as possible, the proposed method calculates difference between pairs of frames based on shape descriptors of objects in the video shot. Finally, key-poses (representative frames) are extracted in a global manner by clustering these shapes. Experimental results on motion video shots show that the proposed method outputs satisfactory summarizations.

Keywords

Graph cuts Key-poses Shape clustering Spatio-temporal  Video object segmentation Video summarization 

Notes

Acknowledgements

This work was supported in part by the National Basic Research Program of China (973 Program) under Grant No. 2010CB327902, and the NSFC Nos. 90920301, 61273252, and 61175010.

References

  1. 1.
    Agarwala A, Dontcheva M, Agrawala M, Drucker S, Colburn A, Curless B, Salesin D, Cohen M (2004) Interactive digital photomontage. ACM Trans Graph 23(3):294–302CrossRefGoogle Scholar
  2. 2.
    Bay H, Tuytelaars T, Gool LV (2006) Surf: speeded up robust features. In: Proceedings of the European conference on computer visionGoogle Scholar
  3. 3.
    Boykov Y, Jolly M (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: Proceedings of the IEEE international conference on computer visionGoogle Scholar
  4. 4.
    Brendel W, Todorovic S (2009) Video object segmentation by tracking regions. In: Proceedings of the IEEE international conference on computer visionGoogle Scholar
  5. 5.
    Brox T, Malik J (2010) Object segmentation by long term analysis of point trajectories. In: Proceedings of the European conference on computer visionGoogle Scholar
  6. 6.
    Calic J, Gibson DP, Campbell NW (2007) Efficient layout of comic-like video summaries. IEEE Trans Circuits Syst Video Technol 17(7):931–936CrossRefGoogle Scholar
  7. 7.
    Chen B, Wang J, Wang J (2009) A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans Multimedia 11(2):295–312CrossRefGoogle Scholar
  8. 8.
    Comaniciu D (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRefGoogle Scholar
  9. 9.
    Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans Multimedia 14:66–75CrossRefGoogle Scholar
  10. 10.
    Criminisi A, Cross G, Blake A, Kolmogorov V (2006) Bilayer segmentation of live video. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar
  11. 11.
    Erol B, Kossentini F (2000) Automatic key video object plane selection using the shape information in the mpeg-4 compressed domain. IEEE Trans Multimedia 2(2):129–138CrossRefGoogle Scholar
  12. 12.
    Feng S, Lei Z, Yi D, Li SZ (2012) Online content-aware video condensation. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar
  13. 13.
    Ferman A, Tekalp A (2003) Two-stage hierarchical video summary extraction to match low-level user browsing preferences. IEEE Trans Multimedia 5(2):244–256CrossRefGoogle Scholar
  14. 14.
    Ferman A, Gunsel B, Tekalp A (1997) Object-based indexing of mpeg-4 compressed video. In: Proceedings of the conference on visual communications and image processingGoogle Scholar
  15. 15.
    Fischler M, Bolles R (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395CrossRefMathSciNetGoogle Scholar
  16. 16.
    Fontes de Avila SE, Brandao Lopes AP, da Luz Antonio J, Araujo AdA (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68CrossRefGoogle Scholar
  17. 17.
    Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976CrossRefzbMATHMathSciNetGoogle Scholar
  18. 18.
    Fu Y, Guo Y, Zhu Y, Liu F, Song C, Zhou Z (2010) Multi-view video summarization. IEEE Trans Multimedia 12(7):717–729CrossRefGoogle Scholar
  19. 19.
    Furini M, Geraci F, Montangero M, Pellegrini M (2010) Stimo: still and moving video storyboard for the web scenario. Multimed Tools Appl 46(1):47–69CrossRefGoogle Scholar
  20. 20.
    Gao Y, Wang W, Yong J, Gu H (2009) Dynamic video summarization using two-level redundancy detection. Multimed Tools Appl 42(2):233–250CrossRefGoogle Scholar
  21. 21.
    Gatica-Perez D, Loui AC, Sun MT (2003) Finding structure in home videos by probabilistic hierarchical clustering. IEEE Trans Circuits Syst Video Technol 13(6):539–548CrossRefGoogle Scholar
  22. 22.
    Ghoniem M, Luo D, Yang J, Ribarsky W (2007) NewsLab: exploratory broadcast news video analysis. In: Proceedings of the IEEE symposium on visual analytics science and technologyGoogle Scholar
  23. 23.
    Goyette N, Jodoin PM, Porikli F, Konrad J, Ishwar P (2012) changedetection.net: a new change detection benchmark dataset. In: Proceedings of the IEEE workshop on change detectionGoogle Scholar
  24. 24.
    Grundmann M, Kwatra V, Han M, Essa I (2010) Efficient hierarchical graph-based video segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar
  25. 25.
    Guan G, Wang Z, Lu S, Deng JD, Feng DD (2012) Keypoint based keyframe selection. IEEE Trans Circuits Syst Video Technol 23(4):729–734CrossRefGoogle Scholar
  26. 26.
    Havre SL, Hetzler EG, Whitney PD, Nowell LT (2002) ThemeRiver: visualizing thematic changes in large document collections. IEEE Trans Vis Comput Graph 8(1):9–20CrossRefGoogle Scholar
  27. 27.
    Herranz L, Calic J, Martinez JM, Mrak M (2012) Scalable comic-like video summaries and layout disturbance. IEEE Trans Multimedia 14:1290–1297CrossRefGoogle Scholar
  28. 28.
    Hu M (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8(2):179–187CrossRefzbMATHGoogle Scholar
  29. 29.
    Irani M, Anandan P, Hsu S (1995) Mosaic based representations of video sequences and their applications. In: Proceedings of the IEEE international conference on computer visionGoogle Scholar
  30. 30.
    Kim C, Hwang J (2000) An integrated scheme for object-based video abstraction. In: Proceedings of ACM international conference on multimediaGoogle Scholar
  31. 31.
    Kim C, Hwang J (2002) Object-based video abstraction for video surveillance systems. IEEE Trans Circuits Syst Video Technol 12(12):1128–1138CrossRefGoogle Scholar
  32. 32.
    Kolmogorov V, Zabin R (2004) What energy functions can be minimized via graph cuts? IEEE Trans Pattern Anal Mach Intell 26(2):147–159CrossRefGoogle Scholar
  33. 33.
    Lee YJ, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: Proceedings of the IEEE international conference on computer visionGoogle Scholar
  34. 34.
    Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar
  35. 35.
    Li Y, Sun J, Shum H (2005) Video object cut and paste. ACM Trans Graph (TOG) 24(3):595–600CrossRefGoogle Scholar
  36. 36.
    Li Y, Lee S, Yeh C, Kuo C (2006) Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. IEEE Signal Process Mag 23(2):79–89CrossRefzbMATHGoogle Scholar
  37. 37.
    Li Z, Ishwar P, Konrad J (2009) Video condensation by ribbon carving. IEEE Trans Image Process 18(11):2572–2583CrossRefMathSciNetGoogle Scholar
  38. 38.
    Liu D, Hua G, Chen T (2010) A hierarchical visual model for video object summarization. IEEE Trans Pattern Anal Mach Intell 32:2178–2190CrossRefGoogle Scholar
  39. 39.
    Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  40. 40.
    Ma Y, Hua X, Lu L, Zhang H (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans Multimedia 7(5):907–919CrossRefGoogle Scholar
  41. 41.
    Money A, Agius H (2008) Video summarisation: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143CrossRefGoogle Scholar
  42. 42.
    Ngo C, Ma Y, Zhang H (2003) Automatic video summarization by graph modeling. In: Proceedings of the IEEE international conference on computer visionGoogle Scholar
  43. 43.
    Ngo C, Ma Y, Zhang H (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305CrossRefGoogle Scholar
  44. 44.
    Over P, Smeaton A, Kelly P (2007) The TRECVID 2007 BBC rushes summarization evaluation pilot. In: Proceedings of the international workshop on TRECVID video summarizationGoogle Scholar
  45. 45.
    Over P, Smeaton A, Awad G (2008) The TRECVID 2008 BBC rushes summarization evaluation. In: Proceedings of the international workshop on TRECVID video summarizationGoogle Scholar
  46. 46.
    Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Trans Circuits Syst Video Technol 15(1):52–64CrossRefGoogle Scholar
  47. 47.
    Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar
  48. 48.
    Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRefGoogle Scholar
  49. 49.
    Silva L, Scharcanski J (2010) Video segmentation based on motion coherence of particles in a video sequence. IEEE Trans Image Process 19(4):1036–1049CrossRefMathSciNetGoogle Scholar
  50. 50.
    Szeliski R (2006) Image alignment and stitching: a tutorial. Found Trends Comput Graph Vis 2:1–105CrossRefGoogle Scholar
  51. 51.
    Taniguchi Y, Akutsu A, Tonomura Y (1997) Panoramaexcerpts: extracting and packing panoramas for video browsing. In: Proceedings of ACM international conference on multimediaGoogle Scholar
  52. 52.
    Tian Z, Xue J, Lan X, Li C, Zheng N (2011) Key object-based static video summarization. In: Proceedings of ACM international conference on multimediaGoogle Scholar
  53. 53.
    Tian Z, Xue J, Zheng N, Lan X, Li C (2011) 3d spatio-temporal graph cuts for video objects segmentation. In: Proceedings of the international conference on image processingGoogle Scholar
  54. 54.
    Truong B, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl (TOMCCAP) 3(1):1–37CrossRefGoogle Scholar
  55. 55.
    Valdes V, Martinez JM (2010) A framework for video abstraction systems analysis and modelling from an operational point of view. Multimed Tools Appl 49:7–35CrossRefGoogle Scholar
  56. 56.
    Vasconcelos N, Lippman A (2000) Statistical models of video structure for content analysis and characterization. IEEE Trans Image Process 9(1):3–19Google Scholar
  57. 57.
    Vazquez-Reina A, Avidan S, Pfister H, Miller E (2010) Multiple hypothesis video segmentation from superpixel flows. In: Proceedings of the European conference on computer visionGoogle Scholar
  58. 58.
    Wang F, Ngo C (2012) Summarizing rushes videos by motion, object and event understanding. IEEE Trans Multimedia 14(1):76–87CrossRefGoogle Scholar
  59. 59.
    Werlberger M, Trobin W, Pock T, Wedel A, Cremers D, Bischof H (2009) Anisotropic huber-l1 optical flow. In: Proceedings of the British machine vision conferenceGoogle Scholar
  60. 60.
    Yeung MM, Yeo BL (1997) Video visualization for compact presentation and fast browsing of pictorial content. IEEE Trans Circuits Syst Video Technol 7(5):771–785CrossRefGoogle Scholar
  61. 61.
    Zhou H, Hermans T, Karandikar A, Rehg J (2010) Movie genre classification via scene categorization. In: Proceedings of ACM international conference on multimediaGoogle Scholar
  62. 62.
    Zhuang Y, Rui Y, Huang T, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. In: Proceedings of the international conference on image processingGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Zhiqiang Tian
    • 1
    Email author
  • Jianru Xue
    • 1
  • Xuguang Lan
    • 1
  • Ce Li
    • 1
  • Nanning Zheng
    • 1
  1. 1.Institute of Artificial Intelligence and RoboticsXi’an Jiaotong UniversityXi’anChina

Personalised recommendations