Multimedia Tools and Applications

, Volume 76, Issue 5, pp 6309–6331 | Cite as

Unsupervised, efficient and scalable key-frame selection for automatic summarization of surveillance videos



Recent years have witnessed a dramatical growth of the deployment of vision-based surveillance in public spaces. Automatic summarization of surveillance videos (ASOSV) is hence becoming more and more desirable in many real-world applications. For this purpose, a novel frame-selection framework is proposed in the present paper, which has three properties: 1) un-supervision: it can work without requirements of any supervised learning or training; 2) efficiency: it can work very fast, with experiments demonstrating efficiency faster than real-timeness and 3) scalability: it can achieve a hierarchical analysis/overview of video content. The performance of proposed framework is systematically evaluated and compared with various state-of-the-art frame selection techniques on some collected video sequences and publicly-available ViSOR dataset. The experimental results demonstrate promising performance and good applicability for real-world problems.


Video summarization Martingale test Key frame selection Surveillance videos 


  1. 1.
    Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. Pattern Anal Mach Intell 30 (3):555–560CrossRefGoogle Scholar
  2. 2.
    Angadi S, Naik V (2014) Entropy Based Fuzzy C Means Clustering and Key Frame Extraction for Sports Video Summarization. In: Fifth International Conference on Signal and Image Processing (ICSIP), 2014. IEEE, pp 271–279Google Scholar
  3. 3.
    Chang HS, Sull S, Lee SU (1999) Efficient video indexing scheme for content-based retrieval. IEEE Trans Circuits Syst Video Technol 9(8):1269–1279CrossRefGoogle Scholar
  4. 4.
    Cotsaces C, Nikolaidis N, Pitas I (2006) Video shot detection and condensed representation: a review. IEEE Signal Process Mag 23(2):28–37CrossRefGoogle Scholar
  5. 5.
    Do TT, Chen Y, Nguyen DT, Nguyen N, Gan L, Tran TD (2009) Distributed compressed video sensing. In: Proceeding of the 2009 16th IEEE International Conference on Image Processing (ICIP), pp 1393–1396Google Scholar
  6. 6.
    Doob JL (1962) Boundary properties of functions with finite dirichlet integrals. Ann Inst Fourier 12:573–621MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Ejaz N, Mehmood I, Baik SW (2013) Efficient visual attention based framework for extracting key frames from videos. Signal Process Image Commun 28(1):34–44CrossRefGoogle Scholar
  8. 8.
    Elhamifar E, Sapiro G, Vidal R (2012) See all by looking at a few: Sparse modeling for finding representative objects. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1600–1607Google Scholar
  9. 9.
    Evangelio RH, Senst T, Keller T, Sikora T (2013) Video indexing and summarization as a tool for privacy protection. In: 18th International Conference on Digital Signal Processing (DSP), 2013. IEEE, pp 1–6Google Scholar
  10. 10.
    Fox EB, Hughes MC, Sudderth EB, Jordan MI (2014) Joint modeling of multiple time series via the Beta process with application to motion capture segmentation. Ann Appl Stat 8(3):1281–1313MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Gong D, Medioni G, Zhao X (2014) Structured time series analysis for human action segmentation and recognition. IEEE Trans Pattern Anal Mach Intell 36 (7):1414–1427CrossRefGoogle Scholar
  12. 12.
    Hammoud RI, Sahin CS, Blasch EP, Rhodes BJ (2014) Multi-source multi-modal activity recognition in aerial video surveillance. In: Proceeding of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 237–244Google Scholar
  13. 13.
    Ho SS, Wechsler H (2010) A martingale framework for detecting changes in data streams by testing exchangeability. Pattern Anal Mach Intell 32(12):2113–2127CrossRefGoogle Scholar
  14. 14.
    Ji QG, Fang ZD, Xie ZH, Lu ZM (2013) Video abstraction based on the visual attention model and online clustering. Signal Process Image Commun 28 (3):241–253CrossRefGoogle Scholar
  15. 15.
    Jones S, Shao L (2014) Linear regression motion analysis for unsupervised temporal segmentation of human actions. In: IEEE Winter Conference on Applications of Computer Vision (WACV), 2014. IEEE, pp 816–822Google Scholar
  16. 16.
    Keogh E, Lin J, Fu A (2005) Hot sax: Efficiently finding the most unusual time series subsequence. In: Proceedings of the the fifth IEEE international conference on Data mining, pp 1–8Google Scholar
  17. 17.
    Kim TK, Wong KYK, Cipolla R (2007) Tensor canonical correlation analysis for action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8Google Scholar
  18. 18.
    Liu T, Zhang H-J, Qi F (2003) A novel video key-frameextraction algorithm based on perceived motion energy model. IEEE Trans Circuits Syst Video Technol 13 (10):1006–1013CrossRefGoogle Scholar
  19. 19.
    Liu T, Zhang X, Feng J, Lo K-T (2004) Shot reconstruction degree: a novel criterion for key frame selection. Pattern Recogn Lett 25(12):1451–1457CrossRefGoogle Scholar
  20. 20.
    Lv F, Nevatia R (2007) Single view human action recognition using key pose matching and viterbi path searchingGoogle Scholar
  21. 21.
    Mahmoud KM, Ismail MA, Ghanem NM (2013) Vscan: an enhanced video summarization using density-based spatial clustering. In: Image Analysis and ProcessingCICIAP 2013. Springer, Berlin Heidelberg, pp 733–742Google Scholar
  22. 22.
    Mentzelopoulos M, Psarrou A (2004) Key-frame extraction algorithm using entropy difference. In: Proceedings of the the 6th ACM SIGMM international workshop on Multimedia information retrieval, pp 39–45Google Scholar
  23. 23.
    Mundur P, Rao Y, Yesha Y (2006) Keyframe-based video summarization using delaunay clustering. Int J Digit Libr 6(2):219–232CrossRefGoogle Scholar
  24. 24.
    Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305CrossRefGoogle Scholar
  25. 25.
    Pan L, Wu X, Shu X (2009) Key frame extraction based on sub-shot segmentation and entropy computing. In: Proceedings of the Chinese Conference on Pattern Recognition, pp 1–5Google Scholar
  26. 26.
    Porter SV, Mirmehdi M, Thomas BT (2003) A shortest path representation for video summarisation. In: Proceedings of the 12th International Conference on Image Analysis and Processing, pp 460–465Google Scholar
  27. 27.
    Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: Computer VisionCECCV 2014. Springer International Publishing, pp 540–555Google Scholar
  28. 28.
    Rajendra, Sachan Priyamvada, Keshaveni N (2014) A survey of automatic video summarization techniques. International Journal of Electronics, Electrical and Computational SystemGoogle Scholar
  29. 29.
    Rasheed Z, Shah M (2003) Scene detection in hollywood movies and tv shows. In: Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp II-343Google Scholar
  30. 30.
    Shao L, Ji L (2009) Motion histogram analysis based key frame extraction for human action/activity representation. In: Proceedings of the Canadian Conference on Computer and Robot Vision, pp 88–92Google Scholar
  31. 31.
    Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRefGoogle Scholar
  32. 32.
    Sun X, Kankanhalli MS (2000) Video summarization using r-sequences. Real-Time Imaging 6(6):449–459CrossRefMATHGoogle Scholar
  33. 33.
    Sundaram H, Chang S-F (2000) Video scene segmentation using video and audio features. In: Proceedings of the 2000 IEEE International Conference on Multimedia and Expo, pp 1145–1148Google Scholar
  34. 34.
    Ten Holt GA, Reinders MJ, Hendriks EA (2007) Multi-dimensional dynamic time warping for gesture recognition. In: Annual Conference of the Advanced School for Computing and Imaging, pp 1–6Google Scholar
  35. 35.
    Tseng BL, Lin CY, Smith JR (2002) Real-time video surveillance for traffic monitoring using virtual line analysis. In: Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, pp 541–544Google Scholar
  36. 36.
    Truong BT, Venkatesh S (2007) Video abstraction: A systematic review and classification. ACM Trans Multimed Comput Commun Appl (TOMCCAP) 3(1):3CrossRefGoogle Scholar
  37. 37.
    Tu Z, Sun D, Luo B (2013) Video Summarization by Robust Low-Rank Subspace Segmentation. In: Proceedings of The Eighth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA), 2013. Springer, Berlin Heidelberg, pp 929–937Google Scholar
  38. 38.
    Vezzani R, Cucchiara R (2010) Video Surveillance Online Repository (ViSOR): an integrated framework. Multimed Tools Appl 50(2):359–380CrossRefGoogle Scholar
  39. 39.
    Vovk V, Nouretdinov I, Gammerman A (2003) Testing exchangeability on-line. ICML 12(2):768–775Google Scholar
  40. 40.
    Wang X, Ma X, Grimson WEL (2009) Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. IEEE Trans Pattern Anal Mach Intell 31(3):539–555CrossRefGoogle Scholar
  41. 41.
    Xiong Z, Radhakrishnan R, Divakaran A, Rui Y, Huang T S (2006) A unified framework for video summarization, browsing & retrieval: with applications to consumer and surveillance video. Academic PressGoogle Scholar
  42. 42.
    Yang S, Lin X (2005) Key frame extraction using unsupervised clustering based on a statistical model. Tsinghua Sci Technol 10(2):169–173MathSciNetCrossRefGoogle Scholar
  43. 43.
    Yu XD, Wang L, Tian Q, Xue P (2004) Multilevel video representation with application to keyframe extraction. In: Proceedings of the 10th International Multimedia Modelling Conference, pp 117–123Google Scholar
  44. 44.
    Zhang X, Sun F, Liu G, Ma Y (2014) Fast low-rank subspace segmentation. IEEE Trans Knowl Data Eng 26(5):1293–1297CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Guoliang Lu
    • 1
    • 2
  • Yiqi Zhou
    • 1
    • 2
  • Xueyong Li
    • 1
    • 2
  • Peng Yan
    • 1
    • 2
  1. 1.School of Mechanical EngineeringShandong UniversityJinanChina
  2. 2.Key Laboratory of High-efficiency and Clean Mechanical Manufacture (Shandong University)Ministry of EducationBeijingChina

Personalised recommendations