Skip to main content

SurVizor: visualizing and understanding the key content of surveillance videos


With the rapid development of society, video surveillance has progressively expanded into different areas of life, such as transportation, security inspection, banks. There are a large number of replaced and newly deployed cameras in fields such as safe cities, smart campuses and smart buildings, which leads to a huge amount of video data, slow retrieval speed in video examining, and low efficiency in understanding complete picture of videos. In this paper, we propose SurVizor, a visual analysis system to understand the key content of surveillance videos. We integrate multiple image features and employ time series analysis methods to explore key temporal patterns in the feature. We integrate multiple visualization views from three levels of video, feature, and frame to promote exploration, analysis and understanding of video content. We evaluate the proposed system through a case study based on real-world surveillance videos from multi-camera and a user study. The results demonstrate the usability and effectiveness of our system in analyzing and understanding the key content of surveillance videos.

Graphic abstract

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13


  1. Alabdulatif A, Khalil I, Forkan ARM, Atiquzzaman M (2018) Real-time secure health surveillance for smarter health communities. IEEE Commun Mag 57(1):122–129

    Article  Google Scholar 

  2. Alameda-Pineda X, Staiano J, Subramanian R, Batrinca L, Ricci E, Lepri B, Lanz O, Sebe N (2015) Salsa: a novel dataset for multimodal group behavior analysis. IEEE Trans Pattern Anal Mach Intell 38(8):1707–1720

    Article  Google Scholar 

  3. Alshammari A, Rawat DB (2019) Intelligent multi-camera video surveillance system for smart city applications. In: Proceedings of the IEEE annual computing and communication workshop and conference, pp 0317–0323

  4. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008

    Article  Google Scholar 

  5. Bylinskii Z, Isola P, Bainbridge C, Torralba A, Oliva A (2015) Intrinsic and extrinsic effects on image memorability. Vision Res 116:165–178

    Article  Google Scholar 

  6. Chan GYY, Nonato LG, Chu A, Raghavan P, Aluru V, Silva CT (2019) Motion browser: visualizing and understanding complex upper limb movement under obstetrical brachial plexus injuries. IEEE Trans Visual Comput Graph 26(1):981–990

    Article  Google Scholar 

  7. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58

    Article  Google Scholar 

  8. Cheng Z, Yang Y, Wang W, Hu W, Zhuang Y, Song G (2020) Time2graph: revisiting time series modeling with dynamic shapelets. Proc AAAI Conf Artif Intell 34:3617–3624

    Google Scholar 

  9. Chung FL, Fu TC, Luk R, Ng V, et al (2001) Flexible time series pattern matching based on perceptually important points, pp 1–7

  10. Cui Z, Chen W, Chen Y (2016) Multi-scale convolutional neural networks for time series classification. arXiv preprint arXiv:1603.06995

  11. Douglas DH, Peucker TK (1973) Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartograph Int J Geograph Inform Geovisual 10(2):112–122

    Google Scholar 

  12. Fajtl J, Argyriou V, Monekosso D, Remagnino P (2018) Amnet: memorability estimation with attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6363–6372

  13. Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. In: Proceedings of the European conference on computer vision, pp 505–520

  14. Heer J, Kong N, Agrawala M (2009) Sizing the horizon: the effects of chart size and layering on the graphical perception of time series visualizations. In: Proceedings of the special interest group on computer-human interaction conference on human factors in computing systems, pp 1303–1312

  15. Hu T, Li Z, Su W, Mu X, Tang J (2017) Unsupervised video summaries using multiple features and image quality. In: Proceedings of the IEEE international conference on multimedia big data, pp 117–120

  16. Lee C, Kim Y, Jin SM, Kim D, Maciejewski R, Ebert D, Ko S (2019) A visual analytics system for exploring, monitoring, and forecasting road traffic congestion. IEEE Trans Visual Comput Graph 26(11):3133–3146

    Article  Google Scholar 

  17. Liao TW (2005) Clustering of time series data: a survey. Pattern Recogn 38(11):1857–1874

    Article  Google Scholar 

  18. Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: a survey. Int J Comput Vision 128(2):261–318

    Article  Google Scholar 

  19. Liu L, Wang Z (2016) Encoding temporal markov dynamics in graph for visualizing and mining time series. arXiv preprint arXiv:1610.07273

  20. Liu M, Shi J, Li Z, Li C, Zhu J, Liu S (2017) Towards better analysis of deep convolutional neural networks. IEEE Trans Visual Comput Graph 23(1):91–100

    Article  Google Scholar 

  21. Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection: a new baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6536–6545

  22. Sun G, Liang R, Qu H, Wu Y (2017) Embedding spatio-temporal information into maps by route-zooming. IEEE Trans Visual Comput Graph 23(5):1506–1519.

    Article  Google Scholar 

  23. Sun G, Wu H, Zhu L, Xu C, Liang H, Xu B, Liang R (2021) VSumVis: interactive visual understanding and diagnosis of video summarization model. ACM Trans Intell Syst Technol 12(4):1–28.

    Article  Google Scholar 

  24. Sun GD, Wu YC, Liang RH, Liu SX (2013) A survey of visual analytics techniques and applications: state-of-the-art research and future challenges. J Comput Sci Technol 28(5):852–867

    Article  Google Scholar 

  25. Talebi H, Milanfar P (2018) Nima: neural image assessment. IEEE Trans Image Process 27(8):3998–4011

    MathSciNet  Article  Google Scholar 

  26. Wang J, Wu J, Cao A, Zhou Z, Zhang H, Wu Y (2021) Tac-miner: visual tactic mining for multiple table tennis matches. IEEE Trans Vis Comput Graph 27(6):2770–2782.

    Article  Google Scholar 

  27. Wei H, Ni B, Yan Y, Yu H, Yang X, Yao C (2018) Video summarization via semantic attended networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  28. Weng D, Zheng C, Deng Z, Ma M, Bao J, Zheng Y, Xu M, Wu Y (2021) Towards better bus networks: a visual analytics approach. IEEE Trans Vis Comput Graph 27(2):817–827.

    Article  Google Scholar 

  29. Wu A, Qu H (2018) Multimodal analysis of video collections: visual exploration of presentation techniques in ted talks. IEEE Trans Visual Comput Graph 26(7):2429–2442

    Article  Google Scholar 

  30. Xu Y, Liu X, Liu Y, Zhu SC (2016) Multi-view people tracking via hierarchical trajectory composition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4256–4265

  31. Ye S, Chen Z, Chu X, Wang Y, Fu S, Shen L, Zhou K, Wu Y (2021) Shuttlespace: exploring and analyzing movement trajectory in immersive visualization. IEEE Trans Vis Comput Graph 27(2):860–869.

    Article  Google Scholar 

  32. Yuan J, Chen C, Yang W, Liu M, Xia J, Liu S (2021) A survey of visual analytics techniques for machine learning. Comput Vis Media 7(1):3–36.

    Article  Google Scholar 

  33. Zeng H, Shu X, Wang Y, Wang Y, Zhang L, Pong TC, Qu H (2020) Emotioncues: emotion-oriented visual summarization of classroom videos. IEEE Trans Visual Comput Graph

  34. Zeng H, Wang X, Wu A, Wang Y, Li Q, Endert A, Qu H (2019) Emoco: visual analysis of emotion coherence in presentation videos. IEEE Trans Visual Comput Graph 26(1):927–937

    Google Scholar 

Download references


This work is partly supported by National Natural Science Foundation of China (62036009), National Natural Science Foundation of China (61972356), Fundamental Research Funds for the Provincial Universities of Zhejiang (RF-A2020001).

Author information



Corresponding author

Correspondence to Ronghua Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sun, G., Li, T. & Liang, R. SurVizor: visualizing and understanding the key content of surveillance videos. J Vis (2021).

Download citation


  • Surveillance video
  • Multi-feature
  • Time series
  • Visual analysis