Skip to main content

How close are we to solving the problem of automated visual surveillance?

A review of real-world surveillance, scientific progress and evaluative mechanisms

Abstract

The problem of automated visual surveillance has spawned a lively research area, with 2005 seeing three conferences or workshops and special issues of two major journals devoted to the topic. These alone are responsible for somewhere in the region of 240 papers and posters on automated visual surveillance before we begin to count those presented in more general fora. Many of these systems and algorithms perform one small sub-part of the surveillance task, such as motion detection. But even with low level image processing tasks it is often difficult to compare systems on the basis of published results alone. This review paper aims to answer the difficult question “How close are we to developing surveillance related systems which are really useful?” The first section of this paper considers the question of surveillance in the real world: installations, systems and practises. The main body of the paper then considers existing computer vision techniques with an emphasis on higher level processes such as behaviour modelling and event detection. We conclude with a review of the evaluative mechanisms that have grown from within the computer vision community in an attempt to provide some form of robust evaluation and cross-system comparability.

This is a preview of subscription content, access via your institution.

References

  1. 1.

    Aguilera, J., Wildenauer, H., Kampel, M., Borg, M., Thirde, D., Ferryman, J.: Evaluation of motion segmentation quality for aircraft activity surveillance. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS 2005), pp. 293–300. Beijing, China (2005)

  2. 2.

    Aoki, M.: Imaging and analysis of traffic scene. In: IEEE International Conference on Image Processing, vol.4, pp. 1–5. Kobe, Japan (1999)

  3. 3.

    Armitage R. (2002). To CCTV or not to CCTV? A review of current research in the effectiveness of CCTV systems in reducing crime. NACRO, London

    Google Scholar 

  4. 4.

    Baumberg A. and Hogg D.C. (1996). Learning spatiotemporal models from examples. Image Vis. Comput. 14(8): 525–532

    Article  Google Scholar 

  5. 5.

    BBC news online. CCTV voyeurism story. 2005. http://www.news. bbc.co.uk/1/hi/england/merseyside/4521342.stm

  6. 6.

    Black, J., Velastin, S., Boghossian, B.: A real-time surveillance system for metropolitan railways. In: Proceedings of. International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 189–194. Como, Italy (2005)

  7. 7.

    Boiman, O., Irani, M.: Detecting irregularities in images and in video. In: Proceedings of International Conference on Computer Vision (ICCV). Beijing, China (2005)

  8. 8.

    Brand M. and Kettnaker V. (2000). Discovery and segmentation of activities in video. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 747–757

    Article  Google Scholar 

  9. 9.

    Brand, M., Oliver, N., Pentland, A.: Coupled hidden markov models for complex action recognition. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 994–999 (1997)

  10. 10.

    Brémond, F., Thonnat, M., Zuniga, M.: Video understanding framework for automatic behavior recognition. Behav. Res. Meth. (in print) (2006)

  11. 11.

    Buxton H. (2003). Learning and understanding dynamic scene activity: a review. Image Vis. Comput. 21(1): 125–136

    Article  Google Scholar 

  12. 12.

    Buxton H. and Gong S. (1995). Visual surveillance in a dynamic and uncertain world. Artif. Intell. 78(1–2): 431–459

    Article  Google Scholar 

  13. 13.

    Dee, H.M., Hogg, D.C.: Detecting inexplicable behaviour. In: of British Machine Vision Conference (BMVC). Kingston-on-Thames, UK (2004)

  14. 14.

    Dee, H.M., Hogg, D.C.: Is it interesting? comparing human and machine judgements on the PETS dataset. In: ECCV-PETS: the Performance Evaluation of Tracking and Surveillance workshop at the European Conference on Computer Vision. Prague, Czech Republic (2004)

  15. 15.

    Ditton J., Short E.: Evaluating Scotland’s first town centre CCTV scheme. In: Norris, C., Moran, J., Armstrong, G. (eds.) Surveillance, closed circuit television and social control, pp. 155–173. Ashgate, Aldershot (1998)

    Google Scholar 

  16. 16.

    François A.R.J., Nevatia R., Hobbs J. and Bolles R.C. (2005). VERL: an ontology for representing and annotating video events. IEEE Multimed. Mag. 12(4): 76–86

    Article  Google Scholar 

  17. 17.

    Galata, A., Cohn, A.G., Magee, D.R., Hogg, D.C.: Modeling interaction using learnt qualitative spatio-temporal relations and length Markov models. In: Proceedings of European Conference on Artificial Intelligence (ECAI), pp. 741–745. Lyon, France (2002)

  18. 18.

    Gong, S., Xiang, T.: Recognition of group activities using dynamic probablistic networks. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 742–749. Nice, France (2003)

  19. 19.

    Graves, A., Gong, S.: Wavelet based holistic sequence descriptor for generating video summaries. In: Proceeedings of British Machine Vision Conference (BMVC), pp. 167–176. Kingston, UK (2004)

  20. 20.

    Greenhill, D., Renno, J., Orwell, J., Jones, G.A.: Occlusion analysis: learning and utilising depth maps in object tracking. In: of British Machine Vision Conference (BMVC), pp. 467–476. Kingston, UK (2004)

  21. 21.

    Grimson, W.E.L., Stauffer, C., Romano, R., Lee, L.: Using adaptive tracking to classify and monitor activities in a site. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 246–252. Santa Barbara, CA (1998)

  22. 22.

    Hampel, F.: Robust statistics: a brief introduction and overview. In: Seminar für Statistik, Eidgenössische Technische Hochschule. Zürich, Switzerland (2001)

  23. 23.

    Hockaday, S.: Evaluation of image processing technology for applications in highway operations. Technical Report Final Report TR91-2, Transportation Research Group, California Polytechnic State University, San Luis Obispo, California (1991)

  24. 24.

    Home Office Scientific Development Branch. Evaluating ‘intelligent’ CCTV—i-LIDS: imagery library for intelligent detection systems 2005.http://www.scienceandresearch.homeoffice.gov.uk/hosdb/news-events/270405

  25. 25.

    Hongeng, S., Nevatia, R.: Multi-agent event recognition. In: of International Conference on Computer Vision (ICCV), pp. 84–91. Vancouver, Canada (2001)

  26. 26.

    Howarth, R.J., Buxton, H.: Conceptual descriptions from monitoring and watching image sequences. Image Vis. Comput. 18, 105–135 (2000)

    Article  Google Scholar 

  27. 27.

    Hu W., Tan T., Wang L. and Maybank S. (2004). A survey on visual surveillance of object motion and behaviours. IEEE Tran. Syst. Man and Cybern. 34(3): 334–352

    Article  Google Scholar 

  28. 28.

    Huang, T., Russell, S.: Object identification in a Bayesian context. In: Proceedings of International Joint Conference on Artificial Intelligence(IJCAI), pp. 1276–1283. Nagoya, Japan (1997)

  29. 29.

    Hung, H., Gong, S.: Detecting and quantifying unusual interactions by correlating salient action. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 46–51. Como, Italy (2005)

  30. 30.

    Institute of Electrical and Electronics Engineers: IEEE standard computer dictionary: a compilation of IEEE standard computer glossaries. IEEE, New York (1990)

  31. 31.

    Intille S.S. and Bobick A.F. (2001). Recognising planned, multiperson action. Comput. Vis. Image Underst. (CVIU) 81: 414–445

    MATH  Article  Google Scholar 

  32. 32.

    Isard, M., Blake, A.: A mixed-state CONDENSATION tracker with automatic model-switching. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 107–112. Bombay, India (1998)

  33. 33.

    Isard, M., MacCormick, J.: BraMBLe: a Bayesian multiple-blob tracker. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 34–41. Vancouver, Canada (2001)

  34. 34.

    Ivanov Y.A. and Bobick A.F. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 852–872

    Article  Google Scholar 

  35. 35.

    Jan, T., Piccardi, M., Hintz, T.: Detection of suspicious pedestrian behavior using modified probabilistic neural network. In: Proceedings of Image and Vision Computing, pp. 237–241. Auckland, New Zealand, 2002

  36. 36.

    Johnson, N., Galata, A., Hogg, D.C.: The acquisition and use of interaction behaviour models. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 866–871. Santa Barbara, CA (1998)

  37. 37.

    Johnson N. and Hogg D.C. (1996). Learning the distribution of object tractories for event recognition. Image Vis. Comput. 14(8): 609–615

    Article  Google Scholar 

  38. 38.

    Kalman R. (1960). A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 82: 35–45

    Google Scholar 

  39. 39.

    Kingston University, Mott MacDonald and Ipsotek Limited: Maximising benefits from CCTV on the railway—existing systems. Technical report, Rail Safety and Standards Board (2003)

  40. 40.

    Liberty CCTV, 2005. http://www.liberty-human-rights.org.uk/ privacy/cctv.shtml

  41. 41.

    List, T., Bins, J., Vazquez, J., Fisher, R.B.: Performance evaluating the evaluator. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS 2005). Beijing, China (2005)

  42. 42.

    Magee D.R. and Boyle R.D. (2002). Detecting lameness using ‘ condensation’ and ‘multi-stream cyclic Hidden Markov models’. Image Vis. Comput. 20(8): 581–594

    Article  Google Scholar 

  43. 43.

    Makris D. and Ellis T. (2005). Learning semantic scene models from observing activity in visual surveillance. IEEE Trans. Syst. Man Cybern. 35(3): 397–408

    Article  Google Scholar 

  44. 44.

    Makris D. and Ellis T.J. (2002). Path detection in video surveillance. Image Vis Comput 20(12): 895–903

    Article  Google Scholar 

  45. 45.

    McCahill, M., Norris, C.: CCTV in Britain. In: On the threshold to Urban Panopticon?: Analysing the Employment of CCTV in European Cities and Assessing its Social and Political Impacts. Technical University Berlin (2003)

  46. 46.

    McCahill, M., Norris, C.: CCTV systems in London: their structures and practices. In: On the threshold to Urban Panopticon?: Analysing the Employment of CCTV in European Cities and Assessing its Social and Political Impacts. Technical University Berlin (2003)

  47. 47.

    McKenna S.J. and Nait Charif H. (2004). Summarising contextual activity and detecting unusual inactivity in a supportive home environment. Pattern Anal. Appl. 7(4): 386–401

    Article  Google Scholar 

  48. 48.

    Medioni G., Cohen I., Brémond F., Hongeng S. and Nevatia R. (2001). Event detection and analysis from video streams. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 23(8): 873–889

    Article  Google Scholar 

  49. 49.

    Meer P.: Robust techniques for computer vision. In: Medioni, G., Kang, S.B. (ed.) Emerging topics in computer vision pp. 107–190. Prentice Hall, Englewood cliffs (2004)

    Google Scholar 

  50. 50.

    Morris R.J. and Hogg D.C. (2000). Statistical models of object interaction. Int. J. Comput. Vis. 37(2): 209–215

    MATH  Article  Google Scholar 

  51. 51.

    Needham, C.J., Boyle, R.D.: Performance evaluation metrics and statistics for postitional tracker evaluation. In: Proceedings of International Conference on Computer Vision Systems, pp. 278–289. Austria (2003)

  52. 52.

    Norris C. and Armstrong C. (1999). The Maximum Surveillance Society. Berg, Oxford

    Google Scholar 

  53. 53.

    Norris C., McCahill M. and Wood D. (2004). Editorial: the growth of CCTV: a global perspective on the international diffusion of video surveillance in publicly accessible space. Surveill. Soc. 2(2/3): 110–135

    Google Scholar 

  54. 54.

    Oliver, N., Rosario, B., Pentland, A.: Statistical modeling of human interactions. In: Proceedings of IEEE CVPR Workshop on the Interpretation of Visual Motion, pp. 39–46. Santa Barbara, CA (1998)

  55. 55.

    Oliver N.M., Rosario B. and Pentland A.P. (2000). A Bayesian computer system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 831–843

    Article  Google Scholar 

  56. 56.

    Pasula, H., Russell, S., Ostland, M., Ritov, Y.: Tracking many objects with many sensors. In: Proceedings of International Joint Conference on Artificial Intelligence(IJCAI), pp. 1160–1171. Stockholm, Sweden (1999)

  57. 57.

    Remagnino, P., Baumberg, A., Grove, T., Hogg, D.C., Tan, T., Worrall, A., Baker, K.: An integrated traffic and pedestrian model-based vision system. In: Proceedings of British Machine Vision Conference (BMVC), pp. 380–389. Essex, UK (1997)

  58. 58.

    Remagnino, P., Tan, T., Baker, K.: Agent orientated annotation in model based visual surveillance. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 857–862. Bombay, India (1998)

  59. 59.

    Remagnino P., Tan T. and Baker K. (1998). Multi-agent visual surveillance of dynamic scenes. Image Vis. Comput. 16: 529–532

    Article  Google Scholar 

  60. 60.

    Robertson, N., Reid, I.: Behaviour understanding in video: a combined method. In: Proceedings of International Conference on Computer Vision (ICCV). Beijing, China (2005)

  61. 61.

    Rowe, N.C.: Detecting suspicious behaviour from positional information. In: Modelling Others from Observations Workshop at IJCAI. Edinburgh, Scotland (2005)

  62. 62.

    Sacks H. (1972). Notes on police assessment of moral character. In: Sudnow, D. (eds) Studies in social interaction., pp 280–293. Free Press, New York

    Google Scholar 

  63. 63.

    Sage, K.H., Buxton, H.: Joint spatial and temporal structure learning for task based control. In: Proceedings of International Conference on Pattern Recognition (ICPR), pp. 48–51. Cambridge, UK (2004)

  64. 64.

    Schwerdt, K., Maman, D., Bernas, P., Paul, E.: Target segmentation and event detection at video-rate: the eagle project. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 183–188. Como, Italy (2005)

  65. 65.

    Scödl, A., Essa, I.: Depth layers from occlusions. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 339–644. Kawai, Hawaii (2001)

  66. 66.

    Senior, A.: Tracking people with probabilistic appearance models. In: IEEE workshop on Performance Evaluation of Tracking and Surveillance, pp. 48–55. Copenhagen, Denmark (2002)

  67. 67.

    Seyve, C.: Metro railway security algorithms with real world experience adapted to the RATP dataset. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 177–182. Como, Italy (2005)

  68. 68.

    Sherrah, J., Gong, S.: Automated detection of localised visual events over varying temporal scales. In: Proceedings of European Workshop on Advanced Video-based Surveillance Systems, pp. 215–227. Kingston, UK (2001)

  69. 69.

    Sherrah, J., Gong, S.: Continuous global evidence-based modality fusion for simultaneous tracking of multiple objects. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 42–49. Vancouver, Canada (2001)

  70. 70.

    Siebel, N.T., Maybank, S.: The advisor visual surveillance system. In: Proceedings of the ECCV 2004 workshop Applications of Computer Vision (ACV’04), pp. 103–111. Prague, Czech Republic (2004)

  71. 71.

    Siegal S. and Castellan N.J. (1988). Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw Hill, Singapore

    Google Scholar 

  72. 72.

    Silogic: Evaluation du traitement et de l’interpretation de séquences video . Introduction to evaluation and metrics, 2005. Available from http://www.silogic.fr/etiseo/bibliothequeDocuments00010058. html

  73. 73.

    Skinns, D.: Crime reduction, diffusion and displacement: the effectiveness of CCTV. In: Norris, C., Moran, J., Armstrong, G. (eds.) Surveillance, closed circuit television and social control, pp. 175–188. Ashgate, Aldershot (1988)

    Google Scholar 

  74. 74.

    Smith G.J.D. (2004). Behind the screens: examining constructions of deviance and informal practices among CCTV control room operators in the UK. Surveil Soc. 2(2/3): 376–395

    Google Scholar 

  75. 75.

    Spirito, M., Regazzoni, C.S., Marcenaro, L.: Automatic detection of dangerous events for underground surveillance. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 195–200. Como, Italy (2005)

  76. 76.

    Stauffer, C.: Automatic hierarchical classification using time-based co-occurrences. In: Proceedings of. Computer Vision and Pattern Recognition (CVPR), pp. 333–339. Ft. Collins, CO (1999)

  77. 77.

    Stauffer, C.: Estimating tracking sources and sinks. In: Proceedings of 2nd IEEE workshop on event mining, pp. 259–266. Madison, WI (2003)

  78. 78.

    Stauffer C. and Grimson E. (2000). Learning patterns of activity using real-time tracking. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 747–757

    Article  Google Scholar 

  79. 79.

    Stauffer, C., Grimson, W.: Adaptive background mixture models for real-time tracking. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 246–252. Fort Collins, CO (1999)

  80. 80.

    Sumpter N. and Bulpitt A. (1999). Learning spatio-temporal patterns for predicting object behaviour. Image Vis. Comput. 18(9): 697–704

    Article  Google Scholar 

  81. 81.

    Svensson, M.S., Heath, C., Luff, P.: Monitoring practice: event detection and system design. In: Velastin, S.A., Remagnino, P. (eds.) Intelligent Distributed Surveillance Systems. The Institution of Electrical Engineers (IEE) (2005)

  82. 82.

    Tilley, N.: Evaluating the effectiveness of CCTV schemes. In: Norris, C., Moran, J., Armstrong, G. (eds.), Surveillance, closed circuit television and social control, pp. 139–153. Ashgate, Aldershot (1998)

    Google Scholar 

  83. 83.

    Troscianko T., Holmes A., Stillman J., Mirmehdi M., Wright D. and Wilson A. (2004). What happens next? the predictability of natural behaviour viewed through CCTV cameras. Perception 33(1): 87–101

    Article  Google Scholar 

  84. 84.

    Velastin S.A., Boghossian B.A., Lo B.P.L., Sun J. and Vicencio-Silva M.A. (2005). PRISMATICA: toward ambient intelligence in public transport environments. IEEE Trans. Syst. Man Cybern. Part A 35(1): 164–182

    Article  Google Scholar 

  85. 85.

    Viola, P., Jones, M.J., Snow, D.: Detecting pedestrians using patterns of motion and appearance. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 734–741. Nice, France (2003)

  86. 86.

    Vogler C. and Metaxas D. (2001). A framework for recognising the simultaneous aspects of american sign language. Comput. Vis. Image Underst. (CVIU) 81: 358–384

    MATH  Article  Google Scholar 

  87. 87.

    Wallace, E., Diffley, C.: CCTV control room ergonomics. Technical Report 14/98, Police Scientific Development Branch (PSDB), UK Home Office (1988)

  88. 88.

    Wallace, R.: Finding natural clusters through entropy minimization. Ph.D. Thesis, CMU (1989)

  89. 88.

    Wu, G., Wu, Y., Jiao, L., Wang, Y., Chang, E.: Multicamera -temporal fusion and biased sequence-data learning for security surveillance. In: Proceedings. of ACM International Conference on Multimedia, November 2003., pp. 528–538. Berkeley, CA (2003)

  90. 90.

    Xu, M., Ellis, T.: Partial observation vs. blind tracking through occlusion. In: Proceedings of British Machine Vision Conference (BMVC), pp. 777–786. Cardiff, UK (2002)

  91. 91.

    Young, D.P., Ferryman, J.M.: PETS metrics on-line performance evaluation service. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS 2005). Beijing, China (2005)

  92. 92.

    Zhong, H., Shi, J., Visontai, M.: Detecting unusual activity in video. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), p. 819826. Washington, DC (2004)

  93. 93.

    Zilani, F., Velastin, S., Porikli, F., Marcenaro, L., Kelliher, T., Cavallaro, A., Bruneaut, P.: Performance evaluation of event detection solutions: the CREDS experience. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 201–206. Como, Italy (2005)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Hannah M. Dee.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Dee, H.M., Velastin, S.A. How close are we to solving the problem of automated visual surveillance?. Machine Vision and Applications 19, 329–343 (2008). https://doi.org/10.1007/s00138-007-0077-z

Download citation

Keywords

  • Computer Vision
  • Ground Truth
  • Minimum Description Length
  • Computer Vision System
  • Visual Surveillance