Machine Vision and Applications

, Volume 19, Issue 5–6, pp 329–343 | Cite as

How close are we to solving the problem of automated visual surveillance?

A review of real-world surveillance, scientific progress and evaluative mechanisms
Special Issue Paper

Abstract

The problem of automated visual surveillance has spawned a lively research area, with 2005 seeing three conferences or workshops and special issues of two major journals devoted to the topic. These alone are responsible for somewhere in the region of 240 papers and posters on automated visual surveillance before we begin to count those presented in more general fora. Many of these systems and algorithms perform one small sub-part of the surveillance task, such as motion detection. But even with low level image processing tasks it is often difficult to compare systems on the basis of published results alone. This review paper aims to answer the difficult question “How close are we to developing surveillance related systems which are really useful?” The first section of this paper considers the question of surveillance in the real world: installations, systems and practises. The main body of the paper then considers existing computer vision techniques with an emphasis on higher level processes such as behaviour modelling and event detection. We conclude with a review of the evaluative mechanisms that have grown from within the computer vision community in an attempt to provide some form of robust evaluation and cross-system comparability.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aguilera, J., Wildenauer, H., Kampel, M., Borg, M., Thirde, D., Ferryman, J.: Evaluation of motion segmentation quality for aircraft activity surveillance. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS 2005), pp. 293–300. Beijing, China (2005)Google Scholar
  2. 2.
    Aoki, M.: Imaging and analysis of traffic scene. In: IEEE International Conference on Image Processing, vol.4, pp. 1–5. Kobe, Japan (1999)Google Scholar
  3. 3.
    Armitage R. (2002). To CCTV or not to CCTV? A review of current research in the effectiveness of CCTV systems in reducing crime. NACRO, London Google Scholar
  4. 4.
    Baumberg A. and Hogg D.C. (1996). Learning spatiotemporal models from examples. Image Vis. Comput. 14(8): 525–532 CrossRefGoogle Scholar
  5. 5.
    BBC news online. CCTV voyeurism story. 2005. http://www.news. bbc.co.uk/1/hi/england/merseyside/4521342.stmGoogle Scholar
  6. 6.
    Black, J., Velastin, S., Boghossian, B.: A real-time surveillance system for metropolitan railways. In: Proceedings of. International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 189–194. Como, Italy (2005)Google Scholar
  7. 7.
    Boiman, O., Irani, M.: Detecting irregularities in images and in video. In: Proceedings of International Conference on Computer Vision (ICCV). Beijing, China (2005)Google Scholar
  8. 8.
    Brand M. and Kettnaker V. (2000). Discovery and segmentation of activities in video. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 747–757 CrossRefGoogle Scholar
  9. 9.
    Brand, M., Oliver, N., Pentland, A.: Coupled hidden markov models for complex action recognition. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 994–999 (1997)Google Scholar
  10. 10.
    Brémond, F., Thonnat, M., Zuniga, M.: Video understanding framework for automatic behavior recognition. Behav. Res. Meth. (in print) (2006)Google Scholar
  11. 11.
    Buxton H. (2003). Learning and understanding dynamic scene activity: a review. Image Vis. Comput. 21(1): 125–136 CrossRefGoogle Scholar
  12. 12.
    Buxton H. and Gong S. (1995). Visual surveillance in a dynamic and uncertain world. Artif. Intell. 78(1–2): 431–459 CrossRefGoogle Scholar
  13. 13.
    Dee, H.M., Hogg, D.C.: Detecting inexplicable behaviour. In: of British Machine Vision Conference (BMVC). Kingston-on-Thames, UK (2004)Google Scholar
  14. 14.
    Dee, H.M., Hogg, D.C.: Is it interesting? comparing human and machine judgements on the PETS dataset. In: ECCV-PETS: the Performance Evaluation of Tracking and Surveillance workshop at the European Conference on Computer Vision. Prague, Czech Republic (2004)Google Scholar
  15. 15.
    Ditton J., Short E.: Evaluating Scotland’s first town centre CCTV scheme. In: Norris, C., Moran, J., Armstrong, G. (eds.) Surveillance, closed circuit television and social control, pp. 155–173. Ashgate, Aldershot (1998)Google Scholar
  16. 16.
    François A.R.J., Nevatia R., Hobbs J. and Bolles R.C. (2005). VERL: an ontology for representing and annotating video events. IEEE Multimed. Mag. 12(4): 76–86 CrossRefGoogle Scholar
  17. 17.
    Galata, A., Cohn, A.G., Magee, D.R., Hogg, D.C.: Modeling interaction using learnt qualitative spatio-temporal relations and length Markov models. In: Proceedings of European Conference on Artificial Intelligence (ECAI), pp. 741–745. Lyon, France (2002)Google Scholar
  18. 18.
    Gong, S., Xiang, T.: Recognition of group activities using dynamic probablistic networks. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 742–749. Nice, France (2003)Google Scholar
  19. 19.
    Graves, A., Gong, S.: Wavelet based holistic sequence descriptor for generating video summaries. In: Proceeedings of British Machine Vision Conference (BMVC), pp. 167–176. Kingston, UK (2004)Google Scholar
  20. 20.
    Greenhill, D., Renno, J., Orwell, J., Jones, G.A.: Occlusion analysis: learning and utilising depth maps in object tracking. In: of British Machine Vision Conference (BMVC), pp. 467–476. Kingston, UK (2004)Google Scholar
  21. 21.
    Grimson, W.E.L., Stauffer, C., Romano, R., Lee, L.: Using adaptive tracking to classify and monitor activities in a site. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 246–252. Santa Barbara, CA (1998)Google Scholar
  22. 22.
    Hampel, F.: Robust statistics: a brief introduction and overview. In: Seminar für Statistik, Eidgenössische Technische Hochschule. Zürich, Switzerland (2001)Google Scholar
  23. 23.
    Hockaday, S.: Evaluation of image processing technology for applications in highway operations. Technical Report Final Report TR91-2, Transportation Research Group, California Polytechnic State University, San Luis Obispo, California (1991)Google Scholar
  24. 24.
    Home Office Scientific Development Branch. Evaluating ‘intelligent’ CCTV—i-LIDS: imagery library for intelligent detection systems 2005.http://www.scienceandresearch.homeoffice.gov.uk/hosdb/news-events/270405Google Scholar
  25. 25.
    Hongeng, S., Nevatia, R.: Multi-agent event recognition. In: of International Conference on Computer Vision (ICCV), pp. 84–91. Vancouver, Canada (2001)Google Scholar
  26. 26.
    Howarth, R.J., Buxton, H.: Conceptual descriptions from monitoring and watching image sequences. Image Vis. Comput. 18, 105–135 (2000)CrossRefGoogle Scholar
  27. 27.
    Hu W., Tan T., Wang L. and Maybank S. (2004). A survey on visual surveillance of object motion and behaviours. IEEE Tran. Syst. Man and Cybern. 34(3): 334–352 CrossRefGoogle Scholar
  28. 28.
    Huang, T., Russell, S.: Object identification in a Bayesian context. In: Proceedings of International Joint Conference on Artificial Intelligence(IJCAI), pp. 1276–1283. Nagoya, Japan (1997)Google Scholar
  29. 29.
    Hung, H., Gong, S.: Detecting and quantifying unusual interactions by correlating salient action. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 46–51. Como, Italy (2005)Google Scholar
  30. 30.
    Institute of Electrical and Electronics Engineers: IEEE standard computer dictionary: a compilation of IEEE standard computer glossaries. IEEE, New York (1990)Google Scholar
  31. 31.
    Intille S.S. and Bobick A.F. (2001). Recognising planned, multiperson action. Comput. Vis. Image Underst. (CVIU) 81: 414–445 MATHCrossRefGoogle Scholar
  32. 32.
    Isard, M., Blake, A.: A mixed-state CONDENSATION tracker with automatic model-switching. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 107–112. Bombay, India (1998)Google Scholar
  33. 33.
    Isard, M., MacCormick, J.: BraMBLe: a Bayesian multiple-blob tracker. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 34–41. Vancouver, Canada (2001)Google Scholar
  34. 34.
    Ivanov Y.A. and Bobick A.F. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 852–872 CrossRefGoogle Scholar
  35. 35.
    Jan, T., Piccardi, M., Hintz, T.: Detection of suspicious pedestrian behavior using modified probabilistic neural network. In: Proceedings of Image and Vision Computing, pp. 237–241. Auckland, New Zealand, 2002Google Scholar
  36. 36.
    Johnson, N., Galata, A., Hogg, D.C.: The acquisition and use of interaction behaviour models. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 866–871. Santa Barbara, CA (1998)Google Scholar
  37. 37.
    Johnson N. and Hogg D.C. (1996). Learning the distribution of object tractories for event recognition. Image Vis. Comput. 14(8): 609–615 CrossRefGoogle Scholar
  38. 38.
    Kalman R. (1960). A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 82: 35–45 Google Scholar
  39. 39.
    Kingston University, Mott MacDonald and Ipsotek Limited: Maximising benefits from CCTV on the railway—existing systems. Technical report, Rail Safety and Standards Board (2003)Google Scholar
  40. 40.
    Liberty CCTV, 2005. http://www.liberty-human-rights.org.uk/ privacy/cctv.shtmlGoogle Scholar
  41. 41.
    List, T., Bins, J., Vazquez, J., Fisher, R.B.: Performance evaluating the evaluator. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS 2005). Beijing, China (2005)Google Scholar
  42. 42.
    Magee D.R. and Boyle R.D. (2002). Detecting lameness using ‘ condensation’ and ‘multi-stream cyclic Hidden Markov models’. Image Vis. Comput. 20(8): 581–594 CrossRefGoogle Scholar
  43. 43.
    Makris D. and Ellis T. (2005). Learning semantic scene models from observing activity in visual surveillance. IEEE Trans. Syst. Man Cybern. 35(3): 397–408 CrossRefGoogle Scholar
  44. 44.
    Makris D. and Ellis T.J. (2002). Path detection in video surveillance. Image Vis Comput 20(12): 895–903 CrossRefGoogle Scholar
  45. 45.
    McCahill, M., Norris, C.: CCTV in Britain. In: On the threshold to Urban Panopticon?: Analysing the Employment of CCTV in European Cities and Assessing its Social and Political Impacts. Technical University Berlin (2003)Google Scholar
  46. 46.
    McCahill, M., Norris, C.: CCTV systems in London: their structures and practices. In: On the threshold to Urban Panopticon?: Analysing the Employment of CCTV in European Cities and Assessing its Social and Political Impacts. Technical University Berlin (2003)Google Scholar
  47. 47.
    McKenna S.J. and Nait Charif H. (2004). Summarising contextual activity and detecting unusual inactivity in a supportive home environment. Pattern Anal. Appl. 7(4): 386–401 CrossRefGoogle Scholar
  48. 48.
    Medioni G., Cohen I., Brémond F., Hongeng S. and Nevatia R. (2001). Event detection and analysis from video streams. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 23(8): 873–889 CrossRefGoogle Scholar
  49. 49.
    Meer P.: Robust techniques for computer vision. In: Medioni, G., Kang, S.B. (ed.) Emerging topics in computer vision pp. 107–190. Prentice Hall, Englewood cliffs (2004)Google Scholar
  50. 50.
    Morris R.J. and Hogg D.C. (2000). Statistical models of object interaction. Int. J. Comput. Vis. 37(2): 209–215 MATHCrossRefGoogle Scholar
  51. 51.
    Needham, C.J., Boyle, R.D.: Performance evaluation metrics and statistics for postitional tracker evaluation. In: Proceedings of International Conference on Computer Vision Systems, pp. 278–289. Austria (2003)Google Scholar
  52. 52.
    Norris C. and Armstrong C. (1999). The Maximum Surveillance Society. Berg, Oxford Google Scholar
  53. 53.
    Norris C., McCahill M. and Wood D. (2004). Editorial: the growth of CCTV: a global perspective on the international diffusion of video surveillance in publicly accessible space. Surveill. Soc. 2(2/3): 110–135 Google Scholar
  54. 54.
    Oliver, N., Rosario, B., Pentland, A.: Statistical modeling of human interactions. In: Proceedings of IEEE CVPR Workshop on the Interpretation of Visual Motion, pp. 39–46. Santa Barbara, CA (1998)Google Scholar
  55. 55.
    Oliver N.M., Rosario B. and Pentland A.P. (2000). A Bayesian computer system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 831–843 CrossRefGoogle Scholar
  56. 56.
    Pasula, H., Russell, S., Ostland, M., Ritov, Y.: Tracking many objects with many sensors. In: Proceedings of International Joint Conference on Artificial Intelligence(IJCAI), pp. 1160–1171. Stockholm, Sweden (1999)Google Scholar
  57. 57.
    Remagnino, P., Baumberg, A., Grove, T., Hogg, D.C., Tan, T., Worrall, A., Baker, K.: An integrated traffic and pedestrian model-based vision system. In: Proceedings of British Machine Vision Conference (BMVC), pp. 380–389. Essex, UK (1997)Google Scholar
  58. 58.
    Remagnino, P., Tan, T., Baker, K.: Agent orientated annotation in model based visual surveillance. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 857–862. Bombay, India (1998)Google Scholar
  59. 59.
    Remagnino P., Tan T. and Baker K. (1998). Multi-agent visual surveillance of dynamic scenes. Image Vis. Comput. 16: 529–532 CrossRefGoogle Scholar
  60. 60.
    Robertson, N., Reid, I.: Behaviour understanding in video: a combined method. In: Proceedings of International Conference on Computer Vision (ICCV). Beijing, China (2005)Google Scholar
  61. 61.
    Rowe, N.C.: Detecting suspicious behaviour from positional information. In: Modelling Others from Observations Workshop at IJCAI. Edinburgh, Scotland (2005)Google Scholar
  62. 62.
    Sacks H. (1972). Notes on police assessment of moral character. In: Sudnow, D. (eds) Studies in social interaction., pp 280–293. Free Press, New York Google Scholar
  63. 63.
    Sage, K.H., Buxton, H.: Joint spatial and temporal structure learning for task based control. In: Proceedings of International Conference on Pattern Recognition (ICPR), pp. 48–51. Cambridge, UK (2004)Google Scholar
  64. 64.
    Schwerdt, K., Maman, D., Bernas, P., Paul, E.: Target segmentation and event detection at video-rate: the eagle project. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 183–188. Como, Italy (2005)Google Scholar
  65. 65.
    Scödl, A., Essa, I.: Depth layers from occlusions. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 339–644. Kawai, Hawaii (2001)Google Scholar
  66. 66.
    Senior, A.: Tracking people with probabilistic appearance models. In: IEEE workshop on Performance Evaluation of Tracking and Surveillance, pp. 48–55. Copenhagen, Denmark (2002)Google Scholar
  67. 67.
    Seyve, C.: Metro railway security algorithms with real world experience adapted to the RATP dataset. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 177–182. Como, Italy (2005)Google Scholar
  68. 68.
    Sherrah, J., Gong, S.: Automated detection of localised visual events over varying temporal scales. In: Proceedings of European Workshop on Advanced Video-based Surveillance Systems, pp. 215–227. Kingston, UK (2001)Google Scholar
  69. 69.
    Sherrah, J., Gong, S.: Continuous global evidence-based modality fusion for simultaneous tracking of multiple objects. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 42–49. Vancouver, Canada (2001)Google Scholar
  70. 70.
    Siebel, N.T., Maybank, S.: The advisor visual surveillance system. In: Proceedings of the ECCV 2004 workshop Applications of Computer Vision (ACV’04), pp. 103–111. Prague, Czech Republic (2004)Google Scholar
  71. 71.
    Siegal S. and Castellan N.J. (1988). Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw Hill, Singapore Google Scholar
  72. 72.
    Silogic: Evaluation du traitement et de l’interpretation de séquences video . Introduction to evaluation and metrics, 2005. Available from http://www.silogic.fr/etiseo/bibliothequeDocuments00010058. htmlGoogle Scholar
  73. 73.
    Skinns, D.: Crime reduction, diffusion and displacement: the effectiveness of CCTV. In: Norris, C., Moran, J., Armstrong, G. (eds.) Surveillance, closed circuit television and social control, pp. 175–188. Ashgate, Aldershot (1988)Google Scholar
  74. 74.
    Smith G.J.D. (2004). Behind the screens: examining constructions of deviance and informal practices among CCTV control room operators in the UK. Surveil Soc. 2(2/3): 376–395 Google Scholar
  75. 75.
    Spirito, M., Regazzoni, C.S., Marcenaro, L.: Automatic detection of dangerous events for underground surveillance. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 195–200. Como, Italy (2005)Google Scholar
  76. 76.
    Stauffer, C.: Automatic hierarchical classification using time-based co-occurrences. In: Proceedings of. Computer Vision and Pattern Recognition (CVPR), pp. 333–339. Ft. Collins, CO (1999)Google Scholar
  77. 77.
    Stauffer, C.: Estimating tracking sources and sinks. In: Proceedings of 2nd IEEE workshop on event mining, pp. 259–266. Madison, WI (2003)Google Scholar
  78. 78.
    Stauffer C. and Grimson E. (2000). Learning patterns of activity using real-time tracking. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 747–757 CrossRefGoogle Scholar
  79. 79.
    Stauffer, C., Grimson, W.: Adaptive background mixture models for real-time tracking. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 246–252. Fort Collins, CO (1999)Google Scholar
  80. 80.
    Sumpter N. and Bulpitt A. (1999). Learning spatio-temporal patterns for predicting object behaviour. Image Vis. Comput. 18(9): 697–704 CrossRefGoogle Scholar
  81. 81.
    Svensson, M.S., Heath, C., Luff, P.: Monitoring practice: event detection and system design. In: Velastin, S.A., Remagnino, P. (eds.) Intelligent Distributed Surveillance Systems. The Institution of Electrical Engineers (IEE) (2005)Google Scholar
  82. 82.
    Tilley, N.: Evaluating the effectiveness of CCTV schemes. In: Norris, C., Moran, J., Armstrong, G. (eds.), Surveillance, closed circuit television and social control, pp. 139–153. Ashgate, Aldershot (1998)Google Scholar
  83. 83.
    Troscianko T., Holmes A., Stillman J., Mirmehdi M., Wright D. and Wilson A. (2004). What happens next? the predictability of natural behaviour viewed through CCTV cameras. Perception 33(1): 87–101 CrossRefGoogle Scholar
  84. 84.
    Velastin S.A., Boghossian B.A., Lo B.P.L., Sun J. and Vicencio-Silva M.A. (2005). PRISMATICA: toward ambient intelligence in public transport environments. IEEE Trans. Syst. Man Cybern. Part A 35(1): 164–182 CrossRefGoogle Scholar
  85. 85.
    Viola, P., Jones, M.J., Snow, D.: Detecting pedestrians using patterns of motion and appearance. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 734–741. Nice, France (2003)Google Scholar
  86. 86.
    Vogler C. and Metaxas D. (2001). A framework for recognising the simultaneous aspects of american sign language. Comput. Vis. Image Underst. (CVIU) 81: 358–384 MATHCrossRefGoogle Scholar
  87. 87.
    Wallace, E., Diffley, C.: CCTV control room ergonomics. Technical Report 14/98, Police Scientific Development Branch (PSDB), UK Home Office (1988)Google Scholar
  88. 88.
    Wallace, R.: Finding natural clusters through entropy minimization. Ph.D. Thesis, CMU (1989)Google Scholar
  89. 88.
    Wu, G., Wu, Y., Jiao, L., Wang, Y., Chang, E.: Multicamera -temporal fusion and biased sequence-data learning for security surveillance. In: Proceedings. of ACM International Conference on Multimedia, November 2003., pp. 528–538. Berkeley, CA (2003)Google Scholar
  90. 90.
    Xu, M., Ellis, T.: Partial observation vs. blind tracking through occlusion. In: Proceedings of British Machine Vision Conference (BMVC), pp. 777–786. Cardiff, UK (2002)Google Scholar
  91. 91.
    Young, D.P., Ferryman, J.M.: PETS metrics on-line performance evaluation service. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS 2005). Beijing, China (2005)Google Scholar
  92. 92.
    Zhong, H., Shi, J., Visontai, M.: Detecting unusual activity in video. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), p. 819826. Washington, DC (2004)Google Scholar
  93. 93.
    Zilani, F., Velastin, S., Porikli, F., Marcenaro, L., Kelliher, T., Cavallaro, A., Bruneaut, P.: Performance evaluation of event detection solutions: the CREDS experience. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 201–206. Como, Italy (2005)Google Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  1. 1.School of ComputingUniversity of LeedsLeedsUK
  2. 2.Digital Imaging Research CentreKingston UniversityKingston-upon-ThamesUK

Personalised recommendations