Multimedia Tools and Applications

, Volume 70, Issue 1, pp 573–598 | Cite as

Requirements for multimedia metadata schemes in surveillance applications for security

  • J. van Rest
  • F. A. Grootjen
  • M. Grootjen
  • R. Wijn
  • O. Aarts
  • M. L. Roelofs
  • G. J. Burghouts
  • H. Bouma
  • L. Alic
  • W. Kraaij


Surveillance for security requires communication between systems and humans, involves behavioural and multimedia research, and demands an objective benchmarking for the performance of system components. Metadata representation schemes are extremely important to facilitate (system) interoperability and to define ground truth annotations for surveillance research and benchmarks. Surveillance places specific requirements on these metadata representation schemes. This paper offers a clear and coherent terminology, and uses this to present these requirements and to evaluate them in three ways: their fitness in breadth for surveillance design patterns, their fitness in depth for a specific surveillance scenario, and their realism on the basis of existing schemes. It is also validated that no existing metadata representation scheme fulfils all requirements. Guidelines are offered to those who wish to select or create a metadata scheme for surveillance for security.


Surveillance Human behaviour Annotation Metadata representation scheme Event Action Multimodal Multi-sensor ONVIF MPEG-7 PETS 



This work was performed as independent research of the applied research programme Dutch Top sector High Tech Systems & Materials: Roadmap Security, Passive Sensors. [9]. The authors thank Aart Beukers (Eye-D Security Experts) and the Amsterdam police for kindly providing the instruction video.

Supplementary material


(MPG 7337 kb)


  1. 1.
    Alexander C (1977) A Pattern Language: Towns, Buildings, ConstructionGoogle Scholar
  2. 2.
    Annesley J, Colombo A, Orwell J, Velastin S (2007) A profile of MPEG-7 for visual surveillance, IEEE Int. Conf. AVSS, 482–487Google Scholar
  3. 3.
    Bouma H, Vogels J, Aarts O, Kruszynski C, Wijn R, Burghouts G (2013) Behavioral profiling in CCTV cameras by combining multiple subtle suspicious observations of different surveillance operators, Proc. SPIE 8745Google Scholar
  4. 4.
    Burghouts GJ, Marck J (2011) Reasoning about threats: from observables to situation assessment. IEEE Trans Syst Man Cybern 41(5):608–616CrossRefGoogle Scholar
  5. 5.
    Buschmann F, Meunier R, Rohnert H, Sommerlad P (1996) Pattern-Oriented Software Architecture, Volume 1: A System of Patterns. John Wiley & SonsGoogle Scholar
  6. 6.
    CAVIAR: Context aware vision using image-based active recognition.
  7. 7.
    CVML: Computer vision markup language.
  8. 8.
    Doermann D, Mihalcik D (2000) Tools and techniques for video performance evaluation. ICPR 4:167–170Google Scholar
  9. 9.
    Dutch top sector high tech systems & materials: Roadmap security, passive sensors.
  10. 10.
    ETISIO: Video understanding evaluation.
  11. 11.
    Fisher RB (2004) The PETS04 surveillance ground-truth data sets. Proc. 6th IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance, pp 1–5Google Scholar
  12. 12.
    Francois AR, Nevatia R, Hobbs J, Bolles RC, Smith JR (2005) VERL: an ontology framework for representing and annotating video events. Multimedia, IEEE 12(4):76–86Google Scholar
  13. 13.
  14. 14.
    I-LIDS: Imagery library for intelligent detection systems. Home Office, UKGoogle Scholar
  15. 15.
    INCOSE, a consensus of the INCOSE fellows.
  16. 16.
    Kester LJHM (2008) Designing networked adaptive interactive hybrid systems. IEEE Multisensor Fusion and Integration for Intelligent Systems, 2008, MFI 2008, pp 516–521Google Scholar
  17. 17.
    Kipp M (2013) Anvil: the video research annotation tool. accessed January 4th 2013
  18. 18.
    Kipp M (2013) Anvil 4.0 Annotation of video and spoken languageGoogle Scholar
  19. 19.
    La Vigne NG (2011) Evaluating the use of public surveillance cameras for crime control and preventionGoogle Scholar
  20. 20.
    Lenat DB, Guha RV (1990) Building large knowledge-based systems: representation and inference in the CYC project. Addison–Wesley, ReadingGoogle Scholar
  21. 21.
    List T, Fisher RB (2004) CVML-an XML-based computer vision markup language. Int Conf Pattern Recog (ICPR) 1:789–792Google Scholar
  22. 22.
    Lyon D (2007) Surveillance studies: an overview. Polity Press, CambridgeGoogle Scholar
  23. 23.
    Mariano VY, Min J, Park J-H, Kasturi R, Mihalcik D, Li H et al (2002) Performance evaluation of object detection algorithms. ICPR 3:965–969Google Scholar
  24. 24.
    Masolo C, Borgo S, Gangemi A, Guarino N, Oltramari A (2003) Ontology library (final). IST Project 2001–33052 WonderWeb Deliverable D18Google Scholar
  25. 25.
  26. 26.
    MPEG-7: Moving pictures expert groupGoogle Scholar
  27. 27.
    Neely H (2010) Modeling Threat Behaviors in Surveillance Video Metadata for Detection using an Analogical Reasoner, IEEE Aerospace conferenceGoogle Scholar
  28. 28.
    Nghiem AT, Bremond F, Thonnat M, Valentin V (2007) ETISEO, performance evaluation for video surveillance systems. IEEE Conference On Advanced Video and Signal Based Surveillance, AVSS 2007, pp 476–481Google Scholar
  29. 29.
    Niles I, Pease A (2001) Towards a Standard Upper Ontology. In: Welty C, Smith B (eds) Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Ogunquit, Maine, October 17–19, 2001Google Scholar
  30. 30.
  31. 31.
    Online resource pickpocket videoGoogle Scholar
  32. 32.
    ONVIF: Open network video interface forum.
  33. 33.
    Over P, Awad G, Fiscus J, Antonishek B, Michel M, Smeaton AF, et al (2011) Proceedings of TRECVID 2010—An overview of the goals, tasks, data, evaluation mechanisms, and metrics, Gaithersburg, Md., USAGoogle Scholar
  34. 34.
    PETS: Performance evaluation of tracking and surveillance.
  35. 35.
    PSIA: Physical security interoperability alliance.
  36. 36.
    SanMiguel JC, Martinez JM, Garcia A (2009) An ontology for event detection and its application in surveillance video, IEEE Int. Conf. AVSS, pp 220–225Google Scholar
  37. 37.
    Schallauer P, Bailer W, Hofmann A, Mörzinger R (2009) SAM: An interoperable metadata model for multimodal surveillance applications. Proc. SPIE, 7344Google Scholar
  38. 38.
    Sowa JF (1976) Conceptual graphs for a database interface. IBM J Res Dev 20(4):336–357CrossRefzbMATHMathSciNetGoogle Scholar
  39. 39.
    Sowa JF (1984) Conceptual graphs. Information Processing in Mind and Machine, 39–44Google Scholar
  40. 40.
    Steinberg AN, Bowman CL, White FE (1999) Revisions to the JDL data fusion model. Environmental Research Institute of Michigan Arlington VAGoogle Scholar
  41. 41.
    Surveillance of Unattended Baggage and the Identification and Tracking of the Owner (SUBITO) consortium (December 2011), SUBITO Deliverable D100.2: Final ReportGoogle Scholar
  42. 42.
    Suzić R (2005) A generic model of tactical plan recognition for threat assessment. Proc. SPIEGoogle Scholar
  43. 43.
    TRECVID: TREC video retrieval evaluation.
  44. 44.
    UK Home Office, Invitation to Tender Efficient Archive Retrieval & Auto Searching (EARS) CONTEST Project, Accessed June 2012
  45. 45.
  46. 46.
    Westermann U, Jain R (2007) Toward a common event model for multimedia applications. IEEE Multimedia 14(1):19–29CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • J. van Rest
    • 1
  • F. A. Grootjen
    • 2
  • M. Grootjen
    • 1
  • R. Wijn
    • 1
  • O. Aarts
    • 1
  • M. L. Roelofs
    • 1
  • G. J. Burghouts
    • 1
  • H. Bouma
    • 1
  • L. Alic
    • 1
  • W. Kraaij
    • 1
  1. 1.TNOThe HagueThe Netherlands
  2. 2.Radboud UniversityNijmegenThe Netherlands

Personalised recommendations