Skip to main content

Ground truth annotation of traffic video data


This paper presents a software application to generate ground-truth data on video files from traffic surveillance cameras used for Intelligent Transportation Systems (IT systems). The computer vision system to be evaluated counts the number of vehicles that cross a line per time unit –intensity-, the average speed and the occupancy. The main goal of the visual interface presented in this paper is to be easy to use without the requirement of any specific hardware. It is based on a standard laptop or desktop computer and a Jog shuttle wheel. The setup is efficient and comfortable because one hand of the annotating person is almost all the time on the space key of the keyboard while the other hand is on the jog shuttle wheel. The mean time required to annotate a video file ranges from 1 to 5 times its duration (per lane) depending on the content. Compared to general purpose annotation tool a time factor gain of about 7 times is achieved.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. Albiol A et al (2011) Detection of parked vehicles using spatiotemporal maps. IEEE Trans Intell Transport Syst 12(4):1277–1291

    Article  Google Scholar 

  2. Blunsden SJ, Fisher R (2010) The BEHAVE video dataset: ground truthed video for multi-person behavior classification. Annal British Mach Vis Assoc 4:1–12

    Google Scholar 

  3. Bradski G, Kaehler A (2008) Learning OpenCV: Computer vision with the OpenCV library. O'Reilly Media, Incorporated

  4. Brooke J. SUS: a “quick and dirty” usability scale. Usability evaluation in industry. Taylor and Francis

  5. Brostow GJ et al (2009) Semantic object classes in video: a high-definition ground truth database. Pattern Recognit Lett 30(2):88–97

    Article  Google Scholar 

  6. Buch N et al (2011) A review of computer vision techniques for the analysis of urban traffic. IEEE Trans Intell Transp Syst 12(3):920–939

    Article  MathSciNet  Google Scholar 

  7. D’Orazio T et al. (2009) A semi-automatic system for ground truth generation of soccer video sequences. Advanced Video and Signal Based Surveillance, 2009. AVSS’09. Sixth IEEE International Conference on (Sep. 2009), 559–564

  8. Dollar P et al (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

    Article  Google Scholar 

  9. Faro A et al (2011) Adaptive background modeling integrated with luminosity sensors and occlusion processing for reliable vehicle detection. IEEE Trans Intell Transport Syst 12(4):1398–1412

    Article  Google Scholar 

  10. Giro-i-Nieto X et al (2010) GAT: a graphical annotation tool for semantic regions. Multimed Tool Appl 46(2–3):155–174

    Article  Google Scholar 

  11. i-LIDS. Image Library for Intelligent Detection Systems: Home Office Scientific Development Branch, United Kingdom. Last Accessed February 2013

  12. Kasturi R et al (2009) Framework for performance evaluation of face, text, and vehicle detection and tracking in video: data, metrics, and protocol. IEEE Trans Pattern Anal Mach Intell 31(2):319–336

    Article  Google Scholar 

  13. Laganière R (2011) OpenCV 2 computer vision application programming cookbook. Packt Pub Limited

  14. Lorist MM et al (2000) Mental fatigue and task control: planning and preparation. Psychophysiology 37(5):614–625

    Article  Google Scholar 

  15. Russell B et al (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77(1):157–173

    Article  Google Scholar 

  16. Serrano M, Gracía J, Patricio M, Molina J (2010). Interactive video annotation tool. Distributed Computing and Artificial Intelligence, 325–332

  17. Traffic City Cameras. Ajuntament de València, Spain. Last Accessed February 2013

  18. TREC video retrieval evaluation.

  19. Vezzani R, Cucchiara R (2010) Video Surveillance Online Repository (ViSOR): an integrated framework. Multimed Tool Appl 50(2):359–380

    Article  Google Scholar 

  20. ViPER: the video performance evaluation resource:

  21. Volkmer T et al. (2005) A web-based system for collaborative annotation of large image and video collections: an evaluation and user study. Proceedings of the 13th annual ACM international conference on Multimedia (New York, NY, USA, 2005), 892–901

  22. Zhang HB, Li SA, Chen SY, Su SZ, Duh DJ, Li SZ (2012) Adaptive photograph retrieval method. Multimedia Tools and Applications, Published online September 2012.

  23. Zou Y et al (2011) Traffic incident classification at intersections based on image sequences by HMM/SVM classifiers. Multimed Tool Appl 52(1):133–145

    Article  Google Scholar 

Download references


The authors thank Etra I+D and Ruth López and Jaime Benlloch from Local Traffic Authority of Valencia, Spain by providing the video files. Also thanks to the VIGTA 2012 Conference Organizers, its participants and the anonymous reviewers of this journal for their valuable advice.

This work was funded by the Spanish Government project MARTA under the CENIT program and CICYT contract TEC2009-09146.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jose M. Mossi.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Mossi, J.M., Albiol, A., Albiol, A. et al. Ground truth annotation of traffic video data. Multimed Tools Appl 70, 461–474 (2014).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Traffic
  • Ground truth
  • Vehicle
  • Video
  • Intelligent transportation systems