GAT: a Graphical Annotation Tool for semantic regions

  • 222 Accesses

  • 11 Citations


This article presents GAT, a Graphical Annotation Tool based on a region-based hierarchical representation of images. The proposed solution uses Partition Trees to navigate through the image segments which are automatically defined at different spatial scales. Moreover, the system focuses on the navigation through ontologies for a semantic annotation of objects and of the parts that compose them. The tool has been designed under usability criteria to minimize the user interaction by trying to predict the future selection of regions and semantic classes. The implementation uses MPEG-7/XML input and output data to allow interoperability with any type of Partition Tree. This tool is publicly available and its source code can be downloaded under a free software license.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 199

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.


  1. 1.

    Akrivas G, Wallace M, Andreou G, Stamou G, Kollias S. (2002) Context-sensitive semantic query expansion. Proc. of IEEE International Conference on Artificial Intelligence Systems (ICAIS). Geelong, Australia, p 109. doi:10.1109/ICAIS.2002.1048064

  2. 2.

    Alatan A, Onural L, Wollborn M, Mech R, Tuncel E, Sikora T (1998) Image sequence analysis for emerging interactive multimedia services-the european cost 211 framework. IEEE Trans Circuits Syst Video Technol 8:802–813

  3. 3.

    Arndt R, Troncy R, Staab S, Hardman L, Vacura M (2008) COMM: sesigning a well-founded multimedia ontology for the web. The semantic web. Springer, Berlin, pp 30–43. doi:10.1007/978-3-540-76298-0_3

  4. 4.

    Ballester C, Caselles V, Monasse P (2003) The tree of shapes of an image. ESAIM, COCV 9:1–18. doi:10.1109/83.663500

  5. 5.

    Bloehdorn S, Petridis K, Saathoff C, Simou N, Tzouvaras V, Avrithis Y, Handscuh S, Kompatsiaris I, Staab S, Strinzis MG (2005) Semantic annotation of images and videos for multimedia analaysis. Proc. 2nd European Semantic Web Conference. Heraklion, Greece. doi:10.1007/11431053_40

  6. 6.

    Burnett IS, Pereira F, Van de Walle R, Koenen R (2006) The MPEG-21 book. Wiley, Chichester

  7. 7.

    Calderero F, Marques F (2008) Object-based evaluation of hierarchical region-based representations based on information theory statistical measures. Proceedings CBMI 2008 (International Sixth International Workshop on Content-Based Multimedia Indexing). London, UK

  8. 8.

    Dasiopoulou S, Tzouvaras V, Kompatsiaris I, Strinzis MG (2008) Capturing MPEG-7 Semantics. Metadata and semantics. Springer, US, pp 113–122

  9. 9.

    Dimitrova N, McGee T, Elenbass H (1997) Video keyframe extraction and filtering: a keyframe is not a keyframe to everyone. Proceedings of the sixth international conference on Information and knowledge management. Las Vegas, USA, pp 113–120. doi:10.1145/266714.266876

  10. 10.

    Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2008) The PASCAL visual object classes challenge available via Accessed 23 Jun 2009

  11. 11.

    Facebook (2009) Available via Accessed 23 Jun 2009

  12. 12.

    Flickr (2009) Available via Accessed 23 Jun 2009

  13. 13.

    Garrido L, Salembier P (2000) Binary partition tree as an efficient representation for image processing, segmentation and information retrieval. IEEE Trans Image Process, 561–576. doi:10.1109/83.841934

  14. 14.

    Gu C, Lee MC (1998) Semiautomatic segmentation and tracking of semantic video objects. IEEE Trans Circuits Syst Video Technol 8:572–584. doi:10.1109/76.718504

  15. 15.

    Kruse S, Bardella X, Schweitzer F, Valero M (1998) An interactive image segmentation scheme. Proc. of Picture Coding Symposium. Portland, USA, pp 169–173

  16. 16.

    Manjunath BS, Salembier P, Sikora T (2002) Introduction to MPEG 7: multimedia content description language. Wiley, Chichester

  17. 17.

    Marcotegui B, Correia P, Marques F, Mech R, Rosa R, Wollborn M, Zanoguera F (1999) A video generation tool allowing friendly user interaction. Proceedings of the ICIP 99, IEEE International Conference on Image Processing. Kobe, Japan

  18. 18.

    Marques F, Salembier P (1999) Region-based representation of image and video: segmentation tools for multimedia services. IEEE Trans Circuits Syst Video Technol 9:1147–1167. doi:10.1109/76.809153

  19. 19.

    Marques F, Marcotegui B, Zanoguera F, Correia P, Mech R, Wollborn M (2000) Partition-based image representation as basis for user-assisted segmentation. Proc. International Conference on Image Processing (ICIP), Vol. 1. Vancouver, Canada, pp 312–315. doi:10.1109/ICIP.2000.900957

  20. 20.

    Minka TP, Picard RW (1997) Interactive learning using a “society of models”. Pattern Recogn 30. doi:10.1016/S0031-3203(96)00113-6

  21. 21.

    Monasse P, Guichard F (2000) Fast computation of a contrast-invariant image representation. IEEE Trans Image Process 5:860–872. doi:10.1109/83.841532

  22. 22.

    Monterey Bay Aquarium Research Institute (2009) Video annotation and reference system. Available via Accessed 23 June 2009

  23. 23.

    Naphade MR, Huang TS (2001) Recognizing high-level audio-visual concepts using context. Proceedings IEEE International Conference on Image Processing (ICIP), Vol. 3. Thessaloniki, Greece, pp 46–49

  24. 24.

    O’Connor NE, Adamek T (2007) An automatic stopping criterion for meaningful region-based image segmentation. Semantic Multimedia. Lecture Notes in Computer Science, Vol. 4816. Springer, Berlin, pp 15–27. doi:10.1007/978-3-540-77051-0_2

  25. 25.

    Papadopoulos GT, Mezaris V, Dasiopoulou S, Kompatsiaris I (2006) Semantic image analysis using a learning approach and spatial context. Lectures notes in computer science, Vol. 4306. Springer, Berlin, pp 199–211. doi:10.1007/11930334

  26. 26.

    Petridis K, Anastasopoulos D, Saathoff C, Timmermann N, Kompatsiaris I, Staab S (2006) M-OntoMat-Annotizer: image annotation. Linking ontologies and multimedia low-level features. Proc. of 10th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems (KES 2006). Bournemouth, UK

  27. 27.

    Rehatschek H, Bailer W, Neuschmied H, Ober S, Bischof H (2007) A tool supporting annotation and analysis of videos. Reconfigurations. Interdisciplinary Perspectives on Religion in a Post-Secular Society. pp 253–268. Vienna, Austria.

  28. 28.

    Rosenfeld A, Pietikainen M (1981) Image segmentation by texture using pyramid node linking. IEEE Trans Syst, Machines Cybern SMC-11:822–825

  29. 29.

    Rother C, Kolmogorov V, Blake A. (2004) GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 309–314. doi:10.1145/1015706.1015720

  30. 30.

    Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77:157–173. doi:10.1007/s11263-007-0090-8

  31. 31.

    Saathoff C, Schenk S, Scherp A (2008) KAT: the K-space annotation tool. Proceedings of the SAMT 2008 Demo and Poster Session. Koblenz, Germany

  32. 32.

    Salembier P, Oliveras A, Garrido L (1998) Antiextensive connected operators for image and sequence processing. IEEE Trans Image Process 7:555–570

  33. 33.

    Smith JR, Lugeon B (2000) Visual annotation tool for multimedia content description. Proc. SPIE, Vol. 4210. Boston, MA, USA doi:10.1117/12.403831

  34. 34.

    Troncy R, Van Ossenbruggen J, Pan JZ, Stamou G (2007) Image annotation on the semantic web. W3C Incubator Group., Report of 14 August 2007

  35. 35.

    Vilaplana V, Marques F, Salembier P (2008) Binary partition trees for object detection. IEEE Trans Image Process 17(11):2201–2216

  36. 36.

    Volkmer T, Smith JR, Nastev A (2005) A web-based system for collaborative annotation of large image and video collections: an evaluation and user study. Proceedings of the 13th annual ACM international conference on Multimedia. Singapore, pp 892–901. doi:10.1145/1101149.1101341

  37. 37.

    Xue B, Sapiro G (2007) Distancecut: interactive segmentation and matting of images and videos. Proc. of the IEEE International Conference on Image Processing (ICIP), Vol. 2. San Antonio, USA, pp II -249–II -252. doi:10.1109/ICIP.2007.4379139

Download references


This work was partially founded by the Catalan Broadcasting Corporation (CCMA) and Mediapro S.L. through the Spanish project CENIT-2007-1012 i3media, by TEC2007-66858/TCM PROVEC project of the Spanish Government and by a grant from the Commissioner for Universities and Research of the Innovation, Universities and Industry Department of the Catalan Government.

Copyright warnings

The “TV anchor” and “Formula 1” key-frames used in this paper belongs to TVC, Televisió de Catalunya, and is copyright protected. This key-frame has been provided by TVC with the only goal of research under the framework of the i3media project.

The “soccer” key-frame used in this paper belongs to MEDIAPRO, S.L., and is copyright protected. This key-frame has been provided by MEDIAPRO, S.L. with the only goal of research under the framework of the i3media project.

Author information

Correspondence to Xavier Giro-i-Nieto.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(MPG 12328 kb)

(MPG 8570 kb)

(MPG 7994 kb)

(MPG 7814 kb)

ESM Video 1

(MPG 12328 kb)

ESM Video 2

(MPG 8570 kb)

ESM Video 3

(MPG 7994 kb)

ESM Video 4

(MPG 7814 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Giro-i-Nieto, X., Camps, N. & Marques, F. GAT: a Graphical Annotation Tool for semantic regions. Multimed Tools Appl 46, 155–174 (2010).

Download citation


  • Annotation
  • Region
  • Ontology
  • Navigation
  • Semantic
  • Hierarchical