Multimedia Tools and Applications

, Volume 46, Issue 2–3, pp 155–174 | Cite as

GAT: a Graphical Annotation Tool for semantic regions



This article presents GAT, a Graphical Annotation Tool based on a region-based hierarchical representation of images. The proposed solution uses Partition Trees to navigate through the image segments which are automatically defined at different spatial scales. Moreover, the system focuses on the navigation through ontologies for a semantic annotation of objects and of the parts that compose them. The tool has been designed under usability criteria to minimize the user interaction by trying to predict the future selection of regions and semantic classes. The implementation uses MPEG-7/XML input and output data to allow interoperability with any type of Partition Tree. This tool is publicly available and its source code can be downloaded under a free software license.


Annotation Region Ontology Navigation Semantic Hierarchical 



This work was partially founded by the Catalan Broadcasting Corporation (CCMA) and Mediapro S.L. through the Spanish project CENIT-2007-1012 i3media, by TEC2007-66858/TCM PROVEC project of the Spanish Government and by a grant from the Commissioner for Universities and Research of the Innovation, Universities and Industry Department of the Catalan Government.

Copyright warnings

The “TV anchor” and “Formula 1” key-frames used in this paper belongs to TVC, Televisió de Catalunya, and is copyright protected. This key-frame has been provided by TVC with the only goal of research under the framework of the i3media project.

The “soccer” key-frame used in this paper belongs to MEDIAPRO, S.L., and is copyright protected. This key-frame has been provided by MEDIAPRO, S.L. with the only goal of research under the framework of the i3media project.

Supplementary material

ESM Video 1

(MPG 12328 kb)

ESM Video 2

(MPG 8570 kb)

ESM Video 3

(MPG 7994 kb)

ESM Video 4

(MPG 7814 kb)


  1. 1.
    Akrivas G, Wallace M, Andreou G, Stamou G, Kollias S. (2002) Context-sensitive semantic query expansion. Proc. of IEEE International Conference on Artificial Intelligence Systems (ICAIS). Geelong, Australia, p 109. doi:10.1109/ICAIS.2002.1048064
  2. 2.
    Alatan A, Onural L, Wollborn M, Mech R, Tuncel E, Sikora T (1998) Image sequence analysis for emerging interactive multimedia services-the european cost 211 framework. IEEE Trans Circuits Syst Video Technol 8:802–813CrossRefGoogle Scholar
  3. 3.
    Arndt R, Troncy R, Staab S, Hardman L, Vacura M (2008) COMM: sesigning a well-founded multimedia ontology for the web. The semantic web. Springer, Berlin, pp 30–43. doi:10.1007/978-3-540-76298-0_3 Google Scholar
  4. 4.
    Ballester C, Caselles V, Monasse P (2003) The tree of shapes of an image. ESAIM, COCV 9:1–18. doi:10.1109/83.663500 MATHMathSciNetGoogle Scholar
  5. 5.
    Bloehdorn S, Petridis K, Saathoff C, Simou N, Tzouvaras V, Avrithis Y, Handscuh S, Kompatsiaris I, Staab S, Strinzis MG (2005) Semantic annotation of images and videos for multimedia analaysis. Proc. 2nd European Semantic Web Conference. Heraklion, Greece. doi:10.1007/11431053_40
  6. 6.
    Burnett IS, Pereira F, Van de Walle R, Koenen R (2006) The MPEG-21 book. Wiley, ChichesterCrossRefGoogle Scholar
  7. 7.
    Calderero F, Marques F (2008) Object-based evaluation of hierarchical region-based representations based on information theory statistical measures. Proceedings CBMI 2008 (International Sixth International Workshop on Content-Based Multimedia Indexing). London, UKGoogle Scholar
  8. 8.
    Dasiopoulou S, Tzouvaras V, Kompatsiaris I, Strinzis MG (2008) Capturing MPEG-7 Semantics. Metadata and semantics. Springer, US, pp 113–122Google Scholar
  9. 9.
    Dimitrova N, McGee T, Elenbass H (1997) Video keyframe extraction and filtering: a keyframe is not a keyframe to everyone. Proceedings of the sixth international conference on Information and knowledge management. Las Vegas, USA, pp 113–120. doi:10.1145/266714.266876
  10. 10.
    Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2008) The PASCAL visual object classes challenge available via Accessed 23 Jun 2009
  11. 11.
    Facebook (2009) Available via Accessed 23 Jun 2009
  12. 12.
    Flickr (2009) Available via Accessed 23 Jun 2009
  13. 13.
    Garrido L, Salembier P (2000) Binary partition tree as an efficient representation for image processing, segmentation and information retrieval. IEEE Trans Image Process, 561–576. doi:10.1109/83.841934
  14. 14.
    Gu C, Lee MC (1998) Semiautomatic segmentation and tracking of semantic video objects. IEEE Trans Circuits Syst Video Technol 8:572–584. doi:10.1109/76.718504 CrossRefGoogle Scholar
  15. 15.
    Kruse S, Bardella X, Schweitzer F, Valero M (1998) An interactive image segmentation scheme. Proc. of Picture Coding Symposium. Portland, USA, pp 169–173Google Scholar
  16. 16.
    Manjunath BS, Salembier P, Sikora T (2002) Introduction to MPEG 7: multimedia content description language. Wiley, ChichesterGoogle Scholar
  17. 17.
    Marcotegui B, Correia P, Marques F, Mech R, Rosa R, Wollborn M, Zanoguera F (1999) A video generation tool allowing friendly user interaction. Proceedings of the ICIP 99, IEEE International Conference on Image Processing. Kobe, JapanGoogle Scholar
  18. 18.
    Marques F, Salembier P (1999) Region-based representation of image and video: segmentation tools for multimedia services. IEEE Trans Circuits Syst Video Technol 9:1147–1167. doi:10.1109/76.809153 CrossRefGoogle Scholar
  19. 19.
    Marques F, Marcotegui B, Zanoguera F, Correia P, Mech R, Wollborn M (2000) Partition-based image representation as basis for user-assisted segmentation. Proc. International Conference on Image Processing (ICIP), Vol. 1. Vancouver, Canada, pp 312–315. doi:10.1109/ICIP.2000.900957
  20. 20.
    Minka TP, Picard RW (1997) Interactive learning using a “society of models”. Pattern Recogn 30. doi:10.1016/S0031-3203(96)00113-6
  21. 21.
    Monasse P, Guichard F (2000) Fast computation of a contrast-invariant image representation. IEEE Trans Image Process 5:860–872. doi:10.1109/83.841532 CrossRefGoogle Scholar
  22. 22.
    Monterey Bay Aquarium Research Institute (2009) Video annotation and reference system. Available via Accessed 23 June 2009
  23. 23.
    Naphade MR, Huang TS (2001) Recognizing high-level audio-visual concepts using context. Proceedings IEEE International Conference on Image Processing (ICIP), Vol. 3. Thessaloniki, Greece, pp 46–49Google Scholar
  24. 24.
    O’Connor NE, Adamek T (2007) An automatic stopping criterion for meaningful region-based image segmentation. Semantic Multimedia. Lecture Notes in Computer Science, Vol. 4816. Springer, Berlin, pp 15–27. doi:10.1007/978-3-540-77051-0_2
  25. 25.
    Papadopoulos GT, Mezaris V, Dasiopoulou S, Kompatsiaris I (2006) Semantic image analysis using a learning approach and spatial context. Lectures notes in computer science, Vol. 4306. Springer, Berlin, pp 199–211. doi:10.1007/11930334
  26. 26.
    Petridis K, Anastasopoulos D, Saathoff C, Timmermann N, Kompatsiaris I, Staab S (2006) M-OntoMat-Annotizer: image annotation. Linking ontologies and multimedia low-level features. Proc. of 10th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems (KES 2006). Bournemouth, UKGoogle Scholar
  27. 27.
    Rehatschek H, Bailer W, Neuschmied H, Ober S, Bischof H (2007) A tool supporting annotation and analysis of videos. Reconfigurations. Interdisciplinary Perspectives on Religion in a Post-Secular Society. pp 253–268. Vienna, Austria.
  28. 28.
    Rosenfeld A, Pietikainen M (1981) Image segmentation by texture using pyramid node linking. IEEE Trans Syst, Machines Cybern SMC-11:822–825Google Scholar
  29. 29.
    Rother C, Kolmogorov V, Blake A. (2004) GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 309–314. doi:10.1145/1015706.1015720
  30. 30.
    Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77:157–173. doi:10.1007/s11263-007-0090-8 CrossRefGoogle Scholar
  31. 31.
    Saathoff C, Schenk S, Scherp A (2008) KAT: the K-space annotation tool. Proceedings of the SAMT 2008 Demo and Poster Session. Koblenz, GermanyGoogle Scholar
  32. 32.
    Salembier P, Oliveras A, Garrido L (1998) Antiextensive connected operators for image and sequence processing. IEEE Trans Image Process 7:555–570CrossRefGoogle Scholar
  33. 33.
    Smith JR, Lugeon B (2000) Visual annotation tool for multimedia content description. Proc. SPIE, Vol. 4210. Boston, MA, USA doi:10.1117/12.403831
  34. 34.
    Troncy R, Van Ossenbruggen J, Pan JZ, Stamou G (2007) Image annotation on the semantic web. W3C Incubator Group., Report of 14 August 2007
  35. 35.
    Vilaplana V, Marques F, Salembier P (2008) Binary partition trees for object detection. IEEE Trans Image Process 17(11):2201–2216CrossRefGoogle Scholar
  36. 36.
    Volkmer T, Smith JR, Nastev A (2005) A web-based system for collaborative annotation of large image and video collections: an evaluation and user study. Proceedings of the 13th annual ACM international conference on Multimedia. Singapore, pp 892–901. doi:10.1145/1101149.1101341
  37. 37.
    Xue B, Sapiro G (2007) Distancecut: interactive segmentation and matting of images and videos. Proc. of the IEEE International Conference on Image Processing (ICIP), Vol. 2. San Antonio, USA, pp II -249–II -252. doi:10.1109/ICIP.2007.4379139

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Xavier Giro-i-Nieto
    • 1
  • Neus Camps
    • 1
  • Ferran Marques
    • 1
  1. 1.Technical University of Catalonia (UPC)BarcelonaSpain

Personalised recommendations