Skip to main content

Advertisement

Log in

CASAM: collaborative human-machine annotation of multimedia

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The CASAM multimedia annotation system implements a model of cooperative annotation between a human annotator and automated components. The aim is that they work asynchronously but together. The system focuses upon the areas where automated recognition and reasoning are most effective and the user is able to work in the areas where their unique skills are required. The system’s reasoning is influenced by the annotations provided by the user and, similarly, the user can see the system’s work and modify and, implicitly, direct it. The CASAM system interacts with the user by providing a window onto the current state of annotation, and by generating requests for information which are important for the final annotation or to constrain its reasoning. The user can modify the annotation, respond to requests and also add their own annotations. The objective is that the human annotator’s time is used more effectively and that the result is an annotation that is both of higher quality and produced more quickly. This can be especially important in circumstances where the annotator has a very restricted amount of time in which to annotate the document. In this paper we describe our prototype system. We expand upon the techniques used for automatically analysing the multimedia document, for reasoning over the annotations generated and for the generation of an effective interaction with the end-user. We also present the results of evaluations undertaken with media professionals in order to validate the approach and gain feedback to drive further research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Abowd GD, Gauger M, Lachenmann A (2003) The Family video archive: an annotation and browsing environment for home movies. Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval - MIR ’03. pp. 1–8 ACM Press, Berkeley, California

  2. Anthimopoulos M, Vlissidis N, Gatos B (2010) A pixel-based evaluation method for text detection in color images. Proceding of 20th International Conference on Pattern Recognition (ICPR), pp. 3264–3267, 23-26 Aug. 2010

  3. Bailey BP, Konstan JA, Carlis JV (2000) Measuring the effects of interruptions on task performance in the user interface. SMC 2000 Conference Proceedings 2000 IEEE International Conference on Systems Man and Cybernetics Cybernetics Evolving to Systems Humans Organizations and their Complex Interactions Cat No00CH37166. 2, 757–762

  4. Bargeron D, Gupta A, Grudin J, Sanocki E (1999) Annotations for streaming video on the web. In: CHI '99 Extended Abstracts on Human Factors in Computing Systems (CHI EA '99). ACM, New York, NY, USA, pp. 278–279

  5. Bowers C, Byrne W, Cowan BR, Creed C, Hendley RJ, Beale R (2011) Choosing your moment: interruptions in multimedia annotation. Human-Computer Interaction–INTERACT 2011. pp. 438–453 Springer

  6. Burr B (2006) VACA: a tool for qualitative video analysis. In: CHI '06 Extended Abstracts on Human Factors in Computing Systems (CHI EA '06). ACM, New York, NY, USA, pp. 622–627

  7. Cavnar WB, Trenkle JM (1994) N-Gram-Based text categorization. Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Ann Arbor MI. 48113, 2, 4001. pp. 161–175

  8. Chen L, Chena GC, Xua CZ, March J, Benford S (2007) EmoPlayer: A media player for video clips with affective annotations. Interact Comput 20:17–28

    Google Scholar 

  9. Cherry G, Fournier J, Reed S (2003) Using a Digital Video Annotation Tool to Teach Dance Composition. Interact Multimedia Electron J of Comput-Enhanc Learn 5:1

    Google Scholar 

  10. Correia N, Cabral D (2006) Interfaces for Video Based Web Lectures. Sixth IEEE International Conference on Advanced Learning Technologies ICALT06. 634–638 IEEE Computer Society

  11. Costa M, Correia N, Guimarães N (2002) Annotations as multiple perspectives of video content. Proceedings of the tenth ACM international conference on Multimedia MULTIMEDIA 02. pp. 283–286 ACM Press

  12. Creed C, Bowers CP, Hendley RJ, Beale R (2010) User perception of interruptions in multimedia annotation tasks. Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries. pp. 619–622 ACM

  13. Creed, C., Lonsdale, P, Hendley R, Beale R (2010) Synergistic Annotation of Multimedia Content. Proc. 3rd International Conference on Advances in Computer-Human Interactions. pp. 205–208 IEEE

  14. Domingos P, Richardson M (2004) Markov Logic: A Unifying Framework for Statistical Relational Learning. Engineering 10:49–54

    Google Scholar 

  15. Espinosa S (2011) Content Management and Knowledge Management: Two Faces of Ontology-Based Text Interpretation. PhD Thesis, Hamburg University of Technology

  16. Fagá Jr., R. et al. (2010) A social approach to authoring media annotations. Proceedings of the 10th ACM symposium on document engineering. pp. 17–26 ACM

  17. Giannakopoulos T, Petridis S (2010) Unsupervised Speaker Clustering in a Linear Discriminant Subspace. Ninth International Conference on Machine Learning and Applications (ICMLA 2010). pp. 1005–1009 IEEE Press

  18. Giannakopoulos T, Petridis S, Perantonis S (2010) User-driven recognition of audio events in news videos. 2010 Fifth International Workshop Semantic Media Adaptation and Personalization. pp. 44–49 IEEE

  19. Gries O, Möller R (2010) Gibbs sampling in probabilistic description logics with deterministic dependencies. Proc. of the First International Workshop on Uncertainty in Description Logics, Edinburgh

  20. Gries O, Moller R, Nafissi A, Rosenfeld M, Sokolski K, Wessel M (2010) A probabilistic abduction engine for media interpretation based on ontologies. Proceedings of the Fourth international conference on Web reasoning and rule systems (RR'10). Springer. pp. 182–194

  21. Gries O et al. (2010) Media interpretation and companion feedback for multimedia annotation. The 5th International Conference on Semantic and Digital Media Technologies (SAMT 2010), Lecture Notes in Computer Science. Springer. pp. 1–15

  22. Guimarães RL, Cesar C, Bulterman DCA (2010) Creating and sharing personalized time-based annotations of videos on the web. Proceedings of the 10th ACM symposium on Document engineering DocEng 10. 27–36

  23. Haarslev V, Möller R (2003) Racer: An owl reasoning agent for the semantic web. Procedings of the International Workshop on Applications, Products and Services of Web-based Support Systems, in conjunction with 2003 IEEE/WIC International Conference on Web Intelligence, Vol. 13, pp. 91–95

  24. Hagedorn J, Hailpern J, Karahalios KG (2008) VCode and VData: Illustrating a new framework for supporting the video annotation workflow. Proceedings of the working conference on Advanced visual interfaces. pp. 317–321 ACM Press, Napoli, Italy

  25. Hunter J, Schroeter R (2008) Co-Annotea: A system for tagging relationships between multiple mixed-media objects. Multimedia IEEE 15(3):42–53

    Article  Google Scholar 

  26. Kaye A (2011) A logic-based approach to multimedia interpretation. PhD Thesis, Hamburg University of Technology

  27. Kipp M (2001) Anvil - a generic annotation tool for multimodal dialogue. Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech). pp. 1367–1370

  28. Lowe DG (1999) Object recognition from local scale-invariant features. Proceedings of the Seventh IEEE International Conference on Computer Vision. 2, 8, 1150–1157 vol.2

  29. Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  30. Nack F, Putz W (2001) Designing annotation before it’s needed. Proceedings of the ninth ACM international conference on Multimedia - MULTIMEDIA ’01. 251–260 ACM Press, New York, New York, USA

  31. Neuschmied H, Trichet R, Merialdo B (2007) Fast annotation of video objects for interactive TV. Proceedings of the 15th international conference on Multimedia. pp. 158–159 ACM Press, Augsburg, Germany

  32. Patel SN, Abowd GD (2004) The ContextCam: Automated point of capture video annotation. UbiComp 2004: Ubiquitous Computing. 301–318

  33. Schaeffer S (2007) Graph clustering. Comput Sci Rev 1(1):27–64

    Article  MathSciNet  Google Scholar 

  34. Schroeter R, Hunter J, Guerin J, Khan I, Henderson M (2006) A Synchronous Multimedia Annotation System for Secure Collaboratories. 2006 s IEEE International Conference on eScience and Grid Computing eScience06. 41–41

  35. Smeaton AF et al. (2006) Evaluation campaigns and TRECVid. In: Wang, J.Z. et al. (eds.) MIR 06 Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval. pp. 321–330 ACM Press

  36. Toutanova K, Manning CD (2000) Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics. 13, 63–70

  37. Tsatsaronis, G., Varlamis, I., Vazirgiannis, M. & Norvag, K.: Omiotis: A thesaurus-based measure of text relatedness. Machine Learning and Knowledge Discovery in Databases. 5782, 742–745 (2009).

    Google Scholar 

  38. Tsatsaronis G, Vazirgiannis M, Androutsopoulos I (2007) Word Sense Disambiguation with Spreading Activation Networks Generated from Thesauri. Strategy. 1725–1730

  39. Viola P, Jones MJ (2004) Robust Real-Time Face Detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  40. Zhai G, Geoffrey F, Marlon P, Wenjun W, Hasan B (2005) eSports: Collaborative and Synchronous Video Annotation System in Grid Computing Environment. ISM 05 Proceedings of the Seventh IEEE International Symposium on Multimedia. pp. 95–103 Ieee

Download references

Acknowledgements

This work was supported by the European commission and partly funded through project FP7-217061. We would like to thank all members of the CASAM project team who contributed to the results of this work, and to all the users who gave their time and comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Russell Beale.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hendley, R.J., Beale, R., Bowers, C.P. et al. CASAM: collaborative human-machine annotation of multimedia. Multimed Tools Appl 70, 1277–1308 (2014). https://doi.org/10.1007/s11042-012-1255-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1255-1

Keywords

Navigation