Skip to main content

Tools for Multimodal Annotation

  • Chapter
  • First Online:
Handbook of Linguistic Annotation

Abstract

Researchers interested in the sounds of speech or the physical gestures of speakers make use of audio and video recordings in their work. Annotating these recordings presents a different set of requirements to the annotation of text. Special purpose tools have been developed to display video and audio signals and to allow the creation of time-aligned annotations. This chapter reviews the most widely used of these tools for both manual and automatic generation of annotations on multimodal data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://transag.sourceforge.net/.

  2. 2.

    http://www.r-project.org/.

  3. 3.

    https://clarin.phonetik.uni-muenchen.de/BASWebServices/.

  4. 4.

    http://www.ling.upenn.edu/phonetics/p2fa/.

  5. 5.

    http://fave.ling.upenn.edu/.

  6. 6.

    http://latlcui.unige.ch/phonetique/easyalign.php.

  7. 7.

    https://github.com/bbcrd/diarize-jruby.

  8. 8.

    http://icar.univ-lyon2.fr/projets/corinte/documents/2013_Conv_ICOR_250313.pdf.

References

  1. Anguera Miro, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: a review of recent research. IEEE Trans. Audio Speech Lang. Process. 20(2), 356–370 (2012)

    Article  Google Scholar 

  2. Barnett, R., Codó, E., Eppler, E., Forcadell, M., Gardner-Chloros, P., van Hout, R., Moyer, M., Torras, M.C., Turell, M.T., Sebba, M., et al.: The lides coding manual: a document for preparing and analyzing language interaction data version 1.1-July 1999. Int. J. Biling. 4(2), 131–271 (2000)

    Article  Google Scholar 

  3. Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: development and use of a tool for assisting speech corpora production. Speech Commun. 33(1,2), 5–22 (2000)

    Google Scholar 

  4. Beckman, M.E., Hirschberg, J.B., Shattuck-Hufnagel, S.: The original tobi system and the evolution of the tobi framework. Prosodic Models and Transcription: Towards Prosodic Typology, pp. 9–54. Oxford University Press, Oxford (2004)

    Google Scholar 

  5. Bert, M., Bruxelles, S., Etienne, C., Mondada, L., Traverso, V.: Tool-assisted analysis of interactional corpora: voilà in the clapi database. J. Fr. Lang. Stud. 18(01), 121–145 (2008)

    Google Scholar 

  6. Bird, S., Liberman, M.: A formal framework for linguistics annotation. Speech Commun. 33(1), 23–60 (2001)

    Article  Google Scholar 

  7. Boersma, P.: The use of praat in corpus research. In: Durand, J., Gut, U., Kristoffersen, G. (eds.) The Oxford Handbook of Corpus Phonology, pp. 342–360. Oxford University Press, Oxford (2014)

    Google Scholar 

  8. Cassidy, S., Harrington, J.: Multi-level annotation in the Emu speech database management system. Speech Commun. 33, 61–77 (2000)

    Article  Google Scholar 

  9. Du Bois, J.W., Schuetze-Coburn, S., Cumming, S., Paolino, D.: Outline of discourse transcription. In: Edwards, J.A., Lampert, M.D. (eds.) Talking Data: Transcription and Coding in Discourse Research, pp. 45–89. Lawrence Erlbaum Associates, New Jersey (1993)

    Google Scholar 

  10. Ehlich, K., Rehbein, J.: Halbinterpretative Arbeitstranskriptionen (HIAT). Linguistische Berichte 45, 21–41 (1976)

    Google Scholar 

  11. Glenn, M.L., Strassel, S.M., Lee, H.: Xtrans: a speech annotation and transcription tool. In: Proceedings of Interspeech, ISCA, Brighton, UK (2009)

    Google Scholar 

  12. Goldman, J.P.: EasyAlign: an automatic phonetic alignment tool under praat. In: INTERSPEECH, pp. 3233–3236 (2011)

    Google Scholar 

  13. John, T., Bombien, L.: Emu. In: Durand, J., Gut, U., Kristoffersen, G. (eds.) The Oxford Handbook of Corpus Phonology, pp. 321–341. Oxford University Press, Oxford (2014)

    Google Scholar 

  14. Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine julius. In: Proceedings APSIPA ASC 2009, Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference, International Organizing Committee, pp. 131–137 (2009)

    Google Scholar 

  15. MacWhinney, B.: The CHILDES Project: Tools for Analyzing Talk. Lawrence Erlbaum Associates, Mahwah (2000)

    Google Scholar 

  16. Nivre, J., Allwood, J., Grönqvist, L., Gunnarsson, M., Ahlsén, E., Vappula, H., Hagman, J., Larsson, S., Sofkova, S., Ottesjö, C.: Göteborg transcription standard. http://www.ling.gu.se/projekt/tal/index.cgi?PAGE=6 (2007)

  17. Rehbein, J., Schmidt, T., Meyer, B., Watzke, F., Herkenrath, A.: Handbuch für das computergestützte Transkribieren nach HIAT. Sonderforschungsbereich 538 (2004)

    Google Scholar 

  18. Rosenfelder, I., Fruehwald, J., Evanini, K., Yuan, J.: FAVE (Forced Alignment and Vowel Extraction) Program Suite. http://fave.ling.upenn.edu (2011)

  19. Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Proceedings of Interspeech 2013, ISCA, Lyon, France (2013)

    Google Scholar 

  20. Sacks, H., Schegloff, E.A., Jefferson, G.: A simplest systematics for the organization of turn-taking for conversation. Language 50, 696–735 (1974)

    Article  Google Scholar 

  21. Salvi, G., Vanhainen, N.: The wavesurfer automatic speech recognition plugin. In: Chair, N.C.C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland (2014)

    Google Scholar 

  22. Schiel, F., Draxler, C., Harrington, J.: Phonemic segmentation and labelling using the MAUS technique. In: New Tools and Methods for Very-Large-Scale Phonetics Research, University of Pennsylvania (2011)

    Google Scholar 

  23. Schmidt, T.: Exmaralda and the folk tools. In: Proceedings of LREC, ELRA. http://www.lrec-conf.org/proceedings/lrec2012/pdf/529_Paper.pdf (2012)

  24. Schmidt, T., Duncan, S., Ehmer, O., Hoyt, J., Kipp, M., Loehr, D., Magnusson, M., Rose, T., Sloetjes, H.: An exchange format for multimodal annotations. In: Kipp, M., Martin, J.C., Paggio, P., Heylen, D. (eds.) Multimodal Corpora. Lecture Notes in Computer Science, vol. 5509, pp. 207–221. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04793-0_13

  25. Selting, M., Auer, P., Barth-Weingarten, D., Bergmann, J., Bergmann, P., Birkner, K., Couper-Kuhlen, E., Deppermann, A., Gilles, P., Gunthner, S., Hartung, M., derike Kern, F., Mertzlufft, C., Meyer, C., Morek, M., Oberzaucher, F., Peters, J., Quasthoff, U., Schutte, W., Stukenbrock, A., Uhmann, S.: Gesprächsanalytisches Transkriptionssystem 2 (GAT 2). In: Gesprächsforschung - Online-Zeitschrift zur verbalen Interaktion, vol. 10, pp. 353–402 (2009)

    Google Scholar 

  26. Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14(5), 1557–1565 (2006)

    Article  Google Scholar 

  27. Wells, J.: SAMPA computer readable phonetic alphabet. In: Gibbon, D., Moore, R., Winski, R. (eds.) Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin (1997)

    Google Scholar 

  28. Winkelmann, R., Raess, G.: Introducing a web application for labeling, visualizing speech and correcting derived speech signals. In: Chair, N.C.C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland (2014)

    Google Scholar 

  29. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., et al.: The HTK Book, vol. 2. Entropic Cambridge Research Laboratory, Cambridge (1997)

    Google Scholar 

  30. Yuan, J., Liberman, M.: Speaker identification on the SCOTUS corpus. In: Proceedings of Acoustics ’08 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steve Cassidy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Cassidy, S., Schmidt, T. (2017). Tools for Multimodal Annotation. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_7

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics