Advertisement

Tools for Multimodal Annotation

  • Steve Cassidy
  • Thomas Schmidt
Chapter

Abstract

Researchers interested in the sounds of speech or the physical gestures of speakers make use of audio and video recordings in their work. Annotating these recordings presents a different set of requirements to the annotation of text. Special purpose tools have been developed to display video and audio signals and to allow the creation of time-aligned annotations. This chapter reviews the most widely used of these tools for both manual and automatic generation of annotations on multimodal data.

Keywords

Speech Video Annotation Multimodal Survey 

References

  1. 1.
    Anguera Miro, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: a review of recent research. IEEE Trans. Audio Speech Lang. Process. 20(2), 356–370 (2012)CrossRefGoogle Scholar
  2. 2.
    Barnett, R., Codó, E., Eppler, E., Forcadell, M., Gardner-Chloros, P., van Hout, R., Moyer, M., Torras, M.C., Turell, M.T., Sebba, M., et al.: The lides coding manual: a document for preparing and analyzing language interaction data version 1.1-July 1999. Int. J. Biling. 4(2), 131–271 (2000)CrossRefGoogle Scholar
  3. 3.
    Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: development and use of a tool for assisting speech corpora production. Speech Commun. 33(1,2), 5–22 (2000)Google Scholar
  4. 4.
    Beckman, M.E., Hirschberg, J.B., Shattuck-Hufnagel, S.: The original tobi system and the evolution of the tobi framework. Prosodic Models and Transcription: Towards Prosodic Typology, pp. 9–54. Oxford University Press, Oxford (2004)Google Scholar
  5. 5.
    Bert, M., Bruxelles, S., Etienne, C., Mondada, L., Traverso, V.: Tool-assisted analysis of interactional corpora: voilà in the clapi database. J. Fr. Lang. Stud. 18(01), 121–145 (2008)Google Scholar
  6. 6.
    Bird, S., Liberman, M.: A formal framework for linguistics annotation. Speech Commun. 33(1), 23–60 (2001)CrossRefGoogle Scholar
  7. 7.
    Boersma, P.: The use of praat in corpus research. In: Durand, J., Gut, U., Kristoffersen, G. (eds.) The Oxford Handbook of Corpus Phonology, pp. 342–360. Oxford University Press, Oxford (2014)Google Scholar
  8. 8.
    Cassidy, S., Harrington, J.: Multi-level annotation in the Emu speech database management system. Speech Commun. 33, 61–77 (2000)CrossRefGoogle Scholar
  9. 9.
    Du Bois, J.W., Schuetze-Coburn, S., Cumming, S., Paolino, D.: Outline of discourse transcription. In: Edwards, J.A., Lampert, M.D. (eds.) Talking Data: Transcription and Coding in Discourse Research, pp. 45–89. Lawrence Erlbaum Associates, New Jersey (1993)Google Scholar
  10. 10.
    Ehlich, K., Rehbein, J.: Halbinterpretative Arbeitstranskriptionen (HIAT). Linguistische Berichte 45, 21–41 (1976)Google Scholar
  11. 11.
    Glenn, M.L., Strassel, S.M., Lee, H.: Xtrans: a speech annotation and transcription tool. In: Proceedings of Interspeech, ISCA, Brighton, UK (2009)Google Scholar
  12. 12.
    Goldman, J.P.: EasyAlign: an automatic phonetic alignment tool under praat. In: INTERSPEECH, pp. 3233–3236 (2011)Google Scholar
  13. 13.
    John, T., Bombien, L.: Emu. In: Durand, J., Gut, U., Kristoffersen, G. (eds.) The Oxford Handbook of Corpus Phonology, pp. 321–341. Oxford University Press, Oxford (2014)Google Scholar
  14. 14.
    Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine julius. In: Proceedings APSIPA ASC 2009, Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference, International Organizing Committee, pp. 131–137 (2009)Google Scholar
  15. 15.
    MacWhinney, B.: The CHILDES Project: Tools for Analyzing Talk. Lawrence Erlbaum Associates, Mahwah (2000)Google Scholar
  16. 16.
    Nivre, J., Allwood, J., Grönqvist, L., Gunnarsson, M., Ahlsén, E., Vappula, H., Hagman, J., Larsson, S., Sofkova, S., Ottesjö, C.: Göteborg transcription standard. http://www.ling.gu.se/projekt/tal/index.cgi?PAGE=6 (2007)
  17. 17.
    Rehbein, J., Schmidt, T., Meyer, B., Watzke, F., Herkenrath, A.: Handbuch für das computergestützte Transkribieren nach HIAT. Sonderforschungsbereich 538 (2004)Google Scholar
  18. 18.
    Rosenfelder, I., Fruehwald, J., Evanini, K., Yuan, J.: FAVE (Forced Alignment and Vowel Extraction) Program Suite. http://fave.ling.upenn.edu (2011)
  19. 19.
    Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Proceedings of Interspeech 2013, ISCA, Lyon, France (2013)Google Scholar
  20. 20.
    Sacks, H., Schegloff, E.A., Jefferson, G.: A simplest systematics for the organization of turn-taking for conversation. Language 50, 696–735 (1974)CrossRefGoogle Scholar
  21. 21.
    Salvi, G., Vanhainen, N.: The wavesurfer automatic speech recognition plugin. In: Chair, N.C.C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland (2014)Google Scholar
  22. 22.
    Schiel, F., Draxler, C., Harrington, J.: Phonemic segmentation and labelling using the MAUS technique. In: New Tools and Methods for Very-Large-Scale Phonetics Research, University of Pennsylvania (2011)Google Scholar
  23. 23.
    Schmidt, T.: Exmaralda and the folk tools. In: Proceedings of LREC, ELRA. http://www.lrec-conf.org/proceedings/lrec2012/pdf/529_Paper.pdf (2012)
  24. 24.
    Schmidt, T., Duncan, S., Ehmer, O., Hoyt, J., Kipp, M., Loehr, D., Magnusson, M., Rose, T., Sloetjes, H.: An exchange format for multimodal annotations. In: Kipp, M., Martin, J.C., Paggio, P., Heylen, D. (eds.) Multimodal Corpora. Lecture Notes in Computer Science, vol. 5509, pp. 207–221. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04793-0_13
  25. 25.
    Selting, M., Auer, P., Barth-Weingarten, D., Bergmann, J., Bergmann, P., Birkner, K., Couper-Kuhlen, E., Deppermann, A., Gilles, P., Gunthner, S., Hartung, M., derike Kern, F., Mertzlufft, C., Meyer, C., Morek, M., Oberzaucher, F., Peters, J., Quasthoff, U., Schutte, W., Stukenbrock, A., Uhmann, S.: Gesprächsanalytisches Transkriptionssystem 2 (GAT 2). In: Gesprächsforschung - Online-Zeitschrift zur verbalen Interaktion, vol. 10, pp. 353–402 (2009)Google Scholar
  26. 26.
    Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14(5), 1557–1565 (2006)CrossRefGoogle Scholar
  27. 27.
    Wells, J.: SAMPA computer readable phonetic alphabet. In: Gibbon, D., Moore, R., Winski, R. (eds.) Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin (1997)Google Scholar
  28. 28.
    Winkelmann, R., Raess, G.: Introducing a web application for labeling, visualizing speech and correcting derived speech signals. In: Chair, N.C.C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland (2014)Google Scholar
  29. 29.
    Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., et al.: The HTK Book, vol. 2. Entropic Cambridge Research Laboratory, Cambridge (1997)Google Scholar
  30. 30.
    Yuan, J., Liberman, M.: Speaker identification on the SCOTUS corpus. In: Proceedings of Acoustics ’08 (2008)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  1. 1.Department of ComputingMacquarie UniversitySydneyAustralia
  2. 2.SFB MultilingualismUniversity of HamburgHamburgGermany

Personalised recommendations