Tools for Multimodal Annotation

Cassidy, Steve; Schmidt, Thomas

doi:10.1007/978-94-024-0881-2_7

Steve Cassidy³ &
Thomas Schmidt⁴

2221 Accesses
6 Citations

Abstract

Researchers interested in the sounds of speech or the physical gestures of speakers make use of audio and video recordings in their work. Annotating these recordings presents a different set of requirements to the annotation of text. Special purpose tools have been developed to display video and audio signals and to allow the creation of time-aligned annotations. This chapter reviews the most widely used of these tools for both manual and automatic generation of annotations on multimodal data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Softcover Book: USD 449.99; Price excludes VAT (USA)

Hardcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Anguera Miro, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: a review of recent research. IEEE Trans. Audio Speech Lang. Process. 20(2), 356–370 (2012)
Article Google Scholar
Barnett, R., Codó, E., Eppler, E., Forcadell, M., Gardner-Chloros, P., van Hout, R., Moyer, M., Torras, M.C., Turell, M.T., Sebba, M., et al.: The lides coding manual: a document for preparing and analyzing language interaction data version 1.1-July 1999. Int. J. Biling. 4(2), 131–271 (2000)
Article Google Scholar
Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: development and use of a tool for assisting speech corpora production. Speech Commun. 33(1,2), 5–22 (2000)
Google Scholar
Beckman, M.E., Hirschberg, J.B., Shattuck-Hufnagel, S.: The original tobi system and the evolution of the tobi framework. Prosodic Models and Transcription: Towards Prosodic Typology, pp. 9–54. Oxford University Press, Oxford (2004)
Google Scholar
Bert, M., Bruxelles, S., Etienne, C., Mondada, L., Traverso, V.: Tool-assisted analysis of interactional corpora: voilà in the clapi database. J. Fr. Lang. Stud. 18(01), 121–145 (2008)
Google Scholar
Bird, S., Liberman, M.: A formal framework for linguistics annotation. Speech Commun. 33(1), 23–60 (2001)
Article Google Scholar
Boersma, P.: The use of praat in corpus research. In: Durand, J., Gut, U., Kristoffersen, G. (eds.) The Oxford Handbook of Corpus Phonology, pp. 342–360. Oxford University Press, Oxford (2014)
Google Scholar
Cassidy, S., Harrington, J.: Multi-level annotation in the Emu speech database management system. Speech Commun. 33, 61–77 (2000)
Article Google Scholar
Du Bois, J.W., Schuetze-Coburn, S., Cumming, S., Paolino, D.: Outline of discourse transcription. In: Edwards, J.A., Lampert, M.D. (eds.) Talking Data: Transcription and Coding in Discourse Research, pp. 45–89. Lawrence Erlbaum Associates, New Jersey (1993)
Google Scholar
Ehlich, K., Rehbein, J.: Halbinterpretative Arbeitstranskriptionen (HIAT). Linguistische Berichte 45, 21–41 (1976)
Google Scholar
Glenn, M.L., Strassel, S.M., Lee, H.: Xtrans: a speech annotation and transcription tool. In: Proceedings of Interspeech, ISCA, Brighton, UK (2009)
Google Scholar
Goldman, J.P.: EasyAlign: an automatic phonetic alignment tool under praat. In: INTERSPEECH, pp. 3233–3236 (2011)
Google Scholar
John, T., Bombien, L.: Emu. In: Durand, J., Gut, U., Kristoffersen, G. (eds.) The Oxford Handbook of Corpus Phonology, pp. 321–341. Oxford University Press, Oxford (2014)
Google Scholar
Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine julius. In: Proceedings APSIPA ASC 2009, Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference, International Organizing Committee, pp. 131–137 (2009)
Google Scholar
MacWhinney, B.: The CHILDES Project: Tools for Analyzing Talk. Lawrence Erlbaum Associates, Mahwah (2000)
Google Scholar
Nivre, J., Allwood, J., Grönqvist, L., Gunnarsson, M., Ahlsén, E., Vappula, H., Hagman, J., Larsson, S., Sofkova, S., Ottesjö, C.: Göteborg transcription standard. http://www.ling.gu.se/projekt/tal/index.cgi?PAGE=6 (2007)
Rehbein, J., Schmidt, T., Meyer, B., Watzke, F., Herkenrath, A.: Handbuch für das computergestützte Transkribieren nach HIAT. Sonderforschungsbereich 538 (2004)
Google Scholar
Rosenfelder, I., Fruehwald, J., Evanini, K., Yuan, J.: FAVE (Forced Alignment and Vowel Extraction) Program Suite. http://fave.ling.upenn.edu (2011)
Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Proceedings of Interspeech 2013, ISCA, Lyon, France (2013)
Google Scholar
Sacks, H., Schegloff, E.A., Jefferson, G.: A simplest systematics for the organization of turn-taking for conversation. Language 50, 696–735 (1974)
Article Google Scholar
Salvi, G., Vanhainen, N.: The wavesurfer automatic speech recognition plugin. In: Chair, N.C.C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland (2014)
Google Scholar
Schiel, F., Draxler, C., Harrington, J.: Phonemic segmentation and labelling using the MAUS technique. In: New Tools and Methods for Very-Large-Scale Phonetics Research, University of Pennsylvania (2011)
Google Scholar
Schmidt, T.: Exmaralda and the folk tools. In: Proceedings of LREC, ELRA. http://www.lrec-conf.org/proceedings/lrec2012/pdf/529_Paper.pdf (2012)
Schmidt, T., Duncan, S., Ehmer, O., Hoyt, J., Kipp, M., Loehr, D., Magnusson, M., Rose, T., Sloetjes, H.: An exchange format for multimodal annotations. In: Kipp, M., Martin, J.C., Paggio, P., Heylen, D. (eds.) Multimodal Corpora. Lecture Notes in Computer Science, vol. 5509, pp. 207–221. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04793-0_13
Selting, M., Auer, P., Barth-Weingarten, D., Bergmann, J., Bergmann, P., Birkner, K., Couper-Kuhlen, E., Deppermann, A., Gilles, P., Gunthner, S., Hartung, M., derike Kern, F., Mertzlufft, C., Meyer, C., Morek, M., Oberzaucher, F., Peters, J., Quasthoff, U., Schutte, W., Stukenbrock, A., Uhmann, S.: Gesprächsanalytisches Transkriptionssystem 2 (GAT 2). In: Gesprächsforschung - Online-Zeitschrift zur verbalen Interaktion, vol. 10, pp. 353–402 (2009)
Google Scholar
Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14(5), 1557–1565 (2006)
Article Google Scholar
Wells, J.: SAMPA computer readable phonetic alphabet. In: Gibbon, D., Moore, R., Winski, R. (eds.) Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin (1997)
Google Scholar
Winkelmann, R., Raess, G.: Introducing a web application for labeling, visualizing speech and correcting derived speech signals. In: Chair, N.C.C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland (2014)
Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., et al.: The HTK Book, vol. 2. Entropic Cambridge Research Laboratory, Cambridge (1997)
Google Scholar
Yuan, J., Liberman, M.: Speaker identification on the SCOTUS corpus. In: Proceedings of Acoustics ’08 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, Macquarie University, Sydney, NSW, Australia
Steve Cassidy
SFB Multilingualism, University of Hamburg, Hamburg, Germany
Thomas Schmidt

Authors

Steve Cassidy
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Schmidt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steve Cassidy .

Editor information

Editors and Affiliations

Department of Computer Science, Vassar College, Poughkeepsie, New York, USA
Nancy Ide
Department of Computer Science, Volen Center for Complex Systems, Brandeis University, Waltham, Massachusetts, USA
James Pustejovsky

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cassidy, S., Schmidt, T. (2017). Tools for Multimodal Annotation. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_7

Download citation

DOI: https://doi.org/10.1007/978-94-024-0881-2_7
Published: 17 June 2017
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics