Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus

Carletta, Jean

doi:10.1007/s10579-007-9040-x

Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus

Published: 10 October 2007

Volume 41, pages 181–190, (2007)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Jean Carletta¹

1192 Accesses
188 Citations
Explore all metrics

Abstract

The AMI Meeting Corpus contains 100 h of meetings captured using many synchronized recording devices, and is designed to support work in speech and video processing, language engineering, corpus linguistics, and organizational psychology. It has been transcribed orthographically, with annotated subsets for everything from named entities, dialogue acts, and summaries to simple gaze and head movement. In this written version of an LREC conference keynote address, I describe the data and how it was created. If this is “killer” data, that presupposes a platform that it will “sell”; in this case, that is the NITE XML Toolkit, which allows a distributed set of users to create, store, browse, and search annotations for the same base data that are both time-aligned against signal and related to each other structurally.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The ALICO corpus: analysing the active listener

Article 21 May 2016

RUPEX Search: Online Tool for Analyzing Multichannel Discourse

A Corpus with Wavesurfer and TEI: Speech and Video in TEITOK

References

Anderson, A. H., Bader, M., Bard, E. G., Boyle, E., Doherty, G., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H., & Weinert, R. (1991). The HCRC Map Task Corpus. Language and Speech, 34(4), 351–366.
Google Scholar
Carletta, J., Evert, S., Heid, U., Kilgour, J., Robertson, J., & Voormann, H. (2003). The NITE XML Toolkit: Flexible annotation for multi-modal language data. Behavior Research Methods, Instruments, and Computers, 35(3), 353–363.
Google Scholar
Chinchor, N., Brown, E., Ferro, L., & Robinson, P. (1999). 1999 Named entity recognition task definition version 1.4. Online at: http://www.nist.gov/speech/tests/ie-er/er_99/doc/ne99_taskdef_v1_4.pdf accessed 6 Dec 06.
Creative Commons. (n.d.). ‘Creative Commons’. Online at: http://creativecommons.org/ accessed 11 Dec 06.
Free Software Foundation. (n.d.). ‘GNU General Public License’. Online at: http://www.gnu.org/copyleft/gpl.html accessed 11 Dec 06.
Godfrey, J. J., Holliman, E. C., & McDaniel, J. (1992). SWITCHBOARD: Telephone speech corpus for research and development. In Proc IEEE Int Conf Acoust Speech Sig Proc (pp. 517–520).
International Computer Science Institute. (n.d.). ‘Extensions to Transcriber for Meeting Recorder Transcription’. Online at: http://www.icsi.berkeley.edu/Speech/mr/channeltrans.html accessed 11 Dec 06.
McGrath, J. (1984) Groups: Interaction and performance. Englewood Cliffs: Prentice-Hall.
Google Scholar
National Insitute of Standards and Technology (2006). Rich Transcription 2006 Spring Meeting Recognition Evaluation. Online at: http://www.nist.gov/speech/tests/rt/rt2006/spring/index.html accessed 11 Dec 06.
Sumec, S. (n.d.). ‘Event Editor’. Online at: http://www.fit.vutbr.cz/research/grants/m4/editor/index.htm.cs.iso-8859-2 accessed 11 Dec 06.
West, M. (1996). Reflexivity and work group effectiveness: A conceptual integration. In M. West (Ed.), The handbook of work group psychology (pp. 555–579). John Wiley.
Wikipedia contributors (2006). Killer application – Wikipedia, The free encyclopedia. Online at: http://en.wikipedia.org/w/index.php?title=Killer_application&oldid=88980227 accessed 21 Nov 06.

Download references

Acknowledgements

I thank the large number of researchers involved in the creation of the NITE XML Toolkit, both during the NITE project and afterwards, and in the collection, transcription, and annotation of the AMI Meeting Corpus, without whom these more personal reflections would not be possible. This work was funded by the European Union 6th FWP IST Integrated Project AMI (Augmented Multi-party Interaction, FP6-506811).

Author information

Authors and Affiliations

University of Edinburgh, Edinburgh, EH8 9LW, UK
Jean Carletta

Authors

Jean Carletta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jean Carletta.

Additional information

This paper is an extended version of a Keynote Address presented at the Language Resources & Evaluation Conference, Genoa, May 2006.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carletta, J. Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus. Lang Resources & Evaluation 41, 181–190 (2007). https://doi.org/10.1007/s10579-007-9040-x

Download citation

Received: 19 December 2006
Accepted: 06 September 2007
Published: 10 October 2007
Issue Date: May 2007
DOI: https://doi.org/10.1007/s10579-007-9040-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus

Abstract

Access this article

Similar content being viewed by others

The ALICO corpus: analysing the active listener

RUPEX Search: Online Tool for Analyzing Multichannel Discourse

A Corpus with Wavesurfer and TEI: Speech and Video in TEITOK

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus

Abstract

Access this article

Similar content being viewed by others

The ALICO corpus: analysing the active listener

RUPEX Search: Online Tool for Analyzing Multichannel Discourse

A Corpus with Wavesurfer and TEI: Speech and Video in TEITOK

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation