Skip to main content
Log in

Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

The AMI Meeting Corpus contains 100 h of meetings captured using many synchronized recording devices, and is designed to support work in speech and video processing, language engineering, corpus linguistics, and organizational psychology. It has been transcribed orthographically, with annotated subsets for everything from named entities, dialogue acts, and summaries to simple gaze and head movement. In this written version of an LREC conference keynote address, I describe the data and how it was created. If this is “killer” data, that presupposes a platform that it will “sell”; in this case, that is the NITE XML Toolkit, which allows a distributed set of users to create, store, browse, and search annotations for the same base data that are both time-aligned against signal and related to each other structurally.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

Download references

Acknowledgements

I thank the large number of researchers involved in the creation of the NITE XML Toolkit, both during the NITE project and afterwards, and in the collection, transcription, and annotation of the AMI Meeting Corpus, without whom these more personal reflections would not be possible. This work was funded by the European Union 6th FWP IST Integrated Project AMI (Augmented Multi-party Interaction, FP6-506811).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean Carletta.

Additional information

This paper is an extended version of a Keynote Address presented at the Language Resources & Evaluation Conference, Genoa, May 2006.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carletta, J. Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus. Lang Resources & Evaluation 41, 181–190 (2007). https://doi.org/10.1007/s10579-007-9040-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-007-9040-x

Keywords

Navigation