The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms

Mostefa, Djamel; Moreau, Nicolas; Choukri, Khalid; Potamianos, Gerasimos; Chu, Stephen M.; Tyagi, Ambrish; Casas, Josep R.; Turmo, Jordi; Cristoforetti, Luca; Tobia, Francesco; Pnevmatikakis, Aristodemos; Mylonakis, Vassilis; Talantzis, Fotios; Burger, Susanne; Stiefelhagen, Rainer; Bernardin, Keni; Rochet, Cedrick

doi:10.1007/s10579-007-9054-4

The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms

Published: 16 January 2008

Volume 41, pages 389–407, (2007)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Djamel Mostefa¹,
Nicolas Moreau¹,
Khalid Choukri¹,
Gerasimos Potamianos²,
Stephen M. Chu²,
Ambrish Tyagi²^nAff3,
Josep R. Casas⁴,
Jordi Turmo⁴,
Luca Cristoforetti⁵,
Francesco Tobia⁵,
Aristodemos Pnevmatikakis⁶,
Vassilis Mylonakis⁶,
Fotios Talantzis⁶,
Susanne Burger⁷,
Rainer Stiefelhagen⁸,
Keni Bernardin⁸ &
…
Cedrick Rochet⁸

538 Accesses
52 Citations
Explore all metrics

Abstract

The analysis of lectures and meetings inside smart rooms has recently attracted much interest in the literature, being the focus of international projects and technology evaluations. A key enabler for progress in this area is the availability of appropriate multimodal and multi-sensory corpora, annotated with rich human activity information during lectures and meetings. This paper is devoted to exactly such a corpus, developed in the framework of the European project CHIL, “Computers in the Human Interaction Loop”. The resulting data set has the potential to drastically advance the state-of-the-art, by providing numerous synchronized audio and video streams of real lectures and meetings, captured in multiple recording sites over the past 4 years. It particularly overcomes typical shortcomings of other existing databases that may contain limited sensory or monomodal data, exhibit constrained human behavior and interaction patterns, or lack data variability. The CHIL corpus is accompanied by rich manual annotations of both its audio and visual modalities. These provide a detailed multi-channel verbatim orthographic transcription that includes speaker turns and identities, acoustic condition information, and named entities, as well as video labels in multiple camera views that provide multi-person 3D head and 2D facial feature location information. Over the past 3 years, the corpus has been crucial to the evaluation of a multitude of audiovisual perception technologies for human activity analysis in lecture and meeting scenarios, demonstrating its utility during internal evaluations of the CHIL consortium, as well as at the recent international CLEAR and Rich Transcription evaluations. The CHIL corpus is publicly available to the research community.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

PyTorch

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

References

AMI—Augmented Multiparty Interaction. http://www.amiproject.org
Burger, S., McLaren, V., & Yu, H. (2002). The ISL meeting corpus: The impact on meeting type on speech style. In Proceedings of International Conference on Spoken Language Processing, Denver, USA.
CALO—Cognitive Agent that Learns and Organizes. http://www.caloproject.sri.com/
CHIL—Computers in the Human Interaction Loop. http://www.chil.server.de
Classification of Events, Activities, and Relationships Evaluation and Workshop. http://www.clear-evaluation.org
ELRA Catalogue of Language Resources. http://www.catalog.elra.info
Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., & Wooters, C. (2003). The ICSI meeting corpus. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hong Kong, China.
Mostefa, D., et al. (2005). Chil Public Deliverable D7.6: Exploitation material for CHIL evaluation campaign 1. http://www.chil.server.de/servlet/is/8063/
Mostefa, D., Garcia, M.-N., & Choukri, K. (2006). Evaluation of multimodal components within CHIL. In Proceedings of the 5th International Language Resources and Evaluations Conference (LREC), Genoa, Italy.
Stiefelhagen, R., Bernardin, K., Bowers, R., Garofolo, J., Mostefa, D., & Soundararajan, P. (2007). The CLEAR 2006 evaluation. In R. Stiefelhagen & J. Garofolo (Eds.), Multimodal Technologies for Perception of Humans. Proceedings of the First International CLEAR Evaluation Workshop, CLEAR 2006, number 4122 in Springer Lecture Notes in Computer Science, pp. 1–45.
Stiefelhagen, R., & Garofolo, J. (Eds). (2007). Multimodal Technologies for Perception of Humans, First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR’06. Number 4122 in Lecture Notes in Computer Science, Springer.
The AGTK Annotation Tool. http://www.agtk.sourceforge.net
The CLEF Website. http://www.clef-campaign.org/
The NIST MarkIII Microphone Array. http://www.nist.gov/smartspace/cmaiii.html
The NIST Smart Space Project. http://www.nist.gov/smartspace/
The Rich Transcription 2006 Spring Meeting Recognition Evaluation Website. http://www.nist.gov/speech/tests/rt/rt2006/spring
The Transcriber Tool Home Page. http://www.trans.sourceforge.net
VACE—Video Analysis and Content Extraction. https://www.control.nist.gov/dto/twiki/bin/view/Main/WebHome

Download references

Acknowledgments

The work presented here was partly funded by the European Union under the integrated project CHIL, “Computers in the Human Interaction Loop” (Grant Number IST-506909).

Author information

Ambrish Tyagi
Present address: Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA

Authors and Affiliations

Evaluations and Language Resources Distribution Agency (ELDA), 55–57 rue Brillat Savarin, 75013, Paris, France
Djamel Mostefa, Nicolas Moreau & Khalid Choukri
IBM T.J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Gerasimos Potamianos, Stephen M. Chu & Ambrish Tyagi
Universitat Politècnica de Catalunya, Barcelona, Spain
Josep R. Casas & Jordi Turmo
ITC-IRST, Via Sommarive 18, 38050, Povo, Italy
Luca Cristoforetti & Francesco Tobia
Athens Information Technology, Markopoulou Ave, 19002, Peania, Greece
Aristodemos Pnevmatikakis, Vassilis Mylonakis & Fotios Talantzis
Interactive Systems Labs, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Susanne Burger
Interactive Systems Labs, Universität Karlsruhe (TH), Karlsruhe, Germany
Rainer Stiefelhagen, Keni Bernardin & Cedrick Rochet

Authors

Djamel Mostefa
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Moreau
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Choukri
View author publications
You can also search for this author in PubMed Google Scholar
Gerasimos Potamianos
View author publications
You can also search for this author in PubMed Google Scholar
Stephen M. Chu
View author publications
You can also search for this author in PubMed Google Scholar
Ambrish Tyagi
View author publications
You can also search for this author in PubMed Google Scholar
Josep R. Casas
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Turmo
View author publications
You can also search for this author in PubMed Google Scholar
Luca Cristoforetti
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Tobia
View author publications
You can also search for this author in PubMed Google Scholar
Aristodemos Pnevmatikakis
View author publications
You can also search for this author in PubMed Google Scholar
Vassilis Mylonakis
View author publications
You can also search for this author in PubMed Google Scholar
Fotios Talantzis
View author publications
You can also search for this author in PubMed Google Scholar
Susanne Burger
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Stiefelhagen
View author publications
You can also search for this author in PubMed Google Scholar
Keni Bernardin
View author publications
You can also search for this author in PubMed Google Scholar
Cedrick Rochet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Djamel Mostefa.

Additional information

Ambrish Tyagi has contributed to this work during two summer internships with the IBM T.J. Watson Research Center.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mostefa, D., Moreau, N., Choukri, K. et al. The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms. Lang Resources & Evaluation 41, 389–407 (2007). https://doi.org/10.1007/s10579-007-9054-4

Download citation

Received: 04 January 2007
Accepted: 12 December 2007
Published: 16 January 2008
Issue Date: December 2007
DOI: https://doi.org/10.1007/s10579-007-9054-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

PyTorch

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

PyTorch

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation