Evaluating human corrections in a computer-assisted speaker diarization system

Broux, Pierre-Alexandre; Petitrenaud, Simon; Meignier, Sylvain; Carrive, Jean; Doukhan, David

doi:10.1007/s10579-020-09493-6

Evaluating human corrections in a computer-assisted speaker diarization system

Original Paper
Published: 06 July 2020

Volume 55, pages 151–172, (2021)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Pierre-Alexandre Broux ORCID: orcid.org/0000-0003-4232-4751^1,2,
Simon Petitrenaud¹,
Sylvain Meignier¹,
Jean Carrive² &
…
David Doukhan²

173 Accesses
Explore all metrics

Abstract

In this paper, we present a framework to evaluate the human corrections of a speaker diarization system. We propose four elementary actions to correct the diarization (“Create a boundary”, “Delete a boundary”, “Create a speaker label” and “Change the speaker label”) and we propose an automaton to simulate the correction sequence. A metric is described to evaluate the correction cost. The framework is evaluated using French broadcast news drawn from the following campaigns: REPERE, ESTER and ETAPE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Experiments with Segmentation in an Online Speaker Diarization System

Second-Generation Web Interface to Correcting ASR Output

Speaker Diarization: A Top-Down Approach Using Syllabic Phonology

Notes

The modification of the code of Transcriber is available at https://git-lium.univ-lemans.fr/broux/transcriber-log.

References

Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., & Vinyals, O. (2012). Speaker diarization: a review of recent research. IEEE Transactions on Audio, Speech and Language Processing, 20, 356–370.
Article Google Scholar
Arora, S., Nyberg, E. & Rosé, C.P. (2009). Estimating annotation cost for active learning in a multi-annotator environment, Proceedings of the 2009 ACL International Workshop on Active Learning for Natural Language Processing, (pp. 18–26).
Barrachina, S., Bender, O., Casacuberta, F., Civera, J., Cubel, E., Khadivi, S., et al. (2009). Statistical approaches to computer-assisted translation. Computational Linguistics, 35, 3–28.
Article Google Scholar
Barras, C., Geoffrois, E., Wu, Z., & Liberman, M. (2001). Transcriber: development and use of a tool for assisting speech corpora production. Speech Communication, 33, 5–22.
Article Google Scholar
Bazillon, T., Estève, Y. & Luzzati, D. (2008). Manual vs assisted transcription of prepared and spontaneous speech, Proceedings of the 6th ELRA International Conference on Language Resources and Evaluation (LREC).
Bonastre, J.F., Delacourt, P., Fredouille, C., Merlin, T. & Wellekens, C. (2000). A speaker tracking system based on speaker turn detection for NIST evaluation, Proceedings of the 2000 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2, (pp. 1177–1180).
Broux, P.A., Doukhan, D., Petitrenaud, S., Meignier, S. & Carrive, J. (2016). An active learning method for speaker identity annotation in audio recordings, Proceedings of the 1st EurAI International Workshop on Multimodal Media Data Analytics (MMDA).
Budnik, M., Poignant, J., Besacier, L. & Quénot, G. (2014). Automatic propagation of manual annotations for multimodal person identification in TV shows, Proceedings of the 12th IEEE International Workshop on Content-Based Multimedia Indexing (CBMI), 1–4
Charhad, M., Moraru, D., Ayache, S. & Quénot, G. (2005). Speaker identity indexing in audio-visual documents, Proceedings of the 4th IEEE International Workshop on Content-Based Multimedia Indexing (CBMI).
De Bra, P., Kobsa, A., & Chin, D. (2010). User modeling, adaptation, and personalization. Lecture Notes in Computer Science, 6075.
Dix, A. (2009). Human–computer interaction. Encyclopedia of database systems, (pp. 1327–1331).
Dufour, R., Jousse, V., Estève, Y., Béchet, F. & Linarès, G. (2009). Spontaneous speech characterization and detection in large audio database, Proceedings of the 13th International Conference on Speech and Computer (SPECOM).
Dupuy, G., Meignier, S., Deléglise, P. & Esteve, Y. (2014). Recent improvements on ILP-based clustering for broadcast news speaker diarization, Proceedings of the 2014 ISCA International Workshop on Speaker and Language Recognition (Odyssey).
Fischer, G. (2001). User modeling in human–computer interaction. User Modeling and User-Adapted Interaction, 11, 65–86.
Article Google Scholar
Galibert, O. (2013). Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech, Proceedings of the 14th Annual Conference of the International Speech Communication Association (INTERSPEECH), (pp. 1131–1134).
Galibert, O. & Kahn, J. (2013). The first official REPERE evaluation, Proceedings of the 1st ISCA International Workshop on Speech, Language and Audio in Multimedia (SLAM)
Galliano, S., Geoffrois, E., Mostefa, D., Choukri, K., Bonastre, J.F. & Gravier, G. (2005). The ESTER phase II evaluation campaign for the rich transcription of French broadcast news, Proceedings of the 9th ISCA European Conference on Speech Communication and Technology (INTERSPEECH-EUROSPEECH).
Galliano, S., Gravier, G. & Chaubard, L. (2009). The ESTER 2 evaluation campaign for the rich transcription of French radio broadcasts, Proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH).
Kahn, J. (2011). Parole de locuteur : Performance et confiance en identification biométrique vocale.
Laurent, A. (2010). Auto-adaptation et reconnaissance automatique de la parole.
Laurent, A., Meignier, S., Merlin, T. & Deléglise, P. (2011). Computer-assisted transcription of speech based on confusion network reordering, Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4884–4887.
Meignier, S. & Merlin, T. (2010). LIUM SpkDiarization: an open source toolkit for diarization, CMU SPUD Workshop.
McCowan, I.A., Moore, D., Dines, J., Gatica-Perez, D., Flynn, M., Wellner, P. & Bourlard, H. (2004). On the use of information retrieval measures for speech recognition evaluation.
NIST, The Rich Transcription Spring 2003 (RT-03S) Evaluation Plan, (2003).
Ordelman, R., De Jong, F. & Larson, M. (2009). Enhanced multimedia content access and exploitation using semantic speech retrieval, Proceedings of the 2009 IEEE International Conference on Semantic Computing (ICSC), (pp. 521–528).
Toselli, A. H., Vidal, E., & Casacuberta, F. (2011). Computer assisted transcription of speech signals. Multimodal Interactive Pattern Recognition and Applications, 99–117.
Trost, H., Matiasek, J., & Baroni, M. (2005). The language component of the FASTY text prediction system. Applied Artificial Intelligence, 19, 743–781.
Article Google Scholar
Vallet, F., Uro, J., Andriamakaoly, J., Nabi, H., Derval, M. & Carrive, J. (2016). Speech Trax: a bottom to the top approach for speaker tracking and indexing in an archiving context, Proceedings of the 10th ELRA International Conference on Language Resources and Evaluation (LREC).
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A. & Sloetjes, H. (2006). ELAN: a professional framework for multimodality research, Proceedings of the 5th ELRA International Conference on Language Resources and Evaluation (LREC).
Wood, M. E. J., & Lewis, E. (1996). Windmill-the use of a parsing algorithm to produce predictions for disabled persons. Institute of Acoustics, 18, (pp. 315–322).
Google Scholar
Zaphiris, P. & Ang, C.S. (2008). Cross-disciplinary advances in human computer interaction: user modeling, social computing. Information Science Reference.

Download references

Author information

Authors and Affiliations

Computer Science Laboratory, Le Mans University (LIUM - EA 4023), Le Mans, France
Pierre-Alexandre Broux, Simon Petitrenaud & Sylvain Meignier
French National Audiovisual Institute (INA), Paris, France
Pierre-Alexandre Broux, Jean Carrive & David Doukhan

Authors

Pierre-Alexandre Broux
View author publications
You can also search for this author in PubMed Google Scholar
Simon Petitrenaud
View author publications
You can also search for this author in PubMed Google Scholar
Sylvain Meignier
View author publications
You can also search for this author in PubMed Google Scholar
Jean Carrive
View author publications
You can also search for this author in PubMed Google Scholar
David Doukhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierre-Alexandre Broux.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Broux, PA., Petitrenaud, S., Meignier, S. et al. Evaluating human corrections in a computer-assisted speaker diarization system. Lang Resources & Evaluation 55, 151–172 (2021). https://doi.org/10.1007/s10579-020-09493-6

Download citation

Published: 06 July 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10579-020-09493-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating human corrections in a computer-assisted speaker diarization system

Abstract

Access this article

Similar content being viewed by others

Experiments with Segmentation in an Online Speaker Diarization System

Second-Generation Web Interface to Correcting ASR Output

Speaker Diarization: A Top-Down Approach Using Syllabic Phonology

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluating human corrections in a computer-assisted speaker diarization system

Abstract

Access this article

Similar content being viewed by others

Experiments with Segmentation in an Online Speaker Diarization System

Second-Generation Web Interface to Correcting ASR Output

Speaker Diarization: A Top-Down Approach Using Syllabic Phonology

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation