Skip to main content
Log in

Evaluating human corrections in a computer-assisted speaker diarization system

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this paper, we present a framework to evaluate the human corrections of a speaker diarization system. We propose four elementary actions to correct the diarization (“Create a boundary”, “Delete a boundary”, “Create a speaker label” and “Change the speaker label”) and we propose an automaton to simulate the correction sequence. A metric is described to evaluate the correction cost. The framework is evaluated using French broadcast news drawn from the following campaigns: REPERE, ESTER and ETAPE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The modification of the code of Transcriber is available at https://git-lium.univ-lemans.fr/broux/transcriber-log.

References

  • Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., & Vinyals, O. (2012). Speaker diarization: a review of recent research. IEEE Transactions on Audio, Speech and Language Processing, 20, 356–370.

    Article  Google Scholar 

  • Arora, S., Nyberg, E. & Rosé, C.P. (2009). Estimating annotation cost for active learning in a multi-annotator environment, Proceedings of the 2009 ACL International Workshop on Active Learning for Natural Language Processing, (pp. 18–26).

  • Barrachina, S., Bender, O., Casacuberta, F., Civera, J., Cubel, E., Khadivi, S., et al. (2009). Statistical approaches to computer-assisted translation. Computational Linguistics, 35, 3–28.

    Article  Google Scholar 

  • Barras, C., Geoffrois, E., Wu, Z., & Liberman, M. (2001). Transcriber: development and use of a tool for assisting speech corpora production. Speech Communication, 33, 5–22.

    Article  Google Scholar 

  • Bazillon, T., Estève, Y. & Luzzati, D. (2008). Manual vs assisted transcription of prepared and spontaneous speech, Proceedings of the 6th ELRA International Conference on Language Resources and Evaluation (LREC).

  • Bonastre, J.F., Delacourt, P., Fredouille, C., Merlin, T. & Wellekens, C. (2000). A speaker tracking system based on speaker turn detection for NIST evaluation, Proceedings of the 2000 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2, (pp. 1177–1180).

  • Broux, P.A., Doukhan, D., Petitrenaud, S., Meignier, S. & Carrive, J. (2016). An active learning method for speaker identity annotation in audio recordings, Proceedings of the 1st EurAI International Workshop on Multimodal Media Data Analytics (MMDA).

  • Budnik, M., Poignant, J., Besacier, L. & Quénot, G. (2014). Automatic propagation of manual annotations for multimodal person identification in TV shows, Proceedings of the 12th IEEE International Workshop on Content-Based Multimedia Indexing (CBMI), 1–4

  • Charhad, M., Moraru, D., Ayache, S. & Quénot, G. (2005). Speaker identity indexing in audio-visual documents, Proceedings of the 4th IEEE International Workshop on Content-Based Multimedia Indexing (CBMI).

  • De Bra, P., Kobsa, A., & Chin, D. (2010). User modeling, adaptation, and personalization. Lecture Notes in Computer Science, 6075.

  • Dix, A. (2009). Human–computer interaction. Encyclopedia of database systems, (pp. 1327–1331).

  • Dufour, R., Jousse, V., Estève, Y., Béchet, F. & Linarès, G. (2009). Spontaneous speech characterization and detection in large audio database, Proceedings of the 13th International Conference on Speech and Computer (SPECOM).

  • Dupuy, G., Meignier, S., Deléglise, P. & Esteve, Y. (2014). Recent improvements on ILP-based clustering for broadcast news speaker diarization, Proceedings of the 2014 ISCA International Workshop on Speaker and Language Recognition (Odyssey).

  • Fischer, G. (2001). User modeling in human–computer interaction. User Modeling and User-Adapted Interaction, 11, 65–86.

    Article  Google Scholar 

  • Galibert, O. (2013). Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech, Proceedings of the 14th Annual Conference of the International Speech Communication Association (INTERSPEECH), (pp. 1131–1134).

  • Galibert, O. & Kahn, J. (2013). The first official REPERE evaluation, Proceedings of the 1st ISCA International Workshop on Speech, Language and Audio in Multimedia (SLAM)

  • Galliano, S., Geoffrois, E., Mostefa, D., Choukri, K., Bonastre, J.F. & Gravier, G. (2005). The ESTER phase II evaluation campaign for the rich transcription of French broadcast news, Proceedings of the 9th ISCA European Conference on Speech Communication and Technology (INTERSPEECH-EUROSPEECH).

  • Galliano, S., Gravier, G. & Chaubard, L. (2009). The ESTER 2 evaluation campaign for the rich transcription of French radio broadcasts, Proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH).

  • Kahn, J. (2011). Parole de locuteur : Performance et confiance en identification biométrique vocale.

  • Laurent, A. (2010). Auto-adaptation et reconnaissance automatique de la parole.

  • Laurent, A., Meignier, S., Merlin, T. & Deléglise, P. (2011). Computer-assisted transcription of speech based on confusion network reordering, Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4884–4887.

  • Meignier, S. & Merlin, T. (2010). LIUM SpkDiarization: an open source toolkit for diarization, CMU SPUD Workshop.

  • McCowan, I.A., Moore, D., Dines, J., Gatica-Perez, D., Flynn, M., Wellner, P. & Bourlard, H. (2004). On the use of information retrieval measures for speech recognition evaluation.

  • NIST, The Rich Transcription Spring 2003 (RT-03S) Evaluation Plan, (2003).

  • Ordelman, R., De Jong, F. & Larson, M. (2009). Enhanced multimedia content access and exploitation using semantic speech retrieval, Proceedings of the 2009 IEEE International Conference on Semantic Computing (ICSC), (pp. 521–528).

  • Toselli, A. H., Vidal, E., & Casacuberta, F. (2011). Computer assisted transcription of speech signals. Multimodal Interactive Pattern Recognition and Applications, 99–117.

  • Trost, H., Matiasek, J., & Baroni, M. (2005). The language component of the FASTY text prediction system. Applied Artificial Intelligence, 19, 743–781.

    Article  Google Scholar 

  • Vallet, F., Uro, J., Andriamakaoly, J., Nabi, H., Derval, M. & Carrive, J. (2016). Speech Trax: a bottom to the top approach for speaker tracking and indexing in an archiving context, Proceedings of the 10th ELRA International Conference on Language Resources and Evaluation (LREC).

  • Wittenburg, P., Brugman, H., Russel, A., Klassmann, A. & Sloetjes, H. (2006). ELAN: a professional framework for multimodality research, Proceedings of the 5th ELRA International Conference on Language Resources and Evaluation (LREC).

  • Wood, M. E. J., & Lewis, E. (1996). Windmill-the use of a parsing algorithm to produce predictions for disabled persons. Institute of Acoustics, 18, (pp. 315–322).

    Google Scholar 

  • Zaphiris, P. & Ang, C.S. (2008). Cross-disciplinary advances in human computer interaction: user modeling, social computing. Information Science Reference.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre-Alexandre Broux.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Broux, PA., Petitrenaud, S., Meignier, S. et al. Evaluating human corrections in a computer-assisted speaker diarization system. Lang Resources & Evaluation 55, 151–172 (2021). https://doi.org/10.1007/s10579-020-09493-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-020-09493-6

Keywords

Navigation