Improvement of Speaker Number Estimation by Applying an Overlapped Speech Detector

Timofeeva, Elena; Evseeva, Elena; Zaluskaia, Valeriia; Kapranova, Vlada; Astapov, Sergei; Kabarov, Vladimir

doi:10.1007/978-3-030-87802-3_62

Elena Timofeeva¹⁰,
Elena Evseeva¹⁰,
Valeriia Zaluskaia^10,11,
Vlada Kapranova^10,11,
Sergei Astapov¹⁰ &
…
Vladimir Kabarov¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

International Conference on Speech and Computer

1608 Accesses

Abstract

The efficiency of modern automatic meeting transcription suffers from the problem of speaker diarization during overlapping speech segments. The problem can be tackled if each segment of a recording could be marked with the number of active speakers. However, overlapped speech recordings with more than two simultaneous speakers serve as a weak point for speaker number estimation. The problem becomes even more complicated if the speaker number estimation system tends to far-field recordings of multiple speakers acquired by a distant microphone. In this paper we propose an improvement for speaker number estimation by combining it with an overlapped speech detector. In our approach we apply different configurations of speaker number estimation and overlapped speech detector models trained and evaluated on the AMI and LibriSpeech datasets with several types of signal representation. Experimental evaluation based on fusion of models yields an improvement of speaker number estimation performance of up to 10% based on the F1-score metric compared with base speaker number estimation model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Robust Speaker Identification in a Meeting with Short Audio Segments

A Region Based Non-overlapping Reference Speech Estimation Method for Speaker Extraction

Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects

Article Open access 02 December 2017

References

AMI Corpus. https://groups.inf.ed.ac.uk/ami/corpus/. Accessed 10 May 2021
Andrei, V., Cucu, H., Burileanu, C.: Overlapped speech detection and competing speaker counting - humans versus deep learning. J. Sel. Topics Signal Process. 13(4), 850–862 (2019)
Article Google Scholar
Astapov, S., Lavrentyev, A., Shuranov, E.: Far field speech enhancement at low SNR in presence of nonstationary noise based on spectral masking and MVDR beamforming. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 21–31. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_3
Chapter Google Scholar
Astapov, S., Popov, D., Kabarov, V.: Directional clustering with polyharmonic phase estimation for enhanced speaker localization. In: Karpov, A., Potapova, R. (eds.) SPECOM 2020. LNCS (LNAI), vol. 12335, pp. 45–56. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60276-5_5
Chapter Google Scholar
Boakye, K., Trueba-Hornero, B., Vinyals, O., Friedland, G.: Overlapped speech detection for improved speaker diarization in multiparty meetings. In: International Conference on Acoustics, Speech and Signal Processing, pp. 4353–4356 (2008)
Google Scholar
Boakye, K., Vinyals, O., Friedland, G.: Two’s a crowd: improving speaker diarization by automatically identifying and excluding overlapped speech. In: INTERSPEECH, pp. 32–35 (2008)
Google Scholar
Bredin, H., Yin, R., Coria, J.C., Gelly, G., Korshunov, P.: Pyannote.audio: neural building blocks for speaker diarization. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 7124–7128 (2020)
Google Scholar
Bullock, L., Bredin, H., Garcia, P.: Overlap-aware diarization: resegmentation using neural end-to-end overlapped speech detection. In: International Conference on Acoustics, Speech and Signal Processing, pp. 7114–7118 (2020)
Google Scholar
Charlet, D., Barras, C., Liénard, J.-S.: Impact of overlapping speech detection on speaker diarization for broadcast news and debates. In: International Conference on Acoustics, Speech and Signal Processing, pp. 7707–7711 (2013)
Google Scholar
Cornell, S., Omologo, M., Squartini, S., Vincent, E.: Detecting and counting overlapping speakers in distant speech scenarios. In: INTERSPEECH (2020)
Google Scholar
Grumiaux, P.A., Kitic, S., Girin, L., Guérin, A.: Multichannel CRNN for speaker counting: an analysis of performance. arXiv preprint arXiv:2101.01977 (2021)
Kunešová, M., Hrúz, M., Zajíc, Z., Radová, V.: Detection of overlapping speech for the purposes of speaker diarization. In: Salah, A.A., Karpov, A., Potapova, R. (eds.) SPECOM 2019. LNCS (LNAI), vol. 11658, pp. 247–257. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26061-3_26
Chapter Google Scholar
Otterson, S., Ostendorf, M.: Efficient use of overlap information in speaker diarization. In: Workshop on Automatic Speech Recognition & Understanding (ASRU), pp. 683–686 (2007)
Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: International Conference on Acoustics, Speech and Signal Processing, pp. 5206–5210 (2015)
Google Scholar
Sajjan, N., Ganesh, S., Sharma, N., Ganapathy, S., Ryant, N.: Leveraging LSTM models for overlap detection in multi-party meetings. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 5249–5253 (2018)
Google Scholar
Sayoud, H., Ouamour, S.: Proposal of a new confidence parameter estimating the number of speakers-an experimental investigation. J. Inf. Hiding Multimed. Signal Process. 1(2), 101–109 (2010)
Google Scholar
Seltzer, M.L., Yu, D., Wang Y.: An investigation of deep neural networks for noise robust speech recognition. In: International Conference on Acoustics, Speech and Signal Processing, pp. 7398–7402 (2013)
Google Scholar
Stöter, R.-F., Chakrabarty, S., Edler, B., Emanuël, H.: Classification vs. regression in supervised learning for single channel speaker count estimation. In: International Conference on Acoustics, Speech and Signal Processing, pp. 436–440 (2018)
Google Scholar
Stöter, R.-F., Chakrabarty, S., Edler, B., Emanuël, H.: CountNet: estimating the number of concurrent speakers using supervised learning. Trans. Audio Speech Lang. Process. 27(2), 268–282 (2019)
Article Google Scholar
Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14(5), 1557–1565 (2006)
Article Google Scholar
Yoshioka, T., Erdogan, H., Chen, Z., Xiao, X., Alleva, F.: Recognizing overlapped speech in meetings: a multichannel separation approach using neural networks. In: INTERSPEECH, pp. 3038–3042 (2018)
Google Scholar
Zelenak, M., Hernando, J.: On the improvement of speaker diarization by detecting overlapped speech. VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop (2010)
Google Scholar

Download references

Acknowledgments

This research was financially supported by the ITMO University.

Author information

Authors and Affiliations

ITMO University, Kronverksky prospekt 49A, St. Petersburg, 197101, Russia
Elena Timofeeva, Elena Evseeva, Valeriia Zaluskaia, Vlada Kapranova, Sergei Astapov & Vladimir Kabarov
Speech Technology Center, Vyborgskaya Embankment 45, St. Petersburg, 194044, Russia
Valeriia Zaluskaia & Vlada Kapranova

Authors

Elena Timofeeva
View author publications
You can also search for this author in PubMed Google Scholar
Elena Evseeva
View author publications
You can also search for this author in PubMed Google Scholar
Valeriia Zaluskaia
View author publications
You can also search for this author in PubMed Google Scholar
Vlada Kapranova
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Astapov
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Kabarov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valeriia Zaluskaia .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Timofeeva, E., Evseeva, E., Zaluskaia, V., Kapranova, V., Astapov, S., Kabarov, V. (2021). Improvement of Speaker Number Estimation by Applying an Overlapped Speech Detector. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_62

Download citation

DOI: https://doi.org/10.1007/978-3-030-87802-3_62
Published: 22 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improvement of Speaker Number Estimation by Applying an Overlapped Speech Detector

Abstract

Access this chapter

Similar content being viewed by others

Robust Speaker Identification in a Meeting with Short Audio Segments

A Region Based Non-overlapping Reference Speech Estimation Method for Speaker Extraction

Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improvement of Speaker Number Estimation by Applying an Overlapped Speech Detector

Abstract

Access this chapter

Similar content being viewed by others

Robust Speaker Identification in a Meeting with Short Audio Segments

A Region Based Non-overlapping Reference Speech Estimation Method for Speaker Extraction

Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation