Skip to main content

Experiments with Segmentation in an Online Speaker Diarization System

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

Abstract

In offline speaker diarization systems, particularly those aimed at telephone speech, the accuracy of the initial segmentation of a conversation is often a secondary concern. Imprecise segment boundaries are typically corrected during resegmentation, which is performed as the final step of the diarization process. However, such resegmentation is generally not possible in online systems, where past decisions are usually unchangeable. In such situations, correct segmentation becomes critical. In this paper, we evaluate several different segmentation approaches in the context of online diarization by comparing the overall performance of an i-vector-based diarization system set to operate in a sequential manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bozonnet, S., Evans, N.W., Fredouille, C.: The LIA-EURECOM RT 2009 speaker diarization system: enhancements in speaker modelling and cluster purification. In: Proceedings ICASSP, pp. 4958–4961. IEEE (2010)

    Google Scholar 

  2. Canavan, A., Graff, D., Zipperlen, G.: CALLHOME American English speech, LDC97S42. In: LDC Catalog, Linguistic Data Consortium, Philadelphia (1997)

    Google Scholar 

  3. Church, K., Zhu, W., Vopicka, J., Pelecanos, J., Dimitriadis, D., Fousek, P.: Speaker diarization: a perspective on challenges and opportunities from theory to practice. In: Proceedings ICASSP, pp. 4950–4954 (2017)

    Google Scholar 

  4. Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  5. Fergani, B., Davy, M., Houacine, A.: Speaker diarization using one-class support vector machines. Speech Commun. 50(5), 355–365 (2008)

    Article  Google Scholar 

  6. Garcia-Romero, D., Snyder, D., Sell, G., Povey, D., McCree, A.: Speaker diarization using deep neural network embedings. In: Proceedings ICASSP, pp. 4930–4934 (2017)

    Google Scholar 

  7. Gupta, V.: Speaker change point detection using deep neural nets. In: Proceedings ICASSP, pp. 4420–4424 (2015)

    Google Scholar 

  8. Hrúz, M., Zajíc, Z.: Convolutional neural network for speaker change detection in telephone speaker diarization system. In: Proceedings ICASSP, pp. 4945–4949 (2017)

    Google Scholar 

  9. Lapidot, I., Bonastre, J.F.: On the importance of efficient transition modeling for speaker diarization. In: Proceedings Interspeech, 08–12 September 2016, pp. 2190–2193 (2016)

    Google Scholar 

  10. NIST: The 2009 (RT-09) rich transcription meeting recognition evaluation plan (2009). http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf

  11. Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Proceedings Interspeech, pp. 1477–1481 (2013)

    Google Scholar 

  12. Sell, G., Garcia-Romero, D.: Speaker diarization with PLDA i-vector scoring and unsupervised calibration. In: IEEE Spoken Language Technology Workshop, pp. 413–417 (2014)

    Google Scholar 

  13. Senoussaoui, M., Kenny, P., Stafylakis, T., Dumouchel, P.: A study of the cosine distance-based mean shift for telephone speech diarization. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 217–227 (2014)

    Article  Google Scholar 

  14. Shum, S., Dehak, N., Chuangsuwanich, E., Reynolds, D., Glass, J.: Exploiting intra-conversation variability for speaker diarization. In: Proceedings Interspeech, pp. 945–948 (2011)

    Google Scholar 

  15. Wang, R., Gu, M., Li, L., Xu, M., Zheng, T.F.: Speaker segmentation using deep speaker vectors for fast speaker change scenarios. In: Proceedings ICASSP, pp. 5420–5424 (2017)

    Google Scholar 

  16. Zajíc, Z., Kunešová, M., Radová, V.: Investigation of segmentation in i-vector based speaker diarization of telephone speech. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 411–418. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_49

    Chapter  Google Scholar 

  17. Zajíc, Z., Machlica, L., Müller, L.: Initialization of fMLLR with sufficient statistics from similar speakers. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS (LNAI), vol. 6836, pp. 187–194. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23538-2_24

    Chapter  Google Scholar 

  18. Zajíc, Z., Hrúz, M., Müller, L.: Speaker diarization using convolutional neural network for statistics accumulation refinement. In: Proceedings Interspeech (2017, in press)

    Google Scholar 

  19. Zhu, W., Pelecanos, J.: Online speaker diarization using adapted i-vector transforms. In: Proceedings ICASSP, pp. 5045–5049. IEEE (2016)

    Google Scholar 

Download references

Acknowledgments

This research was supported by the Ministry of Culture of the Czech Republic, project No. DG16P02B009.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marie Kunešová .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kunešová, M., Zajíc, Z., Radová, V. (2017). Experiments with Segmentation in an Online Speaker Diarization System. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64206-2_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64205-5

  • Online ISBN: 978-3-319-64206-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics