Speaker Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I2R-NTU Submission for the NIST RT 2007 Evaluation

Koh, Eugene Chin Wei; Sun, Hanwu; Nwe, Tin Lay; Nguyen, Trung Hieu; Ma, Bin; Chng, Eng-Siong; Li, Haizhou; Rahardja, Susanto

doi:10.1007/978-3-540-68585-2_45

Eugene Chin Wei Koh^1,2,
Hanwu Sun²,
Tin Lay Nwe²,
Trung Hieu Nguyen¹,
Bin Ma²,
Eng-Siong Chng¹,
Haizhou Li² &
…
Susanto Rahardja²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4625))

Included in the following conference series:

1255 Accesses
3 Citations

Abstract

This paper describes the I²R/NTU system submitted for the NIST Rich Transcription 2007 (RT-07) Meeting Recognition evaluation Multiple Distant Microphone (MDM) task. In our system, speaker turn detection and clustering is done using Direction of Arrival (DOA) information. Purification of the resultant speaker clusters is then done by performing GMM modeling on acoustic features. As a final step, non-speech & silence removal is done. Our system achieved a competitive overall DER of 15.32% for the NIST Rich Transcription 2007 evaluation task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Spring 2007 (RT-07) Rich Transcription Meeting Recognition Evaluation Plan (2007), http://www.nist.gov/speech/tests/rt/rt2007/docs/rt07-meeting-eval-plan-v2.pdf
Anguera, X., Wooters, C., Peskin, B., Aguilo, M.: Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 402–414. Springer, Heidelberg (2006)
Chapter Google Scholar
Leeuwen, D.A.v., Huijbregts, M.: The AMI speaker diarization system for NIST RT06s meeting data. In: Proc. NIST Rich Transcription 2006 Spring Meeting Recognition Evaluation Workshop, Washington DC, pp. 371–384 (2006)
Google Scholar
Istrate, D., Fredouille, C., Meignier, S., Besacier, L., Bonastre, J.F.: NIST RT’05S Evaluation: Pre-processing Techniques and Speaker Diarization on Multiple Microphone Meetings. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 428–439. Springer, Heidelberg (2006)
Chapter Google Scholar
Brandstein, M.S., Silverman, H.F.: A robust method for speech signal time-delay estimation in reverberant rooms. In: Proc. International Conference on Acoustics, Speech, and Signal Processing, Munich, pp. 375–378 (1997)
Google Scholar
Anguera, X., Wooters, C., Hernando, J.: Speaker Diarization for Multi-Party Meetings Using Acoustic Fusion. In: Proc. IEEE Automatic Speech Recognition and Understanding Workshop, San Juan (2005)
Google Scholar
Anguera, X., Wooters, C., Pardo, J.: Robust Speaker Diarization for Meetings: ICSI RT06s evaluation system. In: Proc. Interspeech 2006 ICSLP, Pittsburgh (2006)
Google Scholar
Haykin, S.: Adaptive Filter Theory, 4th edn. Prentice-Hall, Inc., Upper Saddle River, NJ, USA (2002)
Google Scholar
Kaiser, J.F.: On a simple algorithm to calculate the ‘energy’ of a signal. In: Proc. International Conference on Acoustics, Speech, and Signal Processing, Albuquerque, pp. 381–384 (1990)
Google Scholar
Hirsch, H.G.: Estimation of noise spectrum and its application to SNR-estimation and speech enhancement. Technical report tr-93-012, ICSI, Berkeley (1993)
Google Scholar
Brayda, L., Bertotti, C., Cristoforetti, L., Omologo, M., Svaizer, P.: Modifications on NIST MarkIII array to improve coherence properties among input signals. Journal of Audio Engineering Society (2005)
Google Scholar
Rochet, C.: Technical Documentation of the Microphone Array Mark III (September 2005), http://www.nist.gov/smartspace/cmaiii.html
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)
Article Google Scholar
Nwe, T.L., Foo, S.W., Silva, L.C.D.: Stress classification using subband based features. IEICE Trans. Information and Systems E86-D, 565–573 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Nanyang Technological University (NTU), Singapore, 639798
Eugene Chin Wei Koh, Trung Hieu Nguyen & Eng-Siong Chng
Human Language Technology Department, Institute for Infocomm Research (I2R), Singapore, 119613
Eugene Chin Wei Koh, Hanwu Sun, Tin Lay Nwe, Bin Ma, Haizhou Li & Susanto Rahardja

Authors

Eugene Chin Wei Koh
View author publications
You can also search for this author in PubMed Google Scholar
Hanwu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Tin Lay Nwe
View author publications
You can also search for this author in PubMed Google Scholar
Trung Hieu Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Bin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Eng-Siong Chng
View author publications
You can also search for this author in PubMed Google Scholar
Haizhou Li
View author publications
You can also search for this author in PubMed Google Scholar
Susanto Rahardja
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Rainer Stiefelhagen Rachel Bowers Jonathan Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koh, E.C.W. et al. (2008). Speaker Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I²R-NTU Submission for the NIST RT 2007 Evaluation. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_45

Download citation

DOI: https://doi.org/10.1007/978-3-540-68585-2_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics