Speaker-Clustered Acoustic Models Evaluated on GPU for On-line Subtitling of Parliament Meetings

Psutka, Josef V.; Vaněk, Jan; Psutka, Josef

doi:10.1007/978-3-642-23538-2_36

Josef V. Psutka²¹,
Jan Vaněk²¹ &
Josef Psutka²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6836))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

925 Accesses
1 Citations

Abstract

This paper describes the effort with building speaker-clustered acoustic models as a part of the real-time LVCSR system that is used more than one year by the Czech TV for automatic subtitling of parliament meetings broadcasted on the channel ČT24. Speaker-clustered acoustic models are more acoustically homogeneous and therefore give better recognition performance than single gender-independent model or even gender-dependent models. Frequent changes of speakers and a direct connection of the LVCSR system to the audio channel require an automatic switching/fusion of models as quickly as possible. An important part of the solution is real time likelihood evaluations of all clustered acoustic models, taking advantage of a fast GPU(Graphic Processing Unit). The proposed method achieved a WER reduction to the baseline gender-independent model over 2.34% relatively with more than 2M Gaussian mixtures evaluated in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pražák, A., et al.: Automatic online subtitling of the Czech parliament meetings. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 501–508. Springer, Heidelberg (2006)
Chapter Google Scholar
Pražák, A., Müller, L., Psutka, J.V., Psutka, J.: LIVE TV SUBTITLING - Fast 2-pass LVCSR System for Online Subtitling. In: SIGMAP 2007: Proceedings of the Second International Conference on Signal Processing and Multimedia Applications, pp. 139–142. INSTICC Press, Lisbon (2007)
Google Scholar
Vaněk, J., Psutka, J.V.: Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 431–438. Springer, Heidelberg (2010)
Chapter Google Scholar
Neto, J., et al.: Broadcast News Subtitling System In Portuguese. In: Proceedings of the ICASSP, Las Vegas, USA (2008)
Google Scholar
Vaněk, J., Psutka, J.V., Zelinka, J., Pražák, A., Psutka, J.: Training of Speaker-Clustered Acoustic Models for Use in Real-Time Recognizers. In: SIGMAP 2007: Proceedings of the Second International Conference on Signal Processing and Multimedia Applications, pp. 131–135. INSTICC Press, Lisbon (2009)
Google Scholar
Vaněk, J., Psutka, J.V., Zelinka, J., Pražák, A., Psutka, J.: Discriminative training of gender-dependent acoustic models. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 331–338. Springer, Heidelberg (2009)
Chapter Google Scholar
Povey, D., Woodland, P.C.: Improved discriminative training techniques for large vocabulary continuous speech recognition. In: IEEE International Conference on Acoustics Speech and Signal Processing, Salt Lake City, Utah (2001)
Google Scholar
Vaněk, J., et al.: Acoustic Likelihoods Computation Optimized for NVIDIA and ATI/AMD Graphics Processors. Submited to IEEE Signal Processing Magazine (2011)
Google Scholar
Radová, V., Psutka, J.: Recording and Annotation of the Czech Speech Corpus. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 319–323. Springer, Heidelberg (2000)
Chapter Google Scholar
Kolář, J., Švec, J.: The Czech Broadcast Conversation Corpus. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 101–108. Springer, Heidelberg (2009)
Chapter Google Scholar
Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: International Conference on Spoken Language Processing (ICSLP 2002), Denver, USA (2002)
Google Scholar
Wessel, F., et al.: Confidence measures for large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing 9, 288–298 (2001)
Article Google Scholar
Psutka, J.V., et al.: Searching for a robust MFCC-based parameterization for ASR application. In: SIGMAP 2007: Proceedings of the Second International Conference on Signal Processing and Multimedia Applications, pp. 196–199. INSTICC Press, Lisbon (2007)
Google Scholar
Young, S., et al.: The HTK Book (for HTK Version 3.4), Cambridge (2006)
Google Scholar
Stolcke, A., et al.: The SRI March 2000 Hub-5 Conversational Speech Transcription System. In: Proc. NIST Speech Transcription Workshop, College Park, MD (May 2000)
Google Scholar
Olsen, P.A., Dharanipragada, S.: An efficient integrated gender detection scheme and time mediated averaging of gender dependent acoustic models, In: 8th European Conference on Speech Communication and Technology (EUROSPEECH 2003),Geneva, Switzerland (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Cybernetics, West Bohemia University, Pilsen, Czech Republic
Josef V. Psutka, Jan Vaněk & Josef Psutka

Authors

Josef V. Psutka
View author publications
You can also search for this author in PubMed Google Scholar
Jan Vaněk
View author publications
You can also search for this author in PubMed Google Scholar
Josef Psutka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Sciences, University of West Bohemia, Univerzitní 22, 306 14, Pilsen, Czech Republic
Ivan Habernal
Faculty of Applied Sciences, Dept. of Computer Science and Engineering, University of West Bohemia, Univerzitni 8, 306 14, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Psutka, J.V., Vaněk, J., Psutka, J. (2011). Speaker-Clustered Acoustic Models Evaluated on GPU for On-line Subtitling of Parliament Meetings. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-23538-2_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics