Abstract
It has been experimentally demonstrated that optimized multi-stream based speech recognisers can perform substantially better than the corresponding conventional systems on some particular recognition tasks. Typically, those applications present critical robustness problems, with the speech signal affected by noise that is localised in the acoustic space. In general, substantial recognition improvement is obtained extracting multiple feature streams that encode highly complementary information related with the speech signal. The main goal of this experimental study is to assess the potential of the multi-stream statistical formalism on standard clean speech recognition tasks not particularly favourable to this approach and, adding to this, intentionally using highly correlated feature streams. Notwithstanding, it is here demonstrated that a careful design of the streams recombination model, adapting their local influence on the decoding process according to several information sources, can lead to significant performance gains comparing to the single-stream corresponding systems.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bourlard, H., Dupont, S., Ris, C.: Multi-stream speech recognition. IDIAP Research Report. Mons (1996)
Bourlard, H., Dupont, S.: Subband-based speech recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Munich (1997)
Dupont, S.: Using the multi-stream approach for continuous AVSR: experiments on the M2VTS database. FPMS-TCTS Tech. Report Mons (1998)
Bourlard, H., Dupont, S.: Using multiple time scales in a multi-stream speech recognition system. In: Proc. Eurospeech 1997, Rhodes, pp. 3–6 (1997)
Wellekens, C., Kangasharju, J., Milesi, C.: The use of meta-HMM in multistream HMM training for ASR. In: Int. Conf. on Speech and Language Processing (1998)
Immerseel, L., Martens, J.-P.: Pitch and voiced/unvoiced determination with an auditory model. J. Acoustic Soc. Am. 72, 3511–3526 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pera, V., Martens, JP. (2003). Hard-Testing the Multi-stream Approach to Automatic Speech Recognition. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_45
Download citation
DOI: https://doi.org/10.1007/978-3-540-39398-6_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20024-6
Online ISBN: 978-3-540-39398-6
eBook Packages: Springer Book Archive