Unsupervised Speaker Adaptation for Phonetic Transcription Based Voice Dialing
Since the speaker independent phoneme HMM based voice dialing system uses only the phoneme transcription of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the speaker dependent system due to the phoneme recognition errors generated when the speaker independent models are used. In order to solve this problem, a new method that jointly estimates the transformation vectors (bias) and transcriptions for the speaker adaptation is presented. The biases and transcriptions are estimated iteratively from the training data of each user with maximum likelihood approach to the stochastic matching using speaker independent phoneme models. Experimental result shows that the proposed method is superior to the conventional method using transcriptions only.
KeywordsSpeech Recognition Transformation Vector Word Error Rate Input Sentence Speaker Adaptation
Unable to display preview. Download preview PDF.
- 1.Jain, N., Cole, R., Barnard, E.: Creating Speaker-Specific Phonetic Templates with a Speaker-Independent Phonetic Recognizer: Implications for Voice Dialing. In: Proc. of ICASSP 1996, pp. 881–884 (1996)Google Scholar
- 2.Fontaine, V., Bourlard, H.: Speaker-Dependent Speech Recognition Based on Phone-Like Units Models-Application to Voice Dialing. In: Proc. of ICASSP 1997, pp. 1527–1530 (1997)Google Scholar
- 3.Ramabhadran, B., Bahl, L.R., deSouza, P.V.: Acoustic-Only Based Automatic Phonetic Baseform Generation. In: Proc. of ICASSP 1998, pp. 2275–2278 (1998)Google Scholar
- 4.Shozakai, M.: Speech Interface for Car Applications. In: Proc. of ICASSP 1999, pp. 1386–1389 (1999)Google Scholar
- 5.Zavaliagkos, G., Schwartz, R., Makhoul, J.: Batch, Incremental and Instantaneous Adaptation Techniques for Speech Recognition. In: Proc. of ICASSP 1995, pp. 676–679 (1995)Google Scholar