Unsupervised Speaker Adaptation Using Reference Speaker Weighting
Recently, we revisited the fast adaptation method called reference speaker weighting (RSW), and suggested a few modifications. We then showed that the algorithmically simplest technique actually outperformed conventional adaptation techniques like MAP and MLLR for 5- or 10-second supervised adaptation on the Wall Street Journal 5K task. In this paper, we would like to further investigate the performance of RSW in unsupervised adaptation mode, which is the more natural way of doing adaptation in practice. Moreover, various analyses were carried out on the reference speakers computed by the method.
KeywordsTest Utterance Maximum Likelihood Linear Regression Speaker Adaptation Speak Language Processing Test Speaker
Unable to display preview. Download preview PDF.
- 1.Hazen, T.J., Glass, J.R.: A comparison of novel techniques for instantaneous speaker adaptation. In: Proceedings of the European Conference on Speech Communication and Technology, pp. 2047–2050 (1997)Google Scholar
- 3.Kuhn, R., Nguyen, P., Junqua, J.-C., et al.: Eigenvoices for speaker adaptation. In: Proceedings of the International Conference on Spoken Language Processing, vol. 5, pp. 1771–1774 (1998)Google Scholar
- 4.Botterweck, H.: Very fast adaptation for large vocabulary continuous speech recognition using eigenvoices. In: Proceedings of the International Conference on Spoken Language Processing, vol. 4, pp. 354–357 (2000)Google Scholar
- 6.Mak, B., Lai, T.-C., Hsiao, R.: Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Toulouse, France, May 14-19 (2006)Google Scholar
- 9.Chen, K.T., Liau, W.W., Wang, H.M., Lee, L.S.: Fast speaker adaptation using eigenspace-based maximum likelihood linear regression. In: Proceedings of the International Conference on Spoken Language Processing, vol. 3, pp. 742–745 (2000)Google Scholar
- 10.Paul, D.B., Baker, J.M.: The design of the Wall Street Journal-based CSR corpus. In: Proceedings of the DARPA Speech and Natural Language Workshop (February 1992)Google Scholar