Advertisement

Unsupervised Speaker Adaptation Using Reference Speaker Weighting

  • Tsz-Chung Lai
  • Brian Mak
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4274)

Abstract

Recently, we revisited the fast adaptation method called reference speaker weighting (RSW), and suggested a few modifications. We then showed that the algorithmically simplest technique actually outperformed conventional adaptation techniques like MAP and MLLR for 5- or 10-second supervised adaptation on the Wall Street Journal 5K task. In this paper, we would like to further investigate the performance of RSW in unsupervised adaptation mode, which is the more natural way of doing adaptation in practice. Moreover, various analyses were carried out on the reference speakers computed by the method.

Keywords

Test Utterance Maximum Likelihood Linear Regression Speaker Adaptation Speak Language Processing Test Speaker 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hazen, T.J., Glass, J.R.: A comparison of novel techniques for instantaneous speaker adaptation. In: Proceedings of the European Conference on Speech Communication and Technology, pp. 2047–2050 (1997)Google Scholar
  2. 2.
    Hazen, T.J.: A comparison of novel techniques for rapid speaker adaptation. Speech Communications 31, 15–33 (2000)CrossRefGoogle Scholar
  3. 3.
    Kuhn, R., Nguyen, P., Junqua, J.-C., et al.: Eigenvoices for speaker adaptation. In: Proceedings of the International Conference on Spoken Language Processing, vol. 5, pp. 1771–1774 (1998)Google Scholar
  4. 4.
    Botterweck, H.: Very fast adaptation for large vocabulary continuous speech recognition using eigenvoices. In: Proceedings of the International Conference on Spoken Language Processing, vol. 4, pp. 354–357 (2000)Google Scholar
  5. 5.
    Kosaka, T., Matsunaga, S., Sagayama, S.: Speaker-independent speech recognition based on tree-structured speaker clustering. Journal of Computer Speech and Language 10, 55–74 (1996)CrossRefGoogle Scholar
  6. 6.
    Mak, B., Lai, T.-C., Hsiao, R.: Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Toulouse, France, May 14-19 (2006)Google Scholar
  7. 7.
    Gauvain, J.L., Lee, C.H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing 2(2), 291–298 (1994)CrossRefGoogle Scholar
  8. 8.
    Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Journal of Computer Speech and Language 9, 171–185 (1995)CrossRefGoogle Scholar
  9. 9.
    Chen, K.T., Liau, W.W., Wang, H.M., Lee, L.S.: Fast speaker adaptation using eigenspace-based maximum likelihood linear regression. In: Proceedings of the International Conference on Spoken Language Processing, vol. 3, pp. 742–745 (2000)Google Scholar
  10. 10.
    Paul, D.B., Baker, J.M.: The design of the Wall Street Journal-based CSR corpus. In: Proceedings of the DARPA Speech and Natural Language Workshop (February 1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tsz-Chung Lai
    • 1
  • Brian Mak
    • 1
  1. 1.Department of Computer Science and EngineeringThe Hong Kong University of Science & TechnologyHong Kong

Personalised recommendations