Convolutional Neural Network for Refinement of Speaker Adaptation Transformation

  • Zbyněk Zajíc
  • Jan Zelinka
  • Jan Vaněk
  • Luděk Müller
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8773)


The aim of this work is to propose a refinement of the shift-MLLR (shift Maximum Likelihood Linear Regression) adaptation of an acoustics model in the case of limited amount of adaptation data, which can lead to ill-conditioned transformations matrices. We try to suppress the influence of badly estimated transformation parameters utilizing the Artificial Neural Network (ANN), especially Convolutional Neural Network (CNN) with bottleneck layer on the end. The badly estimated shift-MLLR transformation is propagated through an ANN (suitably trained beforehand), and the output of the net is used as the new refined transformation. To train the ANN the well and the badly conditioned shift-MLLR transformations are used as outputs and inputs of ANN, respectively.


ASR Adaptation shift-MLLR ANN CNN bottleneck 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language 12, 75–98 (1997)CrossRefGoogle Scholar
  2. 2.
    Povey, D., Yao, K.: A Basis Representation of Constrained MLLR Transforms for Robust Adaptation. Computer Speech & Language 26, 35–51 (2012)CrossRefGoogle Scholar
  3. 3.
    Li, Y., Erdogan, H., Gao, T., Marcheret, E.: Incremental on-line feature space MLLR adaptation for telephony speech recognition. In: 7th International Conference on Spoken Language Processing, pp. 1417–1420 (2002)Google Scholar
  4. 4.
    Zajíc, Z., Machlica, L., Müller, L.: Initialization of fMLLR with Sufficient Statistics from Similar Speakers. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 187–194. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  5. 5.
    Zajíc, Z., Machlica, L., Müller, L.: Bottleneck ANN: Dealing with small amount of data in shift-MLLR adaptation. In: Proc. of the IEEE 11th ICSP, pp. 507–510 (2012)Google Scholar
  6. 6.
    LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time-series. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks. MIT Press (1995)Google Scholar
  7. 7.
    Ciresan, D., Meier, U., Masci, J., Schmidhuber, J.: A Committee of Neural Networks for Traffic Sign Classification. In: Proc. of the IJCNN, pp. 1918–1921 (2011)Google Scholar
  8. 8.
    Gales, M.J.F.: The Generation and use of Regression class Trees for MLLR Adaptation, Techreport Cambridge University Engineering Department (1996)Google Scholar
  9. 9.
    Giuliani, D., Brugnara, F.: Acoustic model adaptation with multiple supervisions, TC-STAR Workshop on Speech-to-Speech Translation, pp. 151–154 (2006)Google Scholar
  10. 10.
    Parviainen, E.: Dimension Reduction for Regression with Bottleneck Neural Networks. In: Fyfe, C., Tino, P., Charles, D., Garcia-Osorio, C., Yin, H. (eds.) IDEAL 2010. LNCS, vol. 6283, pp. 37–44. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, USA (1996)Google Scholar
  12. 12.
    Grézl, F., Karafiát, M., Burget, L.: Investigation into bottle-neck features for meeting speech recognition. In: Interspeech, vol. 9, pp. 2947–2950 (2009)Google Scholar
  13. 13.
    Zelinka, J., Trmal, J., Müller, L.: Low-dimensional Space Transforms of Posteriors in Speech Recognition. In: Interspeech, vol. 10, pp. 1193–1196 (2010)Google Scholar
  14. 14.
    Igel, C., Hsken, M.: Improving the Rprop Learning Algorithm. In: Second International Symposium on Neural Computation, pp. 115–121 (2000)Google Scholar
  15. 15.
    Pollak, P., et al.: SpeechDat(E) - Eastern European Telephone Speech Databases, In: XLDB - Very Large Telephone Speech Databases, ELRA (2000)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Zbyněk Zajíc
    • 1
  • Jan Zelinka
    • 1
  • Jan Vaněk
    • 1
  • Luděk Müller
    • 1
  1. 1.Faculty of Applied Sciences, New Technologies for the Information SocietyUniversity of West Bohemia in PilsenPilsenCzech Republic

Personalised recommendations