Fast Discriminative Training

  • Qi (Peter) LiEmail author
Part of the Signals and Communication Technology book series (SCT)


A good training algorithm for pattern recognition needs to satisfy two criteria. First, the objective function is associated to the desired performance, and second, the parameter estimation process derived from the objective is easy to compute using available computation resources and can converge in the required time. For example, the expectation-maximization (EM) algorithm guarantees in convergence but its objective is not to minimize the error rate which is desired by most applications. On the other hand, many new objective functions are very well defined to directly associate to desired performance, but are often too computationally complicated and may not be able to get the desired results in a reasonable amount of time. Therefore, for real applications, to define an objective and derive an estimation algorithm is a joint design process. This chapter presents an example where a discriminative objective was defined together with its fast training algorithm.


Feature Vector Hide Markov Model Gaussian Mixture Model Speaker Recognition Posteriori Probability 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bahl L. R., Brown P. F., de Souza P. V., Mercer R. L. “Maximum mutual information estimation of hidden Markov model parameters for speech recognition”. in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Tokyo) pp. 49–52, 1986Google Scholar
  2. 2.
    Ben-Yishai A., Burshtein D.: “A discriminative training algorithm for hidden Markov models”. IEEE Trans. on Speech and Audio Processing, May 2004Google Scholar
  3. 3.
    Bishop, C.: Neural networks for pattern recognition. Oxford Univ. Press, NY (1995)Google Scholar
  4. 4.
    Chou, W.: “Discriminant-function-based minimum recognition error rate pattern-recognition approach to speech recognition”. Proceedings of the IEEE 88, 1201–1222 (2000)CrossRefGoogle Scholar
  5. 5.
    Dempster, A. P., Laird, N. M., Rubin, D. B.: “Maximum likelihood from incomplete data via the EM algorithm”. Journal of Royal Statistical Society 39, 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Duda, R. O., Hart, P. E., Stork, D. G.: Pattern Classification. Second edn. Wiley, New York (2001)zbMATHGoogle Scholar
  7. 7.
    Gopalakrishnan, P. S., Kanevsky, D., Nadas, A., Nahamoo, D.: “An inequality for rational functions with applications to some statistical estimation problems”. IEEE Trans. on Information theory 37, 107–113 (1991)zbMATHCrossRefGoogle Scholar
  8. 8.
    Juang, B.-H., Katagiri, S.: “Discriminative learning for minimum error classification”. IEEE Transactions on Signal Processing 40, 3043–3054 (1992)zbMATHCrossRefGoogle Scholar
  9. 9.
    Kirkpatrick S., C. D. Gelatt, J., Vecchi, M. P.: “Optimization by simulated annealing”. Science 220:671–680 (1983)Google Scholar
  10. 10.
    Li Q., Juang B.-H. “Fast discriminative training for sequential observations with application to speaker identification”. in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Hong Kong), April 2003Google Scholar
  11. 11.
    Li, Q., Juang, B.-H.: “Study of a fast discriminative training algorithm for pattern recognition”. IEEE Trans. on Neural Networks 17, 1212–1221 (2006)CrossRefGoogle Scholar
  12. 12.
    Li, Q., Zheng, J., Tsai, A., Zhou, Q.: “Robust endpoint detection and energy normalization for real-time speech and speaker recognition”. IEEE Trans. on Speech and Audio Processing 10, 146–157 (2002)CrossRefGoogle Scholar
  13. 13.
    Markov K., Nakagawa S. “Discriminative training of GMM using a modified EM algorithm for speaker recognition”. in Proc. ICSLP, 1998Google Scholar
  14. 14.
    Markov K., Nakagawa S., Nakamura S. “Discriminative training of HMM using maximum normalized likelihood algorithm”. in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 497–500, 2001Google Scholar
  15. 15.
    Max, B., Tam, Y.-C., Li, Q.: “Discriminative auditory features for robust speech recognition”. IEEE Trans. on Speech and Audio Processing 12, 27–36 (2004)CrossRefGoogle Scholar
  16. 16.
    Mora-Jimenez, I., Cid-Sueiro, J.: “A universal learning rule that minimize well-formed cost functinos”. IEEE Trans. On Neural Networks 16, 810–820 (2005)CrossRefGoogle Scholar
  17. 17.
    Normandin, Y., Cardin, R., Mori, R. D.: “High-performance connected digit recognition using maximum mutual information estimation”. IEEE Trans. on Speech and Audio Processing 2, 299–311 (1994)CrossRefGoogle Scholar
  18. 18.
    Reynolds, D., Rose, R. C.: “Robust text-independent speaker identification using Gaussian mixture speaker models”. IEEE Trans. on Speech and Audio Processing 3, 72–83 (1995)CrossRefGoogle Scholar
  19. 19.
    Robinson, M., Azimi-Sadjadi, M. R., Salazar, J.: “Multi-aspect target discrimination using hidden Markov models and neural networks”. IEEE Trans. On Neural Networks 16, 447–459 (2005)CrossRefGoogle Scholar
  20. 20.
    Werbos, P. J.: The roots of backpropagation: from ordered derivatives to neural networks and political forecasting. Wiley, New York (1994)Google Scholar
  21. 21.
    Wu, W., Feng, G., Li, Z., Xu, Y.: “Deterministic convergence of an online gradient method for BP networks”. IEEE Trans. On neural Networks 16, 533–540 (2005)CrossRefGoogle Scholar
  22. 22.
    Yin Y., Li Q. “Soft frame margin estimation of Gaussian mixture models for speaker recognition with sparse training data”. in ICASSP 2011,(2011)Google Scholar
  23. 23.
    Yu, X., Efe, M. O., kaynak, O.: “A general backpropagation algorithm for feedforward neural networks learning”. IEEE Trnas. On Neural Networks 13, 251–254 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg  2012

Authors and Affiliations

  1. 1.Li Creative Technologies (LcT), IncFlorham ParkUSA

Personalised recommendations