Skip to main content
Log in

Basis-Based Speaker Adaptation Using Partitioned HMM Mean Parameters of Training Speaker Models

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

This paper presents the basis-based speaker adaptation method that includes approaches using principal component analysis (PCA) and two-dimensional PCA (2DPCA). The proposed method partitions the hidden Markov model (HMM) mean vectors of training models into subvectors of smaller dimension. Consequently, the sample covariance matrix computed using the partitioned HMM mean vectors has various dimensions according to the dimension of the subvectors. From the eigen-decomposition of the sample covariance matrix, basis vectors are constructed. Thus, the dimension of basis vectors varies according to the dimension of the sample covariance matrix, and the proposed method includes PCA and 2DPCA-based approaches. We present the adaptation equation in both the maximum likelihood (ML) and maximum a posteriori (MAP) frameworks. We perform continuous speech recognition experiments using the Wall Street Journal (WSJ) corpus. The results show that the model with basis vectors whose dimensions are between those of PCA and 2DPCA-based approaches shows good overall performance. The proposed approach in the MAP framework shows additional performance improvement over the ML counterpart when the number of adaptation parameters is large but the amount of available adaptation data is small. Furthermore, the performance of the approach in the MAP framework approach is less sensitive to the choice of model order than the ML counterpart.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2

Similar content being viewed by others

References

  1. Rabiner, L.R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

    Article  Google Scholar 

  2. Gales, M., & Young, S. (2008). The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing, 1(3), 195–304.

    Article  Google Scholar 

  3. Kuhn, R., Junqua, J.-C., Nguyen, P., & Niedzielski, N. (2000). Rapid speaker adaptation in eigenvoice space. IEEE Transactions on Audio, Speech, and Language Processing, 8(6), 695–707.

    Article  Google Scholar 

  4. Jolliffe, I.T. (2002). Principal Component Analysis, 2nd edn. New York: Springer.

    MATH  Google Scholar 

  5. Chen, S., & Zhu, Y. (2004). Subpattern-based principle component analysis. Pattern Recognition, 37(5), 1081–1083.

    Article  Google Scholar 

  6. Gottumukkal, R., & Asari, V.K. (2004). An improved face recognition technique based on modular PCA approach. Pattern Recognition Letters, 25(4), 429–436.

    Article  Google Scholar 

  7. Jeong, Y. (2013). Unified framework for basis-based speaker adaptation based on sample covariance matrix of variable dimension. Speech Communication, 55(2), 340–346.

    Article  Google Scholar 

  8. Yang, J., Zhang, D., Frangi, A.F., & Yang, J.-Y. (2004). Two-dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(1), 131–137.

    Article  Google Scholar 

  9. Paul, D.B., & Baker, J.M. (1992). The design for the Wall Street Journal-based CSR corpus. In Proceedings of DARPA speech and natural language workshop (pp. 357–362).

  10. Jeong, Y. (2012). Adaptation of hidden Markov models using model-as-matrix representation. IEEE Transactions on Audio, Speech, and Language Processing, 20(8), 2352–2364.

    Article  Google Scholar 

  11. Shan, S., Cao, B., Su, Y., Qing, L., Chen, X., & Gao, W. (2008). Unified principal component analysis with generalized covariance matrix for face recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–7).

  12. Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 39(1), 1–38.

    MathSciNet  MATH  Google Scholar 

  13. Leggetter, C.J., & Woodland, P.C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9(2), 171–185.

    Article  Google Scholar 

  14. Gupta, A.K., & Varga, T. (1993). Elliptically contoured models in statistics. Norwell: Kluwer.

    Book  MATH  Google Scholar 

  15. Gupta, A.K., & Nagar, D.K. Matrix Variate Distributions. Boca Raton: Chapman & Hall/CRC.

  16. Siohan, O., Chesta, C., & Lee, C.-H. (2001). Joint maximum a posteriori adaptation of transformation and HMM parameters. IEEE Transactions on Speech and Audio Processing, 9(14), 417–428.

    Article  Google Scholar 

  17. Parihar, N., & Picone, J. (2002). Aurora working group: DSR front end LVCSR evaluation AU/384/02. Technical Report, Institute for Signal and Information Processing, Mississippi State University.

  18. Gauvain, J.-L., & Lee, C.-H. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2(2), 291–298.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongwon Jeong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jeong, Y. Basis-Based Speaker Adaptation Using Partitioned HMM Mean Parameters of Training Speaker Models. J Sign Process Syst 82, 303–310 (2016). https://doi.org/10.1007/s11265-015-0996-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-015-0996-2

Keywords

Navigation