Abstract
Nonparametric Bayesian models have become increasingly popular in speech recognition for their ability to discover data’s underlying structure in an iterative manner. Dirichlet process mixtures (DPMs) are a widely used nonparametric method that do not require a priori assumptions about the structure of the data. DPMs, however, require an infinite number of parameters so inference algorithms are needed to make posterior calculations tractable. The focus of this work is an evaluation of three variational inference algorithms for acoustic modeling: Accelerated Variational Dirichlet Process Mixtures (AVDPM), Collapsed Variational Stick Breaking (CVSB), and Collapsed Dirichlet Priors (CDP).
A phoneme classification task is chosen to more clearly assess the viability of these algorithms for acoustic modeling. Evaluations were conducted on the CALLHOME English and Mandarin corpora, consisting of two languages that, from a human perspective, are phonologically very different. In this work, we show that these inference algorithms yield error rates comparable to a baseline Gaussian mixture model (GMM) but with a factor of 20 fewer mixture components. AVDPM is the most attractive choice because it delivers the most compact models and is computationally efficient, enabling its application to big data problems.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
C. Antoniak, “Mixtures of Dirichlet Process with Applications to Bayesian Nonparametric Problems,” The Annals of Statistics, vol. 2, no. 7, pp. 1152–1174, 1974.
K. Kurihara, “Collapsed variational Dirichlet process mixture models,” Int. Joint Conference on Artificial Intelligence, 2007, pp. 1–6.
K. Kurihara, M. Welling, and N. Vlassis, “Accelerated Variational Dirichlet Process Mixtures,” Advances in Neural Information Processing Systems, 1st ed., B. Scholkopf, J. Platt, and T. Hofmann, Eds. Boston, Massachusetts, USA: The MIT Press, 2007, pp. 1–8.
R. Neal, “Bayesian Mixture Modeling by Monte Carlo Simulation,” 1991.
J. Paisley, “Machine learning with Dirichlet and beta process priors: Theory and Applications,” Duke University, 2012.
C. E. Rasmussen, “The Infinite Gaussian Mixture Model,” Advances in Neural Inf. Proc. Systems, MIT Press, 2000, pp. 554–560.
D. Blei and M. Jordan, “Variational inference for Dirichlet process mixtures,” Bayesian Analysis, vol. 1, pp. 121–144, 2005.
L. Paul, G. Simons, and C. Fennig, “Ethnologue: Languages of the World,” 2009. [Online]. Available: http://www.ethnologue.com. [Accessed: 03-Feb-2013].
“The History of Automatic Speech Recognition Evaluations at NIST,” NIST, 2009. [Online]. Available: http://www.itl.nist.gov/iad/mig/ publications/ASRhistory/index.html. [Accessed: 03-Feb-2013].
W. Gu, K. Hirose, and H. Fujisaki, “Comparison of Perceived Prosodic Boundaries and Global Characteristics of Voice Fundamental Frequency Contours in Mandarin Speech,” Chinese Spoken Language Processing, 2006, pp. 31–42.
K.-F. Lee and H.-W. Hon, “Speaker-independent phone recognition using hidden Markov models,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 11, pp. 1641–1648, 1989.
A. Halberstadt and J. Glass, “Heterogeneous acoustic measurements for phonetic classification,” Proceedings of the International Conference on Spoken Language Processing, 1997, pp. 401–404.
M. Ager, Z. Cvetkovic, and P. Sollich, “Robust phoneme classification: Exploiting the adaptability of acoustic waveform models,” EUSIPCO, 2009.
“The CMU Pronouncing Dictionary,” 2008. [Online]. Available: https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict. [Accessed: 03-Feb-2013].
C. Cieri, D. Miller, and K. Walker, “The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text,” Proceedings of the International Conference on Language Resources and Evaluation, 2004, pp. 69–71.
Acknowledgements
This research was supported in part by the National Science Foundation through Major Research Instrumentation Grant No. CNS-09-58854. Our research group would also like to thank the Linguistic Data Consortium (LDC) for awarding a data scholarship to this project and providing the lexicon and transcripts for CALLHOME Mandarin.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Steinberg, J., Harati, A., Picone, J. (2015). A Comparative Analysis of Bayesian Nonparametric Inference Algorithms for Acoustic Modeling in Speech Recognition. In: Sobh, T., Elleithy, K. (eds) Innovations and Advances in Computing, Informatics, Systems Sciences, Networking and Engineering. Lecture Notes in Electrical Engineering, vol 313. Springer, Cham. https://doi.org/10.1007/978-3-319-06773-5_61
Download citation
DOI: https://doi.org/10.1007/978-3-319-06773-5_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06772-8
Online ISBN: 978-3-319-06773-5
eBook Packages: EngineeringEngineering (R0)