A Comparative Analysis of Bayesian Nonparametric Inference Algorithms for Acoustic Modeling in Speech Recognition

Steinberg, John; Harati, Amir; Picone, Joseph

doi:10.1007/978-3-319-06773-5_61

John Steinberg³,
Amir Harati³ &
Joseph Picone³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 313))

1927 Accesses

Abstract

Nonparametric Bayesian models have become increasingly popular in speech recognition for their ability to discover data’s underlying structure in an iterative manner. Dirichlet process mixtures (DPMs) are a widely used nonparametric method that do not require a priori assumptions about the structure of the data. DPMs, however, require an infinite number of parameters so inference algorithms are needed to make posterior calculations tractable. The focus of this work is an evaluation of three variational inference algorithms for acoustic modeling: Accelerated Variational Dirichlet Process Mixtures (AVDPM), Collapsed Variational Stick Breaking (CVSB), and Collapsed Dirichlet Priors (CDP).

A phoneme classification task is chosen to more clearly assess the viability of these algorithms for acoustic modeling. Evaluations were conducted on the CALLHOME English and Mandarin corpora, consisting of two languages that, from a human perspective, are phonologically very different. In this work, we show that these inference algorithms yield error rates comparable to a baseline Gaussian mixture model (GMM) but with a factor of 20 fewer mixture components. AVDPM is the most attractive choice because it delivers the most compact models and is computationally efficient, enabling its application to big data problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

C. Antoniak, “Mixtures of Dirichlet Process with Applications to Bayesian Nonparametric Problems,” The Annals of Statistics, vol. 2, no. 7, pp. 1152–1174, 1974.
Article MATH MathSciNet Google Scholar
K. Kurihara, “Collapsed variational Dirichlet process mixture models,” Int. Joint Conference on Artificial Intelligence, 2007, pp. 1–6.
Google Scholar
K. Kurihara, M. Welling, and N. Vlassis, “Accelerated Variational Dirichlet Process Mixtures,” Advances in Neural Information Processing Systems, 1st ed., B. Scholkopf, J. Platt, and T. Hofmann, Eds. Boston, Massachusetts, USA: The MIT Press, 2007, pp. 1–8.
Google Scholar
R. Neal, “Bayesian Mixture Modeling by Monte Carlo Simulation,” 1991.
Google Scholar
J. Paisley, “Machine learning with Dirichlet and beta process priors: Theory and Applications,” Duke University, 2012.
Google Scholar
C. E. Rasmussen, “The Infinite Gaussian Mixture Model,” Advances in Neural Inf. Proc. Systems, MIT Press, 2000, pp. 554–560.
Google Scholar
D. Blei and M. Jordan, “Variational inference for Dirichlet process mixtures,” Bayesian Analysis, vol. 1, pp. 121–144, 2005.
Article MathSciNet Google Scholar
L. Paul, G. Simons, and C. Fennig, “Ethnologue: Languages of the World,” 2009. [Online]. Available: http://www.ethnologue.com. [Accessed: 03-Feb-2013].
“The History of Automatic Speech Recognition Evaluations at NIST,” NIST, 2009. [Online]. Available: http://www.itl.nist.gov/iad/mig/ publications/ASRhistory/index.html. [Accessed: 03-Feb-2013].
W. Gu, K. Hirose, and H. Fujisaki, “Comparison of Perceived Prosodic Boundaries and Global Characteristics of Voice Fundamental Frequency Contours in Mandarin Speech,” Chinese Spoken Language Processing, 2006, pp. 31–42.
Google Scholar
K.-F. Lee and H.-W. Hon, “Speaker-independent phone recognition using hidden Markov models,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 11, pp. 1641–1648, 1989.
Google Scholar
A. Halberstadt and J. Glass, “Heterogeneous acoustic measurements for phonetic classification,” Proceedings of the International Conference on Spoken Language Processing, 1997, pp. 401–404.
Google Scholar
M. Ager, Z. Cvetkovic, and P. Sollich, “Robust phoneme classification: Exploiting the adaptability of acoustic waveform models,” EUSIPCO, 2009.
Google Scholar
“The CMU Pronouncing Dictionary,” 2008. [Online]. Available: https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict. [Accessed: 03-Feb-2013].
C. Cieri, D. Miller, and K. Walker, “The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text,” Proceedings of the International Conference on Language Resources and Evaluation, 2004, pp. 69–71.
Google Scholar

Download references

Acknowledgements

This research was supported in part by the National Science Foundation through Major Research Instrumentation Grant No. CNS-09-58854. Our research group would also like to thank the Linguistic Data Consortium (LDC) for awarding a data scholarship to this project and providing the lexicon and transcripts for CALLHOME Mandarin.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Temple University, Philadelphia, PA, USA
John Steinberg, Amir Harati & Joseph Picone

Authors

John Steinberg
View author publications
You can also search for this author in PubMed Google Scholar
Amir Harati
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Picone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Steinberg .

Editor information

Editors and Affiliations

Engineering and Computer Science, University of Bridgeport, Bridgeport, Connecticut, USA
Tarek Sobh
Computer Science and Engineering, University of Bridgeport, Bridgeport, Connecticut, USA
Khaled Elleithy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Steinberg, J., Harati, A., Picone, J. (2015). A Comparative Analysis of Bayesian Nonparametric Inference Algorithms for Acoustic Modeling in Speech Recognition. In: Sobh, T., Elleithy, K. (eds) Innovations and Advances in Computing, Informatics, Systems Sciences, Networking and Engineering. Lecture Notes in Electrical Engineering, vol 313. Springer, Cham. https://doi.org/10.1007/978-3-319-06773-5_61

Download citation

DOI: https://doi.org/10.1007/978-3-319-06773-5_61
Published: 16 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06772-8
Online ISBN: 978-3-319-06773-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics