Skip to main content

A Comparative Analysis of Bayesian Nonparametric Inference Algorithms for Acoustic Modeling in Speech Recognition

  • Conference paper
  • First Online:
Innovations and Advances in Computing, Informatics, Systems Sciences, Networking and Engineering

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 313))

  • 1927 Accesses

Abstract

Nonparametric Bayesian models have become increasingly popular in speech recognition for their ability to discover data’s underlying structure in an iterative manner. Dirichlet process mixtures (DPMs) are a widely used nonparametric method that do not require a priori assumptions about the structure of the data. DPMs, however, require an infinite number of parameters so inference algorithms are needed to make posterior calculations tractable. The focus of this work is an evaluation of three variational inference algorithms for acoustic modeling: Accelerated Variational Dirichlet Process Mixtures (AVDPM), Collapsed Variational Stick Breaking (CVSB), and Collapsed Dirichlet Priors (CDP).

A phoneme classification task is chosen to more clearly assess the viability of these algorithms for acoustic modeling. Evaluations were conducted on the CALLHOME English and Mandarin corpora, consisting of two languages that, from a human perspective, are phonologically very different. In this work, we show that these inference algorithms yield error rates comparable to a baseline Gaussian mixture model (GMM) but with a factor of 20 fewer mixture components. AVDPM is the most attractive choice because it delivers the most compact models and is computationally efficient, enabling its application to big data problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. C. Antoniak, “Mixtures of Dirichlet Process with Applications to Bayesian Nonparametric Problems,” The Annals of Statistics, vol. 2, no. 7, pp. 1152–1174, 1974.

    Article  MATH  MathSciNet  Google Scholar 

  2. K. Kurihara, “Collapsed variational Dirichlet process mixture models,” Int. Joint Conference on Artificial Intelligence, 2007, pp. 1–6.

    Google Scholar 

  3. K. Kurihara, M. Welling, and N. Vlassis, “Accelerated Variational Dirichlet Process Mixtures,” Advances in Neural Information Processing Systems, 1st ed., B. Scholkopf, J. Platt, and T. Hofmann, Eds. Boston, Massachusetts, USA: The MIT Press, 2007, pp. 1–8.

    Google Scholar 

  4. R. Neal, “Bayesian Mixture Modeling by Monte Carlo Simulation,” 1991.

    Google Scholar 

  5. J. Paisley, “Machine learning with Dirichlet and beta process priors: Theory and Applications,” Duke University, 2012.

    Google Scholar 

  6. C. E. Rasmussen, “The Infinite Gaussian Mixture Model,” Advances in Neural Inf. Proc. Systems, MIT Press, 2000, pp. 554–560.

    Google Scholar 

  7. D. Blei and M. Jordan, “Variational inference for Dirichlet process mixtures,” Bayesian Analysis, vol. 1, pp. 121–144, 2005.

    Article  MathSciNet  Google Scholar 

  8. L. Paul, G. Simons, and C. Fennig, “Ethnologue: Languages of the World,” 2009. [Online]. Available: http://www.ethnologue.com. [Accessed: 03-Feb-2013].

  9. “The History of Automatic Speech Recognition Evaluations at NIST,” NIST, 2009. [Online]. Available: http://www.itl.nist.gov/iad/mig/ publications/ASRhistory/index.html. [Accessed: 03-Feb-2013].

  10. W. Gu, K. Hirose, and H. Fujisaki, “Comparison of Perceived Prosodic Boundaries and Global Characteristics of Voice Fundamental Frequency Contours in Mandarin Speech,” Chinese Spoken Language Processing, 2006, pp. 31–42.

    Google Scholar 

  11. K.-F. Lee and H.-W. Hon, “Speaker-independent phone recognition using hidden Markov models,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 11, pp. 1641–1648, 1989.

    Google Scholar 

  12. A. Halberstadt and J. Glass, “Heterogeneous acoustic measurements for phonetic classification,” Proceedings of the International Conference on Spoken Language Processing, 1997, pp. 401–404.

    Google Scholar 

  13. M. Ager, Z. Cvetkovic, and P. Sollich, “Robust phoneme classification: Exploiting the adaptability of acoustic waveform models,” EUSIPCO, 2009.

    Google Scholar 

  14. “The CMU Pronouncing Dictionary,” 2008. [Online]. Available: https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict. [Accessed: 03-Feb-2013].

  15. C. Cieri, D. Miller, and K. Walker, “The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text,” Proceedings of the International Conference on Language Resources and Evaluation, 2004, pp. 69–71.

    Google Scholar 

Download references

Acknowledgements

This research was supported in part by the National Science Foundation through Major Research Instrumentation Grant No. CNS-09-58854. Our research group would also like to thank the Linguistic Data Consortium (LDC) for awarding a data scholarship to this project and providing the lexicon and transcripts for CALLHOME Mandarin.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Steinberg .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Steinberg, J., Harati, A., Picone, J. (2015). A Comparative Analysis of Bayesian Nonparametric Inference Algorithms for Acoustic Modeling in Speech Recognition. In: Sobh, T., Elleithy, K. (eds) Innovations and Advances in Computing, Informatics, Systems Sciences, Networking and Engineering. Lecture Notes in Electrical Engineering, vol 313. Springer, Cham. https://doi.org/10.1007/978-3-319-06773-5_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06773-5_61

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06772-8

  • Online ISBN: 978-3-319-06773-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics