An Overview of Automatic Speech Recognition

Rabiner, L. R.; Juang, B.-H.; Lee, C.-H.

doi:10.1007/978-1-4613-1367-0_1

L. R. Rabiner³,
B.-H. Juang³ &
C.-H. Lee³

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 355))

484 Accesses
26 Citations

Abstract

For the past two decades, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Speech recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems on personal computers, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. In this chapter we review some of the key advances in several areas of automatic speech recognition. We also briefly discuss the requirements in designing successful real-world applications and address technical challenges that need to be faced in order to reach the ultimate goal of providing an easy-to-use, natural, and flexible voice interface between people and machines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Acero and R. Stern, “Environmental Robustness in Automatic Speech Recognition”, Proc. ICASSP-90, pp. 849–852, 1990.
Google Scholar
B. S. Atal, “Efficient Coding of LPC Parameters by Temporal Decomposition,” Proc. ICASSP-83, Boston, pp. 81–84, 1983.
Google Scholar
L. R. Bahl, F. Jelinek and R. L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Trans. Pattern Analysis, Machine Intelligence, Vol. 5, pp. 179–190, 1983.
Article Google Scholar
L. R. Bahl, P. F. Brown, P. V. de Souza and R. L. Mercer, “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition,” Proc. ICASSP-86, Tokyo, pp. 49–52, 1986.
Google Scholar
L. R. Bahl, P. F. Brown, P. V. de Souza and R. L. Mercer, “Tree-Based Language Model for Natural Language Speech Recognition,” IEEE Trans. Acons., Speech, Signal Proc, Vol. 37, pp. 1001–1008, 1989.
Article Google Scholar
L. R. Bahl, J. R. Bellegarda, P. V. de Sousa, P. S. Gopalakrishnan, D. Nahamoo and M. A. Picheny, “Multonic Markov Word Models for Large Vocabulary Continuous Speech Recognition,” IEEE Trans. Speech and Audio Processing, Vol. 1, pp. 334–344, 1993.
Article Google Scholar
L. R. Bahl, S. V. de Gennaro, P. S. Gopalakrishnan and R. L. Mercer, “A Fast Approximate Acoustic Match for Large Vocabulary Speech Recognition,” IEEE Trans. Speech and Audio Proc, Vol. 1, pp. 59–67, 1993.
Article Google Scholar
L. E. Baum, T. Petrie, G. Soules and N. Weiss, “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains,” Annal Math. Stat, Vol. 41, pp. 164–171, 1970.
Article MathSciNet MATH Google Scholar
J. R. Bellegarda and D. Nahamoo, “Tied Mixture Continuous Parameter Modeling for Speech Recognition,” IEEE Trans. Acoustics, Speech, Signal Proc, Vol. 38, pp. 2033–2045, 1990.
Article Google Scholar
J. R. Bellegarda, P. V. de Sousa, A. Nadas, D. Nahamoo, M. A. Picheny and L. R. Bahl, “The Metamorphic Algorithm: A Speaker Mapping Approach to Data Augmentation,” IEEE Trans. Speech and Audio Proc, Vol. 2, pp. 413–420, 1994.
Article Google Scholar
A. Biem, S. Katagiri and B.-H. Juang, “Discriminative Feature Extraction for Speech Recognition,” Proc. IEEE NN-SP Workshop, 1993.
Google Scholar
H. Bourlard and C. J. Wellekens, “Links between Markov Models and Multi-Layer Perceptron,” IEEE Trans. Pattern Analysis, Machine Intelligence, Vol. 12, pp. 1167–1178, 1992.
Article Google Scholar
H. Bourlard and N. Morgan, Connectionist Speech Recognition — A Hybrid Approach, Kluwer Academic Publishers, 1994.
Google Scholar
W. Chou, B.-H. Juang and C.-H. Lee, “Segmental GPD Training of HMM Based Speech Recognizer,” Proc. ICASSP-92, pp. I–473–476, 1992.
Google Scholar
W. Chou, C.-H. Lee and B.-H. Juang, “Minimum Error Rate Training Based on the N-Best String Models,” Proc. ICASSP, pp. II-652–655, 1993.
Google Scholar
S. J. Cox and J. S. Bridle, “Unsupervised Speaker Adaptation by Probabilistic Fitting,” Proc. ICASSP-89, Glasgow, pp. 294–297, 1989.
Google Scholar
S. B. Davis and P. Mermelstein, “Comparison of Parametric Representations of Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Trans. Acous., Speech, Signal Proc, Vol. 28, pp. 357–366, 1980.
Article Google Scholar
L. Deng, “A Stochastic Model of Speech Incorporating Hierarchical Non-stationality,” IEEE Trans. Speech and Audio Proc, Vol. 1, pp. 471–475, 1993.
Article Google Scholar
L. Deng and D. Sun, “A Statistical Approach to Automatic Speech Recognition Using the Atomic Speech Units Constructed from Overlapping Articulator Features,” J. Acous. Soc Am., Vol. 95, pp. 2702–2719, 1994.
Article Google Scholar
V. V. Digalakis, J. R. Rohlicek and M. Ostendorf, “ML Estimation of a Stochastic Linear System with the EM Algorithm and Its Application to Speech Recognition,” IEEE Trans. Speech and Audio Proc, Vol. 1, pp. 431–442, 1993.
Article Google Scholar
V. V. Digalakis, D. Rtischev and L. G. Nuemeyer, “Speaker Adaptation Using Constrained Estimation of Gaussian Mixtures,” IEEE Trans. Speech and Audio Proc, Vol. 3, pp. 357–366, 1995.
Article Google Scholar
J. L. Flanagan, Speech Analysis, Synthesis and Perception, 2nd edition, Springer-Verlag, 1972.
Google Scholar
G. Fant, Speech Sounds and Features, MIT Press, 1973.
Google Scholar
S. Furui, “Speaker-Independent Isolated Word Recognition Using Dynamic Features of Speech Spectrum,” IEEE Trans. Acous., Speech, Signal Proc, Vol. 34, pp. 52–59, 1986.
Article Google Scholar
S. Furui, “Unsupervised Speaker Adaptation Method Based on Hierarchical Spectral Clustering,” Proc. ICASSP-89, Glasgow, pp. 286–289, 1989.
Google Scholar
M. J. F. Gales and S. J. Young, “Parallel model combination for speech recognition in noise,” Technical Report, CUED/F-INFENG/TR135, 1993.
Google Scholar
J.-L. Gauvain and C.-H. Lee, “Bayesian Learning for Hidden Markov Models With Gaussian Mixture State Observation Densities,” Speech Communication, Vol. 11, Nos. 2–3, pp. 205–214, 1992.
Article Google Scholar
J.-L. Gauvain and C.-H. Lee, “Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Trans. Speech and Audio Proc, Vol. 2, pp. 291–298, 1994.
Article Google Scholar
O. Ghitza, “Auditory Nerve Feedback as a Basis for Speech Processing,” Proc. ICASSP-88, pp. 91–94, 1988.
Google Scholar
Y. Gong and J.-P. Haton, “Stochastic Trajectory Modeling for Speech Recognition,” Proc. ICASSP-94, pp. 57–60, 1994.
Google Scholar
H. Hattori and S. Sagayama, “Vector Field Smoothing Principle for Speaker Adaptation,” Proc. ICSLP-92, Banff, pp. 381–384, 1992.
Google Scholar
H.-W. Hon and K.-F. Lee, “Vocabulary Learning and Environmental Normalization in Vocabulary-Independent Speech Recognition”, Proc. ICASSP-92, pp. I–485–488, 1992.
Google Scholar
X. Huang and M. A. Jack, “Semi-continuous hidden Markov models for speech signal,” Computer, Speech and Language, Vol. 3, pp. 239–251, 1989.
Article Google Scholar
M. Hwang and X. Huang, “Share-Distribution Hidden Markov Models for Speech Recognition,” IEEE Trans. Speech and Audio Proc, Vol. 1, pp. 414–420, 1993.
Article Google Scholar
F. Jelinek and R. L. Mercer, “Interpolated Estimation of Markov Source Parameters from Sparse Data,” in Pattern Recognition in Practice, edited by E. S. Gelsema and L.N. Kanal, North-Holland, pp. 381–397, 1980.
Google Scholar
F. Jelinek, “The Development of an Experimental Discrete Dictation Recognizer,” Proc. IEEE, Vol. 73, pp. 1616–1624, 1985.
Article Google Scholar
B.-H. Juang, “Maximum-Likelihood Estimation for Mixture Multivariate Stochastic Observations of Markov Chains,” AT&T Technical Journal, Vol. 64, 1985.
Google Scholar
B.-H. Juang, “Speech Recognition in Adverse Conditions,” Computer, Speech and Language, Vol. 5, pp. 275–294, 1991.
Article Google Scholar
B.-H. Juang and S. Katagiri, “Discriminative Learning for Minimum Error Classification,” IEEE Trans. Signal Proc, Vol. 40, pp. 3043–3054, 1992.
Article MATH Google Scholar
J.-C. Junqua, H. Wakita and H. Hermansky, “Evaluation and Optimization of Perceptually-Based ASR Front-End,” IEEE Trans. Speech and Audio Proc, Vol. 1, pp. 39–48, 1993.
Article Google Scholar
S. Katagiri, C.-H. Lee and B.-H. Juang, “New Discriminative Training Algorithms Based on the Generalized Probabilistic Descent Method,” Proc IEEE NN-SP Workshop pp. 299–308, 1991.
Google Scholar
P. Kenny, et al., “A*-Admissible Heuristics for Rapid Lexical Access,” IEEE Trans. Speech and Audio, Vol. 1, pp. 49–58, 1993.
Article Google Scholar
C.-H. Lee, F. K. Soong and B.-H. Juang, “A Segment Model Based Approach to Speech Recognition”, Proc ICASSP-88, pp. 501–504, 1988.
Google Scholar
C.-H. Lee, L. R. Rabiner, R. Pieraccini and J. G. Wilpon, “Acoustic modeling for large vocabulary speech recognition,” Computer Speech and Language, Vol. 4, pp. 127–165, 1990.
Article Google Scholar
C.-H. Lee, C.-H. Lin and B.-H. Juang, “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models,” IEEE Trans. Acous., Speech, Signal Proc, Vol. 39, pp. 806–814, 1991.
Google Scholar
K.-F. Lee, Automatic Speech Recognition —The Development of the SPHINX-System, Kluwer Academic Publishers, Boston, 1989.
Google Scholar
C. J. Leggetter and P. C. Woodland, “Speaker Adaptation of Continuous Density HMMs Using Linear Regression,” Proc ICSLP-94, 1994.
Google Scholar
S. E. Levinson, “Structural Methods in Automatic Speech Recognition,” Proc IEEE, Vol. 73, pp. 1625–1650, 1985.
Article Google Scholar
A. Ljolje and M. D. Riley, “Optimal Speech Recognition Using Phone Recognition and Lexical Access,” Proc. ICSLP-92, pp. 313–316, 1992.
Google Scholar
L. R. Liporace, “Maximum Likelihood Estimation for Multivariate Observations of Markov Sources,” IEEE Trans. Information Theory, Vol. 28, pp. 729–734, 1982.
Article MathSciNet MATH Google Scholar
F.-H. Liu, A. Acero and R. M. Stern, “Efficient Joint Compensation of Speech for the Effect of Additive Noise and Linear Filtering,” Proc. ICASSP-92, pp. I–257–260, 1992.
Google Scholar
N. Merhav and C.-H. Lee, “A Minimax Classification Approach with Application to Robust Speech Recognition,” IEEE Trans. Speech and Audio, Vol. 1, pp. 90–100, 1993.
Article Google Scholar
H. Murveit, J. Butzberger, V. Digalakis and M. Weintraub, “Large-Vocabulary Dictation Using SRI’s DECIPHER Speech Recognition System: Progressive Search Techniques,” Proc. ICASSP, pp. 11–319–322, 1993.
Google Scholar
H. Ney, “Dynamic Programming Parsing for Context-Free Grammar in Continuous Speech Recognition,” IEEE Trans. Signal Proc, Vol. 39, pp. 336–340, 1991.
Article MATH Google Scholar
H. Ney, R. Haeb-Umbach, B.-H. Tran and M. Oerder, “Improvement in Beam Search for 10,000-Word Continuous Speech Recognition,” Proc. ICASSP-92, pp. I–9–12, 1992.
Google Scholar
Y. Normandin and D. Morgera, “An Improved MMIE Training Algorithm for Speaker-Independent Small Vocabulary, Continuous Speech Recognition,” Proc. ICASSP-91, pp. 537–540, 1991.
Google Scholar
M. Ostendorf and S. Roukos, “A Stochastic Segment Model for Phoneme-Based Continuous Speech Recognition,” IEEE Trans. Acous., Speech, Signal Proc, Vol. 37, pp. 1857–1869, 1989.
Article Google Scholar
D. B. Paul, “Algorithm for an Optimal A* Search and Linearizing the Search in the Stack Decoder,” Proc. ICASSP-91, pp. 693–696, 1991.
Google Scholar
S. Parthasarathy and C.-H. Coker, “On Automatic Estimation of Articulator Parameters in a Text-to-Speech System,” Computer, Speech and Language, Vol. 6, pp. 37–75, 1992.
Article Google Scholar
L. R. Rabiner, J. G. Wilpon and B.-H. Juang, “A Segmental K-Means Training Procedure for Connected Word Recognition,” AT&T Tech. Journal, Vol. 65, pp. 21–31, 1986.
Google Scholar
L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, Vol. 77, pp. 257–286, 1989.
Article Google Scholar
L. R. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
Google Scholar
M. Rahim and B.-H. Juang, “Signal Bias Removal for Robust Telephone Speech Recognition in Adverse Environments”, Proc. ICASSP-94, pp. 445–448, 1994.
Google Scholar
M. Rahim, C.-H. Lee and B.-H. Juang, “Robust Utterance Verification for Connected Digit Recognition,” ICASSP-95, pp. 285–288, 1995.
Google Scholar
M. Rahim and C.-H. Lee, “An Integrated ANN-HMM Speech Recognition System Based on Minimum Classification Error Training”, Proc. IEEE ASR Workshop, 1995.
Google Scholar
M. D. Riley, “A Statistical Model for Generating Pronunciation Networks,” Proc. ICASSP-91, Vol. 2, pp. 737–740, 1991.
Google Scholar
A. Robinson, “An Application of Recurrent Nets to Phone Probability Estimation,” IEEE Trans. Neural Networks, Vol. 5, pp. 298–305, 1994.
Article Google Scholar
J. R. Rohlicek, “Word Spotting”, in Modern Methods of Speech Processing, edited by R. Ramachandran and R. Mammone, Kluwer Academic Publishers, 1995.
Google Scholar
R. C. Rose and E. M. Hofstetter, “Task-Independent Wordspotting Using Decision Tree Based Allophone Clustering,” Proc. ICASSP-93, pp. 11–467–470, 1993.
Google Scholar
R. C. Rose, E. M. Hofstetter and D. A. Reynold, “Integrated Models of Speech and Background with Application to Speaker Identification in Noise,” IEEE Trans. Speech and Audio, Vol. 2, pp. 245–257, 1994.
Article Google Scholar
H. Sakoe and S. Chiba, “Dynamic Programming Optimization for Spoken Word Recognition,” IEEE Trans. Acous., Speech, Signal Proc, Vol. 26, pp. 52–59, 1978.
Google Scholar
A. Sankar and C.-H. Lee, “Stochastic Matching for Robust Speech Recognition,” IEEE Signal Processing Letter, pp. 124–125, Vol. 1, 1994.
Article Google Scholar
R. Schwartz, Y.-L. Chow and F. Kubala, “Rapid Speaker Adaptation Using a Probabilistic Spectral Mapping,” Proc. ICASSP, pp. 633–636, 1987.
Google Scholar
R. Schwartz and Y.-L. Chow, “The JV-Best Algorithm: An Efficient and Exact Procedure for Finding the N Most Likely Sentence Hypotheses,” Proc. ICASSP-90, pp. 81–84, 1990.
Google Scholar
R. Schwartz, S. Austin, F. Kubala, J. Makhoul, L. Nguyen and P. Placeway, “New Uses for the TV-Best Sentence Hypotheses within The BBN BYBLOS Continuous Speech Recognition System,” Proc. ICASSP-92, pp. I–1–4, 1992.
Google Scholar
S. Seneff, “A Joint Synchrony/Mean-Rate Model of Auditory Speech Processing,” J. Phonetics, Vol. 16, pp. 55–76, 1988.
Google Scholar
F. K. Soong and E. F. Huang, “A Tree-Trellis Based Fast Search for Finding the JV-Best Sentence Hypotheses in Continuous Speech Recognition,” Proc. ICASSP-91, pp. 703–706, 1991.
Google Scholar
R. Sukkar, C.-H. Lee and B.-H. Juang, “A Vocabulary-Independent Discriminatively Trained Method for Rejection of Non-Keywords in Subword Based Speech Recognition”, Proc. EuroSpeech-95, Madrid, 1995.
Google Scholar
J. Takami and S. Sagayama, “A Successive State Splitting Algorithm for Efficient Allophone Modeling,” Proc. ICASSP-92, pp. I–573–576, 1992.
Google Scholar
A. P. Varga and R. K. Moore, “Hidden Markov Model Decomposition of Speech and Noise,” Proc. ICASSP-90, pp. 845–848, 1990.
Google Scholar
J. G. Wilpon, L. R. Rabiner, C.-H. Lee, and E. R. Goldman, “Automatic Recognition of Keywords in Unconstrained Speech Using Hidden Markov Models,” IEEE Trans. Acous., Speech, Signai Proc, Vol. 38, pp. 1870–1878, 1990.
Article Google Scholar
S. J. Young, J. J. Odell and P. C. Woodland, “Tree-Based State Tying for High Accuracy Acoustic Modeling,” Proc. ARPA Human Language Technology Workshop, Princeton, 1994.
Google Scholar
G. Zavaliagkos, Y. Zhao, R. Schwartz and J. Makhoul, “A Hybrid Segmental Neural Net/Hidden Markov Model System for Continuous Speech Recognition,” IEEE Trans. Speech and Audio, Vol. 2, pp. 151–160, 1994.
Article Google Scholar
Y. Zhao, “A New Speaker Adaptation Technique Using Very Short Calibration Speech,” Proc. ICASSP-93, pp. 11–592–595, 1993.
Google Scholar
V. Zue, J. Glass, M. Phillips and S. Seneff, “The MIT Summit Speech Recognition System: A Progress Report,” Proc. DARPA Speech and Natural Language Workshop, pp. 179–189, 1989.
Google Scholar

Download references

Author information

Authors and Affiliations

AT&T Bell Laboratories, Murray Hill, NJ, 07974, USA
L. R. Rabiner, B.-H. Juang & C.-H. Lee

Authors

L. R. Rabiner
View author publications
You can also search for this author in PubMed Google Scholar
B.-H. Juang
View author publications
You can also search for this author in PubMed Google Scholar
C.-H. Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

AT&T Bell Laboratories, Murray Hill, NJ, 07974, USA
Chin-Hui Lee & Frank K. Soong &
School of Microelectronic Engineering, Griffith University, Australia
Kuldip K. Paliwal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rabiner, L.R., Juang, BH., Lee, CH. (1996). An Overview of Automatic Speech Recognition. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_1

Download citation

DOI: https://doi.org/10.1007/978-1-4613-1367-0_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8590-8
Online ISBN: 978-1-4613-1367-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics