Abstract
There are basically two application modes for automatic speech recognition (ASR): using speech as spoken input or as knowledge source. Spoken input addresses applications like dictation systems and navigation (transactional) systems. Using speech as a knowledge source has applications like multimedia indexing systems. The chapter presents the stages of speech recognition process, resources of ASR, role and functions of speech engine—like, Julius speech recognition engine, voice-over web resources, ASR algorithms, language model and acoustic models—like HMM (hidden Markov models). Many open-source tools like—Kaldi speech recognition toolkit, CMU-Sphinx, HTK, and Deep speech tools’ introduction, and guidelines for their usages are presented. These tools have interfaces with high-level languages like C/C++ and Python. The is followed with chapter summary and set of exercises.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
for example, the word “speech” can be also pronounced as “spee...ech”, repeating the sound of ’e’, which creates a self-loop.
- 2.
chanting of Aum is a common practice during meditation and yoga (IPA:/ɐwm/), where sound of IPA ’‘m’ is repeated.
- 3.
Lombard effect: Involuntary tendency of speakers to increase their vocal effort particularly when speaking in loud background to enhance the audibility of their voice. Due to the Lombard effect, not only the loudness increases, but also the other acoustic features such as pitch, rate, and duration of syllables. The Lombard effect also results in an increase in the signal-to-noise ratio of the speaker’s signal.
References
Hannun A et al (2014) Deep Speech: Scaling up end-to-end speech recognition. https://arxiv.org/abs/1412.5567. Accessed Dec 19, 2017
http://julius.osdn.jp/book/Julius-3.2-book-e.pdf. Accessed Dec 19, 2017
http://kaldi.sf.net/. Accessed Dec 19, 2017
Padmanabham M, Picheny M (2002) Large-vocabulary speech recognition algorithms. Computer 4:42–50
Provey D (2011) The Kaldi speech recognition toolkit. IEEE workshop on automatic speech recognition and understanding. US IEEE Signal Processing Society, Hawaii
Ronald C et al (1997) Survey of the state of art in human language technology. Studies in Natural Language Processing, Cambridge University Press
Savitha S, Eric B (2002) Is speech recognition becoming mainstream? Computer 4:38–41
http://www.w3.org/Voice/. Accessed Dec 19, 2017
Author information
Authors and Affiliations
Corresponding author
Exercises
Exercises
-
1.
Consider alphabet set \(\Sigma = \{a, b, c, d\}\). Create finite automata (recognizers) for following strings.
-
a.
All strings which start with letter a.
-
b.
All strings which end with letter d.
-
c.
All strings where every c is followed letter d.
-
d.
All strings which have odd number of c’s.
-
a.
-
2.
Answer followings in brief, giving suitable examples.
-
a.
What is the difference between phoneme and morpheme?
-
b.
What is the difference between language and dialect?
-
a.
-
3.
Write an equation to compute trigram probability.
-
4.
The text processing algorithms are usually written in Python, while the ASR algorithms, which produce the same text, are written in C/C++. Explain what could have been the reason behind this?
-
5.
What is the fundamental difference between the language model and acoustic model? Why are they the same so?
Rights and permissions
Copyright information
© 2020 Springer Nature India Private Limited
About this chapter
Cite this chapter
Chowdhary, K.R. (2020). Automatic Speech Recognition. In: Fundamentals of Artificial Intelligence. Springer, New Delhi. https://doi.org/10.1007/978-81-322-3972-7_20
Download citation
DOI: https://doi.org/10.1007/978-81-322-3972-7_20
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-3970-3
Online ISBN: 978-81-322-3972-7
eBook Packages: Computer ScienceComputer Science (R0)