Automatic Speech Recognition

Chowdhary, K. R.

doi:10.1007/978-81-322-3972-7_20

K. R. Chowdhary²

12k Accesses
1 Citations

Abstract

There are basically two application modes for automatic speech recognition (ASR): using speech as spoken input or as knowledge source. Spoken input addresses applications like dictation systems and navigation (transactional) systems. Using speech as a knowledge source has applications like multimedia indexing systems. The chapter presents the stages of speech recognition process, resources of ASR, role and functions of speech engine—like, Julius speech recognition engine, voice-over web resources, ASR algorithms, language model and acoustic models—like HMM (hidden Markov models). Many open-source tools like—Kaldi speech recognition toolkit, CMU-Sphinx, HTK, and Deep speech tools’ introduction, and guidelines for their usages are presented. These tools have interfaces with high-level languages like C/C++ and Python. The is followed with chapter summary and set of exercises.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Hardcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
for example, the word “speech” can be also pronounced as “spee...ech”, repeating the sound of ’e’, which creates a self-loop.
2.
chanting of Aum is a common practice during meditation and yoga (IPA:/ɐwm/), where sound of IPA ’‘m’ is repeated.
3.
Lombard effect: Involuntary tendency of speakers to increase their vocal effort particularly when speaking in loud background to enhance the audibility of their voice. Due to the Lombard effect, not only the loudness increases, but also the other acoustic features such as pitch, rate, and duration of syllables. The Lombard effect also results in an increase in the signal-to-noise ratio of the speaker’s signal.

References

Hannun A et al (2014) Deep Speech: Scaling up end-to-end speech recognition. https://arxiv.org/abs/1412.5567. Accessed Dec 19, 2017
http://julius.osdn.jp/book/Julius-3.2-book-e.pdf. Accessed Dec 19, 2017
http://kaldi.sf.net/. Accessed Dec 19, 2017
Padmanabham M, Picheny M (2002) Large-vocabulary speech recognition algorithms. Computer 4:42–50
Article Google Scholar
Provey D (2011) The Kaldi speech recognition toolkit. IEEE workshop on automatic speech recognition and understanding. US IEEE Signal Processing Society, Hawaii
Google Scholar
Ronald C et al (1997) Survey of the state of art in human language technology. Studies in Natural Language Processing, Cambridge University Press
Google Scholar
Savitha S, Eric B (2002) Is speech recognition becoming mainstream? Computer 4:38–41
Google Scholar
http://www.w3.org/Voice/. Accessed Dec 19, 2017

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jodhpur Institute of Engineering and Technology, Jodhpur, Rajasthan, India
K. R. Chowdhary

Authors

K. R. Chowdhary
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. R. Chowdhary .

Exercises

1.
Consider alphabet set \(\Sigma = \{a, b, c, d\}\). Create finite automata (recognizers) for following strings.
1. a.
  All strings which start with letter a.
2. b.
  All strings which end with letter d.
3. c.
  All strings where every c is followed letter d.
4. d.
  All strings which have odd number of c’s.
2.
Answer followings in brief, giving suitable examples.
1. a.
  What is the difference between phoneme and morpheme?
2. b.
  What is the difference between language and dialect?
3.
Write an equation to compute trigram probability.
4.
The text processing algorithms are usually written in Python, while the ASR algorithms, which produce the same text, are written in C/C++. Explain what could have been the reason behind this?
5.
What is the fundamental difference between the language model and acoustic model? Why are they the same so?

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chowdhary, K.R. (2020). Automatic Speech Recognition. In: Fundamentals of Artificial Intelligence. Springer, New Delhi. https://doi.org/10.1007/978-81-322-3972-7_20

Download citation

DOI: https://doi.org/10.1007/978-81-322-3972-7_20
Published: 05 April 2020
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-3970-3
Online ISBN: 978-81-322-3972-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Speech Recognition

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Exercises

Exercises

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation