Time-Domain Representation of Phones

Datta, Asoke Kumar

doi:10.1007/978-981-13-2303-4_5

Asoke Kumar Datta²

238 Accesses

Abstract

The three basic group of speech signals: quasi-periodic, quasi-random, and quiescent are defined by the time period amplitude and complexity. The time-domain representation of the first two basic parameters has been a textbook affair since long. On the other hand, for complexity, spectral domain representation is so well studied that it also has acquired almost a robust textbook knowledge. Traditionally, phones are divided into vowels (which include also glides trills laterals) and consonants (plosives, affricates, and sibilants). The quasi-periodicity of vowels arises from the flapping of the mucosal cover introducing nonlinear dynamics. This generate random perturbation, e.g., in time period (jitter), in amplitude (shimmer) and in complexity (complexity perturbation. Five time-domain parameters have been introduced for the classification of a vowel sounds. The articulatory mechanisms for generating different Bangla consonants and the consequent signature for labeling them are discussed in necessary details. The developed algorithm is tested on an SCB database containing speech signals for 850 sentences spoken by 12 native Bangla informants of both the sexes, all in the age group of 20–50 years. A detailed labeling scheme with 94% recognition rate for labeling speech signal in five manner classes is described. These classes are: S (Sibilants), P (Unaspirated plosives), F (aspirated plosives), A (voiced plosives and affricates), L (laterals and Nasal murmurs), and V (vowels and glides). This labeling was used to introduce a partition in the Bangla pronunciation dictionary forming well-defined cohorts. The properties of these cohorts and their usefulness in ASR are discussed. It has been shown that it is possible to automatically generate expert systems for each cohort ultimately leading to a 95% recognition rate in ASR using only vowel recognition. A section is devoted on time-domain features for identification of the vowels. These features reflected a potential of almost 85% recognition in the all vowel situation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Das Mandal, S. (2007). Role of shape parameters in speech recognition: A study on standard colloquial Bengali (SCB). Ph.D. thesis, Jadavpur University.
Google Scholar
Datta, A. K. (in press). Book on ESOLA. Springer.
Google Scholar
Datta, A. K., Ganguly, N. R., & Ray, S. (1980). Recognition of unaspirated plosives: A statistical approach. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(1), 85–91.
Article Google Scholar
Datta, A. K., & Sridhar, R. (1989). Organization and access procedure for a large lexicon. In Speech input/output assessment and databases (pp. 2183–2186). Noordwijkerhout, the Netherlands: ISCA archives.
Google Scholar

Download references

Author information

Authors and Affiliations

(emeritus) Indian Statistical Institute, Kolkata, West Bengal, India
Asoke Kumar Datta

Authors

Asoke Kumar Datta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Asoke Kumar Datta .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Datta, A.K. (2018). Time-Domain Representation of Phones. In: Time Domain Representation of Speech Sounds. Springer, Singapore. https://doi.org/10.1007/978-981-13-2303-4_5

Download citation

DOI: https://doi.org/10.1007/978-981-13-2303-4_5
Published: 04 November 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2302-7
Online ISBN: 978-981-13-2303-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics