Abstract
The three basic group of speech signals: quasi-periodic, quasi-random, and quiescent are defined by the time period amplitude and complexity. The time-domain representation of the first two basic parameters has been a textbook affair since long. On the other hand, for complexity, spectral domain representation is so well studied that it also has acquired almost a robust textbook knowledge. Traditionally, phones are divided into vowels (which include also glides trills laterals) and consonants (plosives, affricates, and sibilants). The quasi-periodicity of vowels arises from the flapping of the mucosal cover introducing nonlinear dynamics. This generate random perturbation, e.g., in time period (jitter), in amplitude (shimmer) and in complexity (complexity perturbation. Five time-domain parameters have been introduced for the classification of a vowel sounds. The articulatory mechanisms for generating different Bangla consonants and the consequent signature for labeling them are discussed in necessary details. The developed algorithm is tested on an SCB database containing speech signals for 850 sentences spoken by 12 native Bangla informants of both the sexes, all in the age group of 20–50 years. A detailed labeling scheme with 94% recognition rate for labeling speech signal in five manner classes is described. These classes are: S (Sibilants), P (Unaspirated plosives), F (aspirated plosives), A (voiced plosives and affricates), L (laterals and Nasal murmurs), and V (vowels and glides). This labeling was used to introduce a partition in the Bangla pronunciation dictionary forming well-defined cohorts. The properties of these cohorts and their usefulness in ASR are discussed. It has been shown that it is possible to automatically generate expert systems for each cohort ultimately leading to a 95% recognition rate in ASR using only vowel recognition. A section is devoted on time-domain features for identification of the vowels. These features reflected a potential of almost 85% recognition in the all vowel situation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Das Mandal, S. (2007). Role of shape parameters in speech recognition: A study on standard colloquial Bengali (SCB). Ph.D. thesis, Jadavpur University.
Datta, A. K. (in press). Book on ESOLA. Springer.
Datta, A. K., Ganguly, N. R., & Ray, S. (1980). Recognition of unaspirated plosives: A statistical approach. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(1), 85–91.
Datta, A. K., & Sridhar, R. (1989). Organization and access procedure for a large lexicon. In Speech input/output assessment and databases (pp. 2183–2186). Noordwijkerhout, the Netherlands: ISCA archives.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Datta, A.K. (2018). Time-Domain Representation of Phones. In: Time Domain Representation of Speech Sounds. Springer, Singapore. https://doi.org/10.1007/978-981-13-2303-4_5
Download citation
DOI: https://doi.org/10.1007/978-981-13-2303-4_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2302-7
Online ISBN: 978-981-13-2303-4
eBook Packages: Computer ScienceComputer Science (R0)