Generative Phoneme-Threephone Model for ASR
Aso-called generalised phoneme recognition problem for the two-level speech understanding system is being solved. It means that under free pho-neme order it is being found the N ≫ 1 best phoneme sequence recognition responses. The method is based on constructive description of diverse realisations of a speech signal. A stochastic generative automata grammar, which is assigned to synthesise the speech signal prototypes, serves for it. This grammar composes all possible speech signal prototypes with allowance for non-linear rate of pronouncing in general, and of the pronouncing of individual phonemes in particular, as well as co-articulation and reduction of sounds and non-linear variation of the speech signal intensity along the time axis. To make deeper the earlier fulfilled research, phoneme-threephones (PT) signal prototypes are introduced. Rules for joining of PT signal prototypes into sequences are evident: the output and input phonemes of joining PT have to coincide. The problem is being solved using new computational scheme of dynamic programming, based on (for substantial reduction in both memory and calculation requirements) concepts of potentially optimal index and phoneme response.
KeywordsSpeech Signal Automatic Speech Recognition Continuous Speech Generative Grammar Speech Understanding
Unable to display preview. Download preview PDF.
- 1.Vintsiuk T.K., Avtomatika 6, 40–49 (1972); 1, 63–72 (1973).Google Scholar