Abstract
An approach to the automated splitting of words into phonetically homogeneous parts is proposed under which the boundaries of the parts are defined as a result of solving a multiparameter optimization problem. The approach is assumed to ensure the maximum difference in the phonetic material between the adjacent parts and the maximum similarity within the parts. The accepted measure of similarity and difference is based on the correlation between the columns of the parametric portrait matrix of the word generated as a result of a time-spectral conversion of an audio recording of the word. To obtain a numerical solution of the problem, an algorithm is proposed which is a modification of a dynamic programming technique. The experimental results are presented with several words from the Russian language taken as examples to confirm the legitimacy of the assumptions made and viability of the algorithms proposed.
Similar content being viewed by others
References
Yu. G. Bondaros, K. A. Makovkin, and V. Ya. Chuchupal, “Recognition system of speech pilot interface commands for the integrated modular avionics,” Vestn. Komp. Inform. Tekhnol., No. 4, 2–13 (2007).
Yu. G. Bondaros, A. I. Ivanov, A. A. Shishov, and A. I. Kostyuk, “Speech signals operators research, critical for safety systems,” Vestn. Komp. Inform. Tekhnol., No. 11, 2–11 (2009).
Yu. G. Bondaros, A. S. Kolokolov, and A. I. Kostyuk, “Using of speech signals in a cabin of the aircraft,” Vestn. Komp. Inform. Tekhnol., No. 4, 2–10 (2008).
O. N. Korsun and A. Sh. Gabdrakhmanov, “Noise resistant algorithm of voice control of aircraft equipment,” Vestn. Komp. Inform. Tekhnol., No. 4, 3–7 (2012).
H.-G. Hirsch and D. Pearce, “The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions,” Autom. Speech Recognit., No. 1, 181–188 (2000).
A. Schmidt-Nielsen, E. Marsh, J. Tardeli, P. Gatewood, E. Kreamer, T. Tremain, C. Cieri, and J. Wright, Speech in Noisy Environments (SPINE) Evaluation Audio (Linguistic Data Consortium, 2000).
J. Benesty, M. M. Sondhi, and Y. Huang, Springer Handbook of Speech Processing (Springer Science, Business Media, Berlin, 2007).
L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals (Prentice-Hall, Englewood Cliffs, NJ, 1978; Ripol Klassik, Moscow, 1981).
O. N. Korsun, A. I. Ivanov, V. N. Filatov, I. V. Krasavin, and V. Ya. Chuchupal, “The technique for methodic for experimental research of overload influence on speech characteristics for avionics speech interface design,” Vestn. Komp. Inform. Tekhnol., No. 5, 3–7 (2012).
Yu. G. Bondaros, A. I. Ivanov, and A. A. Tishchenko, “Operator fatigue degree definition according his voice Lyapunov exponent,” Vestn. Komp. Inform. Tekhnol., No. 6, 22–30 (2010).
L. V. Savchenko, “An algorithm of oral speech phonemic recognition on the basis of the fuzzy phonetic codingdecoding method,” Inform.-Upravl. Sist., No. 1, 23–31 (2014).
L. Rabiner and B. Luang, Fundamentals of Speech Recognition (Prentice Hall, Englewood Cliffs, NJ, 1993).
A. Vorga and H. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: a database and an experimental to study the effect of additive noise on speech recognition systems,” Speech Commun., No. 3, 247–251 (1993).
D. S. Pallet, W. M. Fisher, and J. G. Fiscus, “Tools for the analysis of benchmark speech recognition tests,” IEEE Trans. Acoust. Speech, Signal Process., No. 1, 97–100 (1990).
O. N. Korsun, A. Sh. Gibdrakhmanov, E. I. Mikhailov, M. Z. Nakhaev, and A. K. Tulekbaeva, “Algorithm for an automatic recognition of the speech commands, invariant to languages,” Mekhatron., Avtomatiz., Upravl., No. 9, 599–604 (2015).
A. S. Kolokolov and I. A. Lyubinskii, “A comparative study of several approaches to short-term frequency analysis of a speech signal,” Autom. Remote Control 76, 1828 (2015).
A. S. Kolokolov, “Frequency domain signal processing in speech recognition,” Probl. Upravl., No. 3, 13–18 (2006).
F. D. Harris, “On the use of windows for harmonic analysis with the discrete Fourier transform,” Proc. IEEE 66, 51–83 (1978).
R. L. Stratonovich, Information Theory (Sov. Radio, Moscow, 1975) [in Russian].
E. S. Venttsel’, Operations Research. Problems, Principles, Methodology (Nauka, Moscow, 1980) [in Russian].
F. S. Cooper, P. C. Delattre, A. M. Liberman, J. M. Borst, and L. J. Gerstman, “Some experiments on the perception of synthetic speech sounds,” J. Acoust. Soc. Am., No. 6, 597–606 (1952).
S. E. Blumstein and K. N. Stevens, “Perceptual invariance and onset spectra for stop consonants in different vowel environments,” J. Acoust. Soc. Am., No. 2, 648–662 (1980).
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © O.N. Korsun, A.V. Poliev, 2016, published in Izvestiya Akademii Nauk, Teoriya i Sistemy Upravleniya, 2016, No. 4, pp. 115–124.
Rights and permissions
About this article
Cite this article
Korsun, O.N., Poliev, A.V. Automated definition of phonetically homogeneous sections of words in a natural language based on multiparameter optimization. J. Comput. Syst. Sci. Int. 55, 609–618 (2016). https://doi.org/10.1134/S1064230716040080
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1064230716040080