Abstract
The team of the speech recognition sector of the Computing Center of the Russian Academy of Sciences has participated in the development of speech technologies since their appearance in the Soviet Union in the 1960s. During this time, several generations of researchers have carried out research in this field. Accordingly, the approaches to solve problems of speech recognition have repeatedly undergone fundamental changes: the methods based on the recognition of individual sounds and parts of words using sets of rules obtained by expert means have given place to the methods of recognition and semantic interpretation of natural continuous speech based on mathematical models trained on large data sets. The review of the results begins with a description of the approach to building speech recognition systems, which was proposed by the head of the sector V.N. Trunin-Donskoy. The hardware-software approach has long been the “calling card” of the team and played a significant role in popularizing speech recognition and realizing the benefits of developing speech technologies in the Soviet Union. Solutions based on a combination of software and specialized hardware are now standard, but were new at that time. The development of the area, the complication of problems associated with the transition to discrete speech recognition with large dictionaries have resulted in the replacement of recognition methods based on the use of systems of expert rules with methods based on classical optimization algorithms, which were proposed and improved by the staff and graduate students of the team. Understanding the importance of the tasks of collecting, classifying, and annotating representative arrays of speech data was a feature of research at the Computing Center of the USSR Academy of Sciences. The work of the team members was significantly ahead of not only domestic but also modern foreign research in this area. A relevant area of applied work in the 1970s–1980s was the use of modern methods of digital speech processing, the creation of problem-oriented tools and language means that made it possible to simulate in real time the operation of components of speech recognition and signal processing systems, and in particular, interactive methods for filtering speech signals. The consistent increase in computing power and data corpus volumes has made it possible to move to the use of probabilistic speech modeling technologies, as well as the formulation and solution of natural speech recognition problems. Methods have been worked out and data corpuses and software systems have been developed for recognizing spontaneous speech, separating voices, determining gender, identifying key words in a speech stream, and classifying the subject of a speech message. Recent studies are related to the development of computationally efficient neural network methods and models for speech recognition and processing, which are intended for use in mobile devices.
REFERENCES
S. Andreev and V. Chuchupal, “Workstations for speech analysis,” in Proc. 12th Int. Congress of Phonetic Sciences, Aix-en-Provance, France, 1991, Vol.2, pp. 482–485.
S. V. Andreev and V. Ya. Chuchupal, “SLIRE-2: An interactive system for studying speech signals on IBP PC,” in Reports on Software (Vychilitel’nyi Tsentr, Akad. Nauk SSSR, Moscow, 1990).
Yu. G. Bondaros, K. A. Makovkin, and V. Ya. Chuchupal, “A system for recognizing commands of a pilot speech interface for integrated module avionics,” Vestn. Komp’yuternykh Inf. Tekhnol. 4, 2–13 (2007).
V. Ya. Chuchupal, “Dialog system for analyzing and synthesizing speech signals,” in Analysis of Speech Signals (Vychilitel’nyi Tsentr, Akad. Nauk SSSR, Moscow, 1984), pp. 3–11.
V. Chuchupal, K. Makovkin, and A. Chichagov, “Accurate acoustic modeling for Russian,” in Proc. of Int. Workshop Speech and Computer (SPECOM’2000) (St. Petersburg, 2000), pp. 71–74.
V. Chuchupal, “Environment and speaker clustering for speech recognition,” in Proc. of Int. Workshop Speech and Computer (Mosk. Lingvist. Univ., Moscow, 2003), pp. 209–211.
V. Chuchupal, K. Gorokhovsky, K. Makovkin, and A. Chichagov, “A study of the acoustic model choice for russian speech recognition,” in Proc. Int. Conf. Speech and Computer (St. Petersburg, 2002), pp. 53–56.
V. Chuchupal and A. Korenchikov, “Improving automatic speech recognition accuracy by means of pronunciation modeling,” in Proc. Int. Conf. UkrObraz (Kiev, 2014), pp. 77–81.
V. Chuchupal and A. Korenchikov, “Improving automatic speech recognition accuracy by means of pronunciation variation modeling,” Int. J. Inf. Content Prosessing 1, 390–396 (2014).
V. Chuchupal, “Implicit pronunciation variation model for automatic speech recognition,” Mash.noe Obuchenie Anal. Dannykh 2, 370–375 (2016). https://doi.org/10.21469/22233792.2.4.01
V. Ya. Chuchupal, “Implicit pronunciation variation model for automatic speech recognition,” Rechevye Tekhnol., Nos. 1–2, 3–11 (2018).
V. J. Chuchupal, “The neural network model for speech analysis,” in Proc.of Int. Workshop Neurocomputers and Attention (Nauchnyi Tsentr Biologicheskikh Issledovanii Akad. Nauk SSSR, 1989), pp. 90–93.
V. Ya. Chuchupal, Studying the Algorithms of Phonetic Analysis Based on Neural Networks (Vychislitel’nyi Tsentr Akad. Nauk SSSR, Moscow, 1989).
V. J. Chuchupal, “Recognition of the phonemes based on network model,” in 13th Int. Congress on Acoustics (Beograd, 1989), Vol. 2, pp. 419–422.
V. Ya. Chuchupal, “Acoustic and language modeling in end-to-end speech recognition systems,” Tsifrovaya Obrab. Signalov, No. 1, 34–42 (2020).
V. Ya. Chuchupal, “Sparing transformer model for acoustic speech modeling,” in Proc. 20th All-Russian Conf. on Mathematical Methods of Image Recognition MMRO (Rossiiskaya Akademiya Nauk, Moscow, 2021), pp. 239–244.
A. A. Desyatchikov, D. V. Kovkov, V. V. Lobantsov, K. A. Makovkin, I. A. Matveev, A. B. Murynin, and V. Ch. Chuchupal, “A system of algorithms for stable human recognition,” J. Comput. Syst. Sci. Int. 45, 958–969 (2006). https://doi.org/10.1134/s1064230706060116
F. Elinek, Statistical Methods for Speech Recognition (Language, Speech and Communication), 4th ed. (1998).
L. A. Emel’yanova and A. A. Kol’tsova, Organisation of a Database for Problem-Oriented Dialog Systems (Moscow, 1980).
S. L. Goncharov and V. J. Chuchupal, “Interactive lab system for speech analysis,” in Proc. 11th Int. Congress of Phonetic Sciences (Tallinn, 1987), p. 63, Se 75.3.1.
A. L. Gorlovskii, N. A. Lendyashov, A. N. Petrov, V. N. Turkin, and V. M. Yakubinskii, “Speech recognition system DTIS-332.03,” in Abstracts of Reports at the 13th All-Union School-Workshop on Automatic Recognition of Auditory Images (ARSO-13) (Inst. Matematiki Sib. Otd. Akad. Nauk SSSR, Novosibirsk, 1984), Vol. 2, pp. 95–96.
F. E. Korkmazskii, “A module for analyzing speech signals PS7801,” in Automatic Recognition of Auditory Images: Abstracts of Reports at the 14th All-Union Workshop (ARSO-14) (Kaunasskii Politekh. Inst., Kaunas, 1986), pp. 67–68.
V. Kouznetsov, V. Chuchupal, K. Makovkin, and A. Chichagov, “Design and implementation of a Russian telephone speech database,” in Proc. Int. Workshop Speech and Computer (SPECOM’99) (Moscow, 1999), pp. 179–181.
V. Kouznetsov and V. Chuchupal, “Increasing trainabiliby of ASR system by means of top-down clustering procedure based on decision trees,” in Proc. Int. Conf. Speech and Computer SPECOM (St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, 2004), pp. 289–290.
M. Kudinov and A. Romanenko, “A hybrid language model based on a recurrent neural network and probabilistic topic modeling,” Pattern Recognit. Image Anal. 26, 587–592 (2016). https://doi.org/10.1134/s1054661816030123
M. S. Kudinov, “A language model on the basis of recurrent neural network and probabilistic thematic modeling,”in Models and Methods of Speech Recognition (Vychislit. Tsentr im. A.A. Dorodnitsyn, Ross. Akad. Nauk, Moscow, 2015), pp. 48–55.
J. D. Markel and A. H. Gray, Linear Prediction of Speech, Communication and Cybernetics, Vol. 12 (Springer, 1976). https://doi.org/10.1007/978-3-642-66286-7
A. T. Nguen, “Recognition of isolated and continuous tonal speech,” Dokl. Akad. Nauk Akad. Nauk SSSR 276, 819–820 (1984).
M. T. Nguyen and V. J. Chuchupal, “Word verification method for automatic speech recognition,” in Proc. 12th Int.Conf. Speech and Computer, SPECOM’2007 (2007), Vol. 1, pp. 152–156.
M. T. Nguyen and V. J. Chuchupal, “Word confidence measure based on frame likelihood score,” Pattern Recognit. Image Anal. 18, 431–433 (2008). https://doi.org/10.1134/s1054661808030103
N. K. Obzhelyan and V. N. Trunin-Donskoi, Machines That Speak and Hear, Ed. by Yu. I. Zhuravlev (Shtinitsa, Kishinev, 1987).
N. K. Obzhelyan and V. N. Trunin-Donskoi, Speech Communication in Human-Computer Systems, Ed. by Yu. I. Zhuravlev (Shtinitsa, Kishinev, 1985).
A. N. Petrov and V. N. Turkin, “A complex of module speech interfaces DIS-332,” in Abstracts of Reports of the 13th All-Union School-Workshop ARSO-13 (Novosibirsk, 1983), Vol. 1, pp. 25–26.
V. S. Pyatkov and V. N. Trunin-Donskoi, “Word recognition in a stream of continuous speech,” in Problems of Constructing Systems of Speech Understanding (Nauka, Moscow, 1980).
L. R. Rabiner and R. V. Shafer, Digital Processing of Speech Signals (Nauka, Moscow, 1980).
L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE 77, 257–286 (1989). https://doi.org/10.1109/5.18626
N. V. Somin, “Package of applied software for studying speech signals,” in Computer Analysis and Recognition of Speech Signals (Vychilitel’nyi Tsentr Akad. Nauk SSSR, Moscow, 1975).
N. V. Somin, “Algorithms for separating the base tone by spectral methods for a middle-class computer,” in Discrete Processing of Speech Signals (Vychilitel’nyi Tsentr Akad. Nauk SSSR, 1978).
V. N. Trunin-Donskoi and A. L. Pazhitnov, “A learning system for recognizing words operating with multiple announcers,” in Problems of Cybernetics: Analysis and Synthesis of Speech in Control Systems (Moscow, 1981), pp. 18–32.
V. N. Trunin-Donskoi, “Speech understanding systems,” in Linguistic Problems of Artificial Intelligence (Moscow, 1980), pp. 46–99.
V. N. Turkin and V. N. Trunin-Donskoi, “An algorithm for recognizing key words in a stream of continuous speech,” in Abstracts of Reports of the 14th All-Union Workshop (ARSO-14) (Kaunasskii Politekh. Inst., Kaunas, 1986), p. 109.
V. N. Turkin, “Recognition of words using gradient descent methods,” in Abstracts of Reports of the 13th All-Union School-Workshop on Automatic Recognition of Auditory Images (ARSO-14) (Institut Matematiki Sib. Otd. Akad. Nauk SSSR, Novosibirsk, 1984), Vol. 2, pp. 120–121.
G. I. Tsemel, Recognition of Speech Signals (Nauka, Moscow, 1971).
T. K. Vintsyuk, “Recognition of verbal words by the dynamic programming method,” Kibernetika, No. 1, 81–88 (1968).
V. M. Velichko and N. G. Zagoruiko, “Automatic recognition of a limited set of verbal commands,” in Computational Systems (Nauka, Novosibirsk, 1969), Vol. 36.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Adv. Neural Inf. Process. Syst. 30 (2017). arXiv: 1706.03762v5
G. Ya. Vysotskii, B. N. Rudnyi, and V. N. Trunin-Donskoi, “Studying the dynamics of noise component of consonants and automatic recognition of some loud sounds,” in Speech Control (Moscow, 1972), pp. 57–93.
G. Ya. Vysotskii, B. N. Rudnyi, V. N. Trunin-Donskoi, and G. I. Tsemel’, “Experience of computing machine speech control,” Izv. Akad. Nauk SSSR Tekh. Kibern., No. 2, 134–143 (1970).
Funding
This work was supported by ongoing institutional funding. No additional grants to carry out or direct this particular research were obtained.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors of this work declare that they have no conflicts of interest.
Additional information
Vladimir Yakovlevich Chuchupal. Graduated from Lenin Moscow State Pedagogical Institute in 1976 with a degree in mathematics; in 1983, finished postgraduate studies at the Computing Center of the USSR Academy of Sciences with a degree in mathematics and software for system computers; in 1985, received the Candidate of Physical and Mathematical Sciences degree. Since 1984 he has been working at the Computing Center of the USSR Academy of Sciences (CC RAS). Main area of interest: recognition and processing of speech signals. Leading researcher at the Dorodnitsyn Computing Center of the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences.
Konstantin Aleksandrovich Makovkin. Graduated from Bauman Moscow State Technical University in 1990 with a degree in Automated Control Systems; since 1990 he has been working at the Computing Center of the USSR Academy of Sciences (CC RAS). Main area of interest: recognition and processing of speech signals; programmer at the Dorodnitsyn Computing Center of the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences.
Translated by L.A. Solovyova
Publisher’s Note.
Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chuchupal, V.Y., Makovkin, K.A. Development of Speech Technologies at Trunin-Donskoy’s School: From Sound Recognition to Natural Speech Recognition. Pattern Recognit. Image Anal. 33, 888–901 (2023). https://doi.org/10.1134/S1054661823040120
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1054661823040120