Development of Speech Technologies at Trunin-Donskoy’s School: From Sound Recognition to Natural Speech Recognition

Chuchupal, V. Ya.; Makovkin, K. A.

doi:10.1134/S1054661823040120

Development of Speech Technologies at Trunin-Donskoy’s School: From Sound Recognition to Natural Speech Recognition

SCIENTIFIC SCHOOLS OF THE FEDERAL RESEARCH CENTER “COMPUTER SCIENCE AND CONTROL” OF THE RUSSIAN ACADEMY OF SCIENCES, MOSCOW, THE RUSSIAN FEDERATION
V.N. Trunin-Donskoy’s Scientific School
Published: 20 March 2024

Volume 33, pages 888–901, (2023)
Cite this article

Pattern Recognition and Image Analysis Aims and scope Submit manuscript

V. Ya. Chuchupal¹ &
K. A. Makovkin¹

16 Accesses
Explore all metrics

Abstract

The team of the speech recognition sector of the Computing Center of the Russian Academy of Sciences has participated in the development of speech technologies since their appearance in the Soviet Union in the 1960s. During this time, several generations of researchers have carried out research in this field. Accordingly, the approaches to solve problems of speech recognition have repeatedly undergone fundamental changes: the methods based on the recognition of individual sounds and parts of words using sets of rules obtained by expert means have given place to the methods of recognition and semantic interpretation of natural continuous speech based on mathematical models trained on large data sets. The review of the results begins with a description of the approach to building speech recognition systems, which was proposed by the head of the sector V.N. Trunin-Donskoy. The hardware-software approach has long been the “calling card” of the team and played a significant role in popularizing speech recognition and realizing the benefits of developing speech technologies in the Soviet Union. Solutions based on a combination of software and specialized hardware are now standard, but were new at that time. The development of the area, the complication of problems associated with the transition to discrete speech recognition with large dictionaries have resulted in the replacement of recognition methods based on the use of systems of expert rules with methods based on classical optimization algorithms, which were proposed and improved by the staff and graduate students of the team. Understanding the importance of the tasks of collecting, classifying, and annotating representative arrays of speech data was a feature of research at the Computing Center of the USSR Academy of Sciences. The work of the team members was significantly ahead of not only domestic but also modern foreign research in this area. A relevant area of applied work in the 1970s–1980s was the use of modern methods of digital speech processing, the creation of problem-oriented tools and language means that made it possible to simulate in real time the operation of components of speech recognition and signal processing systems, and in particular, interactive methods for filtering speech signals. The consistent increase in computing power and data corpus volumes has made it possible to move to the use of probabilistic speech modeling technologies, as well as the formulation and solution of natural speech recognition problems. Methods have been worked out and data corpuses and software systems have been developed for recognizing spontaneous speech, separating voices, determining gender, identifying key words in a speech stream, and classifying the subject of a speech message. Recent studies are related to the development of computationally efficient neural network methods and models for speech recognition and processing, which are intended for use in mobile devices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REFERENCES

S. Andreev and V. Chuchupal, “Workstations for speech analysis,” in Proc. 12th Int. Congress of Phonetic Sciences, Aix-en-Provance, France, 1991, Vol.2, pp. 482–485.
S. V. Andreev and V. Ya. Chuchupal, “SLIRE-2: An interactive system for studying speech signals on IBP PC,” in Reports on Software (Vychilitel’nyi Tsentr, Akad. Nauk SSSR, Moscow, 1990).
Yu. G. Bondaros, K. A. Makovkin, and V. Ya. Chuchupal, “A system for recognizing commands of a pilot speech interface for integrated module avionics,” Vestn. Komp’yuternykh Inf. Tekhnol. 4, 2–13 (2007).
Google Scholar
V. Ya. Chuchupal, “Dialog system for analyzing and synthesizing speech signals,” in Analysis of Speech Signals (Vychilitel’nyi Tsentr, Akad. Nauk SSSR, Moscow, 1984), pp. 3–11.
Google Scholar
V. Chuchupal, K. Makovkin, and A. Chichagov, “Accurate acoustic modeling for Russian,” in Proc. of Int. Workshop Speech and Computer (SPECOM’2000) (St. Petersburg, 2000), pp. 71–74.
V. Chuchupal, “Environment and speaker clustering for speech recognition,” in Proc. of Int. Workshop Speech and Computer (Mosk. Lingvist. Univ., Moscow, 2003), pp. 209–211.
V. Chuchupal, K. Gorokhovsky, K. Makovkin, and A. Chichagov, “A study of the acoustic model choice for russian speech recognition,” in Proc. Int. Conf. Speech and Computer (St. Petersburg, 2002), pp. 53–56.
V. Chuchupal and A. Korenchikov, “Improving automatic speech recognition accuracy by means of pronunciation modeling,” in Proc. Int. Conf. UkrObraz (Kiev, 2014), pp. 77–81.
V. Chuchupal and A. Korenchikov, “Improving automatic speech recognition accuracy by means of pronunciation variation modeling,” Int. J. Inf. Content Prosessing 1, 390–396 (2014).
Google Scholar
V. Chuchupal, “Implicit pronunciation variation model for automatic speech recognition,” Mash.noe Obuchenie Anal. Dannykh 2, 370–375 (2016). https://doi.org/10.21469/22233792.2.4.01
Article Google Scholar
V. Ya. Chuchupal, “Implicit pronunciation variation model for automatic speech recognition,” Rechevye Tekhnol., Nos. 1–2, 3–11 (2018).
V. J. Chuchupal, “The neural network model for speech analysis,” in Proc.of Int. Workshop Neurocomputers and Attention (Nauchnyi Tsentr Biologicheskikh Issledovanii Akad. Nauk SSSR, 1989), pp. 90–93.
V. Ya. Chuchupal, Studying the Algorithms of Phonetic Analysis Based on Neural Networks (Vychislitel’nyi Tsentr Akad. Nauk SSSR, Moscow, 1989).
Google Scholar
V. J. Chuchupal, “Recognition of the phonemes based on network model,” in 13th Int. Congress on Acoustics (Beograd, 1989), Vol. 2, pp. 419–422.
V. Ya. Chuchupal, “Acoustic and language modeling in end-to-end speech recognition systems,” Tsifrovaya Obrab. Signalov, No. 1, 34–42 (2020).
V. Ya. Chuchupal, “Sparing transformer model for acoustic speech modeling,” in Proc. 20th All-Russian Conf. on Mathematical Methods of Image Recognition MMRO (Rossiiskaya Akademiya Nauk, Moscow, 2021), pp. 239–244.
A. A. Desyatchikov, D. V. Kovkov, V. V. Lobantsov, K. A. Makovkin, I. A. Matveev, A. B. Murynin, and V. Ch. Chuchupal, “A system of algorithms for stable human recognition,” J. Comput. Syst. Sci. Int. 45, 958–969 (2006). https://doi.org/10.1134/s1064230706060116
Article Google Scholar
F. Elinek, Statistical Methods for Speech Recognition (Language, Speech and Communication), 4th ed. (1998).
Google Scholar
L. A. Emel’yanova and A. A. Kol’tsova, Organisation of a Database for Problem-Oriented Dialog Systems (Moscow, 1980).
Google Scholar
S. L. Goncharov and V. J. Chuchupal, “Interactive lab system for speech analysis,” in Proc. 11th Int. Congress of Phonetic Sciences (Tallinn, 1987), p. 63, Se 75.3.1.
A. L. Gorlovskii, N. A. Lendyashov, A. N. Petrov, V. N. Turkin, and V. M. Yakubinskii, “Speech recognition system DTIS-332.03,” in Abstracts of Reports at the 13th All-Union School-Workshop on Automatic Recognition of Auditory Images (ARSO-13) (Inst. Matematiki Sib. Otd. Akad. Nauk SSSR, Novosibirsk, 1984), Vol. 2, pp. 95–96.
F. E. Korkmazskii, “A module for analyzing speech signals PS7801,” in Automatic Recognition of Auditory Images: Abstracts of Reports at the 14th All-Union Workshop (ARSO-14) (Kaunasskii Politekh. Inst., Kaunas, 1986), pp. 67–68.
V. Kouznetsov, V. Chuchupal, K. Makovkin, and A. Chichagov, “Design and implementation of a Russian telephone speech database,” in Proc. Int. Workshop Speech and Computer (SPECOM’99) (Moscow, 1999), pp. 179–181.
V. Kouznetsov and V. Chuchupal, “Increasing trainabiliby of ASR system by means of top-down clustering procedure based on decision trees,” in Proc. Int. Conf. Speech and Computer SPECOM (St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, 2004), pp. 289–290.
M. Kudinov and A. Romanenko, “A hybrid language model based on a recurrent neural network and probabilistic topic modeling,” Pattern Recognit. Image Anal. 26, 587–592 (2016). https://doi.org/10.1134/s1054661816030123
Article Google Scholar
M. S. Kudinov, “A language model on the basis of recurrent neural network and probabilistic thematic modeling,”in Models and Methods of Speech Recognition (Vychislit. Tsentr im. A.A. Dorodnitsyn, Ross. Akad. Nauk, Moscow, 2015), pp. 48–55.
J. D. Markel and A. H. Gray, Linear Prediction of Speech, Communication and Cybernetics, Vol. 12 (Springer, 1976). https://doi.org/10.1007/978-3-642-66286-7
A. T. Nguen, “Recognition of isolated and continuous tonal speech,” Dokl. Akad. Nauk Akad. Nauk SSSR 276, 819–820 (1984).
Google Scholar
M. T. Nguyen and V. J. Chuchupal, “Word verification method for automatic speech recognition,” in Proc. 12th Int.Conf. Speech and Computer, SPECOM’2007 (2007), Vol. 1, pp. 152–156.
M. T. Nguyen and V. J. Chuchupal, “Word confidence measure based on frame likelihood score,” Pattern Recognit. Image Anal. 18, 431–433 (2008). https://doi.org/10.1134/s1054661808030103
Article Google Scholar
N. K. Obzhelyan and V. N. Trunin-Donskoi, Machines That Speak and Hear, Ed. by Yu. I. Zhuravlev (Shtinitsa, Kishinev, 1987).
N. K. Obzhelyan and V. N. Trunin-Donskoi, Speech Communication in Human-Computer Systems, Ed. by Yu. I. Zhuravlev (Shtinitsa, Kishinev, 1985).
A. N. Petrov and V. N. Turkin, “A complex of module speech interfaces DIS-332,” in Abstracts of Reports of the 13th All-Union School-Workshop ARSO-13 (Novosibirsk, 1983), Vol. 1, pp. 25–26.
V. S. Pyatkov and V. N. Trunin-Donskoi, “Word recognition in a stream of continuous speech,” in Problems of Constructing Systems of Speech Understanding (Nauka, Moscow, 1980).
Google Scholar
L. R. Rabiner and R. V. Shafer, Digital Processing of Speech Signals (Nauka, Moscow, 1980).
Google Scholar
L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE 77, 257–286 (1989). https://doi.org/10.1109/5.18626
Article Google Scholar
N. V. Somin, “Package of applied software for studying speech signals,” in Computer Analysis and Recognition of Speech Signals (Vychilitel’nyi Tsentr Akad. Nauk SSSR, Moscow, 1975).
Google Scholar
N. V. Somin, “Algorithms for separating the base tone by spectral methods for a middle-class computer,” in Discrete Processing of Speech Signals (Vychilitel’nyi Tsentr Akad. Nauk SSSR, 1978).
Google Scholar
V. N. Trunin-Donskoi and A. L. Pazhitnov, “A learning system for recognizing words operating with multiple announcers,” in Problems of Cybernetics: Analysis and Synthesis of Speech in Control Systems (Moscow, 1981), pp. 18–32.
Google Scholar
V. N. Trunin-Donskoi, “Speech understanding systems,” in Linguistic Problems of Artificial Intelligence (Moscow, 1980), pp. 46–99.
V. N. Turkin and V. N. Trunin-Donskoi, “An algorithm for recognizing key words in a stream of continuous speech,” in Abstracts of Reports of the 14th All-Union Workshop (ARSO-14) (Kaunasskii Politekh. Inst., Kaunas, 1986), p. 109.
V. N. Turkin, “Recognition of words using gradient descent methods,” in Abstracts of Reports of the 13th All-Union School-Workshop on Automatic Recognition of Auditory Images (ARSO-14) (Institut Matematiki Sib. Otd. Akad. Nauk SSSR, Novosibirsk, 1984), Vol. 2, pp. 120–121.
G. I. Tsemel, Recognition of Speech Signals (Nauka, Moscow, 1971).
Google Scholar
T. K. Vintsyuk, “Recognition of verbal words by the dynamic programming method,” Kibernetika, No. 1, 81–88 (1968).
V. M. Velichko and N. G. Zagoruiko, “Automatic recognition of a limited set of verbal commands,” in Computational Systems (Nauka, Novosibirsk, 1969), Vol. 36.
Google Scholar
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Adv. Neural Inf. Process. Syst. 30 (2017). arXiv: 1706.03762v5
G. Ya. Vysotskii, B. N. Rudnyi, and V. N. Trunin-Donskoi, “Studying the dynamics of noise component of consonants and automatic recognition of some loud sounds,” in Speech Control (Moscow, 1972), pp. 57–93.
G. Ya. Vysotskii, B. N. Rudnyi, V. N. Trunin-Donskoi, and G. I. Tsemel’, “Experience of computing machine speech control,” Izv. Akad. Nauk SSSR Tekh. Kibern., No. 2, 134–143 (1970).

Download references

Funding

This work was supported by ongoing institutional funding. No additional grants to carry out or direct this particular research were obtained.

Author information

Authors and Affiliations

Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 119333, Moscow, Russian Federation
V. Ya. Chuchupal & K. A. Makovkin

Authors

V. Ya. Chuchupal
View author publications
You can also search for this author in PubMed Google Scholar
K. A. Makovkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. Ya. Chuchupal.

Ethics declarations

The authors of this work declare that they have no conflicts of interest.

Additional information

Vladimir Yakovlevich Chuchupal. Graduated from Lenin Moscow State Pedagogical Institute in 1976 with a degree in mathematics; in 1983, finished postgraduate studies at the Computing Center of the USSR Academy of Sciences with a degree in mathematics and software for system computers; in 1985, received the Candidate of Physical and Mathematical Sciences degree. Since 1984 he has been working at the Computing Center of the USSR Academy of Sciences (CC RAS). Main area of interest: recognition and processing of speech signals. Leading researcher at the Dorodnitsyn Computing Center of the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences.

Konstantin Aleksandrovich Makovkin. Graduated from Bauman Moscow State Technical University in 1990 with a degree in Automated Control Systems; since 1990 he has been working at the Computing Center of the USSR Academy of Sciences (CC RAS). Main area of interest: recognition and processing of speech signals; programmer at the Dorodnitsyn Computing Center of the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences.

Translated by L.A. Solovyova

Publisher’s Note.

Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chuchupal, V.Y., Makovkin, K.A. Development of Speech Technologies at Trunin-Donskoy’s School: From Sound Recognition to Natural Speech Recognition. Pattern Recognit. Image Anal. 33, 888–901 (2023). https://doi.org/10.1134/S1054661823040120

Download citation

Received: 16 September 2022
Revised: 16 September 2022
Accepted: 16 September 2022
Published: 20 March 2024
Issue Date: December 2023
DOI: https://doi.org/10.1134/S1054661823040120

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of Speech Technologies at Trunin-Donskoy’s School: From Sound Recognition to Natural Speech Recognition

Abstract

Access this article

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher’s Note.

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Search

Navigation