Skip to main content
Log in

Development of Speech Technologies at Trunin-Donskoy’s School: From Sound Recognition to Natural Speech Recognition

  • SCIENTIFIC SCHOOLS OF THE FEDERAL RESEARCH CENTER “COMPUTER SCIENCE AND CONTROL” OF THE RUSSIAN ACADEMY OF SCIENCES, MOSCOW, THE RUSSIAN FEDERATION
  • V.N. Trunin-Donskoy’s Scientific School
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

The team of the speech recognition sector of the Computing Center of the Russian Academy of Sciences has participated in the development of speech technologies since their appearance in the Soviet Union in the 1960s. During this time, several generations of researchers have carried out research in this field. Accordingly, the approaches to solve problems of speech recognition have repeatedly undergone fundamental changes: the methods based on the recognition of individual sounds and parts of words using sets of rules obtained by expert means have given place to the methods of recognition and semantic interpretation of natural continuous speech based on mathematical models trained on large data sets. The review of the results begins with a description of the approach to building speech recognition systems, which was proposed by the head of the sector V.N. Trunin-Donskoy. The hardware-software approach has long been the “calling card” of the team and played a significant role in popularizing speech recognition and realizing the benefits of developing speech technologies in the Soviet Union. Solutions based on a combination of software and specialized hardware are now standard, but were new at that time. The development of the area, the complication of problems associated with the transition to discrete speech recognition with large dictionaries have resulted in the replacement of recognition methods based on the use of systems of expert rules with methods based on classical optimization algorithms, which were proposed and improved by the staff and graduate students of the team. Understanding the importance of the tasks of collecting, classifying, and annotating representative arrays of speech data was a feature of research at the Computing Center of the USSR Academy of Sciences. The work of the team members was significantly ahead of not only domestic but also modern foreign research in this area. A relevant area of applied work in the 1970s–1980s was the use of modern methods of digital speech processing, the creation of problem-oriented tools and language means that made it possible to simulate in real time the operation of components of speech recognition and signal processing systems, and in particular, interactive methods for filtering speech signals. The consistent increase in computing power and data corpus volumes has made it possible to move to the use of probabilistic speech modeling technologies, as well as the formulation and solution of natural speech recognition problems. Methods have been worked out and data corpuses and software systems have been developed for recognizing spontaneous speech, separating voices, determining gender, identifying key words in a speech stream, and classifying the subject of a speech message. Recent studies are related to the development of computationally efficient neural network methods and models for speech recognition and processing, which are intended for use in mobile devices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.

REFERENCES

  1. S. Andreev and V. Chuchupal, “Workstations for speech analysis,” in Proc. 12th Int. Congress of Phonetic Sciences, Aix-en-Provance, France, 1991, Vol.2, pp. 482–485.

  2. S. V. Andreev and V. Ya. Chuchupal, “SLIRE-2: An interactive system for studying speech signals on IBP PC,” in Reports on Software (Vychilitel’nyi Tsentr, Akad. Nauk SSSR, Moscow, 1990).

  3. Yu. G. Bondaros, K. A. Makovkin, and V. Ya. Chuchupal, “A system for recognizing commands of a pilot speech interface for integrated module avionics,” Vestn. Komp’yuternykh Inf. Tekhnol. 4, 2–13 (2007).

    Google Scholar 

  4. V. Ya. Chuchupal, “Dialog system for analyzing and synthesizing speech signals,” in Analysis of Speech Signals (Vychilitel’nyi Tsentr, Akad. Nauk SSSR, Moscow, 1984), pp. 3–11.

    Google Scholar 

  5. V. Chuchupal, K. Makovkin, and A. Chichagov, “Accurate acoustic modeling for Russian,” in Proc. of Int. Workshop Speech and Computer (SPECOM’2000) (St. Petersburg, 2000), pp. 71–74.

  6. V. Chuchupal, “Environment and speaker clustering for speech recognition,” in Proc. of Int. Workshop Speech and Computer (Mosk. Lingvist. Univ., Moscow, 2003), pp. 209–211.

  7. V. Chuchupal, K. Gorokhovsky, K. Makovkin, and A. Chichagov, “A study of the acoustic model choice for russian speech recognition,” in Proc. Int. Conf. Speech and Computer (St. Petersburg, 2002), pp. 53–56.

  8. V. Chuchupal and A. Korenchikov, “Improving automatic speech recognition accuracy by means of pronunciation modeling,” in Proc. Int. Conf. UkrObraz (Kiev, 2014), pp. 77–81.

  9. V. Chuchupal and A. Korenchikov, “Improving automatic speech recognition accuracy by means of pronunciation variation modeling,” Int. J. Inf. Content Prosessing 1, 390–396 (2014).

    Google Scholar 

  10. V. Chuchupal, “Implicit pronunciation variation model for automatic speech recognition,” Mash.noe Obuchenie Anal. Dannykh 2, 370–375 (2016). https://doi.org/10.21469/22233792.2.4.01

    Article  Google Scholar 

  11. V. Ya. Chuchupal, “Implicit pronunciation variation model for automatic speech recognition,” Rechevye Tekhnol., Nos. 1–2, 3–11 (2018).

  12. V. J. Chuchupal, “The neural network model for speech analysis,” in Proc.of Int. Workshop Neurocomputers and Attention (Nauchnyi Tsentr Biologicheskikh Issledovanii Akad. Nauk SSSR, 1989), pp. 90–93.

  13. V. Ya. Chuchupal, Studying the Algorithms of Phonetic Analysis Based on Neural Networks (Vychislitel’nyi Tsentr Akad. Nauk SSSR, Moscow, 1989).

    Google Scholar 

  14. V. J. Chuchupal, “Recognition of the phonemes based on network model,” in 13th Int. Congress on Acoustics (Beograd, 1989), Vol. 2, pp. 419–422.

  15. V. Ya. Chuchupal, “Acoustic and language modeling in end-to-end speech recognition systems,” Tsifrovaya Obrab. Signalov, No. 1, 34–42 (2020).

  16. V. Ya. Chuchupal, “Sparing transformer model for acoustic speech modeling,” in Proc. 20th All-Russian Conf. on Mathematical Methods of Image Recognition MMRO (Rossiiskaya Akademiya Nauk, Moscow, 2021), pp. 239–244.

  17. A. A. Desyatchikov, D. V. Kovkov, V. V. Lobantsov, K. A. Makovkin, I. A. Matveev, A. B. Murynin, and V. Ch. Chuchupal, “A system of algorithms for stable human recognition,” J. Comput. Syst. Sci. Int. 45, 958–969 (2006). https://doi.org/10.1134/s1064230706060116

    Article  Google Scholar 

  18. F. Elinek, Statistical Methods for Speech Recognition (Language, Speech and Communication), 4th ed. (1998).

    Google Scholar 

  19. L. A. Emel’yanova and A. A. Kol’tsova, Organisation of a Database for Problem-Oriented Dialog Systems (Moscow, 1980).

    Google Scholar 

  20. S. L. Goncharov and V. J. Chuchupal, “Interactive lab system for speech analysis,” in Proc. 11th Int. Congress of Phonetic Sciences (Tallinn, 1987), p. 63, Se 75.3.1.

  21. A. L. Gorlovskii, N. A. Lendyashov, A. N. Petrov, V. N. Turkin, and V. M. Yakubinskii, “Speech recognition system DTIS-332.03,” in Abstracts of Reports at the 13th All-Union School-Workshop on Automatic Recognition of Auditory Images (ARSO-13) (Inst. Matematiki Sib. Otd. Akad. Nauk SSSR, Novosibirsk, 1984), Vol. 2, pp. 95–96.

  22. F. E. Korkmazskii, “A module for analyzing speech signals PS7801,” in Automatic Recognition of Auditory Images: Abstracts of Reports at the 14th All-Union Workshop (ARSO-14) (Kaunasskii Politekh. Inst., Kaunas, 1986), pp. 67–68.

  23. V. Kouznetsov, V. Chuchupal, K. Makovkin, and A. Chichagov, “Design and implementation of a Russian telephone speech database,” in Proc. Int. Workshop Speech and Computer (SPECOM’99) (Moscow, 1999), pp. 179–181.

  24. V. Kouznetsov and V. Chuchupal, “Increasing trainabiliby of ASR system by means of top-down clustering procedure based on decision trees,” in Proc. Int. Conf. Speech and Computer SPECOM (St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, 2004), pp. 289–290.

  25. M. Kudinov and A. Romanenko, “A hybrid language model based on a recurrent neural network and probabilistic topic modeling,” Pattern Recognit. Image Anal. 26, 587–592 (2016). https://doi.org/10.1134/s1054661816030123

    Article  Google Scholar 

  26. M. S. Kudinov, “A language model on the basis of recurrent neural network and probabilistic thematic modeling,”in Models and Methods of Speech Recognition (Vychislit. Tsentr im. A.A. Dorodnitsyn, Ross. Akad. Nauk, Moscow, 2015), pp. 48–55.

  27. J. D. Markel and A. H. Gray, Linear Prediction of Speech, Communication and Cybernetics, Vol. 12 (Springer, 1976). https://doi.org/10.1007/978-3-642-66286-7

  28. A. T. Nguen, “Recognition of isolated and continuous tonal speech,” Dokl. Akad. Nauk Akad. Nauk SSSR 276, 819–820 (1984).

    Google Scholar 

  29. M. T. Nguyen and V. J. Chuchupal, “Word verification method for automatic speech recognition,” in Proc. 12th Int.Conf. Speech and Computer, SPECOM’2007 (2007), Vol. 1, pp. 152–156.

  30. M. T. Nguyen and V. J. Chuchupal, “Word confidence measure based on frame likelihood score,” Pattern Recognit. Image Anal. 18, 431–433 (2008). https://doi.org/10.1134/s1054661808030103

    Article  Google Scholar 

  31. N. K. Obzhelyan and V. N. Trunin-Donskoi, Machines That Speak and Hear, Ed. by Yu. I. Zhuravlev (Shtinitsa, Kishinev, 1987).

  32. N. K. Obzhelyan and V. N. Trunin-Donskoi, Speech Communication in Human-Computer Systems, Ed. by Yu. I. Zhuravlev (Shtinitsa, Kishinev, 1985).

  33. A. N. Petrov and V. N. Turkin, “A complex of module speech interfaces DIS-332,” in Abstracts of Reports of the 13th All-Union School-Workshop ARSO-13 (Novosibirsk, 1983), Vol. 1, pp. 25–26.

  34. V. S. Pyatkov and V. N. Trunin-Donskoi, “Word recognition in a stream of continuous speech,” in Problems of Constructing Systems of Speech Understanding (Nauka, Moscow, 1980).

    Google Scholar 

  35. L. R. Rabiner and R. V. Shafer, Digital Processing of Speech Signals (Nauka, Moscow, 1980).

    Google Scholar 

  36. L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE 77, 257–286 (1989). https://doi.org/10.1109/5.18626

    Article  Google Scholar 

  37. N. V. Somin, “Package of applied software for studying speech signals,” in Computer Analysis and Recognition of Speech Signals (Vychilitel’nyi Tsentr Akad. Nauk SSSR, Moscow, 1975).

    Google Scholar 

  38. N. V. Somin, “Algorithms for separating the base tone by spectral methods for a middle-class computer,” in Discrete Processing of Speech Signals (Vychilitel’nyi Tsentr Akad. Nauk SSSR, 1978).

    Google Scholar 

  39. V. N. Trunin-Donskoi and A. L. Pazhitnov, “A learning system for recognizing words operating with multiple announcers,” in Problems of Cybernetics: Analysis and Synthesis of Speech in Control Systems (Moscow, 1981), pp. 18–32.

    Google Scholar 

  40. V. N. Trunin-Donskoi, “Speech understanding systems,” in Linguistic Problems of Artificial Intelligence (Moscow, 1980), pp. 46–99.

  41. V. N. Turkin and V. N. Trunin-Donskoi, “An algorithm for recognizing key words in a stream of continuous speech,” in Abstracts of Reports of the 14th All-Union Workshop (ARSO-14) (Kaunasskii Politekh. Inst., Kaunas, 1986), p. 109.

  42. V. N. Turkin, “Recognition of words using gradient descent methods,” in Abstracts of Reports of the 13th All-Union School-Workshop on Automatic Recognition of Auditory Images (ARSO-14) (Institut Matematiki Sib. Otd. Akad. Nauk SSSR, Novosibirsk, 1984), Vol. 2, pp. 120–121.

  43. G. I. Tsemel, Recognition of Speech Signals (Nauka, Moscow, 1971).

    Google Scholar 

  44. T. K. Vintsyuk, “Recognition of verbal words by the dynamic programming method,” Kibernetika, No. 1, 81–88 (1968).

  45. V. M. Velichko and N. G. Zagoruiko, “Automatic recognition of a limited set of verbal commands,” in Computational Systems (Nauka, Novosibirsk, 1969), Vol. 36.

    Google Scholar 

  46. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Adv. Neural Inf. Process. Syst. 30 (2017). arXiv: 1706.03762v5

  47. G. Ya. Vysotskii, B. N. Rudnyi, and V. N. Trunin-Donskoi, “Studying the dynamics of noise component of consonants and automatic recognition of some loud sounds,” in Speech Control (Moscow, 1972), pp. 57–93.

  48. G. Ya. Vysotskii, B. N. Rudnyi, V. N. Trunin-Donskoi, and G. I. Tsemel’, “Experience of computing machine speech control,” Izv. Akad. Nauk SSSR Tekh. Kibern., No. 2, 134–143 (1970).

Download references

Funding

This work was supported by ongoing institutional funding. No additional grants to carry out or direct this particular research were obtained.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Ya. Chuchupal.

Ethics declarations

The authors of this work declare that they have no conflicts of interest.

Additional information

Vladimir Yakovlevich Chuchupal. Graduated from Lenin Moscow State Pedagogical Institute in 1976 with a degree in mathematics; in 1983, finished postgraduate studies at the Computing Center of the USSR Academy of Sciences with a degree in mathematics and software for system computers; in 1985, received the Candidate of Physical and Mathematical Sciences degree. Since 1984 he has been working at the Computing Center of the USSR Academy of Sciences (CC RAS). Main area of interest: recognition and processing of speech signals. Leading researcher at the Dorodnitsyn Computing Center of the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences.

Konstantin Aleksandrovich Makovkin. Graduated from Bauman Moscow State Technical University in 1990 with a degree in Automated Control Systems; since 1990 he has been working at the Computing Center of the USSR Academy of Sciences (CC RAS). Main area of interest: recognition and processing of speech signals; programmer at the Dorodnitsyn Computing Center of the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences.

Translated by L.A. Solovyova

Publisher’s Note.

Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chuchupal, V.Y., Makovkin, K.A. Development of Speech Technologies at Trunin-Donskoy’s School: From Sound Recognition to Natural Speech Recognition. Pattern Recognit. Image Anal. 33, 888–901 (2023). https://doi.org/10.1134/S1054661823040120

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1054661823040120

Keywords:

Navigation