Advertisement

Robustness of Speech Recognition System of Isolated Speech in Macedonian

  • Daniel Spasovski
  • Goran Peshanski
  • Gjorgji Madjarov
  • Dejan Gjorgjevikj
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 311)

Abstract

Over five decades the scientists attempt to design machine that clearly transcripts the spoken words. Even though satisfactory accuracy is achieved, machines cannot recognize every voice, in any environment, from any speaker. In this paper we tackle the problem of robustness of Automatic Speech Recognition for isolated Macedonian speech in noisy environments. The goal is to exceed the problem of background noise type changing. Five different types of noise were artificially added to the audio recordings and the models were trained and evaluated for each one. The worst case scenario for the speech recognition systems turned out to be the babble noise, which in the higher levels of noise reaches 81.10% error rate. It is shown that as the noise increases the error rate is also increased and the model trained with clean speech, gives considerably better results in lower noise levels.

Keywords

speech recognition robustness isolated speech signal-to-noise ratio background noise 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Schwarz, P.: Phoneme recognition based on long temporal context. PhD Thesis, Faculty of Information Technology. Department of Computer Graphics and Multimedia, Brno University of Technology 47–60 (2008)Google Scholar
  2. 2.
    Acero, A.: Acoustical and Environmental Robustness in Automatic Speech Recognition. Phd. Thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania (1990)Google Scholar
  3. 3.
    Sethy, S.N., Parthasarthy, S.: A split lexicon approach for improved recognition of spoken names. Integrated Media Systems Center, Department of Electrical Engineering-Systems, University of Southern California, Los Angeles, United States. AT&T Labs-Research, Florham Park. Speech Communications 48(9) (2006)Google Scholar
  4. 4.
    Liu, F., SternR., H., Huang, M., AceroA, X.: Efficient Cepstral Normalization for Robust Speech Recognition. Department of Electrical and Computer Engineering, Carnegie Mellon University (1992)Google Scholar
  5. 5.
    Kraljevski, I., Mihajlov, D., Gjorgjevik, D.: Hybrid HMM/ANN speech recognition system in Macedonian. Faculty of Electrical Engineering, St. Cirilus and Methodius, Skopje. Veterinary Institute, Skopje (2000); Краљевски, И., Михајлов, Д., Ѓорѓевиќ Д.: Хибриден HMM/ANN систем за препознавање на говор на македонски јазик. Електротехнички факултет, Универзитет Св. Кирил и Методиј, Скопје. Ветеринарен институт, Скопје (2000)Google Scholar
  6. 6.
    Gerazov, B., Ivanovski, Z., Labroska, V.: Modeling of the intonation structure of the Macedonian language on intonation phrases level. Faculty of Electrical engineering and Information technology, St. Cyrilus and Methodius University, Skopje. Intstitute of Macedonian Language Krste Misirkov, Skopje (2012)Google Scholar
  7. 7.
    Геразов, Б., Ивановски, З., Лаброска, В.: Моделирање на интонациската структура на македонскиотјазик на ниво на интонациски фрази. Институт за електроника, Факултет за електротехника и информацискитехнологии, Универзитет Св. Кирил и Методиј, Скопје. Институт за македонски јазик Крсте Мисирков, Скопје(2012)Google Scholar
  8. 8.
    Kumar, N., Andreou, A.G.: Heteroscedastic discriminant analysis and reduced rank hmms for improved speech recognition. Speech Communication 26, 283–297 (1998)CrossRefGoogle Scholar
  9. 9.
    Compernolle, V.D.: Noise Adaptation in a Hidden Markov Model Speech Recognition System. Computer Speech and Language (1989)Google Scholar
  10. 10.
    Hermansky, H., Sharma, S.: Temporal patterns (traps) in asr of noisy speech. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), Phoenix, Arizona, USA (1999)Google Scholar
  11. 11.
    Jyh-Shing, R.J.: Audio Signal Processing and Recognition. CS Dept., Tsing Hua University, Taiwan (1996), http://neural.cs.nthu.edu.tw/jang/books/audiosignalprocessing/index.asp (accessed 2013)
  12. 12.
    Moreno, P.J.: Speech Recognition in Noisy Environments. Department of Electrical and Computer Engineering. Carnegie Mellon University (1996)Google Scholar
  13. 13.
    Sphinx – 4. Speech Recognizer written in JavaTM, http://cmusphinx.sourceforge.net/sphinx4/ (accessed 2013)
  14. 14.
    CMUSphinx Wiki. Document for the CMU Sphinx speech recognition engines, http://cmusphinx.sourceforge.net/wiki/ (accessed 2013)

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Daniel Spasovski
    • 1
  • Goran Peshanski
    • 1
  • Gjorgji Madjarov
    • 2
  • Dejan Gjorgjevikj
    • 2
  1. 1.NetceteraSkopjeMacedonia
  2. 2.Faculty of Computer Science and EngineeringSkopjeMacedonia

Personalised recommendations