Advertisement

WFT – Context-Sensitive Speech Signal Representation

  • Jakub Gałka
  • Michał Kępiński
Part of the Advances in Soft Computing book series (AINSC, volume 35)

Abstract

Progress of automatic speech recognition systems’ (ASR) development is, inter alia, made by using signal representation sensitive for more and more sophisticated features. This paper is an overview of our investigation of the new context-sensitive speech signal’s representation, based on wavelet-Fourier transform (WFT), and proposal of it’s quality measures. The paper is divided into 5 sections, introducing as follows: phonetic-acoustic contextuality in speech, basics of WFT, WFT speech signal feature space, feature space quality measures and finally conclusion of our achievements.

Keywords

Speech Signal Packet Wavelet Automatic Speech Recognition Decomposition Tree Automatic Speech Recognition System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    1. Benitez C., Burget L. et al. (2001) Robust ASR front-end using spectral-based and discriminant features: experiments on the Aurora tasks. Eurospeech, AalborgGoogle Scholar
  2. 2.
    2. Bojar B. (1974) Elementy jêzykoznawstwa dla informatyków. PAN ODiIN, Warszawa.Google Scholar
  3. 3.
    3. Bölla K., Foldi E. (1987) A Phonetic Conspectus of Polish, The Articulatory and Acoustic Features of Polish Speech Sounds. Linguistic Institute of the Hungarian Academy of Sciences, BudapestGoogle Scholar
  4. 4.
    4. Chang S., Greenberg S.,Wester M. (2001) An Elitist Approach to Articulatory- Acoustic Feature Classification. Eurospeech, AalborgGoogle Scholar
  5. 5.
    5. Dukiewicz L., Piela R. (1962) Wyrazistoćæ I rozróźnialnoćæ gòsek w jêzyku polskim w zaleęnoćci od górnej granicy czêstotliwoćcifi Przeglą TelekomunikacyjnyGoogle Scholar
  6. 6.
    6. Dukiewicz L. (1995) Gramatyka Wspólczesnego Jêzyka Polskiego—Fonetyka. Instytut Jêzyka Polskiego PAN, KrakówGoogle Scholar
  7. 7.
    7. Galka J., Kêpiński M. (2004) Wavelet-Fourier Spectrum Parameterisation for Speech Signal Recognition. Proceedings of the Tenth National Conference on Application of Mathematics in Biology and Medicine. wiêty KrzyźGoogle Scholar
  8. 8.
    8. Gold B., Morgan N. (2000) Speech and Audio Signal Processing. John Wiley&Sons Inc.Google Scholar
  9. 9.
    9. Jassem W. (1966) The Distinctwe Features and Entropy of the Polish Phoneme System. Biuletyn PTJ XXIV Google Scholar
  10. 10.
    10. Jassem W. (1973) Podstawy fonetyki akustycznej. PWN, WarszawaGoogle Scholar
  11. 11.
    11. Kêpiński M. (2001) Ulepszona metodaćledzenia punktów charakterystycznych. II Krajowa Konferencja Metody i Systemy Komputerowe w badaniach naukowych i projektowaniu inźynierskim, KrakówGoogle Scholar
  12. 12.
    12. Kòsowski P. (2000) Usprawnienie procesu rozpoznawania mowy w oparciu o fonetykêi fonologiêjêzyka polskiego. Politechnika lś ka, GliwiceGoogle Scholar
  13. 13.
    13. Martens P. J. (Chairman) (2000) Continuous Speech Recognition over the Telephone, Electronics&Information Systems (ELIS). Final Report of COST Action 249, Ghent UniversityGoogle Scholar
  14. 14.
    14. Miêkisz M., Denenfeld J. (1975) Phonology and Distribution of Phonemes in Present-day English and Polish. Ossolineum, WrolcawGoogle Scholar
  15. 15.
    15. Rabiner L., Juang B. H. (1993) Fundamentals of Speech Recognition. Prentice- Hall, Englewood Cliffs, NJGoogle Scholar
  16. 16.
    16. Rolcawski B. (1976) Zarys fonologii, fonetyki, fonotaktyki i fonostatystyki wspólczesnego jêzyka polskiego. GdańskGoogle Scholar
  17. 17.
    17. SAMPA—A computer readable phonetic alphabet. http://www.phon.ucl.ac.uk/home/sampa/home.htmGoogle Scholar
  18. 18.
    18. Sharma S., Ellis D. et al. (2000) Feature extraction using non-linear transformation for robust speech recognition on the Aurora database. ICASSP, IstanbulGoogle Scholar
  19. 19.
    19. Shuangyu C. (2002) A Syllable, Articulatory-Feature, and Stress-Accent Model of Speech Recognition. Ph.D. Thesis, University of California, BerkeleyGoogle Scholar
  20. 20.
    20. Somervuo P. (2003) Experiments With Linear And Nonlinear Feature Transformations In HMM Based Phone Recognition. ICASSP, Hong KongGoogle Scholar
  21. 21.
    21. Somervuo P., Chen B., Zhu Q. (2003) Feature Transformations and Combinations for Improving ASR Performance. Eurospeech, GenevaGoogle Scholar
  22. 22.
    22. Tadeusiewicz R., Flasiński M. (2000) Rozpoznawanie obrazów. AGH, KrakówGoogle Scholar
  23. 23.
    23. Tadeusiewicz R. (1988) Sygnal mowy. Wydawnictwa Komunikacji i Łącznoćci, WarszawaGoogle Scholar
  24. 24.
    24. Tan B., Lang R. et al. (1994) Applying wavelet analysis to speech segmentation and classification. Proceedings of Spie the International Society for Optical Engineering, Orlando, 750–761Google Scholar
  25. 25.
    25. Tyagi V., McCowan ifi et al. (2003) Mel-cepstrum Modulation Spectrum (MCMS) Features for Robust ASR. Dalle Molle Institute for Perceptual Arti ficial Intelligence (IDIAP)Google Scholar
  26. 26.
    26. Xiong Z., Huang T. S. (2002) Boosting Speech/Non-Speech Classification Using Averaged Mel-frequency Cepstrum Coef-ficients Features. Proceedings of The Third IEEE Pacific-Rim Conference on MultimediaGoogle Scholar
  27. 27.
    27. Ziólko M., Kêpiński M., Galka J. (2003) Wavelet-Fourier Analysis of Speech Signal. Procedings of the Workshop on Multimedia Communications and Services, KielceGoogle Scholar
  28. 28.
    28. Ziólko M., Stêpień J. (1999) Does the Wavelet Transfer Function Exist? Proceedings of the ECMCS99, CD ROM, KrakówGoogle Scholar

Copyright information

© Springer 2006

Authors and Affiliations

  • Jakub Gałka
    • 1
  • Michał Kępiński
    • 2
  1. 1.Department of ElectronicsAGH University of Science and TechnologyKrakówPoland
  2. 2.Computer CenterAGH University of Science and TechnologyKrakówPoland

Personalised recommendations