Polish Speech Dictation System as an Application of Voice Interfaces

  • Grażyna Demenko
  • Robert Cecko
  • Marcin Szymański
  • Mariusz Owsianny
  • Piotr Francuzik
  • Marek Lange
Part of the Communications in Computer and Information Science book series (CCIS, volume 287)

Abstract

This paper presents the results of the project realized at PSNC and supported by The Polish Ministry of Science and Higher Education – “Integrated system of automatic speech-to-text conversion based on linguistic modeling designed in the environment of the analysis and legal documentation workflow for the needs of homeland security”, aiming at developing a Polish speech dictation (or Large Vocabulary Continuous Speech Recognition, LVCSR) system designed with the use of a phonetically controlled large vocabulary speech corpus and a large text corpora. The functions of the resulting system are outlined, the software architecture is presented briefly, then the example applications are demonstrated and the recognition results are discussed.

Keywords

speech recognition dictation voice interfaces 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. of the IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  2. 2.
    Young, S., et al.: The HTK Book (for HTK Version 3.2), Cambridge University Engineering Department (2002)Google Scholar
  3. 3.
    Klessa, K., Demenko, G.: Structure and Annotation of Polish LVCSR Speech Database. In: Proc. of Interspeech, Brighton UK, pp. 1815–1818 (2009)Google Scholar
  4. 4.
    Szymański, M., Klessa, K., Lange, M., Rapp, B., Grocholewski, S., Demenko, G.: Development of acoustic models for the needs of a speech recognition system using large lexical databases. Best Practices - Nauka w obliczu społeczeństwa Cyfrowego, Poznań (2010) Google Scholar
  5. 5.
    Demenko, G., Szymański, M., Cecko, R., Lange, M., Klessa, K., Owsianny, M.: Development of Large Vocabulary Continuous Speech Recognition using phonetically structured speech corpus. In: Proc. Intl. Congress of Phonetic Sciences, Hong Kong (2011)Google Scholar
  6. 6.
    Kneser, R., Ney, H.: Improved backing-off for M-gram language modeling. In: Proc. ICASSP, Detroit, vol. 1, pp. 181–184 (1995)Google Scholar
  7. 7.
    Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: Proc. Intl. Conf. Spoken Language Processing, Denver (2001)Google Scholar
  8. 8.
    Sclite tool kit on-line documentation, http://www.itl.nist.gov/iad/mig/tools/
  9. 9.
    Leggetter, C.J., Woodland, P.: Speaker Adaptation of Continuous Density HMMs Using Multivariate Linear Regression. In: Proceedings of ICSLP 1994, Yokohama, Japan (1994)Google Scholar
  10. 10.
    Moore, R.K.: A comparison of the data requirements of automatic speech recognition systems and human listeners. In: Proc. Eurospeech, Geneva (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Grażyna Demenko
    • 1
    • 2
  • Robert Cecko
    • 1
  • Marcin Szymański
    • 1
  • Mariusz Owsianny
    • 1
  • Piotr Francuzik
    • 1
  • Marek Lange
    • 1
  1. 1.Poznań Supercomputing and Networking CenterPolish Academy of SciencesPoznańPoland
  2. 2.The Institute of LinguisticsAdam Mickiewicz UniversityPoznańPoland

Personalised recommendations