Rules for Automatic Grapheme-to-Allophone Transcription in Slovene

  • Jerneja Gros
  • France Mihelič
  • Simon Dobrišek
  • Tomaž Erjavec
  • Mario Žganec
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1902)


The domain of spoken language technologies ranges from speech input and output systems to complex understanding and generation systems, including multi-modal systems of widely differing complexity (such as automatic dictation machines) and multilingual systems (for example, automatic dialogue and translation systems). The definition of standards and evaluation methodologies for such systems involves the specification and development of highly specific spoken language corpus and lexicon resources, and measurement and evaluation tools [5]. This paper presents the MobiLuz spoken resources of the Slovene language, which will be made freely available for research purposes in speech technology and linguistics.


Speech Signal Speech Data Dialogue System Language Resource Speech Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Brants, T. (2000): TnT-A Statistical Part-of-Speech Tagger. Proceedings of the ANLP-NAACL, in print, Seattle.Google Scholar
  2. 2.
    Dobrišek S., Kačič Z., Gros J., Horvat B. and Mihelič R, (1996): An Initiative for Standardisation of Phonetic Transcription of Slovenian Speech, Proceedings of the Fifth Electro technical and Computer Science Conference ERK’96, pp. 247–250, Portorož, Slovenia, 1996.Google Scholar
  3. 3.
    Dimitrova, L., Erjavec, T. Ide, N. Kaalep, H.J., Petkevič, V. and Tufis, D. (1998): Multext-East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages. COLING-ACL’ 98 Proceedings, pp. 315–319.Google Scholar
  4. 4.
    Dobrisek S., Gros J., Mihelič F. and Pavešić N. (1998): Recording and Labelling of the GOPOLIS Slovenian Speech Database, Proceedings of the First International Conference on Language Resources and Evaluation, pp. 1089–1096. Granada, Spain.Google Scholar
  5. 5.
    EAGLES Handbook (1997): Handbook of Standards and Resources for Spoken Language Systems. Editors D. Gibbon, Roger Moore and Richard Winski. Berlin: Mouton de Gruyter.Google Scholar
  6. 6.
    Erjavec T. (1998): The MULTEXT-East Slovene Lexicon. Proceedings of the ERK’98 Conference, Portorož, Slovenia, pp. 189–192.Google Scholar
  7. 7.
    Gros J., Mihelič F. and Pavešić N.,(1995): Sentence Hypothesisation Using Ng-Gram Models, In Proceedings of the the Fourth European Conference On Speech Communication and Technology, pp. 1759–1762, Madrid, Spain.Google Scholar
  8. 8.
    Gros J., Ipšić I., Mihelič F. and Pavešić N. (1996): Segmentation and labelling of Slovenian diphone inventories, COLING’ 96, pp. 298–303, Copenhagen, Denmark.Google Scholar
  9. 9.
    Gros, J., Pavešić, N. and Mihelič, F. (1997): Text-to-speech synthesis: a complete system for the Slovenian language. Journal of Computing and Information Technology. 5(1). pp. 11–19.Google Scholar
  10. 10.
    Ide, N., Tufis, D. and Erjavec, T. (1998): Development and Assessment of Common Lexical Specifications for Six Central and Eastern European Languages. Proceedings of the First International Conference on Language Resources and Evaluation, LREC’ 98, Granada, pp. 233–240.Google Scholar
  11. 11.
    Ipšić I., Mihelič F., Dobrišek S., Gros J. and Pavešić N. (1998): An overview of the spoken queries in European languages: the Slovenian spoken dialogue system. Proceedings of the scientific conference Artificial Intelligence in Industry from Theory to Practice and 3rd SQEL Workshop on Multi-Lingual Information Retrieval Dialogues, High Tatras, Slovakia, pp. 431–438.Google Scholar
  12. 12.
    Kačič Z. and Horvat B. and Derlič R. (1994): Zasnova baze izgovorjav slovenskega jezika SNABI. Proceedings of the ERK’ 94. Portorož, Slovenia.Google Scholar
  13. 13.
    Kačič Z. and Horvat B. (1998): Izgradnja infrastrukture, potrebne za razvoj govorne tehnologije za slovenski jezik. Proceedings of the Conference on Language Technologies for the Slovene Language. Ljubljana. pp. 100–104.Google Scholar
  14. 14.
    Kaiser J. and Kačič Z. (1998): Development of Slovenian SpeechDat Database. Proceedings of the Workshop On Speech Database Development for Central and Eastern European Languages, Granada, Spain, 1998.Google Scholar
  15. 15.
    Sperberg-McQueen, C.M., and Burnard, L., eds. (1994): Guidelines for Electronic Text Encoding and Interchange. Chicago and Oxford.Google Scholar
  16. 16.
    Šuštaršič R., Komar S. and Petek B. (1998): Slovene IPA Symbols, Illustrations of the IPA.Google Scholar
  17. 17.
    Zemljak M., Kačič Z., Dobrišek S. and Gros J. (2000): A Machine-readable Phonetic Transcription of the Slovene Speech, in preparation.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Jerneja Gros
    • 1
  • France Mihelič
    • 1
  • Simon Dobrišek
    • 1
  • Tomaž Erjavec
    • 1
  • Mario Žganec
    • 1
  1. 1.Faculty 16 Electrical EngineeringUniversity of LjubljanaLjubljanaSlovenia

Personalised recommendations