Language Resources and Evaluation

, Volume 47, Issue 4, pp 1031–1048

Compilation, transcription and usage of a reference speech corpus: the case of the Slovene corpus GOS

  • Darinka Verdonik
  • Iztok Kosem
  • Ana Zwitter Vitez
  • Simon Krek
  • Marko Stabej
Original Paper

DOI: 10.1007/s10579-013-9216-5

Cite this article as:
Verdonik, D., Kosem, I., Vitez, A.Z. et al. Lang Resources & Evaluation (2013) 47: 1031. doi:10.1007/s10579-013-9216-5

Abstract

In recent years, building reference speech corpora was an important part of the activities which provided the necessary linguistic infrastructure in many European countries, for languages with many speakers (e.g., French, German, Spanish, Italian) as well as for those with smaller numbers of speakers (e.g., Swedish, Dutch, Czech, Slovak). This paper describes the process of the creation of a reference speech corpus and its distribution to potential users, as it was done in the case of the Slovene corpus GOS. The corpus structure and fieldwork experiences with recording, labelling system, and two levels of transcription (pronunciation-based and standardized) are described, as well as the main characteristics of the corpus interface (web concordancer) and the availability of the original corpus files.

Keywords

Spoken language Discourse Recordings Transcription conventions Web concordancer 

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Darinka Verdonik
    • 1
  • Iztok Kosem
    • 2
  • Ana Zwitter Vitez
    • 2
  • Simon Krek
    • 3
  • Marko Stabej
    • 4
  1. 1.University of MariborMariborSlovenia
  2. 2.Trojina, Institute for Applied Slovene StudiesŠkofja LokaSlovenia
  3. 3.Amebis, d.o.o.KamnikSlovenia
  4. 4.University of LjubljanaLjubljanaSlovenia

Personalised recommendations