From the first beginnings in the mid-1990s, availability of electronic text corpora in Slovenian, all with an Internet user interface, has grown to a level comparable to many European languages with a long history of quantitative linguistic research. There are two established corpora with 100 million running words, an academic one which is freely accessible and a commercial one, prepared by industrial and academic partners. The two are complemented by a sizeable collection of works of fiction, available for reading in a free virtual library and several specialized corpora, compiled for the needs of particular institutions. The majority of Slovenian newspapers are also accessible online, at least in the form of selected articles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer
About this chapter
Cite this chapter
Jakopin, P. (2007). On Text Corpora, Word Lengths, Andword Frequencies in Slovenian. In: Grzybek, P. (eds) Contributions to the Science of Text and Language. Text, Speech and Language Technology, vol 31. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-4068-9_6
Download citation
DOI: https://doi.org/10.1007/978-1-4020-4068-9_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-4067-2
Online ISBN: 978-1-4020-4068-9
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)