On Text Corpora, Word Lengths, Andword Frequencies in Slovenian
From the first beginnings in the mid-1990s, availability of electronic text corpora in Slovenian, all with an Internet user interface, has grown to a level comparable to many European languages with a long history of quantitative linguistic research. There are two established corpora with 100 million running words, an academic one which is freely accessible and a commercial one, prepared by industrial and academic partners. The two are complemented by a sizeable collection of works of fiction, available for reading in a free virtual library and several specialized corpora, compiled for the needs of particular institutions. The majority of Slovenian newspapers are also accessible online, at least in the form of selected articles.
KeywordsWord Frequency Word Length Word Form Total Frequency Text Corpus
Unable to display preview. Download preview PDF.