Sentiment Classification of the Slovenian News Texts

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 403)


This paper deals with automatic two classdocument-level sentiment classification. We retrieved textual documents with political, business, economic and financial content from five Slovenian web media. By annotating a sample of 10,427 documents, we obtained a labelled corpus in the Slovenian language. Five classifiers were evaluated on this corpus: multinomial naïve Bayes, support vector machines, random forest, k-nearest neighbour and naïve Bayes, out of which the first three were used also in the assessment of the pre-processing options. Among the selected classifiers, multinomial naïve Bayes outperforms the naïve Bayes, k-nearest neighbour, random forest and support vector machines classifier in terms of classification accuracy. The best selection of pre-processing options achieves more than 95 % classification accuracy with Naïve Bayes Multinomial and more than 85 % with support vector machines and random forest classifier.


Sentiment analysis Document classification Machine learning Slovenian language Corpus 

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Faculty of Information StudiesLaboratory of Data TechnologiesNovo mestoSlovenia
  2. 2.Department of Knowledge TechnologiesJožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations