In this paper, we present an ensemble modeling approach for sentiment analysis using machine learning algorithms. The main goal of sentiment analysis is to develop estimators that are able to identify the sentiment orientation (positive, negative, or neutral) of sentences found in any arbitrary source. The novel approach presented here relies on the analysis of the words found in sentences and the formation of large sets of heterogeneous models, i.e., binary as well as multi-class classification models that are calculated by various different machine learning methods; these models shall represent the relationship between the presence of given words (or combination of words) and sentiments. All models trained during the learning phase are applied during the test phase and the final sentiment assessment is annotated with a confidence value that specifies, how reliable the models are regarding the presented decision. In the empirical part of this paper, we show results achieved using a German corpus of Amazon recensions and a set of machine learning methods (decision trees and adaptive boosting, Gaussian processes, random forests, k-nearest neighbor classification, support vector machines and artificial neural networks with evolutionary feature and parameter optimization, and genetic programming). Using a heterogeneous model ensemble learning approach that combines multi-class classifiers as well as binary classifiers, the classification accuracy can be increased significantly and the ratio of totally wrongly classified samples (i.e., those that are assigned to the completely opposite sentiment orientation) can be decreased significantly.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst (TOIS) 26(3):12–34
Affenzeller M, Wagner S (2004) SASEGASA: a new generic parallel evolutionary algorithm for achieving highest quality results. J Heuristics Spec Issue New Adv Parallel Meta-Heuristics Complex Probl 10:239–263
Affenzeller M, Winkler S, Wagner S, Beham A (2009) Genetic algorithms and genetic programming—modern concepts and practical applications. Chapman & Hall/CRC, Boca Raton, London, New York
Alba E, Jourdan JGNL, Talbi EG (2007) Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. IEEE congress on evolutionary computation, pp 284–290
Banzhaf W, Lasarczyk C (2004) Genetic programming of an algorithmic chemistry. In: Riolo R, Worzel B (eds) Genetic programming theory and practice II. Kluwer Publishers, Boston, MA, pp 175–190
Boiy E, Hens P, Deschacht K, Moens MF (2007) Automatic sentiment analysis in on-line text. In: Proceedings ELPUB2007 conference on electronic publishing. Vienna and Austria
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Ding X, Liu B (2007) The utility of linguistic rules in opinion mining. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, special issue of the SIGIR Forum. ACM Press, New York
Ding X, Liu B, Yu PS (2008) A holistic lexicon-based approach to opinion mining. In: International conference on web search & data mining. ACM, New York
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley Interscience, New York
Eiben A, Smith J (2003) Introduction to evolutionary computation. Natural computing series. Springer, Berlin, Heidelberg
Faschang P, Petz G, Dorfer V, Kern T, Winkler SM (2011) An approach to mining consumer’s opinion on the web. In: Proceedings of international conference on computer aided systems theory (EUROCAST 2011), pp 37–39
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Grossman DA, Frieder O (2004) Information retrieval—algorithms and heuristics, 2nd edn. Springer, Berlin
Hersh WR (2003) Information retrieval—a health and biomedical perspective, 2nd edn. Springer, Berlin
Holland JH (1975) Adaption in natural and artifical systems. University of Michigan Press, Ann Arbor
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on Artificial intelligence, vol 2, pp 1137–1143. Morgan Kaufmann, San Francisco
Kotsiantis S (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–268
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. The MIT Press, Cambridge
Liu W, Principe JC, Haykin S (2010) Kernel adaptive filtering: a comprehensive introduction. Wiley, New York
Nelles O (2001) Nonlinear system identification. Springer, Berlin Heidelberg New York
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on association for, computational linguistics, p 271
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, pp 79–86
Petz G, Karpowicz M, Fuerschuss H, Auinger A, Winkler SM, Schaller S, Holzinger A (2012) On text preprocessing for opinion mining outside of laboratory environments. In: Proceedings of active media technology conference in Macau, China
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Francisco
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. The MIT Press, Cambridge
Schaller S, Winkler SM, Dorfer V, Petz G, Fuerschuss H (2012) A machine learning suite for opinion mining in web 2.0. In: Proceedings of IEEE APCAST’12 conference in Sydney, Australia, pp 10–13
Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of international conference on new methods in language processing, vol 12, pp 44–49
Segal M (2004) Machine learning benchmarks and random forest regression. Center for Bioinformatics and Molecular Biostatistics, San Francisco
Stein ML (1999) Interpolation of spatial data: some theory for kriging. Springer, Berlin
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307
Taboada M, Gillies MA, McFetridge P (2006) Sentiment classification techniques for tracking literary reputation. In: LREC workshop: towards computational models of literary analysis, pp 36–43
Vapnik V (1998) Statistical learning theory. Wiley, New York
Vinodhini G, Chandrasekaran R (2012) Sentiment analysis and opinion mining: a survey. Int J Adv Res Comput Sci Softw Eng 2(6). ISSN: 2277 128X
Wagner S (2009) Heuristic optimization software systems—modeling of heuristic optimization algorithms in the heuristiclab software environment. Ph.D. thesis, Johannes Kepler University Linz
Wagner S, Affenzeller M (2005) SexualGA: gender-specific selection for genetic algorithms. In: Callaos N, Lesso W, Hansen E (eds) Proceedings of the 9th world multi-conference on systemics, cybernetics and informatics (WMSCI) 2005. International Institute of Informatics and Systemics, Caracas, pp 76–81
Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis. In: Proceedings of the 14th ACM international conference on information and knowledge management, CIKM ’05. ACM, New York, pp 625–631. doi: 10.1145/1099554.1099714
Winkler S (2008) Evolutionary system identification—modern concepts and practical applications. Ph.D. thesis, Institute for Formal Models and Verification, Johannes Kepler University Linz
Winkler S, Affenzeller M, Kronberger G, Kommenda M, Wagner S, Jacak W, Stekel H (2010) Feature selection in the analysis of tumor marker data using evolutionary algorithms. In: Proceedings of the 7th international mediterranean and latin american modelling multiconference, pp 1–6
Winkler S, Jacak W, Affenzeller M, Stekel H (2010) Identification of cancer diagnosis estimation models using evolutionary algorithms—a case study for breast cancer, melanoma, and cancer in the respiratory system. In: Proceedings of the genetic and evolutionary computation conference GECCO 2010
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco
Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181(6):1138–1152
This work emerged from the research projects OPMIN 2.0 and SENOMWEB. SENOMWEB is funded by the European Regional Development fund (EFRE, Regio 13), OPMIN 2.0 is funded under the program COIN—Cooperation & Innovation; COIN is a joint initiative launched by the Austrian Federal Ministry for Transport, Innovation and Technology (BMVIT) and the Austrian Federal Ministry of Economy, Family and Youth (BMWFJ).
S. Winkler and S. Schaller are co-first authors.
Communicated by V. Loia.
About this article
Cite this article
Winkler, S., Schaller, S., Dorfer, V. et al. Data-based prediction of sentiments using heterogeneous model ensembles. Soft Comput 19, 3401–3412 (2015). https://doi.org/10.1007/s00500-014-1325-6
- Sentiment analysis
- Machine learning
- Heterogeneous model ensembles
- Evolutionary computation