Evaluating Industrial and Research Sentiment Analysis Engines on Multiple Sources

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10640)


Sentiment Analysis has a fundamental role in analyzing users opinions in all kinds of textual sources. Computing accurately sentiment expressed in huge amount of textual data is a key task largely required by the market, and nowadays industrial engines make available ready-to-use APIs for sentiment analysis-related tasks. However, building sentiment engines showing high accuracy on structurally different textual sources (e.g. reviews, tweets, blogs, etc.) is not a trivial task. Papers about cross-source evaluation lack of a comparison with industrial engines, which are instead specifically designed for dealing with multiple sources.

In this paper, we compare the results of research and industrial engines on an extensive experimental evaluation, considering the document-level polarity detection task performed on different textual sources: tweets, apps reviews and general products reviews, in both English and Italian. The experimental evaluation results help the reader to quantify the performance gap between industrial and research sentiment engines when both are tested on heterogeneous textual sources and on different languages (English/Italian). Finally, we present the results of our multi-source solution X2Check. Considering an overall cross-source average F-score on all of the results, X2Check shows a performance that is 9.1% and 5.1% higher than Google CNL, respectively on Italian and English benchmarks. Compared to the research engines, X2Check shows a F-score that is always higher than tools not specifically trained on the test set under evaluation; it is lower at most of 3.4% in Italian and 11.6% on English benchmarks, compared to the best research tools specifically trained on the target source.


Sentiment analysis Natural language processing Machine learning Experimental evaluation Industrial and research tools comparison Cross-domain sentiment classification 


  1. 1.
    Araújo, M., Gonçalves, P., Cha, M., Benevenuto, F.: iFeel: a system that compares and combines sentiment analysis methods. In: Proceedings of WWW 2014 Companion, pp. 75–78 (2014)Google Scholar
  2. 2.
    Araújo, M., dos Reis, J.C., Pereira, A.M., Benevenuto, F.: An evaluation of machine translation for multilingual sentence-level sentiment analysis. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, 4–8 April 2016, pp. 1140–1145 (2016)Google Scholar
  3. 3.
    Barbieri, F., Basile, V., Croce, D., Nissim, M., Novielli, N., Patti, V.: Overview of the evalita 2016 sentiment polarity classification task. In: Proceedings of CLiC-it 2016 & EVALITA 2016 (2016)Google Scholar
  4. 4.
    Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of ACL 2007 (2007)Google Scholar
  5. 5.
    Bollegala, D., Mu, T., Goulermas, J.Y.: Cross-domain sentiment classification using sentiment sensitive embeddings. IEEE Trans. Knowl. Data Eng. 28(2), 398–410 (2016)CrossRefGoogle Scholar
  6. 6.
    Di Rosa, E., Durante, A.: App2check: a machine learning-based system for sentiment analysis of app reviews in Italian language. In: Proceedings of the International Workshop on Social Media World Sensors (Sideways)- Held in conjunction with LREC 2016, pp. 8–11 (2016)Google Scholar
  7. 7.
    Dragoni, M., Recupero, D.R.: Challenge on fine-grained sentiment analysis within ESWC2016. In: Sack, H., Dietze, S., Tordai, A., Lange, C. (eds.) Semantic Web Challenges - Third SemWebEval Challenge at ESWC 2016, vol. 641, pp. 79–94. Springer, Heidelberg (2016)Google Scholar
  8. 8.
    Heredia, B., Khoshgoftaar, T.M., Prusa, J.D., Crawford, M.: Cross-domain sentiment analysis: an empirical investigation. In: Proceedings of IRI 2016, pp. 160–165 (2016)Google Scholar
  9. 9.
    Heredia, B., Khoshgoftaar, T.M., Prusa, J.D., Crawford, M.: Integrating multiple data sources to enhance sentiment prediction. In: Proceedings of IEEE CIC 2016, pp. 285–291 (2016)Google Scholar
  10. 10.
    Li, F., Wang, S., Liu, S., Zhang, M.: SUIT: a supervised user-item based topic model for sentiment analysis. In: Proceedings of AAAI 2014, pp. 1636–1642 (2014)Google Scholar
  11. 11.
    Liu, B.: Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, San Rafael (2012)Google Scholar
  12. 12.
    Mejova, Y., Srinivasan, P.: Crossing media streams with sentiment: domain adaptation in blogs, reviews and Twitter. In: Proceedings of ICWSM 2012 (2012)Google Scholar
  13. 13.
    Nakov, P., Ritter, A., Sara, R., Sebastiani, F., Stoyanov, V.: Semeval-2016 task 4: sentiment analysis in Twitter. In: Proceedings of SemEval 2016. Association for Computational Linguistics (2016)Google Scholar
  14. 14.
    Pan, S.J., Ni, X., Sun, J., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of WWW 2010, pp. 751–760 (2010)Google Scholar
  15. 15.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)CrossRefGoogle Scholar
  16. 16.
    Rosenthal, S., Farra, N., Nakov, P.: SemEval-2017 task 4: sentiment analysis in Twitter. In: Proceedings of SemEval 2017. Association for Computational Linguistics (2017)Google Scholar
  17. 17.
    Täckström, O., McDonald, R.: Discovering fine-grained sentiment with latent variable structured prediction models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 368–374. Springer, Heidelberg (2011). CrossRefGoogle Scholar
  18. 18.
    Täckström, O., McDonald, R.T.: Semi-supervised latent variable models for sentence-level sentiment analysis. In: Proceedings of HLT 2011, pp. 569–574 (2011)Google Scholar
  19. 19.
    Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment strength detection in short informal text. JASIST 61(12), 2544–2558 (2010)CrossRefGoogle Scholar
  20. 20.
    Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis. Comput. Linguist. 35(3), 399–433 (2009)CrossRefGoogle Scholar
  21. 21.
    Wu, F., Huang, Y.: Sentiment domain adaptation with multiple sources. In: Proceedings of ACL 2016 (2016)Google Scholar
  22. 22.
    Wu, F., Huang, Y., Yuan, Z.: Domain-specific sentiment classification via fusing sentiment knowledge from multiple sources. Inf. Fusion 35, 26–37 (2017)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Head of Artificial Intelligence at Finsa s.p.a.GenoaItaly
  2. 2.Research Scientist at Finsa s.p.a.GenoaItaly

Personalised recommendations