Erratum to: Cogn Comput DOI 10.1007/s12559-016-9415-7

Unfortunately, the original version of the article has been published with few errors in Abstract, Conclusion, Acknowledgment, and References.

Also, Dr. Erik Cambria is the co-corresponding author of the article.

The corrected versions of the sections are given below.

Abstract

With the advent of the internet, people actively express their opinions about products, services, events, political parties, etc., in social media, blogs, and website comments. The amount of research work on sentiment analysis is growing explosively. However, the majority of research efforts are devoted to English language data, while a great share of information is available in other languages. We present a state-of-the-art review on multilingual sentiment analysis. More importantly, we compare our own implementation of existing state-of-the-art approaches on common data. Precision observed in our experiments is typically lower than that reported by the original authors, which we attribute to lack of detail in the original presentation of those approaches. Thus, we compare the existing works by what they really offer to the reader, including whether they allow for accurate implementation and for reliable reproduction of the reported results.

Conclusion

We gave an overview of state-of-the-art multilingual sentiment analysis methods. We described data pre-processing, typical features, and the main resources used for multilingual sentiment analysis. Then, we discussed different approaches applied by their authors to English and other languages. We have classified these approaches into corpus-based, lexicon-based, and hybrid ones.

The real value of any sentiment analysis technique for the research community corresponds to the results that can be reproduced with it, not in the results its original authors reportedly obtained with it. To evaluate this real value, we have implemented eleven selected approaches as closely as we could, based on their descriptions in the original papers, and tested them on the same two corpora. In the majority of the cases, we obtained lower results than those reported by their corresponding authors. We attribute this mainly to the incompleteness of their descriptions in the original papers. In some cases, though, the methods were developed for a specific domain, so in such cases, comparison on our test corpora may not be fair. A lesson learnt was that for a method to be useful for the research community, authors should provide sufficient detail to allow its correct implementation by the reader.

According to our results, the approach proposed by Singh et al. [52] outperforms other approaches. However, this approach is computationally expensive and has been tested only on English language data. The least accurate approaches of those that we considered were the ones proposed by Zhu et al. [73], Habernal et al. [23], and Mizumoto et al. [34].

The main problem of multilingual sentiment analysis is the lack of lexical resources [18]. In our future work, we are planning to develop a multilingual corpus, which will include Persian, Arabic, Turkish, and English data, and compare a range of state-of-the-art methods.