Advertisement

Artificial Intelligence Review

, Volume 48, Issue 4, pp 499–527 | Cite as

Multilingual sentiment analysis: from formal to informal and scarce resource languages

  • Siaw Ling Lo
  • Erik Cambria
  • Raymond Chiong
  • David Cornforth
Article

Abstract

The ability to analyse online user-generated content related to sentiments (e.g., thoughts and opinions) on products or policies has become a de-facto skillset for many companies and organisations. Besides the challenge of understanding formal textual content, it is also necessary to take into consideration the informal and mixed linguistic nature of online social media languages, which are often coupled with localised slang as a way to express ‘true’ feelings. Due to the multilingual nature of social media data, analysis based on a single official language may carry the risk of not capturing the overall sentiment of online content. While efforts have been made to understand multilingual sentiment analysis based on a range of informal languages, no significant electronic resource has been built for these localised languages. This paper reviews the various current approaches and tools used for multilingual sentiment analysis, identifies challenges along this line of research, and provides several recommendations including a framework that is particularly applicable for dealing with scarce resource languages.

Keywords

Multilingual analysis Sentiment analysis Scarce resource languages Social media 

References

  1. Abdul-Mageed M, Diab MT, Korayem M (2011) Subjectivity and sentiment analysis of modern standard arabic. Proc Ann Meet Assoc Comput Ling Human Language Technol Short Papers 2:587–591Google Scholar
  2. Ahmad K, Cheng D, Almas Y (2006) Multi-lingual sentiment analysis of financial news streams. In: Proceedings of the international conference on grid in financeGoogle Scholar
  3. Ambati V, Vogel S, Carbonell JG (2010) Active learning and crowd-sourcing for machine translation. In: Proceedings of language resources and evaluation conference, vol. 1, p 2Google Scholar
  4. Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of language resources and evaluation conference, vol. 10, pp 2200–2204Google Scholar
  5. Bakliwal A, Arora P, Varma V (2012) Hindi subjective lexicon: a lexical resource for Hindi polarity classification. In: Proceedings of language resources and evaluation conference, pp 1189–1196Google Scholar
  6. Balahur A, Turchi M (2013) Improving sentiment analysis in Twitter using multilingual machine translated data. In: Proceedings of recent advances in natural language processing, pp 49–55Google Scholar
  7. Balahur A, Turchi M (2014) Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. Comput Speech Lang 28(1):56–75CrossRefGoogle Scholar
  8. Banea C, Mihalcea R, Wiebe J (2008) A bootstrapping method for building subjectivity lexicons for languages with scarce resources. In: Proceedings of language resources and evaluation conference, vol. 8, pp 2–764Google Scholar
  9. Barbosa L, Feng J (2010) Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 36–44Google Scholar
  10. Bautin M, Vijayarenu L, Skiena S (2008) International sentiment analysis for news and blogs. In: Proceedings of international conference on web and social mediaGoogle Scholar
  11. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATHGoogle Scholar
  12. Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In: Proceedings of the conference on empirical methods in natural language processing, pp 120–128Google Scholar
  13. Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. Proc Ann Meet Assoc Comput Ling 7:440–447Google Scholar
  14. Boiy E, Moens M-F (2009) A machine learning approach to sentiment analysis in multilingual Web texts. Inf Retr 12(5):526–558CrossRefGoogle Scholar
  15. Boudin F, Huet S, Torres-Moreno J-M, Torres-Moreno J (2010) A graph-based approach to cross-language multi-document summarization. Res J Comput Sci Comput Eng Appl Polibits 43:113–118Google Scholar
  16. Boyd-Graber J, Resnik P (2010) Holistic sentiment analysis across languages: multilingual supervised latent Dirichlet allocation. In: Proceedings of the conference on empirical methods in natural language processing, pp 45–55Google Scholar
  17. Cambria E, Olsher D, Rajagopal D (2014) SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. In: Proceedings of AAAI conference on artificial intelligence, pp 1515–1521Google Scholar
  18. Cambria E, Hussain A (2015) Sentic computing: a common-sense-based framework for concept-level sentiment analysis, vol 1. Springer, BerlinCrossRefGoogle Scholar
  19. Cambria E, Gastaldo P, Bisio F, Zunino R (2015a) An ELM-based model for affective analogical reasoning. Neurocomputing 149:443–455CrossRefGoogle Scholar
  20. Cambria E, Fu J, Bisio F, Poria S (2015b) AffectiveSpace 2: enabling affective intuition for concept-level sentiment analysis. In: Proceedings of AAAI conference on artificial intelligence, pp 508–514Google Scholar
  21. Cambria E, Rajagopal D, Kwok K, Sepulveda J (2015c) GECKA: game engine for commonsense knowledge acquisition. In: Proceedings of AAAI FLAIRS conference, pp 282–287Google Scholar
  22. Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107CrossRefGoogle Scholar
  23. Che W, Li Z, Liu T (2010) Ltp: a chinese language technology platform. In: Proceedings of the international conference on computational linguistics: demonstrations, pp 13–16Google Scholar
  24. Chowdhury S, Chowdhury W (2014) Performing sentiment analysis in Bangla microblog posts. In: Proceedings of international conference on informatics, electronics and vision, pp 1–6Google Scholar
  25. Constant N, Davis C, Potts C, Schwarz F (2009) The pragmatics of expressive content: evidence from large corpora. Sprache Datenverarb 33(1–2):5–21Google Scholar
  26. Cui A, Zhang M, Liu Y, Ma S (2011) ‘Emotion tokens: bridging the gap among multilingual twitter sentiment analysis. In: Information retrieval technology, Springer, Berlin, pp 238–249Google Scholar
  27. Davidov D, Tsur O, Rappoport A (2010) Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 241–249Google Scholar
  28. Denecke K (2008) Using sentiwordnet for multilingual sentiment analysis. In: Proceedings of international conference on data engineering workshops, pp 507–512Google Scholar
  29. Deng L, Hinton G, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 8599–8603Google Scholar
  30. Dumais ST, Furnas GW, Landauer TK, Deerwester S, Harshman R (1988) Using latent semantic analysis to improve access to textual information. In: Proceedings of the special interest group on computer–human interaction conference, pp 281–285Google Scholar
  31. Elming J, Hovy D, Plank B (2014) Robust cross-domain sentiment analysis for low-resource languages. In: Proceedings of annual meeting of association for computational linguistics, pp 2–7Google Scholar
  32. Esuli A, Sebastiani F (2006) Determining term subjectivity and term orientation for opinion mining. In: Proceedings of the conference of the European chapter of the association for computational linguistics, vol. 6, p 2006Google Scholar
  33. Ghani R, Jones R, Mladenić D (2001) Mining the web to create minority language corpora. In: Proceedings of the international conference on information and knowledge management, pp 279–286Google Scholar
  34. Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Proj Rep Stanf 1–12Google Scholar
  35. Godbole N, Srinivasaiah M, Skiena S (2007) Large-scale sentiment analysis for news and blogs. In: Proceedings of international conference on web and social media, vol. 7, p 21Google Scholar
  36. Hiroshi K, Tetsuya N, Hideo W (2004) Deeper sentiment analysis using machine translation technology. In: Proceedings of the international conference on computational linguistics, p 494Google Scholar
  37. Hu Y, Duan J, Chen X, Pei B, Lu R (2005) A new method for sentiment classification in text retrieval. In: Proceedings of international joint conference on natural language processing, pp 1–9Google Scholar
  38. IBM—WebSphere translation server for multiplatforms. http://www-03.ibm.com/software/products/en/translation-server. Accessed 28 Mar 2015
  39. Irvine A, Callison-Burch C (2013) Combining bilingual and comparable corpora for low resource machine translation. In: Proceedings of the eighth workshop on statistical machine translation, pp 262–270Google Scholar
  40. Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent twitter sentiment classification. Proc Ann Meet Assoc Comput Ling Hum Lang Technol 1:151–160Google Scholar
  41. Kanayama H, Nasukawa T (2006) Fully automatic lexicon expansion for domain-oriented sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing, pp 355–363Google Scholar
  42. Kann V, Rosell M (2005) Free construction of a free Swedish dictionary of synonyms. In: Proceedings of the nordic conference on computational linguistics, pp 105–110Google Scholar
  43. Kim S-M, Hovy E (2006) Identifying and analyzing judgment opinions. In: Proceedings of the conference of North American chapter of the association of computational linguistics, pp 200–207Google Scholar
  44. Kobayashi N, Inui K, Matsumoto Y, Tateishi K, Fukushima T (2005) Collecting evaluative expressions for opinion extraction. In: Proceedings of international conference on natural language processing, pp 596–605Google Scholar
  45. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. Proc Mach Trans Summit 5:79–86Google Scholar
  46. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the annual meeting on association for computational linguistics?: demonstrations, pp 177–180Google Scholar
  47. Kouloumpis E, Wilson T, Moore JD (2011) Twitter sentiment analysis: the good the bad and the omg!. Proc Int Conf Web Soc Media 11:538–541Google Scholar
  48. Leimgruber JR (2011) Singapore English. Lang Linguist Compass 5(1):47–62CrossRefGoogle Scholar
  49. Lewis DD (1998) Naive (Bayes) at forty: the independence assumption in information retrieval. In: Proceedings of European conference on machine learning, pp 4–15Google Scholar
  50. LingPipe Home. http://alias-i.com/lingpipe/index.html. Accessed 25 Mar 2015
  51. Lo SL, Cambria E, Chiong R, Cornforth D (2016a) A multilingual semi-supervised approach in deriving Singlish sentic patterns for polarity detection. Knowl Based Syst 105:236–247CrossRefGoogle Scholar
  52. Lo SL, Chiong R, Cornforth D, Bao Y (2016b) Topic detection in twitter via multilingual analysis. In: Applied informatics and technology innovation. Springer, Switzerland, pp 1–22Google Scholar
  53. Lu B, Tan C, Cardie C, Tsou BK (2011) Joint bilingual sentiment classification with unlabeled parallel corpora. Proc Ann Meet Assoc Comput Ling Hum Lang Technol 1:320–330Google Scholar
  54. Meng X, Wei F, Liu X, Zhou M, Xu G, Wang H (2012) Cross-lingual mixture model for sentiment classification. Proc Ann Meet Assoc Comput Ling Long Papers 1:572–581Google Scholar
  55. Mihalcea R, Banea C, Wiebe J (2007) Learning multilingual subjective language via cross-lingual projections. In: Proceedings of annual meeting of association for computational linguistics, vol. 45, p 976Google Scholar
  56. Miller GA (1990) Nouns in WordNet: a lexical inheritance system. Int J Lexicogr 3(4):245–264CrossRefGoogle Scholar
  57. Miller GA (1995) WordNet: a lexical database for English. Commun. ACM 38(11):39–41CrossRefGoogle Scholar
  58. Miller GA, Leacock C, Tengi R, Bunker RT (1993) A semantic concordance. In: Proceedings of the workshop on human language technology, pp 303–308Google Scholar
  59. Monson C, Llitjós AF, Aranovich R, Levin L, Brown R, Peterson E, Carbonell J, Lavie A (2006) Building NLP systems for two resource-scarce indigenous languages: mapudungun and Quechua. Strateg Dev Mach Transl Minor Lang, p 15Google Scholar
  60. Munteanu DS, Marcu D (2005) Improving machine translation performance by exploiting non-parallel corpora. Comput Linguist 31(4):477–504CrossRefGoogle Scholar
  61. Nakov P, Kozareva Z, Ritter A, Rosenthal S, Stoyanov V, Wilson T (2013) Semeval-2013 task 2: sentiment analysis in twitter. In: Proceedings of the international workshop on semantic evaluationGoogle Scholar
  62. NTCIR8 MOAT Xinhua and NYT News corpus. http://research.nii.ac.jp/ntcir/ntcir-ws8/permission/ntcir8xinhua-nyt-moat.html. Accessed 27 Mar 2015
  63. Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the annual meeting on association for computational linguistics, pp 440–447Google Scholar
  64. Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of language resources and evaluation conference, vol. 10, pp 1320–1326Google Scholar
  65. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135CrossRefGoogle Scholar
  66. Pan J, Xue G-R, Yu Y, Wang Y (2011) Cross-lingual sentiment classification via bi-view non-negative matrix tri-factorization. In: Advances in knowledge discovery and data mining, Springer, Berlin, pp 289–300Google Scholar
  67. Poria S, Cambria E, Winterstein G, Huang G-B (2014) Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl Based Syst 69:45–63CrossRefGoogle Scholar
  68. Poria S, Cambria E, Gelbukh A, Bisio F, Hussain A (2015) Sentiment data flow analysis by means of dynamic linguistic patterns. Comput Intell Mag IEEE 10(4):26–36CrossRefGoogle Scholar
  69. Povey D, Burget L, Agarwal M, Akyazi P, Kai F, Ghoshal A, Glembek O, Goel N, Karafiát M, Rastrow A (2011) The subspace Gaussian mixture model-A structured model for speech recognition. Comput Speech Lang 25(2):404–439CrossRefGoogle Scholar
  70. Prettenhofer P, Stein B (2011) Cross-lingual adaptation using structural correspondence learning. ACM Trans Intell Syst Technol 3(1):13CrossRefGoogle Scholar
  71. Qian Y, Povey D, Liu J (2011) State-level data borrowing for low-resource speech recognition based on subspace GMMs. In: Proceedings of annual conference of the international speech communication association, pp 553–560Google Scholar
  72. Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, AmsterdamGoogle Scholar
  73. Read J (2005) Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the association for computational linguistics student research workshop, pp 43–48Google Scholar
  74. Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the conference on empirical methods in natural language processing, pp 105–112Google Scholar
  75. Rosell M, Kann V (2010) Constructing a swedish general purpose polarity lexicon random walks in the people’s dictionary of synonyms. In: Proceedings of Swedish language technology conference, pp 19–20Google Scholar
  76. Savoy J, Dolamic L (2009) How effective is Google’s translation service in search? Commun ACM 52(10):139–143CrossRefGoogle Scholar
  77. Seki Y, Evans DK, Ku L-W, Chen H-H, Kando N, Lin C-Y (2007) Overview of opinion analysis pilot task at NTCIR-6. In: Proceedings of NTCIR-6 workshop meeting, pp 265–278Google Scholar
  78. Seki Y, Evans DK, Ku L-W, Sun L, Chen H-H, Kando N, Lin C-Y (2008) Overview of multilingual opinion analysis task at NTCIR-7. In: Proceedings of NTCIR-7 workshop meetingGoogle Scholar
  79. Silva MJ, Carvalho P, Costa C, Sarmento L (2010) Automatic expansion of a social judgment lexicon for sentiment analysis. Technical Report TR 1008 University of Lisbon Faculty of Sciences LASIGEGoogle Scholar
  80. Souza M, Vieira R (2012) Sentiment analysis on twitter data for portuguese language. In: Computational processing of the Portuguese language, Springer, Berlin, pp 241–247Google Scholar
  81. Souza M, Vieira R, Busetti D, Chishman R, Alves IM (2011) Construction of a portuguese opinion lexicon from multiple resources. In: Proceedings of the Brazilian symposium in information and human language technology, pp 59–66Google Scholar
  82. Su Q, Xiang K, Wang H, Sun B, Yu S (2006) Using pointwise mutual information to identify implicit features in customer reviews. In: Computer processing of oriental languages. Beyond the Orient, The Research Challenges Ahead, Springer, Berlin, pp 22–30Google Scholar
  83. Tan S, Zhang J (2008) An empirical study of sentiment analysis for chinese documents. Expert Syst Appl 34(4):2622–2629CrossRefGoogle Scholar
  84. Thomas S, Seltzer ML, Church K, Hermansky H (2013) Deep neural network features and semi-supervised training for low resource speech recognition. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 6704–6708Google Scholar
  85. Turney PD (2001) Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Lect. Notes Comput. Sci. 491–502,Google Scholar
  86. Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of annual meeting of the association of computational linguistics, pp 417–424Google Scholar
  87. Vapnik V (2000) The nature of statistical learning theory. Springer, BerlinCrossRefMATHGoogle Scholar
  88. Volkova S, Wilson T, Yarowsky D (2013) Exploring sentiment in social media: bootstrapping subjectivity clues from multilingual twitter streams. In: Proceedings of annual meeting of the association of computational linguistics, pp 505–510Google Scholar
  89. Wan X (2008) Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing, pp 553–561Google Scholar
  90. Wan X (2009) ‘Co-training for cross-lingual sentiment classification’, In: Proceedings of the joint conference of the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing, pp 235–243Google Scholar
  91. Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39(2–3):165–210CrossRefGoogle Scholar
  92. Wilson T, Hoffmann P, Somasundaran S, Kessler J, Wiebe J, Choi Y, Cardie C, Riloff E, Patwardhan S (2005a) OpinionFinder: a system for subjectivity analysis. In: Proceedings of conference on empirical methods in natural language processing, pp 34–35Google Scholar
  93. Wilson T, Wiebe J, Hoffmann P (2005b) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of conference on empirical methods in natural language processing, pp 347–354Google Scholar
  94. Xia Y, Li X, Cambria E, Hussain A (2014) A localization toolkit for SenticNet. In: Proceedings of IEEE international conference on data mining workshops, pp 403–408Google Scholar
  95. Xu R, Wong K-F, Xia Y (2007) Opinmine—opinion analysis system by CUHK for NTCIR-6 pilot task. In: Proceedings of the NTCIR-6 workshopGoogle Scholar
  96. Yao J, Wu G, Liu J, Zheng Y (2006) Using bilingual lexicon to judge sentiment orientation of Chinese words. In: Proceedings of IEEE international conference on computer and information technology, pp 38–38Google Scholar
  97. Zhang W, Johnson TJ, Seltzer T, Bichard SL (2009) The revolution will be networked: the influence of social networking sites on political attitudes and behavior. Soc Sci Comput Rev 28(1):75–92CrossRefGoogle Scholar
  98. Zhao J, Dong L, Wu J, Xu K (2012) Moodlens: an emoticon-based sentiment analysis system for chinese tweets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1528–1531Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • Siaw Ling Lo
    • 1
  • Erik Cambria
    • 2
  • Raymond Chiong
    • 1
  • David Cornforth
    • 1
  1. 1.School of Design, Communication and Information TechnologyThe University of NewcastleCallaghanAustralia
  2. 2.School of Computer Science and EngineeringNanyang Technological UniversitySingaporeSingapore

Personalised recommendations