Abstract
Sentiment analysis has empowered researchers and analysts to extract opinions of people regarding various products, services, events and other entities. This has been made possible due to an astronomical rise in the amount of text data being made available on the Internet, not only in English but also in many regional languages around the world as well, along with the recent advancements in the field of machine learning and deep learning. It has been observed that deep learning models produce the state-of-the-art prediction results without the need for domain expertise or handcrafted feature engineering, unlike traditional machine learning-based algorithms. In this chapter, we wish to focus on sentiment analysis of various low resource languages having limited sentiment analysis resources such as annotated datasets, word embeddings and sentiment lexicons, along with English. Techniques to refine word embeddings for sentiment analysis and improve word embedding coverage in low resource languages are also covered. Finally, we discuss the major challenges involved in multilingual sentiment analysis and explain novel deep learning-based solutions to overcome them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pang, B., and L. Lee. 2008. Opinion Mining and Sentiment Analysis. Hanover, MA: Now.
Hussein, D.M.E.-D.M. 2016. A survey on sentiment analysis challenges. Journal of King Saud University-Engineering Sciences. https://doi.org/10.1016/j.jksues.2016.04.002.
Farooq, U., H. Mansoor, A. Nongaillard, Y. Ouzrout, and M.A. Qadir. 2016. Negation handling in sentiment analysis at sentence level. JCP 12: 470–478.
Xiang, B., and L. Zhou 2014. Improving twitter sentiment analysis with topic-based mixture modeling and semi-supervised training. In ACL.
Ott, M., Y. Choi, C. Cardie, and J. T. Hancock. 2011. Finding deceptive opinion spam by any stretch of the imagination. In ACL.
Li, F., M. Huang, Y. Yang, and X. Zhu. 2011. Learning to identify review spam. In IJCAI.
Flekova, L., D. Preotiuc-Pietro, and E. Ruppert. 2015. Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words. In WASSA@EMNLP.
Felbo, B., A. Mislove, A. Søgaard, I. Rahwan, and S. Lehmann. 2017. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In EMNLP.
Maynard, D., and M.A. Greenwood. 2014. Who cares about sarcastic tweets?. LREC: Investigating the impact of sarcasm on sentiment analysis.
Arora, P. 2013. Sentiment Analysis For Hindi Language (MS thesis, International Institute of Information Technology Hyderabad, 2013). Hyderabad: International Institute of Information Technology Hyderabad.
El-Masri, M., N. Altrabsheh, and H. Mansour. 2017. Successes and challenges of Arabic sentiment analysis research: a literature review. Social Network Analysis and Mining 7: 1–22.
LeCun, Y., Y. Bengio, and G.E. Hinton. 2015. Deep learning. Nature 521: 436–444.
Krizhevsky, A., I. Sutskever, and G.E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. Communication of the ACM 60: 84–90.
Graves, A., A. Mohamed, and G.E. Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 6645–6649.
Xu, K., J. Ba, R. Kiros, K. Cho, A.C. Courville, R.R. Salakhutdinov, R.S. Zemel, and Y. Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In ICML.
Bahdanau, D., K. Cho, and Y. Bengio. 2015. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473.
Karpathy, A., G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In 2014IEEE Conference on Computer Vision and Pattern Recognition, 1725–1732.
Deng, L., and D. Yu. 2014. Deep learning: Methods and applications. Hanover, MA: Now.
Bakliwal, A. 2013. Fine-Grained Opinion Mining from Different Genre of Social Media Content (MS thesis, International Institute of Information Technology Hyderabad, 2013). Hyderabad: International Institute of Information Technology Hyderabad.
Gonçalves, P., M. Araújo, F. Benevenuto, and M. Cha. 2013. Comparing and combining sentiment analysis methods. In COSN.
Rish, I. 2001. An empirical study of the naive Bayes classifier.
Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. In ECML.
Altowayan, A.A., and L. Tao. 2016. Word embeddings for Arabic sentiment analysis. In 2016 IEEE International Conference on Big Data (Big Data), 3820–3825.
Landauer, T.K., P.W. Foltz, and D. Laham. 1998. Introduction to latent semantic analysis. Discourse Processes 25: 259–284.
Mikolov, T., K. Chen, G.S. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.
Mikolov, T., I. Sutskever, K. Chen, G.S. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS.
Pennington, J., R. Socher, and C.D. Manning. 2014. Glove: Global vectors for word representation. In EMNLP.
Joulin, A., E. Grave, P. Bojanowski, and T. Mikolov. 2017. Bag of tricks for efficient text classification. In EACL.
Bojanowski, P., E. Grave, A. Joulin, and T. Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5: 135–146.
Yu, L., J. Wang, K.R. Lai, and X. Zhang. 2017. Refining word embeddings for sentiment analysis. In EMNLP.
Ye, Z., F. Li, and T. Baldwin. 2018. Encoding sentiment information into word vectors for sentiment analysis. In COLING.
Çano, E., and M. Morisio. 2019. Word embeddings for sentiment analysis: A comprehensive empirical survey. CoRR, abs/1902.00753.
Rezaeinia, S.M., A. Ghodsi, and R. Rahmani. 2017. Improving the accuracy of pre-trained word embeddings for sentiment analysis. CoRR, abs/1711.08609.
Yücesoy, V., and A. Koç. 2019. Co-occurrence weight selection in generation of word embeddings for low resource languages. TALLIP.
Akhtar, M.S., P. Sawant, S. Sen, A. Ekbal, and P. Bhattacharyya. 2018. Improving word embedding coverage in less-resourced languages through multi-linguality and cross-linguality: A case study with aspect-based sentiment analysis. ACM Transactions on Asian & Low-Resource Language Information Processing 18: 15:1–15:22.
Barnes, J., R. Klinger, and S.S. Walde. 2018. Bilingual sentiment embeddings: Joint projection of sentiment across languages. In ACL.
Ruder, S., I. Vuli’c, and A. Sogaard. 2017. A survey of cross-lingual word embedding models.
Akhtar, S.S., M. Shrivastava, A. Gupta, and A. Vajpayee. 2018. Robust Representation Learning for Low Resource Languages.
Duong, L., H. Kanayama, T. Ma, S. Bird, and T. Cohn. 2016. Learning crosslingual word embeddings without bilingual corpora. In EMNLP.
Jiang, C., H. Yu, C. Hsieh, and K. Chang. 2018. Learning word embeddings for low-resource languages by PU learning. In NAACL-HLT.
LeCun, Y., L. Bottou, and P. Haffner. 2001. Gradient-based learning applied to document recognition.
Goodfellow, I., Y. Bengio, and A. Courville. 2017. Deep Learning. Cambridge, MA: The MIT Press.
Santos, C.N., and M.A. Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In COLING.
Zhang, X., J.J. Zhao, and Y. LeCun. 2015. Character-level convolutional networks for text classification. In NIPS.
Kim, Y. 2014. Convolutional neural networks for sentence classification. In EMNLP.
Severyn, A., and A. Moschitti. 2015. Twitter sentiment analysis with deep convolutional neural networks. In SIGIR.
Sahni, T., C. Chandak, N.R. Chedeti, and M. Singh. 2017. Efficient Twitter sentiment classification using subjective distant supervision. In 2017 9th International Conference on Communication Systems and Networks (COMSNETS), 548–553.
Wang, X., W. Jiang, and Z. Luo. 2016. Combination of convolutional and recurrent neural network for sentiment analysis of short texts. In COLING.
Ruder, S., P. Ghaffari, and J.G. Breslin. 2016. INSIGHT-1 at SemEval-2016 task 5: Deep learning for multilingual aspect-based sentiment analysis. In SemEval@NAACL-HLT.
Singhal, P., and P. Bhattacharyya. 2016. Borrow a little from your rich cousin: Using embeddings and polarities of english words for multilingual sentiment classification. In COLING.
Araújo, M., J.C. Reis, A.M. Pereira, and F. Benevenuto. 2016. An evaluation of machine translation for multilingual sentence-level sentiment analysis. In SAC.
Bengio, Y., P.Y. Simard, and P. Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5 (2): 157–66.
Hochreiter, S., and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9: 1735–1780.
Sherstinsky, A. 2018. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. CoRR, abs/1808.03314.
Cho, K., B.V. Merrienboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP.
Yao, K. 2015. Depth-gated recurrent neural networks.
Greff, K., R.K. Srivastava, J. KoutnÃk, B.R. Steunebrink, and J. Schmidhuber. 2017. LSTM: a search space Odyssey. IEEE Transactions on Neural Networks and Learning Systems 28: 2222–2232.
Wang, Y., M. Huang, X. Zhu, and L. Zhao. 2016. Attention-based LSTM for aspect-level sentiment classification. In EMNLP.
Chen, T., R. Xu, Y. He, and X. Wang. 2017. Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Systems with Applications 72: 221–230.
Joshi, A., A. Prabhu, M. Shrivastava, and V. Varma. 2016. Towards sub-word level compositions for sentiment analysis of Hindi-English code mixed text. In COLING.
Jhanwar, M.G., and A. Das. 2018. An ensemble model for sentiment analysis of Hindi-English code-mixed data. CoRR, abs/1806.04450.
Can, E.F., A. Ezen-Can, and F. Can. 2018. Multilingual sentiment analysis: An RNN-based framework for limited data. CoRR, abs/1806.04511.
Alayba, A.M., V. Palade, M. England, and R. Iqbal. 2018. A combined CNN and LSTM model for arabic sentiment analysis. In CD-MAKE.
Baly, R., G.E. Khoury, R. Moukalled, R. Aoun, H.M. Hajj, K.B. Shaban, and W. El-Hajj. 2017. Comparative evaluation of sentiment analysis methods across Arabic dialects. In ACLING.
Peng, H., Y. Ma, Y. Li, and E. Cambria. 2018. Learning multi-grained aspect target sequence for Chinese sentiment analysis. Knowledge-Based Systems 148: 167–176.
Chung, T., B. Xu, Y. Liu, C. Ouyang, S. Li, and L. Luo. 2019. Empirical study on character level neural network classifier for Chinese text. Engineering Applications of Artificial Intelligence 80: 1–7. https://doi.org/10.1016/j.engappai.2019.01.009.
Socher, R., A. Perelygin, J. Wu, J. Chuang, C.D. Manning, A.Y. Ng, and C. Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP.
Zhang, J., S. Liu, M. Li, M. Zhou, and C. Zong. 2014. Bilingually-constrained phrase embeddings for machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). https://doi.org/10.3115/v1/p14-1011
Jain, S., and S. Batra. 2015. Cross lingual sentiment analysis using modified BRAE. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/d15-1016
Al-Sallab, A., R. Baly, H. Hajj, K.B. Shaban, W. El-Hajj, and G. Badaro. 2017. Aroma. ACM Transactions on Asian and Low-Resource Language Information Processing 16 (4): 1–20. https://doi.org/10.1145/3086575.
Alotaiby, F.A., I.A. Alkharashi, and S.G. Foda. (2014). Processing large Arabic text corpora: Preliminary analysis and results. In Language Resources and Evaluation Conference.
Pasha, A., M. Al-Badrashiny, M. Diab, A.E. Kholy, R. Eskander, N. Habash, M. Pooleery, O. Rambow, R.M. Roth. 2014. MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In Language Resources and Evaluation Conference.
Green, S., and C.D. Manning. 2010. Better Arabic parsing: Baselines, evaluations, and analysis. In COLING.
Chomsky, N. 1959. On certain formal properties of grammars. Information and Control 2: 137–167.
Bromley, J., I. Guyon, Y. LeCun, E. Säckinger, and R. Shah. 1993. Signature verification using a Siamese time delay neural network. IJPRAI 7: 669–688.
Koch, G.R. 2015. Siamese neural networks for one-shot image recognition.
Leal-Taixé, L., C. Canton-Ferrer, and K. Schindler. 2016. Learning by tracking: Siamese CNN for robust target association. In 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 418–425.
Maheshwary, S., and H. Misra. 2018. Matching resumes to jobs via deep siamese network. In WWW.
Schroff, F., D. Kalenichenko, and J. Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 815–823.
Choudhary, N., R. Singh, I. Bindlish, and M. Shrivastava. 2018. Sentiment analysis of code-mixed languages leveraging resource rich languages. CoRR, abs/1804.00806.
Mathur, P., R. Sawhney, M. Ayyar, and R. Shah. 2018, October. Did you offend me? classification of offensive tweets in hinglish language. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), 138–148.
Mathur, P., R. Shah, R. Sawhney, and D. Mahata. 2018, July. Detecting offensive tweets in Hindi-English code-switched language. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media, 18–26.
Sawhney, R., P. Manchanda, R. Singh, and S. Aggarwal. 2018, July. A computational approach to feature extraction for identification of suicidal ideation in tweets. In Proceedings of ACL 2018, Student Research Workshop, 91–98.
Sawhney, R., P. Manchanda, P. Mathur, R. Shah, and R. Singh. 2018, October. Exploring and learning suicidal ideation connotations on social media with deep learning. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 167–175.
Mishra, R., P. Sinha, R. Sawhney, D. Mahata, P. Mathur, and R. Shah. 2019, June. SNAP-BATNET: Cascading author profiling and social network graphs for suicide ideation detection on social media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Nankani, H., Dutta, H., Shrivastava, H., Rama Krishna, P.V.N.S., Mahata, D., Shah, R.R. (2020). Multilingual Sentiment Analysis. In: Agarwal, B., Nayak, R., Mittal, N., Patnaik, S. (eds) Deep Learning-Based Approaches for Sentiment Analysis. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-1216-2_8
Download citation
DOI: https://doi.org/10.1007/978-981-15-1216-2_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1215-5
Online ISBN: 978-981-15-1216-2
eBook Packages: EngineeringEngineering (R0)