Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models

Cosme, Camilla Johnine; De Leon, Marlene M.

doi:10.1007/978-981-99-8349-0_11

Camilla Johnine Cosme¹³ &
Marlene M. De Leon¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 834))

Included in the following conference series:

World Conference on Information Systems for Business Management

77 Accesses

Abstract

Bilingual individuals already outnumber monolinguals yet most of the available resources for research in natural language processing (NLP) are for high-resource single languages. A recent area of interest in NLP research for low-resource languages is code-switching, a phenomenon in both written and spoken communication marked by the usage of at least two languages in one utterance. This work presented two novel contributions to NLP research for low-resource languages. First, it introduced the first sentiment-annotated corpus of Filipino-English Reviews with Code-Switching (FiReCS) with more than 10k instances of product and service reviews. Second, it developed sentiment analysis models for Filipino-English text using pre-trained Transformers-based large language models (LLMs) and introduced benchmark results for zero-shot sentiment analysis on text with code-switching using OpenAI’s GPT-3 series models. The performance of the Transformers-based sentiment analysis models were compared against those of existing lexicon-based sentiment analysis tools designed for monolingual text. The fine-tuned XLM-RoBERTa model achieved the highest accuracy and weighted average F1-score of 0.84 with F1-scores of 0.89, 0.86, and 0.78 in the Positive, Negative, and Neutral sentiment classes, respectively. The poor performance of the lexicon-based sentiment analysis tools exemplifies the limitations of such systems that are designed for a single language when applied to bilingual text involving code-switching.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The data set is made available to the research community through the following link: https://huggingface.co/datasets/ccosme/FiReCS.

References

Nguyen T (2015) Code switching: a sociolinguistic perspective. Anchor
Google Scholar
Gumperz JJ (1982) Discourse strategies. Studies in interactional sociolinguistics. Cambridge University Press. https://doi.org/10.1017/CBO9780511611834
Myers-Scotton C (1993) Common and uncommon ground: social and structural factors in codeswitching. Lang. Soc. 22(4):475–503. https://doi.org/10.1017/S0047404500017449
Article Google Scholar
Hamers JF, Blanc MHA (2000) Bilinguality and bilingualism, 2nd edn. Cambridge University Press
Google Scholar
Eckert P, McConnell-Ginet S (2003). Language and Gender Cambridge University Press. https://doi.org/10.1017/CBO9780511791147
Green L (2006) African American English: a linguistic introduction. Lang Soc 35(1):149–152. https://doi.org/10.1017/S0047404506260056
Article Google Scholar
Kim E (2006) Reasons and motivations for code-mixing and code-switching. Issues EFL 4(1,2):43–61
Google Scholar
Rosenthal S, McKeown K (2011) Age prediction in blogs: a study of style, content, and online behavior in pre- and post-social media generations. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Portland, Oregon, USA, pp 763–772, June 2011. https://aclanthology.org/P11-1077
Trudgill P (1974) Linguistic change and diffusion: description and explanation in sociolinguistic dialect geography. Lang. Soc. 3(2):215–246. https://doi.org/10.1017/S0047404500004358
Article Google Scholar
Yang Y, Eisenstein J (Dec2017) Overcoming language variation in sentiment analysis with social attention. Trans Assoc Comput Ling 5:295–307. https://doi.org/10.1162/tacl_a_00062, https://direct.mit.edu/tacl/article/43395
Liu B (2012) Sentiment analysis and opinion mining, 1st edn. Synthesis lectures on human language technologies. Springer Cham. https://link.springer.com/book/10.1007/978-3-031-02145-9
Aryal SK, Prioleau H, Washington G (2022) Sentiment classification of code-switched text using pre-trained multilingual embeddings and segmentation. In: Signal, image processing and embedded systems trends. Academy and Industry Research Collaboration Center (AIRCC), pp 179–186. https://doi.org/10.5121/csit.2022.122013, https://aircconline.com/csit/papers/vol12/csit122013.pdf
Angel J, Aroyehun ST, Tamayo A, Gelbukh A (2020) NLP-CIC at SemEval-2020 task 9: analysing sentiment in code-switching language using a simple deep-learning classifier. In: Proceedings of the fourteenth workshop on semantic evaluation. International Committee for Computational Linguistics, Barcelona, pp 957–962 (online). https://doi.org/10.18653/v1/2020.semeval-1.123, https://aclanthology.org/2020.semeval-1.123
Chakravarthi BR, Muralidaran V, Priyadharshini R, McCrae JP (2020) Corpus creation for sentiment analysis in code-mixed Tamil-English text. http://arxiv.org/abs/2006.00206, arXiv:2006.00206 [cs]
Vilares D, Alonso MA, Gómez-Rodríguez C (2015) Sentiment analysis on monolingual, multilingual and code-switching twitter corpora. In: Proceedings of the 6th workshop on computational approaches to subjectivity, sentiment and social media analysis. Association for Computational Linguistics, Lisboa, Portugal, pp 2–8. https://doi.org/10.18653/v1/W15-2902, http://aclweb.org/anthology/W15-2902
Jose N, Chakravarthi BR, Suryawanshi S, Sherly E, McCrae JP (2020) A survey of current datasets for code-switching research. In: 2020 6th international conference on advanced computing and communication systems (ICACCS). IEEE, Coimbatore, India, pp 136–141. https://doi.org/10.1109/ICACCS48705.2020.9074205, https://ieeexplore.ieee.org/document/9074205/
Andrei AL (2014) Development and evaluation of tagalog linguistic inquiry and word count (LIWC) dictionaries for negative and positive emotion. https://www.mitre.org/news-insights/publication/development-and-evaluation-tagalog-linguistic-inquiry-and-word-count-liwc
Mager M, Mager E, Medina-Urrea A, Meza I, Kann K (2018) Lost in translation: analysis of information loss during machine translation between polysynthetic and fusional languages. https://arxiv.org/abs/1807.00286
Blodgett SL, Green L, O’Connor B (2016) Demographic dialectal variation in social media: a case study of African-American English. http://arxiv.org/abs/1608.08868, arXiv:1608.08868 [cs]
Chakravarthi BR, Priyadharshini R, Thavareesan S, Chinnappa D, Thenmozhi D, Sherly E, McCrae JP, Hande A, Ponnusamy R, Banerjee S, Vasantharajan C (2021) Findings of the sentiment analysis of dravidian languages in code-mixed text. arXiv:2111.09811 [cs]
Patwa P, Aguilar G, Kar S, Pandey S, Pykl S, Gambäck B, Chakraborty T, Solorio T, Das A (2020) SemEval-2020 task 9: overview of sentiment analysis of code-mixed tweets. arXiv:2008.04277 [cs]
Yadav K, Lamba A, Gupta D, Gupta A, Karmakar P, Saini S (2020) Bi-LSTM and ensemble based bilingual sentiment analysis for a code-mixed Hindi-English social media text. In: 2020 IEEE 17th India council international conference (INDICON). IEEE, New Delhi, India, pp 1–6, Dec 2020. https://doi.org/10.1109/INDICON49873.2020.9342241, https://ieeexplore.ieee.org/document/9342241/
Yadav K, Lamba A, Gupta D, Gupta A, Karmakar P, Saini S (2020) Bilingual sentiment analysis for a code-mixed Punjabi English social media text. In: 2020 5th international conference on computing, communication and security (ICCCS). IEEE, Patna, India, pp 1–5, Oct 2020. https://doi.org/10.1109/ICCCS49678.2020.9277309, https://ieeexplore.ieee.org/document/9277309/
Solorio T, Blair E, Maharjan S, Bethard S, Diab M, Ghoneim M, Hawwari A, AlGhamdi F, Hirschberg J, Chang A et al (2014) Overview for the first shared task on language identification in code-switched data. In: Proceedings of the first workshop on computational approaches to code switching. pp 62–72
Google Scholar
Vilares D, Alonso MA, Gómez-Rodríguez C (2016) EN-ES-CS: an English-Spanish code-switching twitter corpus for multilingual sentiment analysis. In: Proceedings of the tenth international conference on language resources and evaluation (LREC’16). European Language Resources Association (ELRA), Portorož, Slovenia, pp 4149–4153, May 2016. https://aclanthology.org/L16-1655
Chakravarthi BR, Priyadharshini R, Muralidaran V, Jose N, Suryawanshi S, Sherly E, McCrae JP (2022) DravidianCodeMix: sentiment analysis and offensive language identification dataset for dravidian languages in code-mixed text. Lang Resour Eval 56(3):765–806. https://doi.org/10.1007/s10579-022-09583-7, arXiv:2106.09460 [cs]
Co NA, Estuar MRJ, Tan HC, Tan AS, Abao R, Aureus J (2022) Development of bilingual sentiment and emotion text classification models from COVID-19 vaccination tweets in the Philippines. In: Meiselwitz G (ed) Social computing and social media: design, user experience and impact. Lecture notes in computer science, vol 13315. Springer International Publishing, Cham, pp 247–266 (2022). https://doi.org/10.1007/978-3-031-05061-9_18
De Leon M, Estuar M (2013) Disaster emotions: a bilingual sentiment and affect analysis of disaster tweets. In: Proceedings of the 6th annual international conference on computer games, multimedia and allied technologies
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762 [cs]
Ou X, Li H (2020) Ynu@dravidian-codemix-fire2020: Xlm-roberta for multi-language sentiment analysis. In: Fire
Google Scholar
Liu Y, Han T, Ma S, Zhang J, Yang Y, Tian J, He H, Li A, He M, Liu Z, Wu Z, Zhu D, Li X, Qiang N, Shen D, Liu T, Ge B (2023) Summary of ChatGPT/GPT-4 research and perspective towards the future of large language models
Google Scholar
Kuzman T, Mozetič I, Ljubešiá N (2023) ChatGPT: beginning of an end of manual linguistic data annotation? Use case of automatic genre identification
Google Scholar
Zhang B, Ding D, Jing L (2023) How would stance detection techniques evolve after the launch of ChatGPT?
Google Scholar
Huang F, Kwak H, An J (2023) Is ChatGPT better than human annotators? Potential and limitations of ChatGPT in explaining implicit hate speech. arXiv e-prints arXiv:2302.07736
McKinney W (2010) Data structures for statistical computing in python. In: van der Walt S, Millman J (eds) Proceedings of the 9th python in science conference, pp 56 – 61. https://doi.org/10.25080/Majora-92bf1922-00a
Emistahl P (2021) Lingua-py: a python package for language detection. https://github.com/pemistahl/lingua-py
Castro S (2017) Fast Krippendorff: fast computation of Krippendorff’s alpha agreement measure (2017). https://github.com/pln-fing-udelar/fast-krippendorff
Hutto C, Gilbert E (2014) VADER: a parsimonious rule-based model for sentiment analysis of social media text. Proc Int AAAI Conf Web Soc Media 8(1):216–225. https://doi.org/10.1609/icwsm.v8i1.14550, https://ojs.aaai.org/index.php/ICWSM/article/view/14550
Chen Y, Skiena S (2014) Building sentiment lexicons for all major languages. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (short papers), pp 383–389
Google Scholar
Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: International conference on learning representations. https://openreview.net/forum?id=Bkg6RiCqY7
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. In: Proceedings of the 34th international conference on neural information processing systems. NIPS’20, Curran Associates Inc., Red Hook, NY, USA
Google Scholar

Download references

Author information

Authors and Affiliations

Ateneo de Manila University, Metro Manila, Philippines
Camilla Johnine Cosme & Marlene M. De Leon

Authors

Camilla Johnine Cosme
View author publications
You can also search for this author in PubMed Google Scholar
Marlene M. De Leon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Camilla Johnine Cosme .

Editor information

Editors and Affiliations

University of Cantabria, Santander, Spain
Andres Iglesias
University of Aizu, Fukushima, Japan
Jungpil Shin
Knowledge Chamber of Commerce and Industry, Ahmedabad, Gujarat, India
Bharat Patel
Global Knowledge Research Foundation, Ahmedabad, Gujarat, India
Amit Joshi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cosme, C.J., De Leon, M.M. (2024). Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models. In: Iglesias, A., Shin, J., Patel, B., Joshi, A. (eds) Proceedings of World Conference on Information Systems for Business Management. ISBM 2023. Lecture Notes in Networks and Systems, vol 834. Springer, Singapore. https://doi.org/10.1007/978-981-99-8349-0_11

Download citation

DOI: https://doi.org/10.1007/978-981-99-8349-0_11
Published: 29 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8348-3
Online ISBN: 978-981-99-8349-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models