Skip to main content

Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models

  • Conference paper
  • First Online:
Proceedings of World Conference on Information Systems for Business Management (ISBM 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 834))

Included in the following conference series:

  • 77 Accesses

Abstract

Bilingual individuals already outnumber monolinguals yet most of the available resources for research in natural language processing (NLP) are for high-resource single languages. A recent area of interest in NLP research for low-resource languages is code-switching, a phenomenon in both written and spoken communication marked by the usage of at least two languages in one utterance. This work presented two novel contributions to NLP research for low-resource languages. First, it introduced the first sentiment-annotated corpus of Filipino-English Reviews with Code-Switching (FiReCS) with more than 10k instances of product and service reviews. Second, it developed sentiment analysis models for Filipino-English text using pre-trained Transformers-based large language models (LLMs) and introduced benchmark results for zero-shot sentiment analysis on text with code-switching using OpenAI’s GPT-3 series models. The performance of the Transformers-based sentiment analysis models were compared against those of existing lexicon-based sentiment analysis tools designed for monolingual text. The fine-tuned XLM-RoBERTa model achieved the highest accuracy and weighted average F1-score of 0.84 with F1-scores of 0.89, 0.86, and 0.78 in the Positive, Negative, and Neutral sentiment classes, respectively. The poor performance of the lexicon-based sentiment analysis tools exemplifies the limitations of such systems that are designed for a single language when applied to bilingual text involving code-switching.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The data set is made available to the research community through the following link: https://huggingface.co/datasets/ccosme/FiReCS.

References

  1. Nguyen T (2015) Code switching: a sociolinguistic perspective. Anchor

    Google Scholar 

  2. Gumperz JJ (1982) Discourse strategies. Studies in interactional sociolinguistics. Cambridge University Press. https://doi.org/10.1017/CBO9780511611834

  3. Myers-Scotton C (1993) Common and uncommon ground: social and structural factors in codeswitching. Lang. Soc. 22(4):475–503. https://doi.org/10.1017/S0047404500017449

    Article  Google Scholar 

  4. Hamers JF, Blanc MHA (2000) Bilinguality and bilingualism, 2nd edn. Cambridge University Press

    Google Scholar 

  5. Eckert P, McConnell-Ginet S (2003). Language and Gender Cambridge University Press. https://doi.org/10.1017/CBO9780511791147

  6. Green L (2006) African American English: a linguistic introduction. Lang Soc 35(1):149–152. https://doi.org/10.1017/S0047404506260056

    Article  Google Scholar 

  7. Kim E (2006) Reasons and motivations for code-mixing and code-switching. Issues EFL 4(1,2):43–61

    Google Scholar 

  8. Rosenthal S, McKeown K (2011) Age prediction in blogs: a study of style, content, and online behavior in pre- and post-social media generations. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Portland, Oregon, USA, pp 763–772, June 2011. https://aclanthology.org/P11-1077

  9. Trudgill P (1974) Linguistic change and diffusion: description and explanation in sociolinguistic dialect geography. Lang. Soc. 3(2):215–246. https://doi.org/10.1017/S0047404500004358

    Article  Google Scholar 

  10. Yang Y, Eisenstein J (Dec2017) Overcoming language variation in sentiment analysis with social attention. Trans Assoc Comput Ling 5:295–307. https://doi.org/10.1162/tacl_a_00062, https://direct.mit.edu/tacl/article/43395

  11. Liu B (2012) Sentiment analysis and opinion mining, 1st edn. Synthesis lectures on human language technologies. Springer Cham. https://link.springer.com/book/10.1007/978-3-031-02145-9

  12. Aryal SK, Prioleau H, Washington G (2022) Sentiment classification of code-switched text using pre-trained multilingual embeddings and segmentation. In: Signal, image processing and embedded systems trends. Academy and Industry Research Collaboration Center (AIRCC), pp 179–186. https://doi.org/10.5121/csit.2022.122013, https://aircconline.com/csit/papers/vol12/csit122013.pdf

  13. Angel J, Aroyehun ST, Tamayo A, Gelbukh A (2020) NLP-CIC at SemEval-2020 task 9: analysing sentiment in code-switching language using a simple deep-learning classifier. In: Proceedings of the fourteenth workshop on semantic evaluation. International Committee for Computational Linguistics, Barcelona, pp 957–962 (online). https://doi.org/10.18653/v1/2020.semeval-1.123, https://aclanthology.org/2020.semeval-1.123

  14. Chakravarthi BR, Muralidaran V, Priyadharshini R, McCrae JP (2020) Corpus creation for sentiment analysis in code-mixed Tamil-English text. http://arxiv.org/abs/2006.00206, arXiv:2006.00206 [cs]

  15. Vilares D, Alonso MA, Gómez-Rodríguez C (2015) Sentiment analysis on monolingual, multilingual and code-switching twitter corpora. In: Proceedings of the 6th workshop on computational approaches to subjectivity, sentiment and social media analysis. Association for Computational Linguistics, Lisboa, Portugal, pp 2–8. https://doi.org/10.18653/v1/W15-2902, http://aclweb.org/anthology/W15-2902

  16. Jose N, Chakravarthi BR, Suryawanshi S, Sherly E, McCrae JP (2020) A survey of current datasets for code-switching research. In: 2020 6th international conference on advanced computing and communication systems (ICACCS). IEEE, Coimbatore, India, pp 136–141. https://doi.org/10.1109/ICACCS48705.2020.9074205, https://ieeexplore.ieee.org/document/9074205/

  17. Andrei AL (2014) Development and evaluation of tagalog linguistic inquiry and word count (LIWC) dictionaries for negative and positive emotion. https://www.mitre.org/news-insights/publication/development-and-evaluation-tagalog-linguistic-inquiry-and-word-count-liwc

  18. Mager M, Mager E, Medina-Urrea A, Meza I, Kann K (2018) Lost in translation: analysis of information loss during machine translation between polysynthetic and fusional languages. https://arxiv.org/abs/1807.00286

  19. Blodgett SL, Green L, O’Connor B (2016) Demographic dialectal variation in social media: a case study of African-American English. http://arxiv.org/abs/1608.08868, arXiv:1608.08868 [cs]

  20. Chakravarthi BR, Priyadharshini R, Thavareesan S, Chinnappa D, Thenmozhi D, Sherly E, McCrae JP, Hande A, Ponnusamy R, Banerjee S, Vasantharajan C (2021) Findings of the sentiment analysis of dravidian languages in code-mixed text. arXiv:2111.09811 [cs]

  21. Patwa P, Aguilar G, Kar S, Pandey S, Pykl S, Gambäck B, Chakraborty T, Solorio T, Das A (2020) SemEval-2020 task 9: overview of sentiment analysis of code-mixed tweets. arXiv:2008.04277 [cs]

  22. Yadav K, Lamba A, Gupta D, Gupta A, Karmakar P, Saini S (2020) Bi-LSTM and ensemble based bilingual sentiment analysis for a code-mixed Hindi-English social media text. In: 2020 IEEE 17th India council international conference (INDICON). IEEE, New Delhi, India, pp 1–6, Dec 2020. https://doi.org/10.1109/INDICON49873.2020.9342241, https://ieeexplore.ieee.org/document/9342241/

  23. Yadav K, Lamba A, Gupta D, Gupta A, Karmakar P, Saini S (2020) Bilingual sentiment analysis for a code-mixed Punjabi English social media text. In: 2020 5th international conference on computing, communication and security (ICCCS). IEEE, Patna, India, pp 1–5, Oct 2020. https://doi.org/10.1109/ICCCS49678.2020.9277309, https://ieeexplore.ieee.org/document/9277309/

  24. Solorio T, Blair E, Maharjan S, Bethard S, Diab M, Ghoneim M, Hawwari A, AlGhamdi F, Hirschberg J, Chang A et al (2014) Overview for the first shared task on language identification in code-switched data. In: Proceedings of the first workshop on computational approaches to code switching. pp 62–72

    Google Scholar 

  25. Vilares D, Alonso MA, Gómez-Rodríguez C (2016) EN-ES-CS: an English-Spanish code-switching twitter corpus for multilingual sentiment analysis. In: Proceedings of the tenth international conference on language resources and evaluation (LREC’16). European Language Resources Association (ELRA), Portorož, Slovenia, pp 4149–4153, May 2016. https://aclanthology.org/L16-1655

  26. Chakravarthi BR, Priyadharshini R, Muralidaran V, Jose N, Suryawanshi S, Sherly E, McCrae JP (2022) DravidianCodeMix: sentiment analysis and offensive language identification dataset for dravidian languages in code-mixed text. Lang Resour Eval 56(3):765–806. https://doi.org/10.1007/s10579-022-09583-7, arXiv:2106.09460 [cs]

  27. Co NA, Estuar MRJ, Tan HC, Tan AS, Abao R, Aureus J (2022) Development of bilingual sentiment and emotion text classification models from COVID-19 vaccination tweets in the Philippines. In: Meiselwitz G (ed) Social computing and social media: design, user experience and impact. Lecture notes in computer science, vol 13315. Springer International Publishing, Cham, pp 247–266 (2022). https://doi.org/10.1007/978-3-031-05061-9_18

  28. De Leon M, Estuar M (2013) Disaster emotions: a bilingual sentiment and affect analysis of disaster tweets. In: Proceedings of the 6th annual international conference on computer games, multimedia and allied technologies

    Google Scholar 

  29. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762 [cs]

  30. Ou X, Li H (2020) Ynu@dravidian-codemix-fire2020: Xlm-roberta for multi-language sentiment analysis. In: Fire

    Google Scholar 

  31. Liu Y, Han T, Ma S, Zhang J, Yang Y, Tian J, He H, Li A, He M, Liu Z, Wu Z, Zhu D, Li X, Qiang N, Shen D, Liu T, Ge B (2023) Summary of ChatGPT/GPT-4 research and perspective towards the future of large language models

    Google Scholar 

  32. Kuzman T, Mozetič I, Ljubešiá N (2023) ChatGPT: beginning of an end of manual linguistic data annotation? Use case of automatic genre identification

    Google Scholar 

  33. Zhang B, Ding D, Jing L (2023) How would stance detection techniques evolve after the launch of ChatGPT?

    Google Scholar 

  34. Huang F, Kwak H, An J (2023) Is ChatGPT better than human annotators? Potential and limitations of ChatGPT in explaining implicit hate speech. arXiv e-prints arXiv:2302.07736

  35. McKinney W (2010) Data structures for statistical computing in python. In: van der Walt S, Millman J (eds) Proceedings of the 9th python in science conference, pp 56 – 61. https://doi.org/10.25080/Majora-92bf1922-00a

  36. Emistahl P (2021) Lingua-py: a python package for language detection. https://github.com/pemistahl/lingua-py

  37. Castro S (2017) Fast Krippendorff: fast computation of Krippendorff’s alpha agreement measure (2017). https://github.com/pln-fing-udelar/fast-krippendorff

  38. Hutto C, Gilbert E (2014) VADER: a parsimonious rule-based model for sentiment analysis of social media text. Proc Int AAAI Conf Web Soc Media 8(1):216–225. https://doi.org/10.1609/icwsm.v8i1.14550, https://ojs.aaai.org/index.php/ICWSM/article/view/14550

  39. Chen Y, Skiena S (2014) Building sentiment lexicons for all major languages. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (short papers), pp 383–389

    Google Scholar 

  40. Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: International conference on learning representations. https://openreview.net/forum?id=Bkg6RiCqY7

  41. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. In: Proceedings of the 34th international conference on neural information processing systems. NIPS’20, Curran Associates Inc., Red Hook, NY, USA

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Camilla Johnine Cosme .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cosme, C.J., De Leon, M.M. (2024). Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models. In: Iglesias, A., Shin, J., Patel, B., Joshi, A. (eds) Proceedings of World Conference on Information Systems for Business Management. ISBM 2023. Lecture Notes in Networks and Systems, vol 834. Springer, Singapore. https://doi.org/10.1007/978-981-99-8349-0_11

Download citation

Publish with us

Policies and ethics