Skip to main content
Log in

Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus

  • NATURAL LANGUAGE PROCESSING
  • Published:
Automatic Documentation and Mathematical Linguistics Aims and scope

Abstract

The paper introduces a methodology for extracting opinion aspects from textual content by identifying the customer-evaluated parameters regarding a given object. These parameters form the foundation for shaping the customer’s attitudes toward the product or service. The proposed approach leverages topic modeling tools to delineate classes of vocabulary exhibiting semantics aligned with the parameters influencing the customer’s opinion about the object. Our study specifically explores the application of the BERTopic model as a topic modeling tool to address this challenge. The outlined methodology encompasses several sequential steps, including the preprocessing of textual data involving the removal of stopwords, conversion to lowercase characters, and lemmatization. Additionally, special consideration is given to the distinct lexical manifestations of opinion aspects, obtained as a result of the extraction of nominal, verbal, and adjectival single- and multicomponent phrases from the corpus. Subsequently, the corpus sentences are represented as vectors in a feature space expressed by the extracted words and phrases. The final step involves the application of topic modeling using the BERTopic model on the customer review corpus, utilizing the vector representations of corpus sentences. The experimental inquiry is conducted on a domain-specific Russian-language corpus comprising customer feedback on airline services gathered from customer review websites. The resultant topic distribution is then juxtaposed against a manually constructed conceptual model of the domain. The comparative analysis reveals that the automatic topic distribution aligns with the conceptual structure of the domain, demonstrating a precision of 0.955 and a recall of 0.875. These findings affirm the efficacy of employing the BERTopic model to address the problem of the corpus-based mining of opinion aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.

REFERENCES

  1. Bollen, J., Mao, H., and Zeng, X., Twitter mood predicts the stock market, J. Comput. Sci., 2011, vol. 2, no. 1, pp. 1–8. https://doi.org/10.1016/j.jocs.2010.12.007

    Article  Google Scholar 

  2. Molina-González, M.D., Martínez-Cámara, E., Martín-Valdivia, M.-T., and Perea-Ortega, J.M., Semantic orientation for polarity classification in Spanish reviews, Expert Syst. Appl., 2013, vol. 40, pp. 7250–7257. https://doi.org/10.1016/j.eswa.2013.06.076

    Article  Google Scholar 

  3. Kiritchenko, S., Zhu, X., and Mohammad, S., Sentiment analysis of short informal texts, J. Artif. Intell. Res., 2014, vol. 50, pp. 723–762. https://doi.org/10.1613/jair.4272

    Article  Google Scholar 

  4. Altawaier, M.M. and Tiun, S., Comparison of Machine Learning Approaches on Arabic Twitter Sentiment Analysis, Int. J. Adv. Sci., Eng. Inf. Technol., 2016, vol. 6, no. 6, pp. 1067–1073. https://doi.org/10.18517/IJASEIT.6.6.1456

    Article  Google Scholar 

  5. Kolmogorova, A.V., Use of texts of the internet revelation genre in the context of solving the problems of sentiment-analysis, Vestn. Novosibirskogo Gos. Univ. Ser.: Lingvistika Mezhkul’turnaya Kommunikatsiya, 2019, no. 3, pp. 71–82. https://doi.org/10.25205/1818-7935-2019-17-3-71-82

  6. Mohammad, S.M., Sentiment analysis: Automatically detecting valence, emotions, and other affectual states from text, Emotion Measurement, Meiselman, H.L., Ed., Woodhead Publishing, 2021, pp. 323–379. https://doi.org/10.1016/B978-0-12-821124-3.00011-9

    Book  Google Scholar 

  7. Semina, T.A., Sentiment analysis: Modern approaches and existing problems, Sotsial’nye Gumanitarnye Nauki. Otechestvennaya Zarubezhnaya Literatura. Ser. 6: Yazykoznanie. Referativnyi Zh., 2020, no. 4, pp. 47–63.

  8. Fang, X. and Zhan, J., Sentiment analysis using product review data, J. Big Data, 2015, vol. 2, p. 5. https://doi.org/10.1186/s40537-015-0015-2

    Article  Google Scholar 

  9. Chitra, K., Tamilarasi, A., Dharani, S.G., Keerthana, P., and Madhumitha, T., Opinion mining and sentiment analysis on product reviews, 2022 Int. Conf. on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2022, IEEE, 2022, pp. 1–7. https://doi.org/10.1109/ICCCI54379.2022.9740777

  10. Geetha, R., Rekha, P., and Karthika, S., Twitter opinion mining and boosting using sentiment analysis, Proc. 2018 Int. Conf. on Computer, Communication, and Signal Processing (ICCCSP), Chennai, India, 2018, IEEE, 2018, pp. 1–4. https://doi.org/10.1109/ICCCSP.2018.8452838

  11. Liu, Y., Yu, X., Liu, B., and Chen, Z., Sentence-Level sentiment analysis in the presence of modalities, Computational Linguistics and Intelligent Text Processing, Gelbukh, A., Ed., Lecture Notes in Computer Science, vol. 8404, Berlin: Springer, 2014, pp. 1–16. https://doi.org/10.1007/978-3-642-54903-8_1

    Book  Google Scholar 

  12. Paniagua-Reyes, F., Reyes-Ortiz, J., and Bravo, M., Entity-based opinion mining from Spanish tweets, Proc. 6th Int. Conf. on Data Science, Technology and Applications, Madrid: SciTePress, 2017, pp. 400–407. https://doi.org/10.5220/0006484904000407

  13. Lark, J., Morin, E., and Saldarriaga, S.P., A comparative study of target-based and entity-based opinion extraction, Computational Linguistics and Intelligent Text Processing. CICLing 2017, Gelbukh, A., Ed., Lecture Notes in Computer Science, vol. 10762, Cham: Springer, 2017, pp. 211–223. https://doi.org/10.1007/978-3-319-77116-8_16

    Book  Google Scholar 

  14. Xu, R., Lin, H., Liao, M., Han, X., Xu, J., Tan, W., Sun, Y., and Sun, L., ECO v1: Towards event-centric opinion mining, findings of the, Findings of the Association for Computational Linguistics: ACL 2022, Dublin, 2022, Muresan, S., Nakov, P., and Villvicencio, A., Eds., Association for Computational Linguistics, 2022, pp. 2743–2753. https://doi.org/10.18653/v1/2022.findings-acl.216

  15. Salas-Zárate, M.P., Valencia-García, R., Ruiz-Martínez, A., and Colomo-Palacios, R., Feature-based opinion mining in financial news: an ontology-driven approach, J. Inf. Sci., 2017, vol. 43, pp. 458–479. https://doi.org/10.1177/0165551516645528

    Article  Google Scholar 

  16. Aboelela, E.M., Gad, W., and Isamail, R., The impact of semantics on aspect level opinion mining, PeerJ Comput. Sci., 2021, vol. 7, p. e558. https://doi.org/10.7717/peerj-cs.558

    Article  Google Scholar 

  17. Sanda, R., Abdurahman, Z.K., and Nhita, F., Opinion mining feature level using naïve bayes and feature extraction based analysis dependencies, AIP Conf. Proc., 2015, vol. 1692, no. 1, p. 20020. https://doi.org/10.1063/1.4936448

    Article  Google Scholar 

  18. Abbasi, A., Chen, H., and Salem, A., Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums, ACM Trans. Inf. Syst. (TOIS), 2008, vol. 26, no. 3, p. 12. https://doi.org/10.1145/1361684.1361685

    Article  Google Scholar 

  19. Arora, P., Bakliwal, A., and Varma, V., Hindi subjective lexicon generation using WordNet graph traversal, Int. J. Comput. Linguist. Appl., 2012, vol. 3, no. 1, pp. 25–39.

    Google Scholar 

  20. Hutto, C. and Gilbert, E., VADER: A parsimonious rule-based model for sentiment analysis of social media text, Proc. Int. AAAI Conf. Web Soc. Media, 2014, vol. 8, no. 1, pp. 216–225. https://doi.org/10.1609/icwsm.v8i1.14550

  21. Loukachevitch, N. and Levchik, A., Creating a general Russian sentiment lexicon, Proc. Tenth Int. Conf. on Language Resources and Evaluation (LREC’16), Portorož, Slovenia, 2016, Calzolari, N. et al., Eds., European Language Resources Association, 2016, pp. 1171–1176. https://aclanthology.org/L16-1186.

  22. Koltsova, O., Alexeeva, S., and Kolcov, S., An opinion word lexicon and a training dataset for Russian sentiment analysis of social media, Komp’yuternaya lingvistika i intellektual’nye tekhnologii: po materialam ezhegodnoi mezhdunarodnoi konferentsii Dialog-2016 (Computational Linguistics and Intellectual Technologies: Proc. Int. Conf. Dialogue 2016), Moscow, 2016, Moscow: Izd-vo Ros. Gos. Gumanit. Univ., 2016, pp. 277–287.

  23. Kan, D., Rule-based approach to sentiment analysis at ROMIP 2011: Contest on sentiment analysis at the International Conference Dialogue-2011, 2012. https:// www.dialog-21.ru/media/1393/138.pdf.

  24. Tan, L.I., Phang, W.S., Chin, K.O., and Patricia, A., Rule-based sentiment analysis for financial news, IEEE Int. Conf. on Systems, Man, and Cybernetics, Hong Kong, 2015, IEEE, 2015, pp. 1601–1606. https://doi.org/10.1109/SMC.2015.283

  25. Berka, P., Sentiment analysis using rule-based and case-based reasoning, J. Intell. Inf. Syst., 2020, vol. 55, pp. 51–66. https://doi.org/10.1007/s10844-019-00591-8

    Article  Google Scholar 

  26. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., and Stede, M., Lexicon-based methods for sentiment analysis, Comput. Linguist., 2011, vol. 37, no. 2, pp. 267–307. https://doi.org/10.1162/COLI_a_00049

    Article  Google Scholar 

  27. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., and Passonneau, R., Sentiment analysis of twitter data, Proc. Workshop on Language in Social Media (LSM 2011), Portland, Ore., 2011, Nagarajan, M. and Gamon, M., Eds., Association for Computational Linguistics, 2011, pp. 30–38. https://aclanthology.org/W11-0705.

  28. Turney, P.D., Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews, Proc. 40th Annu. Meeting on Association for Computational Linguistics, Philadelphia, 2002, Isabelle, P., Charniak, E., and Lin, D., Eds., Association for Computational Linguistics, 2002, pp. 417–424. https://doi.org/10.3115/1073083.1073153

  29. Zhang, L. and Liu, B., Aspect and entity extraction for opinion mining, data mining and knowledge discovery for big data, Data Mining and Knowledge Discovery for Big Data. Studies in Big Data, Chu, W.W., Ed., Studies in Big Data, vol. 1, Berlin: Springer, 2014, pp. 1–40. https://doi.org/10.1007/978-3-642-40837-3_1

  30. Roi, D.A. and Efremova, N.E., Methods for extracting aspectual terms from opinions, Nov. Inf. Tekhnol. Avtomatizirovannykh Sistemakh, 2018, no. 21, pp. 212–216.

  31. Golubev, A. and Loukachevitch, N., Improving results on Russian sentiment datasets, Artificial Intelligence and Natural Language, Filchenkov, A., Kauttonen, J., and Pivovarova, L., Eds., Communications in Computer and Information Science, Cham: Springer, 2020, pp. 109–121. https://doi.org/10.1007/978-3-030-59082-6_8

  32. Pathan, A.F. and Prakash, C., Cross-domain aspect detection and categorization using machine learning for aspect-based opinion mining, Int. J. Inf. Manage. Data Insights, 2022, vol. 2, no. 2, p. 100099. https://doi.org/10.1016/j.jjimei.2022.100099

    Article  Google Scholar 

  33. Rajapaksha, S. and Ranathunga, S., Aspect detection in sportswear apparel reviews for opinion mining, Moratuwa Engineering Research Conference (MERCon), Moratuwa, Sri Lanka, 2022, IEEE, 2022, pp. 1–6. https://doi.org/10.1109/MERCon55799.2022.9906265

  34. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T., and Harshman, R., Indexing by latent semantic analysis, J. Am. Soc. Inf. Sci., 1990, vol. 41, no. 6, pp. 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6%3C391::AID-ASI1%3E3.0.CO;2-9

    Article  Google Scholar 

  35. Hofmann, T., Unsupervised learning by probabilistic latent semantic analysis, Mach. Learn., 2001, vol. 42, nos. 1–2, pp. 177–196. https://doi.org/10.1023/A:1007617005950

    Article  Google Scholar 

  36. Blei, D.M., Ng, A.Y., and Jordan, M.I., Latent Dirichlet allocation, J. Mach. Learn. Res., 2003, vol. 3, no. 2, pp. 993–1022.

    Google Scholar 

  37. Wang, J. and Zhang, X.-L., Deep NMF topic modeling, Neurocomputing, 2023, vol. 515, pp. 157–173. https://doi.org/10.1016/j.neucom.2022.10.002

    Article  Google Scholar 

  38. Vendrow, J., Haddock, J., Rebrova, E., and Needell, D., On a guided nonnegative matrix factorization, ICASSP 2021-2021 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Toronto, 2021, pp. 3265–3269. https://doi.org/10.1109/ICASSP39728.2021.9413656

  39. Chen, Yo., Zhang, H., Liu, R., Ye, Z., and Lin, J., Experimental explorations on short text topic mining between LDA and NMF based Schemes, Knowl.-Based Syst., 2019, vol. 163, pp. 1–13. https://doi.org/10.1016/j.knosys.2018.08.011

    Article  Google Scholar 

  40. Gallagher, R.J., Reing, K., Kale, D., and Ver Steeg, G., Anchored correlation explanation: Topic modeling with minimal domain knowledge, Trans. Assoc. Comput. Linguist., 2017, vol. 5, pp. 529–542. https://doi.org/10.1162/tacl_a_00078

    Article  Google Scholar 

  41. Watanabe, S., Information theoretical analysis of multivariate correlation, IBM J. Res. Dev., 1960, vol. 4, no. 1, pp. 66–82. https://doi.org/10.1147/rd.41.0066

    Article  MathSciNet  Google Scholar 

  42. Moody, C.E., Mixing Dirichlet topic models and word embeddings to make lda2Vec, arXiv Preprint, 2016. https://doi.org/10.48550/arXiv.1605.02019

  43. Angelov, D., Top2Vec: Distributed representations of topics, arXiv Preprint, 2020. https://doi.org/10.48550/arXiv.2008.09470

  44. Dieng, A.B., Ruiz, F.J.R., and Blei, D.M., Topic modeling in embedding spaces, Trans. Assoc. Comput. Linguist., 2020, vol. 8, pp. 439–453. https://doi.org/10.1162/tacl_a_00325

    Article  Google Scholar 

  45. Grootendorst, M., BERTopic: Neural topic modeling with a class-based TF-IDF procedure, arXiv Preprint, 2022. https://doi.org/10.48550/arXiv.2203.05794

  46. Albalawi, R., Yeap, T.H., and Benyoucef, M., Using topic modeling methods for short-text data: A comparative analysis, Front. Artif. Intell., 2020, vol. 3, p. 42. https://doi.org/10.3389/frai.2020.00042

    Article  Google Scholar 

  47. Egger, R. and Yu, J., A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify Twitter posts, Front. Sociology, 2022, vol. 7, p. 886498. https://doi.org/10.3389/fsoc.2022.886498

    Article  Google Scholar 

  48. Guo, Y., Barnes, S.J., and Jia, Q., Mining meaning from online ratings and reviews: tourist satisfaction analysis using latent dirichlet allocation, Tourism Manage., 2017, vol. 59, pp. 467–483. https://doi.org/10.1016/j.tourman.2016.09.009

    Article  Google Scholar 

  49. Reimers, N. and Gurevych, I., Sentence-BERT: Sentence embeddings using Siamese BERT-networks, Proc. 2019 Conf. on Empirical Methods in Natural Language Processing, Hong Kong, 2019, Inui, K., Jiang, J., Ng, V., and Wan, X., Eds., Association for Computational Linguistics, 2019, pp. 3982–3992. https://doi.org/10.18653/v1/D19-1410

  50. Mitrofanova, O.A. and Atugodage, M.M., Dynamic topic modelling of the Russian legal text corpus, Terra Linguistica, 2023, vol. 14, no. 1, pp. 70–87. https://doi.org/10.18721/JHSS.14107

    Article  Google Scholar 

  51. Çetinkaya, Y.M., Külah, E., Hakki Toroslu, I., and Davulcu, H., Targeted marketing on social media: Utilizing text analysis to create personalized landing pages, Preprint at Res. Square, 2023. https://doi.org/10.21203/rs.3.rs-2728199/v1

    Book  Google Scholar 

  52. Sharifian-Attar, V., De, S., Jabbari, S., Li, J., Moss, H., and Johnson, J., Analysing longitudinal social science questionnaires: Topic modelling with BERT-based embeddings, 2022 IEEE Int. Conf. on Big Data (Big Data 2022), Osaka, Japan, 2022, IEEE, 2022, pp. 5558–5567. https://doi.org/10.1109/BigData55660.2022.10020678

  53. Alhaj, F., Al-Haj, A., Sharieh, A., and Jabri, R., Improving Arabic cognitive distortion classification in Twitter using BERTopic, Int. J. Adv. Comput. Sci. Appl., 2022, vol. 13, no. 1, pp. 854–860. https://doi.org/10.14569/IJACSA.2022.0130199

    Article  Google Scholar 

  54. Gerasimenko, N., Chernyavskiy, A., Nikiforova, M., Ianina, A., and Vorontsov, K., Incremental topic modeling for scientific trend topics extraction, Komp’yuternaya lingvistika i intellektual’nye tekhnologii: Po materialam ezhegodnoi mezhdunarodnoi konferentsii Dialog-2023 (Computational Linguistics and Intellectual Technologies: Proc. Int. Conf. Dialogue 2023), Moscow, 2023, Moscow: 2023, pp. 88–103. https://www. dialog-21.ru/media/5893/gerasimenkonplusetal012.pdf.

  55. Udupa, A., Adarsh, K.N., Aravinda, A., Godihal, N.H., and Kayarvizhy, N., An exploratory analysis of GSDMM and BERTopic on short text topic modelling, Fourth Int. Conf. on Cognitive Computing and Information Processing (CCIP-2022), Bengaluru, India, 2022, IEEE, 2022, pp. 1–9. https://doi.org/10.1109/CCIP57447.2022.10058687

  56. Sheremet’eva, S.O. and Babina, O.I., A platform for knowledge assisted conceptual annotation of multilingual texts, Vestn. Yuzhno-Ural. Gos. Univ. Ser.: Lingvistika, 2020, vol. 17, no. 4, pp. 53–60. https://doi.org/10.14529/ling200409

    Article  Google Scholar 

  57. Hu, M. and Liu, B., Mining opinion features in customer reviews, Proc. 19th Natl. Conf. on Artificial Intelligence, San Jose, Calif., 2004, Cohn, A.G., Ed., AAAI Press, 2004, pp. 755–760.

  58. Yi, J., Nasukawa, T., Bunescu, R., and Niblack, W., Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques, Proc. IEEE Int. Conf. on Data Mining (ICDM), Melbourne, Fla., IEEE, 2003, pp. 427–434. https://doi.org/10.1109/ICDM.2003.1250949

  59. Sheremetyeva, S.O., Extraction of multicomponent terms and keywords from multilingual patent documentation, Nauchn.-Tekhn. Inform., Ser. 2. Protsessy Sist., 2019, no. 4, pp. 25–33.

  60. Korobov, M., Morphological analyzer and generator for Russian and Ukrainian languages, Analysis of Images, Social Networks and Texts, Khachay, M., Konstantinova, N., Panchenko, A., Ignatov, D., and Labunets, V., Eds., Communications in Computer and Information Science, vol. 542, Cham: Springer, 2015, pp. 320–332. https://doi.org/10.1007/978-3-319-26123-2_31

  61. Sánchez-Franco, M.J. and Rey-Moreno, M., Do travelers’ reviews depend on the destination? An analysis in coastal and urban peer-to-peer lodgings, Psychol. Marketing, 2022, vol. 39, no. 2, pp. 441–459. https://doi.org/10.1002/mar.21608

    Article  Google Scholar 

Download references

Funding

This work was supported by ongoing institutional funding. No additional grants to carry out or direct this particular research were obtained.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to O. I. Babina.

Ethics declarations

The author of this work declares that she has no conflicts of interest.

Additional information

Publisher’s Note.

Allerton Press remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Babina, O.I. Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus. Autom. Doc. Math. Linguist. 58, 63–79 (2024). https://doi.org/10.3103/S0005105524010060

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0005105524010060

Keywords:

Navigation