Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus

Babina, O. I.

doi:10.3103/S0005105524010060

Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus

NATURAL LANGUAGE PROCESSING
Published: 02 April 2024

Volume 58, pages 63–79, (2024)
Cite this article

Automatic Documentation and Mathematical Linguistics Aims and scope

O. I. Babina ORCID: orcid.org/0000-0002-1733-6075¹

34 Accesses
Explore all metrics

Abstract

The paper introduces a methodology for extracting opinion aspects from textual content by identifying the customer-evaluated parameters regarding a given object. These parameters form the foundation for shaping the customer’s attitudes toward the product or service. The proposed approach leverages topic modeling tools to delineate classes of vocabulary exhibiting semantics aligned with the parameters influencing the customer’s opinion about the object. Our study specifically explores the application of the BERTopic model as a topic modeling tool to address this challenge. The outlined methodology encompasses several sequential steps, including the preprocessing of textual data involving the removal of stopwords, conversion to lowercase characters, and lemmatization. Additionally, special consideration is given to the distinct lexical manifestations of opinion aspects, obtained as a result of the extraction of nominal, verbal, and adjectival single- and multicomponent phrases from the corpus. Subsequently, the corpus sentences are represented as vectors in a feature space expressed by the extracted words and phrases. The final step involves the application of topic modeling using the BERTopic model on the customer review corpus, utilizing the vector representations of corpus sentences. The experimental inquiry is conducted on a domain-specific Russian-language corpus comprising customer feedback on airline services gathered from customer review websites. The resultant topic distribution is then juxtaposed against a manually constructed conceptual model of the domain. The comparative analysis reveals that the automatic topic distribution aligns with the conceptual structure of the domain, demonstrating a precision of 0.955 and a recall of 0.875. These findings affirm the efficacy of employing the BERTopic model to address the problem of the corpus-based mining of opinion aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REFERENCES

Bollen, J., Mao, H., and Zeng, X., Twitter mood predicts the stock market, J. Comput. Sci., 2011, vol. 2, no. 1, pp. 1–8. https://doi.org/10.1016/j.jocs.2010.12.007
Article Google Scholar
Molina-González, M.D., Martínez-Cámara, E., Martín-Valdivia, M.-T., and Perea-Ortega, J.M., Semantic orientation for polarity classification in Spanish reviews, Expert Syst. Appl., 2013, vol. 40, pp. 7250–7257. https://doi.org/10.1016/j.eswa.2013.06.076
Article Google Scholar
Kiritchenko, S., Zhu, X., and Mohammad, S., Sentiment analysis of short informal texts, J. Artif. Intell. Res., 2014, vol. 50, pp. 723–762. https://doi.org/10.1613/jair.4272
Article Google Scholar
Altawaier, M.M. and Tiun, S., Comparison of Machine Learning Approaches on Arabic Twitter Sentiment Analysis, Int. J. Adv. Sci., Eng. Inf. Technol., 2016, vol. 6, no. 6, pp. 1067–1073. https://doi.org/10.18517/IJASEIT.6.6.1456
Article Google Scholar
Kolmogorova, A.V., Use of texts of the internet revelation genre in the context of solving the problems of sentiment-analysis, Vestn. Novosibirskogo Gos. Univ. Ser.: Lingvistika Mezhkul’turnaya Kommunikatsiya, 2019, no. 3, pp. 71–82. https://doi.org/10.25205/1818-7935-2019-17-3-71-82
Mohammad, S.M., Sentiment analysis: Automatically detecting valence, emotions, and other affectual states from text, Emotion Measurement, Meiselman, H.L., Ed., Woodhead Publishing, 2021, pp. 323–379. https://doi.org/10.1016/B978-0-12-821124-3.00011-9
Book Google Scholar
Semina, T.A., Sentiment analysis: Modern approaches and existing problems, Sotsial’nye Gumanitarnye Nauki. Otechestvennaya Zarubezhnaya Literatura. Ser. 6: Yazykoznanie. Referativnyi Zh., 2020, no. 4, pp. 47–63.
Fang, X. and Zhan, J., Sentiment analysis using product review data, J. Big Data, 2015, vol. 2, p. 5. https://doi.org/10.1186/s40537-015-0015-2
Article Google Scholar
Chitra, K., Tamilarasi, A., Dharani, S.G., Keerthana, P., and Madhumitha, T., Opinion mining and sentiment analysis on product reviews, 2022 Int. Conf. on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2022, IEEE, 2022, pp. 1–7. https://doi.org/10.1109/ICCCI54379.2022.9740777
Geetha, R., Rekha, P., and Karthika, S., Twitter opinion mining and boosting using sentiment analysis, Proc. 2018 Int. Conf. on Computer, Communication, and Signal Processing (ICCCSP), Chennai, India, 2018, IEEE, 2018, pp. 1–4. https://doi.org/10.1109/ICCCSP.2018.8452838
Liu, Y., Yu, X., Liu, B., and Chen, Z., Sentence-Level sentiment analysis in the presence of modalities, Computational Linguistics and Intelligent Text Processing, Gelbukh, A., Ed., Lecture Notes in Computer Science, vol. 8404, Berlin: Springer, 2014, pp. 1–16. https://doi.org/10.1007/978-3-642-54903-8_1
Book Google Scholar
Paniagua-Reyes, F., Reyes-Ortiz, J., and Bravo, M., Entity-based opinion mining from Spanish tweets, Proc. 6th Int. Conf. on Data Science, Technology and Applications, Madrid: SciTePress, 2017, pp. 400–407. https://doi.org/10.5220/0006484904000407
Lark, J., Morin, E., and Saldarriaga, S.P., A comparative study of target-based and entity-based opinion extraction, Computational Linguistics and Intelligent Text Processing. CICLing 2017, Gelbukh, A., Ed., Lecture Notes in Computer Science, vol. 10762, Cham: Springer, 2017, pp. 211–223. https://doi.org/10.1007/978-3-319-77116-8_16
Book Google Scholar
Xu, R., Lin, H., Liao, M., Han, X., Xu, J., Tan, W., Sun, Y., and Sun, L., ECO v1: Towards event-centric opinion mining, findings of the, Findings of the Association for Computational Linguistics: ACL 2022, Dublin, 2022, Muresan, S., Nakov, P., and Villvicencio, A., Eds., Association for Computational Linguistics, 2022, pp. 2743–2753. https://doi.org/10.18653/v1/2022.findings-acl.216
Salas-Zárate, M.P., Valencia-García, R., Ruiz-Martínez, A., and Colomo-Palacios, R., Feature-based opinion mining in financial news: an ontology-driven approach, J. Inf. Sci., 2017, vol. 43, pp. 458–479. https://doi.org/10.1177/0165551516645528
Article Google Scholar
Aboelela, E.M., Gad, W., and Isamail, R., The impact of semantics on aspect level opinion mining, PeerJ Comput. Sci., 2021, vol. 7, p. e558. https://doi.org/10.7717/peerj-cs.558
Article Google Scholar
Sanda, R., Abdurahman, Z.K., and Nhita, F., Opinion mining feature level using naïve bayes and feature extraction based analysis dependencies, AIP Conf. Proc., 2015, vol. 1692, no. 1, p. 20020. https://doi.org/10.1063/1.4936448
Article Google Scholar
Abbasi, A., Chen, H., and Salem, A., Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums, ACM Trans. Inf. Syst. (TOIS), 2008, vol. 26, no. 3, p. 12. https://doi.org/10.1145/1361684.1361685
Article Google Scholar
Arora, P., Bakliwal, A., and Varma, V., Hindi subjective lexicon generation using WordNet graph traversal, Int. J. Comput. Linguist. Appl., 2012, vol. 3, no. 1, pp. 25–39.
Google Scholar
Hutto, C. and Gilbert, E., VADER: A parsimonious rule-based model for sentiment analysis of social media text, Proc. Int. AAAI Conf. Web Soc. Media, 2014, vol. 8, no. 1, pp. 216–225. https://doi.org/10.1609/icwsm.v8i1.14550
Loukachevitch, N. and Levchik, A., Creating a general Russian sentiment lexicon, Proc. Tenth Int. Conf. on Language Resources and Evaluation (LREC’16), Portorož, Slovenia, 2016, Calzolari, N. et al., Eds., European Language Resources Association, 2016, pp. 1171–1176. https://aclanthology.org/L16-1186.
Koltsova, O., Alexeeva, S., and Kolcov, S., An opinion word lexicon and a training dataset for Russian sentiment analysis of social media, Komp’yuternaya lingvistika i intellektual’nye tekhnologii: po materialam ezhegodnoi mezhdunarodnoi konferentsii Dialog-2016 (Computational Linguistics and Intellectual Technologies: Proc. Int. Conf. Dialogue 2016), Moscow, 2016, Moscow: Izd-vo Ros. Gos. Gumanit. Univ., 2016, pp. 277–287.
Kan, D., Rule-based approach to sentiment analysis at ROMIP 2011: Contest on sentiment analysis at the International Conference Dialogue-2011, 2012. https:// www.dialog-21.ru/media/1393/138.pdf.
Tan, L.I., Phang, W.S., Chin, K.O., and Patricia, A., Rule-based sentiment analysis for financial news, IEEE Int. Conf. on Systems, Man, and Cybernetics, Hong Kong, 2015, IEEE, 2015, pp. 1601–1606. https://doi.org/10.1109/SMC.2015.283
Berka, P., Sentiment analysis using rule-based and case-based reasoning, J. Intell. Inf. Syst., 2020, vol. 55, pp. 51–66. https://doi.org/10.1007/s10844-019-00591-8
Article Google Scholar
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., and Stede, M., Lexicon-based methods for sentiment analysis, Comput. Linguist., 2011, vol. 37, no. 2, pp. 267–307. https://doi.org/10.1162/COLI_a_00049
Article Google Scholar
Agarwal, A., Xie, B., Vovsha, I., Rambow, O., and Passonneau, R., Sentiment analysis of twitter data, Proc. Workshop on Language in Social Media (LSM 2011), Portland, Ore., 2011, Nagarajan, M. and Gamon, M., Eds., Association for Computational Linguistics, 2011, pp. 30–38. https://aclanthology.org/W11-0705.
Turney, P.D., Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews, Proc. 40th Annu. Meeting on Association for Computational Linguistics, Philadelphia, 2002, Isabelle, P., Charniak, E., and Lin, D., Eds., Association for Computational Linguistics, 2002, pp. 417–424. https://doi.org/10.3115/1073083.1073153
Zhang, L. and Liu, B., Aspect and entity extraction for opinion mining, data mining and knowledge discovery for big data, Data Mining and Knowledge Discovery for Big Data. Studies in Big Data, Chu, W.W., Ed., Studies in Big Data, vol. 1, Berlin: Springer, 2014, pp. 1–40. https://doi.org/10.1007/978-3-642-40837-3_1
Roi, D.A. and Efremova, N.E., Methods for extracting aspectual terms from opinions, Nov. Inf. Tekhnol. Avtomatizirovannykh Sistemakh, 2018, no. 21, pp. 212–216.
Golubev, A. and Loukachevitch, N., Improving results on Russian sentiment datasets, Artificial Intelligence and Natural Language, Filchenkov, A., Kauttonen, J., and Pivovarova, L., Eds., Communications in Computer and Information Science, Cham: Springer, 2020, pp. 109–121. https://doi.org/10.1007/978-3-030-59082-6_8
Pathan, A.F. and Prakash, C., Cross-domain aspect detection and categorization using machine learning for aspect-based opinion mining, Int. J. Inf. Manage. Data Insights, 2022, vol. 2, no. 2, p. 100099. https://doi.org/10.1016/j.jjimei.2022.100099
Article Google Scholar
Rajapaksha, S. and Ranathunga, S., Aspect detection in sportswear apparel reviews for opinion mining, Moratuwa Engineering Research Conference (MERCon), Moratuwa, Sri Lanka, 2022, IEEE, 2022, pp. 1–6. https://doi.org/10.1109/MERCon55799.2022.9906265
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T., and Harshman, R., Indexing by latent semantic analysis, J. Am. Soc. Inf. Sci., 1990, vol. 41, no. 6, pp. 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6%3C391::AID-ASI1%3E3.0.CO;2-9
Article Google Scholar
Hofmann, T., Unsupervised learning by probabilistic latent semantic analysis, Mach. Learn., 2001, vol. 42, nos. 1–2, pp. 177–196. https://doi.org/10.1023/A:1007617005950
Article Google Scholar
Blei, D.M., Ng, A.Y., and Jordan, M.I., Latent Dirichlet allocation, J. Mach. Learn. Res., 2003, vol. 3, no. 2, pp. 993–1022.
Google Scholar
Wang, J. and Zhang, X.-L., Deep NMF topic modeling, Neurocomputing, 2023, vol. 515, pp. 157–173. https://doi.org/10.1016/j.neucom.2022.10.002
Article Google Scholar
Vendrow, J., Haddock, J., Rebrova, E., and Needell, D., On a guided nonnegative matrix factorization, ICASSP 2021-2021 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Toronto, 2021, pp. 3265–3269. https://doi.org/10.1109/ICASSP39728.2021.9413656
Chen, Yo., Zhang, H., Liu, R., Ye, Z., and Lin, J., Experimental explorations on short text topic mining between LDA and NMF based Schemes, Knowl.-Based Syst., 2019, vol. 163, pp. 1–13. https://doi.org/10.1016/j.knosys.2018.08.011
Article Google Scholar
Gallagher, R.J., Reing, K., Kale, D., and Ver Steeg, G., Anchored correlation explanation: Topic modeling with minimal domain knowledge, Trans. Assoc. Comput. Linguist., 2017, vol. 5, pp. 529–542. https://doi.org/10.1162/tacl_a_00078
Article Google Scholar
Watanabe, S., Information theoretical analysis of multivariate correlation, IBM J. Res. Dev., 1960, vol. 4, no. 1, pp. 66–82. https://doi.org/10.1147/rd.41.0066
Article MathSciNet Google Scholar
Moody, C.E., Mixing Dirichlet topic models and word embeddings to make lda2Vec, arXiv Preprint, 2016. https://doi.org/10.48550/arXiv.1605.02019
Angelov, D., Top2Vec: Distributed representations of topics, arXiv Preprint, 2020. https://doi.org/10.48550/arXiv.2008.09470
Dieng, A.B., Ruiz, F.J.R., and Blei, D.M., Topic modeling in embedding spaces, Trans. Assoc. Comput. Linguist., 2020, vol. 8, pp. 439–453. https://doi.org/10.1162/tacl_a_00325
Article Google Scholar
Grootendorst, M., BERTopic: Neural topic modeling with a class-based TF-IDF procedure, arXiv Preprint, 2022. https://doi.org/10.48550/arXiv.2203.05794
Albalawi, R., Yeap, T.H., and Benyoucef, M., Using topic modeling methods for short-text data: A comparative analysis, Front. Artif. Intell., 2020, vol. 3, p. 42. https://doi.org/10.3389/frai.2020.00042
Article Google Scholar
Egger, R. and Yu, J., A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify Twitter posts, Front. Sociology, 2022, vol. 7, p. 886498. https://doi.org/10.3389/fsoc.2022.886498
Article Google Scholar
Guo, Y., Barnes, S.J., and Jia, Q., Mining meaning from online ratings and reviews: tourist satisfaction analysis using latent dirichlet allocation, Tourism Manage., 2017, vol. 59, pp. 467–483. https://doi.org/10.1016/j.tourman.2016.09.009
Article Google Scholar
Reimers, N. and Gurevych, I., Sentence-BERT: Sentence embeddings using Siamese BERT-networks, Proc. 2019 Conf. on Empirical Methods in Natural Language Processing, Hong Kong, 2019, Inui, K., Jiang, J., Ng, V., and Wan, X., Eds., Association for Computational Linguistics, 2019, pp. 3982–3992. https://doi.org/10.18653/v1/D19-1410
Mitrofanova, O.A. and Atugodage, M.M., Dynamic topic modelling of the Russian legal text corpus, Terra Linguistica, 2023, vol. 14, no. 1, pp. 70–87. https://doi.org/10.18721/JHSS.14107
Article Google Scholar
Çetinkaya, Y.M., Külah, E., Hakki Toroslu, I., and Davulcu, H., Targeted marketing on social media: Utilizing text analysis to create personalized landing pages, Preprint at Res. Square, 2023. https://doi.org/10.21203/rs.3.rs-2728199/v1
Book Google Scholar
Sharifian-Attar, V., De, S., Jabbari, S., Li, J., Moss, H., and Johnson, J., Analysing longitudinal social science questionnaires: Topic modelling with BERT-based embeddings, 2022 IEEE Int. Conf. on Big Data (Big Data 2022), Osaka, Japan, 2022, IEEE, 2022, pp. 5558–5567. https://doi.org/10.1109/BigData55660.2022.10020678
Alhaj, F., Al-Haj, A., Sharieh, A., and Jabri, R., Improving Arabic cognitive distortion classification in Twitter using BERTopic, Int. J. Adv. Comput. Sci. Appl., 2022, vol. 13, no. 1, pp. 854–860. https://doi.org/10.14569/IJACSA.2022.0130199
Article Google Scholar
Gerasimenko, N., Chernyavskiy, A., Nikiforova, M., Ianina, A., and Vorontsov, K., Incremental topic modeling for scientific trend topics extraction, Komp’yuternaya lingvistika i intellektual’nye tekhnologii: Po materialam ezhegodnoi mezhdunarodnoi konferentsii Dialog-2023 (Computational Linguistics and Intellectual Technologies: Proc. Int. Conf. Dialogue 2023), Moscow, 2023, Moscow: 2023, pp. 88–103. https://www. dialog-21.ru/media/5893/gerasimenkonplusetal012.pdf.
Udupa, A., Adarsh, K.N., Aravinda, A., Godihal, N.H., and Kayarvizhy, N., An exploratory analysis of GSDMM and BERTopic on short text topic modelling, Fourth Int. Conf. on Cognitive Computing and Information Processing (CCIP-2022), Bengaluru, India, 2022, IEEE, 2022, pp. 1–9. https://doi.org/10.1109/CCIP57447.2022.10058687
Sheremet’eva, S.O. and Babina, O.I., A platform for knowledge assisted conceptual annotation of multilingual texts, Vestn. Yuzhno-Ural. Gos. Univ. Ser.: Lingvistika, 2020, vol. 17, no. 4, pp. 53–60. https://doi.org/10.14529/ling200409
Article Google Scholar
Hu, M. and Liu, B., Mining opinion features in customer reviews, Proc. 19th Natl. Conf. on Artificial Intelligence, San Jose, Calif., 2004, Cohn, A.G., Ed., AAAI Press, 2004, pp. 755–760.
Yi, J., Nasukawa, T., Bunescu, R., and Niblack, W., Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques, Proc. IEEE Int. Conf. on Data Mining (ICDM), Melbourne, Fla., IEEE, 2003, pp. 427–434. https://doi.org/10.1109/ICDM.2003.1250949
Sheremetyeva, S.O., Extraction of multicomponent terms and keywords from multilingual patent documentation, Nauchn.-Tekhn. Inform., Ser. 2. Protsessy Sist., 2019, no. 4, pp. 25–33.
Korobov, M., Morphological analyzer and generator for Russian and Ukrainian languages, Analysis of Images, Social Networks and Texts, Khachay, M., Konstantinova, N., Panchenko, A., Ignatov, D., and Labunets, V., Eds., Communications in Computer and Information Science, vol. 542, Cham: Springer, 2015, pp. 320–332. https://doi.org/10.1007/978-3-319-26123-2_31
Sánchez-Franco, M.J. and Rey-Moreno, M., Do travelers’ reviews depend on the destination? An analysis in coastal and urban peer-to-peer lodgings, Psychol. Marketing, 2022, vol. 39, no. 2, pp. 441–459. https://doi.org/10.1002/mar.21608
Article Google Scholar

Download references

Funding

This work was supported by ongoing institutional funding. No additional grants to carry out or direct this particular research were obtained.

Author information

Authors and Affiliations

South Ural State University (National Research University), Chelyabinsk, Russia
O. I. Babina

Authors

O. I. Babina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to O. I. Babina.

Ethics declarations

The author of this work declares that she has no conflicts of interest.

Additional information

Publisher’s Note.

Allerton Press remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Cite this article

Babina, O.I. Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus. Autom. Doc. Math. Linguist. 58, 63–79 (2024). https://doi.org/10.3103/S0005105524010060

Download citation

Received: 02 October 2023
Published: 02 April 2024
Issue Date: February 2024
DOI: https://doi.org/10.3103/S0005105524010060

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus

Abstract

Access this article

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher’s Note.

About this article

Cite this article

Share this article

Keywords:

Search

Navigation