Low-Dimensional Text Representations for Sentiment Analysis NLP Tasks

Akritidis, Leonidas; Bozanis, Panayiotis

doi:10.1007/s42979-023-01913-y

Low-Dimensional Text Representations for Sentiment Analysis NLP Tasks

Original Research
Published: 23 June 2023

Volume 4, article number 474, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

150 Accesses
1 Citation
Explore all metrics

Abstract

Natural Language Processing (NLP) is presently among the hottest scientific fields with an enormous growth rate of the relevant research. Sentiment analysis is a popular NLP problem that aims at the automatic identification of the polarity in user reviews, tweets, blog posts, comments, forum discussions and so on. Unfortunately, the natural sparseness of text, along with its intimate high dimensionality renders the direct application of machine/deep learning models problematic. For this reason, the relevant literature contains a wealth of state-of-the-art dimensionality reduction methods that confront these issues. In this paper, we conduct an experimental study on the effects of dimensionality reduction in the area of sentiment classification. More specifically, we consider multiple feature selection and feature extraction techniques and we investigate their impact on the effectiveness and the efficiency of seven state-of-the-art classifiers. The experimental evaluation includes accuracy and execution time measurements on four benchmark datasets with various degrees of reduction aggressiveness. The results indicate that, in most cases, dimensionality reduction has indeed a beneficial impact on the running times, whereas the accuracy sacrifices are usually small. However, we also indicate several exceptions where this observation is not valid. These exceptions are appropriately highlighted and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How Dimensionality Reduction Affects Sentiment Analysis NLP Tasks: An Experimental Study

A Survey on Sentiment Analysis

A survey on sentiment analysis and its applications

Article 17 August 2023

Data availability

The IMDb dataset is publicly available here: https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews. The Amazon Reviews dataset is publicly available here: https://jmcauley.ucsd.edu/data/amazon/. The Twitter US Airline dataset is publicly available here: https://www.kaggle.com/crowdflower/twitter-airline-sentiment. The Financial Tweets Sentiment dataset is publicly available here: https://www.kaggle.com/vivekrathi055/sentiment-analysis-on-financial-tweets. The code that we developed to conduct the experiments is publicly available here: https://github.com/lakritidis/SentimentAnalysis.

Notes

References

Boldrini E, Balahur A, Martínez-Barco P, Montoyo A. Using EmotiBlog to annotate and analyse subjectivity in the new textual genres. Data Min Knowl Discov. 2012;25(3):603–34.
Article Google Scholar
Akritidis L, Bozanis P. Improving opinionated blog retrieval effectiveness with quality measures and temporal features. World Wide Web. 2014;17(4):777–98.
Article Google Scholar
Thelwall M, Buckley K, Paltoglou G. Sentiment in Twitter events. J Am Soc Inf Sci Technol. 2011;62(2):406–18.
Article Google Scholar
Stieglitz S, Dang-Xuan L. Emotions and information diffusion in social media-sentiment of microblogs and sharing behavior. J Manag Inf Syst. 2013;29(4):217–48.
Article Google Scholar
Ortigosa A, Martín JM, Carro RM. Sentiment analysis in Facebook and its application to e-learning. Comput Hum Behav. 2014;31:527–41.
Article Google Scholar
Kaya T, Bicen H. The effects of social media on students’ behaviors; Facebook as a case study. Comput Hum Behav. 2016;59:374–9.
Article Google Scholar
Mukherjee S, Bhattacharyya P. Feature specific sentiment analysis for product reviews. In: Proceedings of the 13th international conference on intelligent text processing and computational linguistics, 2012;475–487.
Medhat W, Hassan A, Korashy H. Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J. 2014;5(4):1093–113.
Article Google Scholar
Lai S, Xu L, Liu K, Zhao J. Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th AAAI conference on artificial intelligence, 2015;2267–2273.
Soong H-C, Jalil NBA, Ayyasamy RK, Akbar R. The essential of sentiment analysis and opinion mining in social media: Introduction and survey of the recent approaches and techniques. In: Proceedings of the 9th IEEE symposium on computer applications & industrial electronics, 2019;272–277.
Ouyang X, Zhou P, Li CH, Liu L. Sentiment analysis using convolutional neural network. In: Proceedings of the 18th IEEE international conference on computer and information technology, 2015;2359–2364.
Ma Y, Peng H, Khan T, Cambria E, Hussain A. Sentic LSTM: a hybrid network for targeted aspect-based sentiment analysis. Cogn Comput. 2018;10(4):639–50.
Article Google Scholar
Jelodar H, Wang Y, Orji R, Huang S. Deep sentiment classification and topic discovery on novel coronavirus or COVID-19 online discussions: NLP using LSTM recurrent neural network approach. IEEE J Biomed Health Inform. 2020;24(10):2733–42.
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Advances in neural information processing systems 2017;30.
Kingma DP, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 2013.
Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, 2014;1532–1543.
Kusner M, Sun Y, Kolkin N, Weinberger K. From word embeddings to document distances. In: Proceedings of the 32nd international conference on machine learning, 2015;957–966.
Akritidis L, Bozanis P. How dimensionality reduction affects sentiment analysis NLP tasks: an experimental study. In: Proceedings of the IFIP international conference on artificial intelligence applications and innovations, 2022;301–312.
Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep BiDirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 2018.
Abualigah LM, Khader AT, Al-Betar MA, Alomari OA. Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl. 2017;84:24–36.
Article Google Scholar
Akritidis L, Alamaniotis M, Fevgas A, Bozanis P. Confronting sparseness and high dimensionality in short text clustering via feature vector projections. In: Proceedings of the 32nd IEEE international conference on tools with artificial intelligence, 2020;813–820.
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H. Feature selection: a data perspective. ACM Comput Surv. 2017;50(6):1–45.
Article Google Scholar
Jović A, Brkić K, Bogunović N. A review of feature selection methods with applications. In: Proceedings of the 38th international convention on information and communication technology, electronics and microelectronics, 2015;1200–1205.
Humeau-Heurtier A. Texture feature extraction methods: a survey. IEEE Access. 2019;7:8975–9000.
Article Google Scholar
Mutlag WK, Ali SK, Aydam ZM, Taher BH. Feature extraction methods: a review. J Phys: Conf Ser. 2020;1591: 012028.
Google Scholar
Wang Y-X, Zhang Y-J. Non-negative matrix factorization: a comprehensive review. IEEE Trans Knowl Data Eng. 2012;25(6):1336–53.
Article Google Scholar
Shyamasundar L, Rani PJ. Twitter sentiment analysis with different feature extractors and dimensionality reduction using supervised learning algorithms. In: Proceedings of the 2016 IEEE Annual India conference, 2016;1–6.
Kingma DP, Welling M, et al. An introduction to variational autoencoders. Found Trends Mach Learn. 2019;12(4):307–92.
Article MATH Google Scholar
Liao S, Wang J, Yu R, Sato K, Cheng Z. CNN for situations understanding based on sentiment analysis of Twitter data. Procedia Comput Sci. 2017;111:376–81.
Article Google Scholar
Wang Y, Huang M, Zhu X, Zhao L. Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing, 2016;606–615.
Behera RK, Jena M, Rath SK, Misra S. Co-LSTM: convolutional LSTM model for sentiment analysis in social big data. Inf Process Manag. 2021;58(1): 102435.
Article Google Scholar
Rhanoui M, Mikram M, Yousfi S, Barzali S. A CNN-BiLSTM model for document-level sentiment analysis. Mach Learn Knowl Extr. 2019;1(3):832–47.
Article Google Scholar
Zhang T, Gong X, Chen CP. BMT-Net: broad multitask transformer network for sentiment analysis. IEEE Trans Cybern. 2021;2(57):6232–43.
Google Scholar
Zhang T, Xu B, Thung F, Haryono SA, Lo D, Jiang L. Sentiment analysis for software engineering: how far can pre-trained transformer models go? In: Proceedings of the 2020 IEEE international conference on software maintenance and evolution, 2020;70–80.
Zhang L, Wang S, Liu B. Deep learning for sentiment analysis: a survey. Wiley Interdiscip Rev: Data Min Knowl Discov. 2018;8(4):1253.
Google Scholar
Hussein DME-DM. A survey on sentiment analysis challenges. J King Saud Univ Eng Sci. 2018;30(4):330–8.
Google Scholar
Venkatesh B, Anuradha J. A review of feature selection and its methods. Cybern Inf Technol. 2019;19(1):3–26.
MathSciNet Google Scholar
Kim K, Lee J. Sentiment visualization and classification via semi-supervised nonlinear dimensionality reduction. Pattern Recognit. 2014;47(2):758–68.
Article Google Scholar
Kim K. An improved semi-supervised dimensionality reduction using feature weighting: application to sentiment analysis. Expert Syst Appl. 2018;109:49–65.
Article Google Scholar
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 2013.
Grabczewski K, Jankowski N. Feature selection with decision tree criterion. In: Proceedings of the 5th international conference on hybrid intelligent systems, 2005;1–6.
Chen X-w, Jeong JC. Enhanced recursive feature elimination. In: Proceedings of the 6th international conference on machine learning and applications, 2007;429–435.
El Aboudi N, Benhlima L. Review on wrapper feature selection approaches. In: Proceedings of the 2016 international conference on engineering & MIS, 2016;1–5.
Ververidis D, Kotropoulos C. Sequential forward feature selection with low computational cost. In: Proceedings of the 13th European signal processing conference, 2005;1–4.
Nguyen HB, Xue B, Liu I, Zhang M. Filter based backward elimination in wrapper based pso for feature selection in classification. In: Proceedings of the 2014 IEEE congress on evolutionary computation, 2014;3111–3118.
Hansen PC. The TruncatedSVD as a method for regularization. BIT Numer Math. 1987;27(4):534–53.
Article Google Scholar
Tschannen M, Bachem O, Lucic M. Recent advances in autoencoder-based representation learning. arXiv preprint arXiv:1812.05069 (2018)
Meng Q, Catchpoole D, Skillicom D, Kennedy PJ. Relational autoencoder for feature extraction. In: Proceedings of the 2017 international joint conference on neural networks, 2017;364–371.
Han K, Xiao A, Wu E, Guo J, Xu C, Wang Y. Transformer in transformer. Adv Neural Inf Process Syst. 2021;34:15908–19.
Google Scholar
Turc I, Chang M-W, Lee K, Toutanova K. Well-read students learn better: on the importance of pre-training compact models. arXiv preprint arXiv:1908.08962v2 2019.
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 2019.
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R. ALBERT: A Lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 2019.
Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 2019.
Liu D, Nocedal J. On the limited memory method for large scale optimization. Math Program B. 1989;45(3):503–28.
Article MathSciNet MATH Google Scholar
Nair V, Hinton GE. Rectified Linear Units improve restricted Boltzmann Machines. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel; 2010. p. 807–814.
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.
MathSciNet MATH Google Scholar
Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th international conference on artificial intelligence and statistics, 2010;249–256.

Download references

Acknowledgements

This research is co-financed by Greece and the European Union (European Social Fund-SF) through the Operational Programme (Human Resources Development, Education and Lifelong Learning 2014-2020) in the context of the project “Support for International Actions of the International Hellenic University”, (MIS 5154651).

Author information

Leonidas Akritidis and Panayiotis Bozanis have contributed equally to this work.

Authors and Affiliations

School of Science and Technology, International Hellenic University, 14th km Thessaloniki - Nea Moudania, Thermi, 570 01, Thessaloniki, Greece
Leonidas Akritidis & Panayiotis Bozanis

Authors

Leonidas Akritidis
View author publications
You can also search for this author in PubMed Google Scholar
Panayiotis Bozanis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonidas Akritidis.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Machine Learning Modeling Techniques and Applications” guest edited by Lazaros Iliadis, Elias Pimenidis and Chrisina Jayne.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Akritidis, L., Bozanis, P. Low-Dimensional Text Representations for Sentiment Analysis NLP Tasks. SN COMPUT. SCI. 4, 474 (2023). https://doi.org/10.1007/s42979-023-01913-y

Download citation

Received: 09 February 2023
Accepted: 14 May 2023
Published: 23 June 2023
DOI: https://doi.org/10.1007/s42979-023-01913-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low-Dimensional Text Representations for Sentiment Analysis NLP Tasks

Abstract

Access this article

Similar content being viewed by others

How Dimensionality Reduction Affects Sentiment Analysis NLP Tasks: An Experimental Study

A Survey on Sentiment Analysis

A survey on sentiment analysis and its applications

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Low-Dimensional Text Representations for Sentiment Analysis NLP Tasks

Abstract

Access this article

Similar content being viewed by others

How Dimensionality Reduction Affects Sentiment Analysis NLP Tasks: An Experimental Study

A Survey on Sentiment Analysis

A survey on sentiment analysis and its applications

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation