Abstract
Dimensionality reduction is a well-known technique for limiting the size of the feature space and for discovering latent meaningful variables in the input data. It is particularly valuable when the raw data is sparse and its processing by machine learning algorithms becomes computationally very expensive. On the other hand, sentiment analysis refers to a collection of text classification methods that identify the polarity of the user opinions in blog posts, reviews, tweets, etc. However, since text is naturally very sparse, training classification models is often intractable, rendering the importance of dimensionality reduction even greater. In this paper we study the impact of dimensionality reduction in sentiment analysis classification tasks. Through extensive experimentation with traditional algorithms and benchmark datasets, we verify the general intuition that the dimensionality reduction methods significantly improve the data preprocessing times and the model training durations, while they sacrifice only small amounts of accuracy. Simultaneously, we highlight several exceptions to this rule, where the training times actually increase and the accuracy losses are significant.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akritidis, L., Bozanis, P.: A supervised machine learning classification algorithm for research articles. In: Proceedings of the 28th ACM Symposium on Applied Computing, pp. 115–120 (2013)
Akritidis, L., Bozanis, P.: Improving opinionated blog retrieval effectiveness with quality measures and temporal features. World Wide Web 17(4), 777–798 (2013). https://doi.org/10.1007/s11280-013-0237-1
Akritidis, L., Fevgas, A., Bozanis, P.: Effective products categorization with importance scores and morphological analysis of the titles. In: Proceedings of the 30th IEEE International Conference on Tools with Artificial Intelligence, pp. 213–220 (2018)
Boldrini, E., Balahur, A., MartÃnez-Barco, P., Montoyo, A.: Using EmotiBlog to annotate and analyse subjectivity in the new textual genres. Data Mining Knowl. Discov. 25(3), 603–634 (2012)
Jelodar, H., Wang, Y., Orji, R., Huang, S.: Deep sentiment classification and topic discovery on novel coronavirus or COVID-19 online discussions: NLP using LSTM Recurrent Neural Network approach. IEEE J. Biomed. Health Inf. 24(10), 2733–2742 (2020)
Kaya, T., Bicen, H.: The effects of social media on students’ behaviors; Facebook as a case study. Comput. Human Behav. 59, 374–379 (2016)
Kim, K.: An improved semi-supervised dimensionality reduction using feature weighting: application to sentiment analysis. Exp. Syst. Appl. 109, 49–65 (2018)
Kim, K., Lee, J.: Sentiment visualization and classification via semi-supervised nonlinear dimensionality reduction. Pattern Recogn. 47(2), 758–768 (2014)
Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: Proceedings of the 2015 International Conference on Machine Learning, pp. 957–966 (2015)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence, pp. 2267–2273 (2015)
Lane, P.C., Clarke, D., Hender, P.: On developing robust models for favourability analysis: model choice, feature sets and imbalanced data. Decis. Supp. Syst. 53(4), 712–718 (2012)
Ma, Y., Peng, H., Khan, T., Cambria, E., Hussain, A.: Sentic LSTM: a hybrid network for targeted aspect-based sentiment analysis. Cognit. Comput. 10(4), 639–650 (2018)
Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5(4), 1093–1113 (2014)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Moraes, R., Valiati, J.F., Neto, W.P.G.: Document-level sentiment classification: an empirical comparison between SVM and ANN. Exp. Syst. Appl. 40(2), 621–633 (2013)
Mukherjee, S., Bhattacharyya, P.: Feature specific sentiment analysis for product reviews. In: Proceedings of the 13th International Conference on Intelligent Text Processing and Computational Linguistics, pp. 475–487 (2012)
Naseem, U., Razzak, I., Musial, K., Imran, M.: Transformer based deep intelligent contextual embedding for twitter sentiment analysis. Future Gen. Comput. Syst. 113, 58–69 (2020)
Ortigosa, A., MartÃn, J.M., Carro, R.M.: Sentiment analysis in Facebook and its application to e-learning. Comput. Human Behav. 31, 527–541 (2014)
Ouyang, X., Zhou, P., Li, C.H., Liu, L.: Sentiment analysis using convolutional neural network. In: Proceedings of the 2015 IEEE International Conference on Computer and Information Technology, pp. 2359–2364 (2015)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Shyamasundar, L., Rani, P.J.: Twitter sentiment analysis with different feature extractors and dimensionality reduction using supervised learning algorithms. In: Proceedings of the 2016 IEEE Annual India Conference, pp. 1–6 (2016)
Stieglitz, S., Dang-Xuan, L.: Emotions and information diffusion in social media-sentiment of microblogs and sharing behavior. J. Manag. Inf. Syst. 29(4), 217–248 (2013)
Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment in Twitter events. J. Am. Soc. Inf. Sci. Technol. 62(2), 406–418 (2011)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Wang, Y., Huang, M., Zhu, X., Zhao, L.: Attention-based lSTM for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 606–615 (2016)
Zhang, T., Xu, B., Thung, F., Haryono, S.A., Lo, D., Jiang, L.: Sentiment analysis for software engineering: how far can pre-trained transformer models go? In: Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution, pp. 70–80 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Akritidis, L., Bozanis, P. (2022). How Dimensionality Reduction Affects Sentiment Analysis NLP Tasks: An Experimental Study. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Cortez, P. (eds) Artificial Intelligence Applications and Innovations. AIAI 2022. IFIP Advances in Information and Communication Technology, vol 647. Springer, Cham. https://doi.org/10.1007/978-3-031-08337-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-08337-2_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08336-5
Online ISBN: 978-3-031-08337-2
eBook Packages: Computer ScienceComputer Science (R0)