Abstract
Social media platforms and micro-blogging websites have grown in popularity in recent years. These platforms are used to express persons’ thoughts and feelings regarding items, people, and events. This massive amount of textual data must be exploited. Sentiment analysis is one of the tools used to take advantage of this text data, in which we classify text into different classes such as positive, negative, neutral, or a number of star classes. It has been investigated by many researchers in several languages. Deep Learning approaches such as CNN, RNN, and LSTM applied on balanced datasets have given efficient results compared to classical machine learning approaches such as SVM, NB, and LR. Furthermore, the apparition of BERT has revolutionized the text classification field, even in sentiment analysis tasks. The main problem that the datasets which have been collected from social media platforms, certain classes dominate others, meaning that the datasets are imbalanced. As a result, classifiers lose efficiency. This paper addresses this issue by introducing an ensemble of mathematical balancing techniques to increase the efficiency of sentiment analysis models based on BERT scheme. The obtained results are significant, indicating that our two main metrics, AVG-Recall and F1-PN, are 17% and 19% higher, respectively, when compared to the classifiers’ results applied to the imbalanced dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yu, B., Deng, C., Bu, L.: Policy text classification algorithm based on BERT. In: 2022 11th International Conference of Information and Communication Technology (ICTech), pp. 488–491 (2022). https://doi.org/10.1109/ICTech55460.2022.00103
Yang, J., Yang, J.: Aspect based sentiment analysis with self-attention and gated convolutional networks. In: 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), pp. 146–149 (2020). https://doi.org/10.1109/ICSESS49938.2020.9237640
Ertam, F.: Deep learning based text classification with Web Scraping methods. In: International Conference on Artificial Intelligence and Data Processing (IDAP), pp. 1–4 (2018). https://doi.org/10.1109/IDAP.2018.8620790
Alsukhni, B.: Multi-label Arabic text classification based on deep learning. In: 2021 12th International Conference on Information and Communication Systems (ICICS), pp. 475–477 (2021). https://doi.org/10.1109/ICICS52457.2021.9464538
Salur, M.U., Aydin, İ.: The impact of preprocessing on classification performance in convolutional neural networks for Turkish text. In: International Conference on Artificial Intelligence and Data Processing (IDAP), pp. 1–4 (2018). https://doi.org/10.1109/IDAP.2018.8620722
Zhang, H., Li, Z., Shahriar, H., Tao, L., Bhattacharya, P., Qian, Y.: Improving prediction accuracy for logistic regression on imbalanced datasets. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), pp. 918–919 (2019). https://doi.org/10.1109/COMPSAC.2019.00140
Hanskunatai, A.: A new hybrid sampling approach for classification of imbalanced datasets. In: 2018 3rd International Conference on Computer and Communication Systems (ICCCS), pp. 67–71 (2018). https://doi.org/10.1109/CCOMS.2018.8463228
Hanif, A., Azhar, N.: Resolving class imbalance and feature selection in customer churn dataset. In: International Conference on Frontiers of Information Technology (FIT), pp. 82–86 (2017). https://doi.org/10.1109/FIT.2017.00022
Raj, R.J.R., Das, P., Sahu, P.: Emotion classification on Twitter data using word embedding and lexicon based approach. In: 2020 IEEE 9th International Conference on Communication Systems and Network Technologies (CSNT), pp. 150–154 (2020). https://doi.org/10.1109/CSNT48778.2020.9115750
Agarwal, A., Sharma, V., Sikka, G., Dhir, R.: Opinion mining of news headlines using SentiWordNet. In: Symposium on Colossal Data Analysis and Networking (CDAN), pp. 1–5 (2016). https://doi.org/10.1109/CDAN.2016.7570949
Rabab’ah, A.M., Al-Ayyoub, M., Jararweh, Y., Al-Kabi, M.N.: Evaluating SentiStrength for Arabic sentiment analysis. In: 2016 7th International Conference on Computer Science and Information Technology (CSIT), pp. 1–6 (2016). https://doi.org/10.1109/CSIT.2016.7549458
Zheng, Y.: An exploration on text classification with classical machine learning algorithm. In: 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), pp. 81–85 (2019). https://doi.org/10.1109/MLBDBI48998.2019.00023
Venkatesh, Ranjitha, K.V.: Classification and optimization scheme for text data using machine learning Naïve Bayes classifier. In: 2018 IEEE World Symposium on Communication Engineering (WSCE), pp. 33–36 (2018). https://doi.org/10.1109/WSCE.2018.8690536
Pathuri, S.K., Anbazhagan, N., Prakash, G.B.: Feature based sentimental analysis for prediction of mobile reviews using hybrid bag-boost algorithm. In: 2020 7th International Conference on Smart Structures and Systems (ICSSS), pp. 1–5 (2020). https://doi.org/10.1109/ICSSS49621.2020.9201990
Dhahi, S.H., Waleed, J.: Emotions polarity of tweets based on semantic similarity and user behavior features. In: 2020 1st Information Technology to Enhance e-Learning and Other Application (IT-ELA), pp. 1–6 (2020). https://doi.org/10.1109/IT-ELA50150.2020.9253088
Putra, B.P., Irawan, B., Setianingsih, C., Rahmadani, A., Imanda, F., Fawwas, I.Z.: Hate speech detection using convolutional neural network algorithm based on image. In: 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE), pp. 207–212 (2022). https://doi.org/10.1109/ISMODE53584.2022.9742810
Amrutha, B.R., Bindu, K.R.: Detecting hate speech in tweets using different deep neural network architectures. In: International Conference on Intelligent Computing and Control Systems (ICCS), pp. 923–926 (2019). https://doi.org/10.1109/ICCS45141.2019.9065763
Zhou, K., Long, F.: Sentiment analysis of text based on CNN and bi-directional LSTM model. In: 2018 24th International Conference on Automation and Computing (ICAC), pp. 1–5 (2018). https://doi.org/10.23919/IConAC.2018.8749069
Santos, M.S., Soares, J.P., Abreu, P.H., Araujo, H., Santos, J.: Cross-validation for imbalanced datasets: avoiding overoptimistic and overfitting approaches [research frontier]. IEEE Comput. Intell. Mag. 13(4), 59–76 (2018). https://doi.org/10.1109/MCI.2018.2866730
Mohammadi, S., Chapon, M.: Investigating the performance of fine-tuned text classification models based-on BERT. In: 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1252–1257 (2020). https://doi.org/10.1109/HPCC-SmartCity-DSS50907.2020.00162
Weijie, D., Yunyi, L., Jing, Z., Xuchen, S.: Long text classification based on BERT. In: 2021 IEEE 5th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pp. 1147–1151 (2021). https://doi.org/10.1109/ITNEC52019.2021.9587007
Shao, Y., Taylor, S., Marshall, N., Morioka, C., Zeng-Treitler, Q.: Clinical text classification with word embedding features vs. bag-of-words features. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2874–2878 (2018). https://doi.org/10.1109/BigData.2018.8622345
Alessa, A., Faezipour, M., Alhassan, Z.: Text classification of flu-related tweets using FastText with sentiment and keyword features. In: IEEE International Conference on Healthcare Informatics (ICHI), pp. 366–367 (2018). https://doi.org/10.1109/ICHI.2018.00058
Shrivastava, P., Sharma, D.K.: Fake content identification using pre-trained glove-embedding. In: 2021 5th International Conference on Information Systems and Computer Networks (ISCON), pp. 1–6 (2021). https://doi.org/10.1109/ISCON52037.2021.9702379
Yue, W., Li, L.: Sentiment analysis using Word2vec-CNN-BiLSTM classification. In: 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 1–5 (2020). https://doi.org/10.1109/SNAMS52053.2020.9336549
Liu, C., et al.: Constrained oversampling: an oversampling approach to reduce noise generation in imbalanced datasets with class overlapping. IEEE Access 10, 91452–91465 (2020). https://doi.org/10.1109/ACCESS.2020.3018911
Srinilta, C., Kanharattanachai, S.: Application of natural neighbor-based algorithm on oversampling SMOTE algorithms. In: 2021 7th International Conference on Engineering, Applied Sciences and Technology (ICEAST), pp. 217–220 (2021). https://doi.org/10.1109/ICEAST52143.2021.9426310
Cahyana, N., Khomsah, S., Aribowo, A.S.: Improving imbalanced dataset classification using oversampling and gradient boosting. In: 2019 5th International Conference on Science in Information Technology (ICSITech), pp. 217–222 (2019). https://doi.org/10.1109/ICSITech46713.2019.8987499
Veni, C.V.K., Rani, T.S.: Quartiles based undersampling (QUS): a simple and novel method to increase the classification rate of positives in imbalanced datasets. In: Ninth International Conference on Advances in Pattern Recognition (ICAPR), pp. 1–6 (2017). https://doi.org/10.1109/ICAPR.2017.8593202
Luqyana, W.A., Ahmadie, B.L., Supianto, A.A.: K-nearest neighbors undersampling as balancing data for cyber troll detection. In: International Conference on Sustainable Information Engineering and Technology (SIET), pp. 322–325 (2019). https://doi.org/10.1109/SIET48054.2019.8986079
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mahmoudi, L., Salem, M. (2023). Improving Multi-class Text Classification Using Balancing Techniques. In: Salem, M., Merelo, J.J., Siarry, P., Bachir Bouiadjra, R., Debakla, M., Debbat, F. (eds) Artificial Intelligence: Theories and Applications. ICAITA 2022. Communications in Computer and Information Science, vol 1769. Springer, Cham. https://doi.org/10.1007/978-3-031-28540-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-28540-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28539-4
Online ISBN: 978-3-031-28540-0
eBook Packages: Computer ScienceComputer Science (R0)