Improving Multi-class Text Classification Using Balancing Techniques

Mahmoudi, Laouni; Salem, Mohammed

doi:10.1007/978-3-031-28540-0_21

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1769))

Included in the following conference series:

International Conference on Artificial Intelligence: Theories and Applications

350 Accesses

Abstract

Social media platforms and micro-blogging websites have grown in popularity in recent years. These platforms are used to express persons’ thoughts and feelings regarding items, people, and events. This massive amount of textual data must be exploited. Sentiment analysis is one of the tools used to take advantage of this text data, in which we classify text into different classes such as positive, negative, neutral, or a number of star classes. It has been investigated by many researchers in several languages. Deep Learning approaches such as CNN, RNN, and LSTM applied on balanced datasets have given efficient results compared to classical machine learning approaches such as SVM, NB, and LR. Furthermore, the apparition of BERT has revolutionized the text classification field, even in sentiment analysis tasks. The main problem that the datasets which have been collected from social media platforms, certain classes dominate others, meaning that the datasets are imbalanced. As a result, classifiers lose efficiency. This paper addresses this issue by introducing an ensemble of mathematical balancing techniques to increase the efficiency of sentiment analysis models based on BERT scheme. The obtained results are significant, indicating that our two main metrics, AVG-Recall and F1-PN, are 17% and 19% higher, respectively, when compared to the classifiers’ results applied to the imbalanced dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

KSCB: a novel unsupervised method for text sentiment analysis

Article 15 April 2022

Bidirectional LSTM-Based Sentiment Analysis of Context-Sensitive Lexicon for Imbalanced Text

Context-sensitive lexicon for imbalanced text sentiment classification using bidirectional LSTM

Article 10 November 2021

References

Yu, B., Deng, C., Bu, L.: Policy text classification algorithm based on BERT. In: 2022 11th International Conference of Information and Communication Technology (ICTech), pp. 488–491 (2022). https://doi.org/10.1109/ICTech55460.2022.00103
Yang, J., Yang, J.: Aspect based sentiment analysis with self-attention and gated convolutional networks. In: 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), pp. 146–149 (2020). https://doi.org/10.1109/ICSESS49938.2020.9237640
Ertam, F.: Deep learning based text classification with Web Scraping methods. In: International Conference on Artificial Intelligence and Data Processing (IDAP), pp. 1–4 (2018). https://doi.org/10.1109/IDAP.2018.8620790
Alsukhni, B.: Multi-label Arabic text classification based on deep learning. In: 2021 12th International Conference on Information and Communication Systems (ICICS), pp. 475–477 (2021). https://doi.org/10.1109/ICICS52457.2021.9464538
Salur, M.U., Aydin, İ.: The impact of preprocessing on classification performance in convolutional neural networks for Turkish text. In: International Conference on Artificial Intelligence and Data Processing (IDAP), pp. 1–4 (2018). https://doi.org/10.1109/IDAP.2018.8620722
Zhang, H., Li, Z., Shahriar, H., Tao, L., Bhattacharya, P., Qian, Y.: Improving prediction accuracy for logistic regression on imbalanced datasets. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), pp. 918–919 (2019). https://doi.org/10.1109/COMPSAC.2019.00140
Hanskunatai, A.: A new hybrid sampling approach for classification of imbalanced datasets. In: 2018 3rd International Conference on Computer and Communication Systems (ICCCS), pp. 67–71 (2018). https://doi.org/10.1109/CCOMS.2018.8463228
Hanif, A., Azhar, N.: Resolving class imbalance and feature selection in customer churn dataset. In: International Conference on Frontiers of Information Technology (FIT), pp. 82–86 (2017). https://doi.org/10.1109/FIT.2017.00022
Raj, R.J.R., Das, P., Sahu, P.: Emotion classification on Twitter data using word embedding and lexicon based approach. In: 2020 IEEE 9th International Conference on Communication Systems and Network Technologies (CSNT), pp. 150–154 (2020). https://doi.org/10.1109/CSNT48778.2020.9115750
Agarwal, A., Sharma, V., Sikka, G., Dhir, R.: Opinion mining of news headlines using SentiWordNet. In: Symposium on Colossal Data Analysis and Networking (CDAN), pp. 1–5 (2016). https://doi.org/10.1109/CDAN.2016.7570949
Rabab’ah, A.M., Al-Ayyoub, M., Jararweh, Y., Al-Kabi, M.N.: Evaluating SentiStrength for Arabic sentiment analysis. In: 2016 7th International Conference on Computer Science and Information Technology (CSIT), pp. 1–6 (2016). https://doi.org/10.1109/CSIT.2016.7549458
Zheng, Y.: An exploration on text classification with classical machine learning algorithm. In: 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), pp. 81–85 (2019). https://doi.org/10.1109/MLBDBI48998.2019.00023
Venkatesh, Ranjitha, K.V.: Classification and optimization scheme for text data using machine learning Naïve Bayes classifier. In: 2018 IEEE World Symposium on Communication Engineering (WSCE), pp. 33–36 (2018). https://doi.org/10.1109/WSCE.2018.8690536
Pathuri, S.K., Anbazhagan, N., Prakash, G.B.: Feature based sentimental analysis for prediction of mobile reviews using hybrid bag-boost algorithm. In: 2020 7th International Conference on Smart Structures and Systems (ICSSS), pp. 1–5 (2020). https://doi.org/10.1109/ICSSS49621.2020.9201990
Dhahi, S.H., Waleed, J.: Emotions polarity of tweets based on semantic similarity and user behavior features. In: 2020 1st Information Technology to Enhance e-Learning and Other Application (IT-ELA), pp. 1–6 (2020). https://doi.org/10.1109/IT-ELA50150.2020.9253088
Putra, B.P., Irawan, B., Setianingsih, C., Rahmadani, A., Imanda, F., Fawwas, I.Z.: Hate speech detection using convolutional neural network algorithm based on image. In: 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE), pp. 207–212 (2022). https://doi.org/10.1109/ISMODE53584.2022.9742810
Amrutha, B.R., Bindu, K.R.: Detecting hate speech in tweets using different deep neural network architectures. In: International Conference on Intelligent Computing and Control Systems (ICCS), pp. 923–926 (2019). https://doi.org/10.1109/ICCS45141.2019.9065763
Zhou, K., Long, F.: Sentiment analysis of text based on CNN and bi-directional LSTM model. In: 2018 24th International Conference on Automation and Computing (ICAC), pp. 1–5 (2018). https://doi.org/10.23919/IConAC.2018.8749069
Santos, M.S., Soares, J.P., Abreu, P.H., Araujo, H., Santos, J.: Cross-validation for imbalanced datasets: avoiding overoptimistic and overfitting approaches [research frontier]. IEEE Comput. Intell. Mag. 13(4), 59–76 (2018). https://doi.org/10.1109/MCI.2018.2866730
Article Google Scholar
Mohammadi, S., Chapon, M.: Investigating the performance of fine-tuned text classification models based-on BERT. In: 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1252–1257 (2020). https://doi.org/10.1109/HPCC-SmartCity-DSS50907.2020.00162
Weijie, D., Yunyi, L., Jing, Z., Xuchen, S.: Long text classification based on BERT. In: 2021 IEEE 5th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pp. 1147–1151 (2021). https://doi.org/10.1109/ITNEC52019.2021.9587007
Shao, Y., Taylor, S., Marshall, N., Morioka, C., Zeng-Treitler, Q.: Clinical text classification with word embedding features vs. bag-of-words features. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2874–2878 (2018). https://doi.org/10.1109/BigData.2018.8622345
Alessa, A., Faezipour, M., Alhassan, Z.: Text classification of flu-related tweets using FastText with sentiment and keyword features. In: IEEE International Conference on Healthcare Informatics (ICHI), pp. 366–367 (2018). https://doi.org/10.1109/ICHI.2018.00058
Shrivastava, P., Sharma, D.K.: Fake content identification using pre-trained glove-embedding. In: 2021 5th International Conference on Information Systems and Computer Networks (ISCON), pp. 1–6 (2021). https://doi.org/10.1109/ISCON52037.2021.9702379
Yue, W., Li, L.: Sentiment analysis using Word2vec-CNN-BiLSTM classification. In: 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 1–5 (2020). https://doi.org/10.1109/SNAMS52053.2020.9336549
Liu, C., et al.: Constrained oversampling: an oversampling approach to reduce noise generation in imbalanced datasets with class overlapping. IEEE Access 10, 91452–91465 (2020). https://doi.org/10.1109/ACCESS.2020.3018911
Article Google Scholar
Srinilta, C., Kanharattanachai, S.: Application of natural neighbor-based algorithm on oversampling SMOTE algorithms. In: 2021 7th International Conference on Engineering, Applied Sciences and Technology (ICEAST), pp. 217–220 (2021). https://doi.org/10.1109/ICEAST52143.2021.9426310
Cahyana, N., Khomsah, S., Aribowo, A.S.: Improving imbalanced dataset classification using oversampling and gradient boosting. In: 2019 5th International Conference on Science in Information Technology (ICSITech), pp. 217–222 (2019). https://doi.org/10.1109/ICSITech46713.2019.8987499
Veni, C.V.K., Rani, T.S.: Quartiles based undersampling (QUS): a simple and novel method to increase the classification rate of positives in imbalanced datasets. In: Ninth International Conference on Advances in Pattern Recognition (ICAPR), pp. 1–6 (2017). https://doi.org/10.1109/ICAPR.2017.8593202
Luqyana, W.A., Ahmadie, B.L., Supianto, A.A.: K-nearest neighbors undersampling as balancing data for cyber troll detection. In: International Conference on Sustainable Information Engineering and Technology (SIET), pp. 322–325 (2019). https://doi.org/10.1109/SIET48054.2019.8986079

Download references

Author information

Authors and Affiliations

Mustapha Stambouli University of Mascara, 29000, Mascara, Algeria
Laouni Mahmoudi & Mohammed Salem

Authors

Laouni Mahmoudi
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Salem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laouni Mahmoudi .

Editor information

Editors and Affiliations

University of Mascara, Mascara, Algeria
Mohammed Salem
University of Granada, Granada, Spain
Juan Julián Merelo
Université Paris-Est Créteil, Créteil, France
Patrick Siarry
University of Mascara, Mascara, Algeria
Rochdi Bachir Bouiadjra
University of Mascara, Mascara, Algeria
Mohamed Debakla
University of Mascara, Mascara, Algeria
Fatima Debbat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahmoudi, L., Salem, M. (2023). Improving Multi-class Text Classification Using Balancing Techniques. In: Salem, M., Merelo, J.J., Siarry, P., Bachir Bouiadjra, R., Debakla, M., Debbat, F. (eds) Artificial Intelligence: Theories and Applications. ICAITA 2022. Communications in Computer and Information Science, vol 1769. Springer, Cham. https://doi.org/10.1007/978-3-031-28540-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-28540-0_21
Published: 18 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28539-4
Online ISBN: 978-3-031-28540-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Multi-class Text Classification Using Balancing Techniques

Abstract

Access this chapter

Similar content being viewed by others

KSCB: a novel unsupervised method for text sentiment analysis

Bidirectional LSTM-Based Sentiment Analysis of Context-Sensitive Lexicon for Imbalanced Text

Context-sensitive lexicon for imbalanced text sentiment classification using bidirectional LSTM

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving Multi-class Text Classification Using Balancing Techniques

Abstract

Access this chapter

Similar content being viewed by others

KSCB: a novel unsupervised method for text sentiment analysis

Bidirectional LSTM-Based Sentiment Analysis of Context-Sensitive Lexicon for Imbalanced Text

Context-sensitive lexicon for imbalanced text sentiment classification using bidirectional LSTM

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation