Efficacy of Oversampling Over Machine Learning Algorithms in Case of Sentiment Analysis

Chatterjee, Deb Prakash; Mukhopadhyay, Sabyasachi; Goswami, Saptarsi; Panigrahi, Prasanta K.

doi:10.1007/978-981-15-5619-7_17

Deb Prakash Chatterjee¹⁸,
Sabyasachi Mukhopadhyay¹⁹,
Saptarsi Goswami²⁰ &
…
Prasanta K. Panigrahi²¹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1175))

867 Accesses
4 Citations
2 Altmetric

Abstract

Text classification is a very important problem in artificial intelligence domain and covers a wide portion in natural language processing, which can be called as sentiment analysis. Sentiment analysis is basically extracting the tone or emotion of the writer, by understanding the text sequence. This way of approach is to understand the sentiment of a text considering as a boon in the customer management system and can easily be applied to the social media sites, such as twitter or e-commerce websites, like amazon to get the customer review and analyze. Sentiment analysis can be binary or multiclass, here in our approach, we will consider both of them, by doing a comparative study between long short-term memory (LSTM), random forest, support vector machine(SVM), and XGBoost, to check if they can be as good as LSTM in any case. Also, as we discover the data distribution problem in our datasets, so we will be applying oversampling to make the distribution in a stabilized form.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Performance Analysis of Machine Learning Techniques for Sentiment Analysis

Evaluating the Performance of Machine Learning Algorithms for Sentiment Prediction on Social Media Natural Language Text Data

Enhancing Sentiment Analysis with GPT—A Comparison of Large Language Models and Traditional Machine Learning Techniques

References

F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Bergeron, N. Bouchard, D. Warde-Farley, Y. Bengio, Theano: new features and speed improvements (2012). http://arxiv.org/abs/1211.5590. arXiv preprint
M.V. Mäntylä, D. Graziotin, M. Kuutila, The evolution of sentiment analysis—A review of research topics, venues, and top cited papers. Comput. Sci. Rev. 27 (2016). https://doi.org/10.1016/j.cosrev.2017.10.002
S.M. Mohammad, Challenges in sentiment analysis, in A Practical Guide to Sentiment Analysis, ed. by E. Cambria, D. Das, S. Bandyopadhyay, A. Feraco (Springer, 2017), pp. 61–83
Google Scholar
D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate (2014). http://arxiv.org/abs/1409.0473. arXiv preprint
L. Arras, F. Horn, G. Montavon, K.R. M¨uller, W. Samek, What is relevant in a text document?: An interpretable machine learning approach (2016). http://arxiv.org/abs/1612.07843
Y. Bengio, R. Ducharme, P. Vincent, C. Jauvin, A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
K. Mouthami, K.N. Devi, V.M. Bhaskaran, Sentiment analysis and classification based on textual reviews, in 2013 International Conference on Information Communication and Embedded Systems (ICICES) (Chennai, 2013), pp. 271–276
Google Scholar
H. Sak, A. Senior, F. Beaufays, Long short-term memory recurrent neural network architectures for large scale acoustic modeling, in INTERSPEECH 2014, 14–18 September, Singapore (2014)
Google Scholar
T.K. Ho, Random decision forests, in Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August, (1995), pp. 278–282
Google Scholar
B. Pang, L. Lee, S. Vaithyanathan, Thumbs up? Sentiment classification using machine learning techniques, in Empirical Methods in Natural Language Processing [and Very Large Corpora] (2002)
Google Scholar
C. Cortes, V.N. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
T. Chen, C. Guestrin, XGBoost: a scalable tree boosting system, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ed. by B. Krishnapuram, M. Shah, A.J. Smola, C.C. Aggarwal, D. Shen, R. Rastogi, San Francisco, CA, USA, August 13–17. ACM (2016), pp. 785–794
Google Scholar
A. Mukherjee, S. Mukhopadhyay, P.K. Panigrahi, S. Goswami, Utilization of oversampling for multiclass sentiment analysis on amazon review dataset, in 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST), Japan (2019)
Google Scholar
C.X. Ling, C. Li, Data mining for direct marketing: problems and solutions. Kdd. 98 (1998)
Google Scholar
US Airlines Twitter Reviews. https://www.kaggle.com/crowdflower/twitter-airline-sentiment
Rotten Tomatoes Movie Reviews. https://www.kaggle.com/c/movie-review-sentiment-analysis-kernels-only/data

Download references

Author information

Authors and Affiliations

Techno India University, EM-4, Salt Lake City, Sector V, Kolkata, 700091, West Bengal, India
Deb Prakash Chatterjee
BIMS Kolkata, FA Block, Sector III, Bidhannagar, Kolkata, 700097, West Bengal, India
Sabyasachi Mukhopadhyay
University of Calcutta, Technology Campus, JD-2, JD Block, Sector III, Bidhannagar, Kolkata, 700106, West Bengal, India
Saptarsi Goswami
IISER Kolkata, Mohanpur, 741246, West Bengal, India
Prasanta K. Panigrahi

Authors

Deb Prakash Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Sabyasachi Mukhopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Saptarsi Goswami
View author publications
You can also search for this author in PubMed Google Scholar
Prasanta K. Panigrahi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deb Prakash Chatterjee .

Editor information

Editors and Affiliations

Society for Data Science, Pune, Maharashtra, India
Neha Sharma
A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India
Amlan Chakrabarti
Department of Automatics and Applied Software, Faculty of Engineering, University of Arad, Arad, Romania
Valentina Emilia Balas
IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic
Jan Martinovic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chatterjee, D.P., Mukhopadhyay, S., Goswami, S., Panigrahi, P.K. (2021). Efficacy of Oversampling Over Machine Learning Algorithms in Case of Sentiment Analysis. In: Sharma, N., Chakrabarti, A., Balas, V.E., Martinovic, J. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1175. Springer, Singapore. https://doi.org/10.1007/978-981-15-5619-7_17

Download citation

DOI: https://doi.org/10.1007/978-981-15-5619-7_17
Published: 19 September 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5618-0
Online ISBN: 978-981-15-5619-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Efficacy of Oversampling Over Machine Learning Algorithms in Case of Sentiment Analysis

Abstract

Access this chapter

Similar content being viewed by others

Performance Analysis of Machine Learning Techniques for Sentiment Analysis

Evaluating the Performance of Machine Learning Algorithms for Sentiment Prediction on Social Media Natural Language Text Data

Enhancing Sentiment Analysis with GPT—A Comparison of Large Language Models and Traditional Machine Learning Techniques

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Efficacy of Oversampling Over Machine Learning Algorithms in Case of Sentiment Analysis

Abstract

Access this chapter

Similar content being viewed by others

Performance Analysis of Machine Learning Techniques for Sentiment Analysis

Evaluating the Performance of Machine Learning Algorithms for Sentiment Prediction on Social Media Natural Language Text Data

Enhancing Sentiment Analysis with GPT—A Comparison of Large Language Models and Traditional Machine Learning Techniques

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation