Abstract
The application of Artificial Intelligence (AI) is increasing in areas like sentiment analysis and natural language processing (NLP). Automatic sentiment analysis provides a guide to capture the user emotions and classify the reviews into positive or negative. One of the challenges of using general lexicon analysis is its insensitivity to all domains. There arises a need for the interpretability of the output predicted from the AI sentiment analysis models. This paper developed a Shapley Additive Explanations for Text Classification (SHAP) based model to classify the user opinion texts into negative or positive labels. Our sentiment analysis model is evaluated on the Internet Movie Database (IMDB) datasets which have rich vocabulary and coherence of the textual data. Results showed that the model predicted 89% of the user reviews correctly. This model is very flexible for extending it to the unlabeled data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kumar, S., Gahalawat, M., Roy, P.P., Dogra, D.P., Kim, B.G.: Exploring impact of age and gender on sentiment analysis using machine learning. Electron. 9(2), 374, (2020). https://doi.org/10.3390/electronics9020374
Chiny, M., Chihab, M., Chihab, Y., Bencharef, O.: LSTM, VADER and TF-IDF based hybrid sentiment analysis model. Int. J. Adv. Comput. Sci. Appl. 12, 265–275 (2021). https://doi.org/10.14569/IJACSA.2021.0120730
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, pp. 4171–4186 (2019)
Zhu, M., Song, Y., Jin, G., Jiang, K.: Identifying personal experience tweets of medication effects using pre-trained RoBERTa language model and its updating. In: EMNLP 2020 - 11th International Workshop on Health Text Mining and Information Analysis, LOUHI 2020, Proceedings of the Workshop, pp. 127–137 (2020). https://doi.org/10.18653/v1/2020.louhi-1.14
Chen, R.-C., Dewi, C., Huang, S.-W., Caraka, R.E.: Selecting critical features for data classification based on machine learning methods. J. Big Data 7(1), 1–26 (2020). https://doi.org/10.1186/s40537-020-00327-4
Dewi, C., Chen, R.-C., Yu, H., Jiang, X.: Robust detection method for improving small traffic sign recognition based on spatial pyramid pooling. J. Ambient. Intell. Humaniz. Comput. , 1–18 (2021). https://doi.org/10.1007/s12652-021-03584-0
Chatterjee, S., Chakrabarti, K., Garain, A., Schwenker, F., Sarkar, R.: Jumrv1: A sentiment analysis dataset for movie recommendation. Appl. Sci. 11(20), 9381, (2021). https://doi.org/10.3390/app11209381
Lauriola, I., Lavelli, A., Aiolli, F.: An introduction to deep learning in natural language processing: models, techniques, and tools. Neurocomputing. 470, 443–456 (2022). https://doi.org/10.1016/j.neucom.2021.05.103
Zhou, M., Duan, N., Liu, S., Shum, H.Y.: Progress in neural NLP: modeling, learning, and reasoning. Engineering 6(3), 275–290 (2020). https://doi.org/10.1016/j.eng.2019.12.014
Chen, R.-C., Dewi, C., Zhang, W.-W., Liu, J.-M.: Integrating gesture control board and image recognition for gesture recognition based on deep learning. Int. J. Appl. Sci. Eng. 17, 237–248 (2020)
Wang, D., Thunéll, S., Lindberg, U., Jiang, L., Trygg, J., Tysklind, M.: Towards better process management in wastewater treatment plants: process analytics based on SHAP values for tree-based machine learning methods. J. Environ. Manage. 301, 113941 (2022). https://doi.org/10.1016/j.jenvman.2021.113941
Dewi, C., Chen, R.-C., Tai, S.-K.: Evaluation of robust spatial pyramid pooling based on convolutional neural network for traffic sign recognition system. Electronics 9, 889 (2020). https://doi.org/10.3390/electronics9060889
Dewi, C., Chen, R.-C., Jiang, X., Yu, H.: Deep convolutional neural network for enhancing traffic sign recognition developed on Yolo V4. Multimed. Tools Appl. 1–25 (2022). https://doi.org/10.1007/s11042-022-12962-5
Kokalj, E., Škrlj, B., Lavrač, N., Pollak, S., Robnik-Šikonja, M.: BERT meets shapley: extending SHAP explanations to transformer-based classifiers. In: EACL Hackashop on News Media Content Analysis and Automated Report Generation, Hackashop 2021 at 16th conference of the European Chapter of the Association for Computational Linguistics, EACL 2021 – Proceedings, pp. 16–21 (2021)
Dewi, C., Chen, R.-C.: Human activity recognition based on evolution of features selection and random forest. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 2496–2501 (2019)
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis draft draft draft! Comput. Linguist. 37(2), 267–307 (2011)
Bandhakavi, A., Wiratunga, N., Padmanabhan, D., Massie, S.: Lexicon based feature extraction for emotion text classification. Pattern Recognit. Lett. 93, 133–142 (2017). https://doi.org/10.1016/j.patrec.2016.12.009
Feng, S., Song, K., Wang, D., Yu, G.: A word-emoticon mutual reinforcement ranking model for building sentiment lexicon from massive collection of microblogs. World Wide Web 18(4), 949–967 (2014). https://doi.org/10.1007/s11280-014-0289-x
Al-Ayyoub, M., Khamaiseh, A.A., Jararweh, Y., Al-Kabi, M.N.: A comprehensive survey of arabic sentiment analysis. Inf. Process. Manag. 56(2), 320–342 (2019). https://doi.org/10.1016/j.ipm.2018.07.006
Dewi, C., Chen, R.-C.: Random forest and support vector machine on features selection for regression analysis. Int. J. Innov. Comput. Inf. Control. 15, 2027–2038 (2019)
Rezwanul, M., Ali, A., Rahman, A.: Sentiment analysis on twitter data using KNN and SVM. Int. J. Adv. Comput. Sci. Appl. 8(6), (2017). https://doi.org/10.14569/ijacsa.2017.080603
Long, W., Tang, Y.-R., Tian, Y.-J.: Investor sentiment identification based on the universum SVM. Neural Comput. Appl. 30(2), 661–670 (2016). https://doi.org/10.1007/s00521-016-2684-y
Hyun, D., Park, C., Yang, M.C., Song, I., Lee, J.T., Yu, H.: Target-aware convolutional neural network for target-level sentiment analysis. Inf. Sci. (Ny). 491, 166–178 (2019). https://doi.org/10.1016/j.ins.2019.03.076
Chen, T., Xu, R., He, Y., Wang, X.: Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst. Appl. 72, 221–230 (2017). https://doi.org/10.1016/j.eswa.2016.10.065
Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks. In: EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (2020). https://doi.org/10.18653/v1/d19-1410
Gao, Z., Feng, A., Song, X., Wu, X.: Target-dependent sentiment classification with BERT. IEEE Access. 7, 154290–154299 (2019). https://doi.org/10.1109/ACCESS.2019.2946594
Dewi, C., Chen, R.-C., Liu, Y.-T., Tai, S.-K.: Synthetic Data generation using DCGAN for improved traffic sign recognition. Neural Comput. Appl. , 1–16 (2021). https://doi.org/10.1007/s00521-021-05982-z
Subies, G.G., Sánchez, D.B., Vaca, A.: Bert and shap for humor analysis based on human annotation. In: CEUR Workshop Proceedings (2021)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, pp. 4766–4775 (2017)
Aas, K., Jullum, M., Løland, A.: Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artif. Intell. 298, 103502 (2021). https://doi.org/10.1016/j.artint.2021.103502
Dewi, C., Chen, R.-C.: Combination of resnet and spatial pyramid pooling for musical instrument identification. Cybern. Inf. Technol. 22, 104 (2022)
Dewi, C., Chen, R.-C., Yu, H.: Weight analysis for various prohibitory sign detection and recognition using deep learning. Multimedia Tools Appl. 79(43–44), 32897–32915 (2020). https://doi.org/10.1007/s11042-020-09509-x
Lakshmipathi, N.: IMDB Dataset of 50K Movie Reviews
Dewi, C., Chen, R., Liu, Y., Yu, H.: Various generative adversarial networks model for synthetic prohibitory sign image generation. Appl. Sci. 11, 2913 (2021)
De Groote, W., Van Hoecke, S., Crevecoeur, G.: Prediction of follower jumps in cam-follower mechanisms: The benefit of using physics-inspired features in recurrent neural networks. Mech. Syst. Sign. Process. 166, 108453 (2022). https://doi.org/10.1016/j.ymssp.2021.108453
Dewi, C., Chen, R.C., Liu, Y.T., Jiang, X., Hartomo, K.D.: Yolo V4 for advanced traffic sign recognition with synthetic training data generated by various GAN. IEEE Access. 9, 97228–97242 (2021). https://doi.org/10.1109/ACCESS.2021.3094201
Acknowledgment
This paper is supported by the Ministry of Science and Technology, Taiwan. The Nos are MOST-107–2221-E-324 -018 -MY2 and MOST-109–2622-E-324-004, Taiwan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dewi, C., Tsai, BJ., Chen, RC. (2022). Shapley Additive Explanations for Text Classification and Sentiment Analysis of Internet Movie Database. In: Szczerbicki, E., Wojtkiewicz, K., Nguyen, S.V., Pietranik, M., Krótkiewicz, M. (eds) Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2022. Communications in Computer and Information Science, vol 1716. Springer, Singapore. https://doi.org/10.1007/978-981-19-8234-7_6
Download citation
DOI: https://doi.org/10.1007/978-981-19-8234-7_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8233-0
Online ISBN: 978-981-19-8234-7
eBook Packages: Computer ScienceComputer Science (R0)