Shapley Additive Explanations for Text Classification and Sentiment Analysis of Internet Movie Database

Dewi, Christine; Tsai, Bing-Jun; Chen, Rung-Ching

doi:10.1007/978-981-19-8234-7_6

Christine Dewi^10,11,
Bing-Jun Tsai¹⁰ &
Rung-Ching Chen^10,10

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1716))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

984 Accesses
1 Citations

Abstract

The application of Artificial Intelligence (AI) is increasing in areas like sentiment analysis and natural language processing (NLP). Automatic sentiment analysis provides a guide to capture the user emotions and classify the reviews into positive or negative. One of the challenges of using general lexicon analysis is its insensitivity to all domains. There arises a need for the interpretability of the output predicted from the AI sentiment analysis models. This paper developed a Shapley Additive Explanations for Text Classification (SHAP) based model to classify the user opinion texts into negative or positive labels. Our sentiment analysis model is evaluated on the Internet Movie Database (IMDB) datasets which have rich vocabulary and coherence of the textual data. Results showed that the model predicted 89% of the user reviews correctly. This model is very flexible for extending it to the unlabeled data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kumar, S., Gahalawat, M., Roy, P.P., Dogra, D.P., Kim, B.G.: Exploring impact of age and gender on sentiment analysis using machine learning. Electron. 9(2), 374, (2020). https://doi.org/10.3390/electronics9020374
Chiny, M., Chihab, M., Chihab, Y., Bencharef, O.: LSTM, VADER and TF-IDF based hybrid sentiment analysis model. Int. J. Adv. Comput. Sci. Appl. 12, 265–275 (2021). https://doi.org/10.14569/IJACSA.2021.0120730
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, pp. 4171–4186 (2019)
Google Scholar
Zhu, M., Song, Y., Jin, G., Jiang, K.: Identifying personal experience tweets of medication effects using pre-trained RoBERTa language model and its updating. In: EMNLP 2020 - 11th International Workshop on Health Text Mining and Information Analysis, LOUHI 2020, Proceedings of the Workshop, pp. 127–137 (2020). https://doi.org/10.18653/v1/2020.louhi-1.14
Chen, R.-C., Dewi, C., Huang, S.-W., Caraka, R.E.: Selecting critical features for data classification based on machine learning methods. J. Big Data 7(1), 1–26 (2020). https://doi.org/10.1186/s40537-020-00327-4
Article Google Scholar
Dewi, C., Chen, R.-C., Yu, H., Jiang, X.: Robust detection method for improving small traffic sign recognition based on spatial pyramid pooling. J. Ambient. Intell. Humaniz. Comput. , 1–18 (2021). https://doi.org/10.1007/s12652-021-03584-0
Chatterjee, S., Chakrabarti, K., Garain, A., Schwenker, F., Sarkar, R.: Jumrv1: A sentiment analysis dataset for movie recommendation. Appl. Sci. 11(20), 9381, (2021). https://doi.org/10.3390/app11209381
Lauriola, I., Lavelli, A., Aiolli, F.: An introduction to deep learning in natural language processing: models, techniques, and tools. Neurocomputing. 470, 443–456 (2022). https://doi.org/10.1016/j.neucom.2021.05.103
Zhou, M., Duan, N., Liu, S., Shum, H.Y.: Progress in neural NLP: modeling, learning, and reasoning. Engineering 6(3), 275–290 (2020). https://doi.org/10.1016/j.eng.2019.12.014
Chen, R.-C., Dewi, C., Zhang, W.-W., Liu, J.-M.: Integrating gesture control board and image recognition for gesture recognition based on deep learning. Int. J. Appl. Sci. Eng. 17, 237–248 (2020)
Google Scholar
Wang, D., Thunéll, S., Lindberg, U., Jiang, L., Trygg, J., Tysklind, M.: Towards better process management in wastewater treatment plants: process analytics based on SHAP values for tree-based machine learning methods. J. Environ. Manage. 301, 113941 (2022). https://doi.org/10.1016/j.jenvman.2021.113941
Dewi, C., Chen, R.-C., Tai, S.-K.: Evaluation of robust spatial pyramid pooling based on convolutional neural network for traffic sign recognition system. Electronics 9, 889 (2020). https://doi.org/10.3390/electronics9060889
Article Google Scholar
Dewi, C., Chen, R.-C., Jiang, X., Yu, H.: Deep convolutional neural network for enhancing traffic sign recognition developed on Yolo V4. Multimed. Tools Appl. 1–25 (2022). https://doi.org/10.1007/s11042-022-12962-5
Kokalj, E., Škrlj, B., Lavrač, N., Pollak, S., Robnik-Šikonja, M.: BERT meets shapley: extending SHAP explanations to transformer-based classifiers. In: EACL Hackashop on News Media Content Analysis and Automated Report Generation, Hackashop 2021 at 16th conference of the European Chapter of the Association for Computational Linguistics, EACL 2021 – Proceedings, pp. 16–21 (2021)
Google Scholar
Dewi, C., Chen, R.-C.: Human activity recognition based on evolution of features selection and random forest. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 2496–2501 (2019)
Google Scholar
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis draft draft draft! Comput. Linguist. 37(2), 267–307 (2011)
Google Scholar
Bandhakavi, A., Wiratunga, N., Padmanabhan, D., Massie, S.: Lexicon based feature extraction for emotion text classification. Pattern Recognit. Lett. 93, 133–142 (2017). https://doi.org/10.1016/j.patrec.2016.12.009
Feng, S., Song, K., Wang, D., Yu, G.: A word-emoticon mutual reinforcement ranking model for building sentiment lexicon from massive collection of microblogs. World Wide Web 18(4), 949–967 (2014). https://doi.org/10.1007/s11280-014-0289-x
Article Google Scholar
Al-Ayyoub, M., Khamaiseh, A.A., Jararweh, Y., Al-Kabi, M.N.: A comprehensive survey of arabic sentiment analysis. Inf. Process. Manag. 56(2), 320–342 (2019). https://doi.org/10.1016/j.ipm.2018.07.006
Dewi, C., Chen, R.-C.: Random forest and support vector machine on features selection for regression analysis. Int. J. Innov. Comput. Inf. Control. 15, 2027–2038 (2019)
Google Scholar
Rezwanul, M., Ali, A., Rahman, A.: Sentiment analysis on twitter data using KNN and SVM. Int. J. Adv. Comput. Sci. Appl. 8(6), (2017). https://doi.org/10.14569/ijacsa.2017.080603
Long, W., Tang, Y.-R., Tian, Y.-J.: Investor sentiment identification based on the universum SVM. Neural Comput. Appl. 30(2), 661–670 (2016). https://doi.org/10.1007/s00521-016-2684-y
Article Google Scholar
Hyun, D., Park, C., Yang, M.C., Song, I., Lee, J.T., Yu, H.: Target-aware convolutional neural network for target-level sentiment analysis. Inf. Sci. (Ny). 491, 166–178 (2019). https://doi.org/10.1016/j.ins.2019.03.076
Chen, T., Xu, R., He, Y., Wang, X.: Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst. Appl. 72, 221–230 (2017). https://doi.org/10.1016/j.eswa.2016.10.065
Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks. In: EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (2020). https://doi.org/10.18653/v1/d19-1410
Gao, Z., Feng, A., Song, X., Wu, X.: Target-dependent sentiment classification with BERT. IEEE Access. 7, 154290–154299 (2019). https://doi.org/10.1109/ACCESS.2019.2946594
Dewi, C., Chen, R.-C., Liu, Y.-T., Tai, S.-K.: Synthetic Data generation using DCGAN for improved traffic sign recognition. Neural Comput. Appl. , 1–16 (2021). https://doi.org/10.1007/s00521-021-05982-z
Subies, G.G., Sánchez, D.B., Vaca, A.: Bert and shap for humor analysis based on human annotation. In: CEUR Workshop Proceedings (2021)
Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, pp. 4766–4775 (2017)
Google Scholar
Aas, K., Jullum, M., Løland, A.: Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artif. Intell. 298, 103502 (2021). https://doi.org/10.1016/j.artint.2021.103502
Dewi, C., Chen, R.-C.: Combination of resnet and spatial pyramid pooling for musical instrument identification. Cybern. Inf. Technol. 22, 104 (2022)
Google Scholar
Dewi, C., Chen, R.-C., Yu, H.: Weight analysis for various prohibitory sign detection and recognition using deep learning. Multimedia Tools Appl. 79(43–44), 32897–32915 (2020). https://doi.org/10.1007/s11042-020-09509-x
Article Google Scholar
Lakshmipathi, N.: IMDB Dataset of 50K Movie Reviews
Google Scholar
Dewi, C., Chen, R., Liu, Y., Yu, H.: Various generative adversarial networks model for synthetic prohibitory sign image generation. Appl. Sci. 11, 2913 (2021)
Article Google Scholar
De Groote, W., Van Hoecke, S., Crevecoeur, G.: Prediction of follower jumps in cam-follower mechanisms: The benefit of using physics-inspired features in recurrent neural networks. Mech. Syst. Sign. Process. 166, 108453 (2022). https://doi.org/10.1016/j.ymssp.2021.108453
Dewi, C., Chen, R.C., Liu, Y.T., Jiang, X., Hartomo, K.D.: Yolo V4 for advanced traffic sign recognition with synthetic training data generated by various GAN. IEEE Access. 9, 97228–97242 (2021). https://doi.org/10.1109/ACCESS.2021.3094201
Article Google Scholar

Download references

Acknowledgment

This paper is supported by the Ministry of Science and Technology, Taiwan. The Nos are MOST-107–2221-E-324 -018 -MY2 and MOST-109–2622-E-324-004, Taiwan.

Author information

Authors and Affiliations

Department of Information Management, Chaoyang University of Technology Taichung, Taiwan, Republic of China
Christine Dewi, Bing-Jun Tsai, Rung-Ching Chen & Rung-Ching Chen
Faculty of Information Technology, Satya Wacana Christian University, Salatiga, Indonesia
Christine Dewi

Authors

Christine Dewi
View author publications
You can also search for this author in PubMed Google Scholar
Bing-Jun Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Rung-Ching Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rung-Ching Chen .

Editor information

Editors and Affiliations

University of Newcastle Australia, Newcastle, NSW, Australia
Edward Szczerbicki
Wrocław University of Science and Technology, Wrocław, Poland
Krystian Wojtkiewicz
International University - VNU-HCM, Ho Chi Minh City, Vietnam
Sinh Van Nguyen
Wrocław University of Science and Technology, Wrocław, Poland
Marcin Pietranik
Wrocław University of Science and Technology, Wrocław, Poland
Marek Krótkiewicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dewi, C., Tsai, BJ., Chen, RC. (2022). Shapley Additive Explanations for Text Classification and Sentiment Analysis of Internet Movie Database. In: Szczerbicki, E., Wojtkiewicz, K., Nguyen, S.V., Pietranik, M., Krótkiewicz, M. (eds) Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2022. Communications in Computer and Information Science, vol 1716. Springer, Singapore. https://doi.org/10.1007/978-981-19-8234-7_6

Download citation

DOI: https://doi.org/10.1007/978-981-19-8234-7_6
Published: 24 November 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8233-0
Online ISBN: 978-981-19-8234-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics