Abstract
The main problems in sentiment analysis models on Indonesian YouTube comments are unstructured data and low classification accuracy. Sentiment analysis for Indonesian, which is different from English, requires proper preprocessing and classification methods. Previous research usually using Linear Support Vector Machine (SVM), Naïve Bayes and Decision Tree. Although the accuracy of SVM is better than other algorithms, it still needs to be improved. This study aims to compare the performance of the tree-based ensemble method and feature selection to improve the sentiment analysis model for Indonesian YouTube comments. This research crawled Indonesian YouTube comments from different domains and produce ten datasets. The preprocessing’s method in this research was removed stopword, convert slang words, and stemming. For feature selection, we tested two vectorizer method, i.e. Term Frequency (TF) or Term Frequency/Inverse Document Frequency (TF-IDF). The model build using six machine learning, consist of four tree-based ensemble machine learning to raise better accuracy, Linear SVM and Decision Tree. We use tree-based ensemble machine learning, they are Random Forest, and Extra Tree represents bagging ensemble. AdaBoost and Gradient Boosting represent boosting ensemble. SVM and Decision tree as a comparison. Based on experiments by combining feature selection and ensemble machine learning, it can be concluded that the type of vectorizer has little effect on classification accuracy. In all experiments, the best machine learning methods are Extra Tree with an accuracy of 93.39% and AdaBoost with an accuracy of 92.53%. Whereas, the use of TF or TF-IDF does not significantly affect accuracy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Musdholifah A, Rinaldi E (2018) FVEC feature and machine learning approach for Indonesian opinion mining on YouTube comments. In: Proceeding of EECSI, pp 724–729
Khomsah S, Aribowo AS (2020) Text-preprocessing model youtube comments in indonesian. RESTI 4(4):648–654
Kaur S, Kumar P, Kumaraguru P (2019) Automating fake news detection system using multi-level voting model. Soft Comput 24(12):9049–9069
Kanakaraj M, Guddeti RMR (2015) Performance Analysis of Ensemble Methods on Twitter Sentiment Analysis using NLP Techniques. In: Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing, IEEE ICSC 2015, pp 169–170
Savigny J, Purwarianti A (2017) Emotion Classification on Youtube Comments Using Word Embedding. In: International Conference on Advanced Informatics: Concepts, Theory and Applications, pp 1–5
Tanesab FI, Sembiring I, Purnomo HD (2017) Sentiment Analysis Model Based on Youtube Comment Using Support Vector Machine. Int J Comput Sci Softw Eng (IJCSSE) 6(8):180–185 [Online]. Available: http://ijcsse.org/published/volume6/issue8/p2-V6I8.pdf
Andriansyah M et al. (2018) Cyberbullying Comment Classification on Indonesian Selebgram Using Support Vector Machine Method. In: Proceedings of the 2nd International Conference on Informatics and Computing, ICIC 2017, pp 1–5
Rinaldi E, Musdholifah A (2017) FVEC-SVM for Opinion Mining on Indonesian Comments of YouTube Video. In: Proceedings of 2017 International Conference on Data and Software Engineering, ICoDSE 2017, pp 1–5
Abraham MP, Udaya Kumar Reddy KR (2020) Feature based sentiment analysis of mobile product reviews using machine learning techniques. Int J Adv Trends Comput Sci Eng 9(2):2289–2296
Anggraini N, Tursina MJ (2019) Sentiment Analysis of School Zoning System on Youtube Social Media Using the K-Nearest Neighbor with Levenshtein Distance Algorithm. In: 2019 7th International Conference on Cyber and IT Service Management, CITSM 2019, May, pp 1–4
Maisal RA, Hidayanto AN, Ayuning Budi NF, Abidin Z, Purbasari A (2019) Analysis of Sentiments on Indonesian YouTube Video Comments: Case Study of the Indonesian Government’s Plan to Move the Capital City. In Proceedings—1st International Conference on Informatics, Multimedia, Cyber and Information System, ICIMCIS 2019, pp 121–124
Muhammad AN, Bukhori S, Pandunata P (2019) Sentiment Analysis of Positive and Negative of YouTube Comments Using Naïve Bayes-Support Vector Machine (NBSVM) Classifier. In: Proceedings International Conference on Computer Science, Information Technology, and Electrical Engineering, ICOMITEE 2019, vol 1, pp 199–205
Risky Novendri CEP, Syafarani Callista A, Naufal Pratama D (2020) Sentiment Analysis of YouTube Movie Trailer Comments Using Naïve Bayes. Bull Comput Sci Electr Eng 1(1):1–5
Aribowo AS, Basiron H, Herman NS, Khomsah S (2020) Fanaticism Category Generation Using Tree-Based Machine Learning Method. J Phys: Conf Ser 1501(1)
Sultana N, Islam MM (2020) Meta Classifier-Based Ensemble Learning For Sentiment Classification. In: Proceedings of International Joint Conference on Computational Intelligence, Algorithms for Intelligent Systems, pp 73–84
Pong-Inwong C, Kaewmak K (2016) Improved Sentiment Analysis for Teaching Evaluation Using Feature Selection and Voting Ensemble Learning Integration Chakrit. In Proceedings 2016 of 2nd IEEE International Conference on Computer and Communications, pp 1222–1225
Onan A, Korukoğlu S, Bulut H (2016) LDA-based Topic Modelling in Text Sentiment Classification: An Empirical Analysis. Int J Comput Linguist Appl 7(1):101–119
Cao Y, Miao Q-G, Liu J-C, Gao L (2013) Advance and Prospects of AdaBoost Algorithm. Acta Autom Sin 39(6):745–758
Alzamzami F, Hoda M, El Saddik A (2020) Light Gradient Boosting Machine for General Sentiment Classification on Short Texts: A Comparative Evaluation. IEEE Access 8:101840–101858
Sharma H, Kumar S (2016) A Survey on Decision Tree Algorithms of Classification in Data Mining. Int J Sci Res (IJSR) 5(4):2094–2097
Fauzi MA (2018) Random Forest Approach for Sentiment Analysis in Indonesian Language. Indones J Electr Eng Comput Sci 12(1):46–50
Vanegas MI, Ghilardi MF, Kelly SP, Blangero A (2018) Machine learning for EEG-based biomarkers in Parkinson’s disease. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 2661–2665
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Khomsah, S., Hidayatullah, A.F., Aribowo, A.S. (2021). Comparison of the Effects of Feature Selection and Tree-Based Ensemble Machine Learning for Sentiment Analysis on Indonesian YouTube Comments. In: Triwiyanto, Nugroho, H.A., Rizal, A., Caesarendra, W. (eds) Proceedings of the 1st International Conference on Electronics, Biomedical Engineering, and Health Informatics. Lecture Notes in Electrical Engineering, vol 746. Springer, Singapore. https://doi.org/10.1007/978-981-33-6926-9_15
Download citation
DOI: https://doi.org/10.1007/978-981-33-6926-9_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-6925-2
Online ISBN: 978-981-33-6926-9
eBook Packages: EngineeringEngineering (R0)