Skip to main content

Comparison of the Effects of Feature Selection and Tree-Based Ensemble Machine Learning for Sentiment Analysis on Indonesian YouTube Comments

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 746))

Abstract

The main problems in sentiment analysis models on Indonesian YouTube comments are unstructured data and low classification accuracy. Sentiment analysis for Indonesian, which is different from English, requires proper preprocessing and classification methods. Previous research usually using Linear Support Vector Machine (SVM), Naïve Bayes and Decision Tree. Although the accuracy of SVM is better than other algorithms, it still needs to be improved. This study aims to compare the performance of the tree-based ensemble method and feature selection to improve the sentiment analysis model for Indonesian YouTube comments. This research crawled Indonesian YouTube comments from different domains and produce ten datasets. The preprocessing’s method in this research was removed stopword, convert slang words, and stemming. For feature selection, we tested two vectorizer method, i.e. Term Frequency (TF) or Term Frequency/Inverse Document Frequency (TF-IDF). The model build using six machine learning, consist of four tree-based ensemble machine learning to raise better accuracy, Linear SVM and Decision Tree. We use tree-based ensemble machine learning, they are Random Forest, and Extra Tree represents bagging ensemble. AdaBoost and Gradient Boosting represent boosting ensemble. SVM and Decision tree as a comparison. Based on experiments by combining feature selection and ensemble machine learning, it can be concluded that the type of vectorizer has little effect on classification accuracy. In all experiments, the best machine learning methods are Extra Tree with an accuracy of 93.39% and AdaBoost with an accuracy of 92.53%. Whereas, the use of TF or TF-IDF does not significantly affect accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Musdholifah A, Rinaldi E (2018) FVEC feature and machine learning approach for Indonesian opinion mining on YouTube comments. In: Proceeding of EECSI, pp 724–729

    Google Scholar 

  2. Khomsah S, Aribowo AS (2020) Text-preprocessing model youtube comments in indonesian. RESTI 4(4):648–654

    Google Scholar 

  3. Kaur S, Kumar P, Kumaraguru P (2019) Automating fake news detection system using multi-level voting model. Soft Comput 24(12):9049–9069

    Article  Google Scholar 

  4. Kanakaraj M, Guddeti RMR (2015) Performance Analysis of Ensemble Methods on Twitter Sentiment Analysis using NLP Techniques. In: Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing, IEEE ICSC 2015, pp 169–170

    Google Scholar 

  5. Savigny J, Purwarianti A (2017) Emotion Classification on Youtube Comments Using Word Embedding. In: International Conference on Advanced Informatics: Concepts, Theory and Applications, pp 1–5

    Google Scholar 

  6. Tanesab FI, Sembiring I, Purnomo HD (2017) Sentiment Analysis Model Based on Youtube Comment Using Support Vector Machine. Int J Comput Sci Softw Eng (IJCSSE) 6(8):180–185 [Online]. Available: http://ijcsse.org/published/volume6/issue8/p2-V6I8.pdf

  7. Andriansyah M et al. (2018) Cyberbullying Comment Classification on Indonesian Selebgram Using Support Vector Machine Method. In: Proceedings of the 2nd International Conference on Informatics and Computing, ICIC 2017, pp 1–5

    Google Scholar 

  8. Rinaldi E, Musdholifah A (2017) FVEC-SVM for Opinion Mining on Indonesian Comments of YouTube Video. In: Proceedings of 2017 International Conference on Data and Software Engineering, ICoDSE 2017, pp 1–5

    Google Scholar 

  9. Abraham MP, Udaya Kumar Reddy KR (2020) Feature based sentiment analysis of mobile product reviews using machine learning techniques. Int J Adv Trends Comput Sci Eng 9(2):2289–2296

    Google Scholar 

  10. Anggraini N, Tursina MJ (2019) Sentiment Analysis of School Zoning System on Youtube Social Media Using the K-Nearest Neighbor with Levenshtein Distance Algorithm. In: 2019 7th International Conference on Cyber and IT Service Management, CITSM 2019, May, pp 1–4

    Google Scholar 

  11. Maisal RA, Hidayanto AN, Ayuning Budi NF, Abidin Z, Purbasari A (2019) Analysis of Sentiments on Indonesian YouTube Video Comments: Case Study of the Indonesian Government’s Plan to Move the Capital City. In Proceedings—1st International Conference on Informatics, Multimedia, Cyber and Information System, ICIMCIS 2019, pp 121–124

    Google Scholar 

  12. Muhammad AN, Bukhori S, Pandunata P (2019) Sentiment Analysis of Positive and Negative of YouTube Comments Using Naïve Bayes-Support Vector Machine (NBSVM) Classifier. In: Proceedings International Conference on Computer Science, Information Technology, and Electrical Engineering, ICOMITEE 2019, vol 1, pp 199–205

    Google Scholar 

  13. Risky Novendri CEP, Syafarani Callista A, Naufal Pratama D (2020) Sentiment Analysis of YouTube Movie Trailer Comments Using Naïve Bayes. Bull Comput Sci Electr Eng 1(1):1–5

    Google Scholar 

  14. Aribowo AS, Basiron H, Herman NS, Khomsah S (2020) Fanaticism Category Generation Using Tree-Based Machine Learning Method. J Phys: Conf Ser 1501(1)

    Google Scholar 

  15. Sultana N, Islam MM (2020) Meta Classifier-Based Ensemble Learning For Sentiment Classification. In: Proceedings of International Joint Conference on Computational Intelligence, Algorithms for Intelligent Systems, pp 73–84

    Google Scholar 

  16. Pong-Inwong C, Kaewmak K (2016) Improved Sentiment Analysis for Teaching Evaluation Using Feature Selection and Voting Ensemble Learning Integration Chakrit. In Proceedings 2016 of 2nd IEEE International Conference on Computer and Communications, pp 1222–1225

    Google Scholar 

  17. Onan A, Korukoğlu S, Bulut H (2016) LDA-based Topic Modelling in Text Sentiment Classification: An Empirical Analysis. Int J Comput Linguist Appl 7(1):101–119

    Google Scholar 

  18. Cao Y, Miao Q-G, Liu J-C, Gao L (2013) Advance and Prospects of AdaBoost Algorithm. Acta Autom Sin 39(6):745–758

    Article  Google Scholar 

  19. Alzamzami F, Hoda M, El Saddik A (2020) Light Gradient Boosting Machine for General Sentiment Classification on Short Texts: A Comparative Evaluation. IEEE Access 8:101840–101858

    Article  Google Scholar 

  20. Sharma H, Kumar S (2016) A Survey on Decision Tree Algorithms of Classification in Data Mining. Int J Sci Res (IJSR) 5(4):2094–2097

    Article  MathSciNet  Google Scholar 

  21. Fauzi MA (2018) Random Forest Approach for Sentiment Analysis in Indonesian Language. Indones J Electr Eng Comput Sci 12(1):46–50

    Article  Google Scholar 

  22. Vanegas MI, Ghilardi MF, Kelly SP, Blangero A (2018) Machine learning for EEG-based biomarkers in Parkinson’s disease. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 2661–2665

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siti Khomsah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khomsah, S., Hidayatullah, A.F., Aribowo, A.S. (2021). Comparison of the Effects of Feature Selection and Tree-Based Ensemble Machine Learning for Sentiment Analysis on Indonesian YouTube Comments. In: Triwiyanto, Nugroho, H.A., Rizal, A., Caesarendra, W. (eds) Proceedings of the 1st International Conference on Electronics, Biomedical Engineering, and Health Informatics. Lecture Notes in Electrical Engineering, vol 746. Springer, Singapore. https://doi.org/10.1007/978-981-33-6926-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-33-6926-9_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-33-6925-2

  • Online ISBN: 978-981-33-6926-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics