Abstract
The growth of technology has changed the way of communicating opinions on services and products. In consumerism, the real challenge is to understand the latest trends and summarize the state or general opinions about products due to the diversity and size of social media data such as Twitter, Facebook and online forum. This paper discusses sentiments analysis in Malay documents from three perspectives. First, several alternatives of text representation were investigated. Second, the effects of the pre-processing strategies such as normalization and stemming with two type of Malay stemmer algorithm were highlighted. And lastly, the performance of Naïve Bayes (NB), Support Vector Machine (SVM) and K-nearest neighbour (kNN) classifiers in classifying positive and negative reviews, were compared. The results show that our selection of pre-processing strategies on the reviews slightly increases the performance of the classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Office of Prime Minister Putrajaya Malaysia Website. http://www.pmo.gov.my/home.php?menu=speech&page=1676&news_id=721&speech_cat=2
Liu, B.: Sentiment analysis and subjectivity. In: Handbook of Natural Language Processing (2010)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1/2), 1–114 (2008)
Liu, B.: Sentiment analysis: a multi-faceted problem. IEEE Intell. Syst. (2010)
Hajmohammadi, M.S., Ibrahim, R., Ali Othman, Z.: Opinion mining and sentiment analysis: a survey. Int. J. Comput. Technol. 2(3) (2012)
Singh, V.K., Piryani, R., Uddin, A.: Sentiment analysis of movie reviews a new feature-based heuristic for aspect-level sentiment classification. IEEE (2013)
Samsudin, N., Puteh, M., Hamdan, A.R.: Best or xbest: mining the Malaysian online review. In: Conference of Data Mining and Optimization (DMO), pp. 28–29, Selangor, Malaysia, (2011)
Samsudin, N., Puteh, M., Hamdan, A.R., Ahmad Nazri, M.Z.: Normalization of common noisy terms in Malaysian online media. In: Knowledge Management International Conference (KMICe), Johor Bahru, Malaysia (2012)
Samsudin, N., Puteh, M., Hamdan, A.R., Ahmad Nazari, M.Z.: Mining opinion in online messages. Int. J. Adv. Comput. Sci. Appl. 4(8) (2013)
Saloot, M.A., Idris, N., Mahmud, R.: An Architecture for Malay Tweet Normalization. Information Processing and Management, vol. 50, pp. 621–633. Elsevier Ltd (2014)
Isa N., Puteh M., & Raja Kamarudin M.H.R.: Sentiment Classification of Malay Newspaper Using Immune Network (SCIN). In: Proceedings of the World Congress on Engineering 2013 Vol III. WCE (2013)
Puteh, M., Isa, N., Puteh, S., Redzuan, N.R.: Sentiment mining of Malay newspaper (SAMNews) using artificial immune system. In: Proceedings of the World Congress on Engineering 2013 Vol III, WCE (2013)
Vallbé, J., Martí, M.A., Blaz Fortuna, A., Dunja Mladenic J., Casanovas P.: Stemming and lemmatization: improving knowledge management through language processing techniques. In: Trends in Legal Knowledge, the Semantic Web and the Regulation of Electronic Social Systems (2007)
Duwairi, R., El-Orfali, M.A.: Study of the effects of pre-processing strategies on sentiment analysis for Arabic text. J. Inf. Sci. 1–14 (2013)
Santos, F.L.D., Ladeira, M.: The role of text pre-processing in opinion mining on a social media language dataset. In: Brazilian Conference on Intelligent Systems IEEE (2014)
Yussupova, N., Bogdanova, D.: Applying of sentiment analysis for texts in russian based on machine learning approach. In: The second International Conference on Advances in information Mining and Management, pp. 8–14. Venice, Italy (2012)
Hussin, S.: Blog. https://supyanhussin.wordpress.com/2009/07/11/bahasa-sms/
Panduan Singkatan Khidmat Pesanan Ringkas. Dewan Bahasa dan Pustaka. http://www.dbp.gov.my/khidmatsms.pdf
Official Website of Malay Literary Reference Center. http://prpm.dbp.gov.my/
Othman, A.: Pengakar Perkataan Melayu untuk Sistem Capaian Dokumen. Unpublished master’s thesis, Universiti Kebangsaan Malaysia, Bangi, Malaysia (1993)
Ahmad, F., Yusoff, M., Sembok, T.M.T.: Experiments with a stemming algorithm for Malay words. J. Am. Soc. Inf. Sci. 47(12), 909–918 (1996). USA
Omar, N., Albared, M., Al-Shabi, A.Q., Almoslmi, T.: Ensemble of classification algorithms for subjectivity and sentiment analysis of Arabic customers’ review. Int. J. Adv. Comput. Technol. (IJACT) 5(14) (2013)
Bharati, P., Kalaivaani, P.C.D.: Incremental learning on sentiment analysis using weakly supervised learning technique. Int. J. Eng. Sci. Innovative Technol. (IJESIT) 3(2) (2014)
Anjaria, M., Reddy Guddeti, R.M.: Influence factor based opinion mining of twitter data using supervised learning. IEEE (2014)
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., Mclachlan, G.J., Ng, A., Liu, B., Philip, S.Y.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008)
Vinodhini, G., Chandrasekaran, R.M.: Sentiment analysis and opinion mining: a survey. IJARCSSE 2(6) (2012)
Haddi, E., Liua, X., Shib, Y.: The role of text pre-processing in sentiment analysis. In: Procedia Computer Science, vol. 17, pp. 26–32, ELSEVIER (2013)
Acknowledgment
This study is supported by the UKM Grant ICONIC-2013-007 and FRGS-2-2013-ICT02-UKM-02-2.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
M. Arif, S., Mustapha, M. (2017). The Effect of Noise Elimination and Stemming in Sentiment Analysis for Malay Documents. In: Ahmad, AR., Kor, L., Ahmad, I., Idrus, Z. (eds) Proceedings of the International Conference on Computing, Mathematics and Statistics (iCMS 2015). Springer, Singapore. https://doi.org/10.1007/978-981-10-2772-7_10
Download citation
DOI: https://doi.org/10.1007/978-981-10-2772-7_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2770-3
Online ISBN: 978-981-10-2772-7
eBook Packages: EducationEducation (R0)