Skip to main content

The Effect of Noise Elimination and Stemming in Sentiment Analysis for Malay Documents

  • Conference paper
  • First Online:
Proceedings of the International Conference on Computing, Mathematics and Statistics (iCMS 2015)

Abstract

The growth of technology has changed the way of communicating opinions on services and products. In consumerism, the real challenge is to understand the latest trends and summarize the state or general opinions about products due to the diversity and size of social media data such as Twitter, Facebook and online forum. This paper discusses sentiments analysis in Malay documents from three perspectives. First, several alternatives of text representation were investigated. Second, the effects of the pre-processing strategies such as normalization and stemming with two type of Malay stemmer algorithm were highlighted. And lastly, the performance of Naïve Bayes (NB), Support Vector Machine (SVM) and K-nearest neighbour (kNN) classifiers in classifying positive and negative reviews, were compared. The results show that our selection of pre-processing strategies on the reviews slightly increases the performance of the classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Office of Prime Minister Putrajaya Malaysia Website. http://www.pmo.gov.my/home.php?menu=speech&page=1676&news_id=721&speech_cat=2

  2. Liu, B.: Sentiment analysis and subjectivity. In: Handbook of Natural Language Processing (2010)

    Google Scholar 

  3. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1/2), 1–114 (2008)

    Google Scholar 

  4. Liu, B.: Sentiment analysis: a multi-faceted problem. IEEE Intell. Syst. (2010)

    Google Scholar 

  5. Hajmohammadi, M.S., Ibrahim, R., Ali Othman, Z.: Opinion mining and sentiment analysis: a survey. Int. J. Comput. Technol. 2(3) (2012)

    Google Scholar 

  6. Singh, V.K., Piryani, R., Uddin, A.: Sentiment analysis of movie reviews a new feature-based heuristic for aspect-level sentiment classification. IEEE (2013)

    Google Scholar 

  7. Samsudin, N., Puteh, M., Hamdan, A.R.: Best or xbest: mining the Malaysian online review. In: Conference of Data Mining and Optimization (DMO), pp. 28–29, Selangor, Malaysia, (2011)

    Google Scholar 

  8. Samsudin, N., Puteh, M., Hamdan, A.R., Ahmad Nazri, M.Z.: Normalization of common noisy terms in Malaysian online media. In: Knowledge Management International Conference (KMICe), Johor Bahru, Malaysia (2012)

    Google Scholar 

  9. Samsudin, N., Puteh, M., Hamdan, A.R., Ahmad Nazari, M.Z.: Mining opinion in online messages. Int. J. Adv. Comput. Sci. Appl. 4(8) (2013)

    Google Scholar 

  10. Saloot, M.A., Idris, N., Mahmud, R.: An Architecture for Malay Tweet Normalization. Information Processing and Management, vol. 50, pp. 621–633. Elsevier Ltd (2014)

    Google Scholar 

  11. Isa N., Puteh M., & Raja Kamarudin M.H.R.: Sentiment Classification of Malay Newspaper Using Immune Network (SCIN). In: Proceedings of the World Congress on Engineering 2013 Vol III. WCE (2013)

    Google Scholar 

  12. Puteh, M., Isa, N., Puteh, S., Redzuan, N.R.: Sentiment mining of Malay newspaper (SAMNews) using artificial immune system. In: Proceedings of the World Congress on Engineering 2013 Vol III, WCE (2013)

    Google Scholar 

  13. Vallbé, J., Martí, M.A., Blaz Fortuna, A., Dunja Mladenic J., Casanovas P.: Stemming and lemmatization: improving knowledge management through language processing techniques. In: Trends in Legal Knowledge, the Semantic Web and the Regulation of Electronic Social Systems (2007)

    Google Scholar 

  14. Duwairi, R., El-Orfali, M.A.: Study of the effects of pre-processing strategies on sentiment analysis for Arabic text. J. Inf. Sci. 1–14 (2013)

    Google Scholar 

  15. Santos, F.L.D., Ladeira, M.: The role of text pre-processing in opinion mining on a social media language dataset. In: Brazilian Conference on Intelligent Systems IEEE (2014)

    Google Scholar 

  16. Yussupova, N., Bogdanova, D.: Applying of sentiment analysis for texts in russian based on machine learning approach. In: The second International Conference on Advances in information Mining and Management, pp. 8–14. Venice, Italy (2012)

    Google Scholar 

  17. Hussin, S.: Blog. https://supyanhussin.wordpress.com/2009/07/11/bahasa-sms/

  18. Panduan Singkatan Khidmat Pesanan Ringkas. Dewan Bahasa dan Pustaka. http://www.dbp.gov.my/khidmatsms.pdf

  19. Official Website of Malay Literary Reference Center. http://prpm.dbp.gov.my/

  20. Othman, A.: Pengakar Perkataan Melayu untuk Sistem Capaian Dokumen. Unpublished master’s thesis, Universiti Kebangsaan Malaysia, Bangi, Malaysia (1993)

    Google Scholar 

  21. Ahmad, F., Yusoff, M., Sembok, T.M.T.: Experiments with a stemming algorithm for Malay words. J. Am. Soc. Inf. Sci. 47(12), 909–918 (1996). USA

    Article  Google Scholar 

  22. Omar, N., Albared, M., Al-Shabi, A.Q., Almoslmi, T.: Ensemble of classification algorithms for subjectivity and sentiment analysis of Arabic customers’ review. Int. J. Adv. Comput. Technol. (IJACT) 5(14) (2013)

    Google Scholar 

  23. Bharati, P., Kalaivaani, P.C.D.: Incremental learning on sentiment analysis using weakly supervised learning technique. Int. J. Eng. Sci. Innovative Technol. (IJESIT) 3(2) (2014)

    Google Scholar 

  24. Anjaria, M., Reddy Guddeti, R.M.: Influence factor based opinion mining of twitter data using supervised learning. IEEE (2014)

    Google Scholar 

  25. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., Mclachlan, G.J., Ng, A., Liu, B., Philip, S.Y.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008)

    Article  Google Scholar 

  26. Vinodhini, G., Chandrasekaran, R.M.: Sentiment analysis and opinion mining: a survey. IJARCSSE 2(6) (2012)

    Google Scholar 

  27. Haddi, E., Liua, X., Shib, Y.: The role of text pre-processing in sentiment analysis. In: Procedia Computer Science, vol. 17, pp. 26–32, ELSEVIER (2013)

    Google Scholar 

Download references

Acknowledgment

This study is supported by the UKM Grant ICONIC-2013-007 and FRGS-2-2013-ICT02-UKM-02-2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shereena M. Arif .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

M. Arif, S., Mustapha, M. (2017). The Effect of Noise Elimination and Stemming in Sentiment Analysis for Malay Documents. In: Ahmad, AR., Kor, L., Ahmad, I., Idrus, Z. (eds) Proceedings of the International Conference on Computing, Mathematics and Statistics (iCMS 2015). Springer, Singapore. https://doi.org/10.1007/978-981-10-2772-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2772-7_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2770-3

  • Online ISBN: 978-981-10-2772-7

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics