Abstract
Text stemmer is one of useful language preprocessing tools in the field of information retrieval, text mining and natural language processing. It is used to map morphological variants of words into base forms. Most of the current text stemmers for the Malay language focused on removing affixes, clitics, and particles from affixation words. However, these stemmers still suffered from stemming errors due to insufficiently address the root cause of these stemming errors. This paper investigates the root cause of stemming errors and proposes stemming technique to address possible stemming errors. The proposed text stemmer uses affixes removal method and multiple dictionary lookup to address various root causes of stemming errors. The experimental results showed promising stemming accuracy in reducing various possible stemming errors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Singh, J., & Gupta, V.: A systematic review of text stemming techniques. Artificial Intelligence Review, 48(2), 157-217 (2017).
Alfred, R., Ren, L. J., and Obit, J. H.: Assessing Factors that Influence the Performances of Automated Topic Selection for Malay Articles. International Conference on Soft Computing in Data Science, 300-309. Springer, Singapore (2016).
Willett, P. 2006. The Porter stemming algorithm: then and now. Program, 40(3), 219-223.
Hassan, A.: Morfologi (Vol. 13). PTS Professional (2006).
Ahmad, F., Yusoff, M., and Sembok, T. M.T: Experiments with a Stemming Algorithm for Malay Words. Journal of the American Society for Information Science, 47(12), 909-918 (1996)
Alfred, R., Leong, L. C., On, C. K., and Anthony, P.: A Literature Review and Discussion of Malay Rule-Based Affix Elimination Algorithms. The 8th International Conference on Knowledge Management in Organizations, 285-297. Springer, Netherlands (2014).
Darwis, S. A., Abdullah, R., and Idris, N.: Exhaustive Affix Stripping and A Malay Word Register to Solve Stemming Errors and Ambiguity Problem in Malay Stemmers. Malaysian Journal of Computer Science (2012).
Lovins, J. B.: Development of a stemming algorithm. MIT Information Processing Group, Electronic Systems Laboratory (1968).
Othman, A.: Pengakar Perkataan Melayu untuk Sistem Capaian Dokumen, MSc Thesis. Universiti Kebangsaan Malaysia. Bangi (1993).
Fadzli, S. A., Norsalehen, A. K., Syarilla, I. A., Hasni, H., and Dhalila, M. S. S.: Simple Rules Malay Stemmer. The International Conference on Informatics and Applications (ICIA2012), The Society of Digital Information and Wireless Communication, 28-35 (2012).
Idris, N., and Syed, S. M. F. D.: Stemming for Term Conflation in Malay Texts. International Conference on Artificial Intelligence (2001).
Yasukawa, M., Lim, H. T., and Yokoo, H.: Stemming Malay Text and Its Application in Automatic Text Categorization. IEICE transactions on information and systems, 92(12), 2351-2359 (2009).
Abdullah, M. T., Ahmad, F., Mahmod, R., and Sembok, T. M. T.: Rules frequency order stemmer for Malay language. IJCSNS International Journal of Computer Science and Network Security, 9(2), 433-438 (2009).
Leong, L. C., Basri, S., and Alfred, R.: Enhancing Malay Stemming Algorithm with Background Knowledge. PRICAI 2012: Trends in Artificial Intelligence, 753-758. Springer, Berlin Heidelberg (2012).
Sankupellay, M., and Valliappan, S.: Malay Language Stemmer. Sunway Academic Journal, 3, 147-153 (2006).
Lee, J., Othman, R. M., and Mohamad, N. Z. 2013. Syllable-based Malay word stemmer. Computers & Informatics (ISCI), 2013 IEEE Symposium, 7-11. IEEE (2013).
Sembok, T. M. T., Yussoff, M., and Ahmad, F.: A Malay Stemming Algorithm for Information Retrieval. Proceedings of the 4th International Conference and Exhibition on Multilingual Computing, Vol. 5, 2-1 (1994).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kassim, M.N., Jali, S.H.M., Maarof, M.A., Zainal, A. (2019). Towards Stemming Error Reduction for Malay Texts. In: Alfred, R., Lim, Y., Ibrahim, A., Anthony, P. (eds) Computational Science and Technology. Lecture Notes in Electrical Engineering, vol 481. Springer, Singapore. https://doi.org/10.1007/978-981-13-2622-6_2
Download citation
DOI: https://doi.org/10.1007/978-981-13-2622-6_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2621-9
Online ISBN: 978-981-13-2622-6
eBook Packages: EngineeringEngineering (R0)