Abstract
Word stemmer (or text stemmer) is used to remove bound morphemes from derived words so that various morphological variants are mapped into common base forms. It is usually used as one of the preprocessing tools in text classification, text mining, and information retrieval tasks. Therefore, the design of an effective text stemmer is crucial for ensuring text stemming process maps morphological variants into correct base forms. This paper investigates the design consideration of an effective text stemmer from the perspective of the Malay language. These design considerations are based on current challenges faced by previous researchers in performing text stemming against Malay texts. By adopting these considerations, an effective text stemmer is expected to address common stemming errors and also, expected to produce promising stemming accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kietzmann, J.H., Hermkens, K., McCarthy, I.P., Silvestre, B.S.: Social media? get serious! understanding the functional building blocks of social media. Bus. Horiz. 54(3), 241–251 (2011)
Aggarwal, C.C., Zhai, C. (eds.).: Mining Text Data. Springer Science and Business Media (2012)
Alfred, R., Leong, L.C., On, C.K., Anthony, P.: A literature review and discussion of Malay rule-based affix elimination algorithms. In: The 8th International Conference on Knowledge Management in Organizations, pp. 285–297. Springer, Dordrecht (2014)
Singh, J., Gupta, V.: A systematic review of text stemming techniques. Artif. Intell. Rev. 48(2), 157–217 (2017)
Kassim, M.N., Maarof, M.A., Zainal, A., Wahab, A.A.: Word stemming challenges in Malay texts: a literature review. In: 2016 4th International Conference on Information and Communication Technology (ICoICT), pp. 1–6. IEEE (2016)
Othman, A.: Pengakar Perkataan Melayu untuk Sistem Capaian Dokumen, MSc Thesis. Universiti Kebangsaan Malaysia, Bangi (1993)
Ahmad, F., Yusoff, M., Sembok, T.M.T.: Experiments with a stemming algorithm for Malay words. J. Am. Soc. Inform. Sci. 47(12), 909–918 (1996)
Idris, N., Syed, S.M.F.D.: Stemming for term conflation in Malay texts. In: International Conference on Artificial Intelligence (2001)
Sankupellay, M., Valliappan, S.: Malay language stemmer. Sunway Acad. J. 3, 147–153 (2006)
Yasukawa, M., Lim, H.T., Yokoo, H.: Stemming Malay text and its application in automatic text categorization. IEICE Trans. Inform. Syst. 92(12), 2351–2359 (2009)
Abdullah, M.T., Ahmad, F., Mahmod, R., Sembok, T.M.T.: Rules frequency order stemmer for Malay language. IJCSNS Int. J. Comput. Sci. Netw. Secur. 9(2), 433–438 (2009)
Fadzli, S.A., Norsalehen, A.K., Syarilla, I.A., Hasni, H., Dhalila, M.S.S.: Simple rules Malay stemmer. In: The International Conference on Informatics and Applications (ICIA2012), The Society of Digital Information and Wireless Communication, pp. 28–35 (2012)
Leong, L.C., Basri, S., Alfred, R.: Enhancing Malay stemming algorithm with background knowledge. In: PRICAI 2012: Trends in Artificial Intelligence, pp. 753–758. Springer, Heidelberg (2012)
Lee, J., Othman, R.M., Mohamad, N.Z.: Syllable-based Malay word stemmer. In: Computers and Informatics (ISCI), 2013 IEEE Symposium, pp. 7–11. IEEE (2013)
Darwis, S.A., Abdullah, R., Idris, N.: Exhaustive affix stripping and a Malay word register to solve stemming errors and ambiguity problem in Malay stemmers. Malays. J. Comput. Sci. (2012)
Kassim, M.N., Maarof, M.A., Zainal, A., Wahab, A.A.: Enhanced affixation word stemmer with stemming error reducer to solve affixation stemming errors. J. Telecommun. Electron. Comput. Eng. (JTEC) 8(3), 37–41 (2016)
Kassim, M.N., Jali, S.H.M., Maarof, M.A., Zainal, A.: Towards stemming error reduction for Malay texts. In: Computational Science and Technology, pp. 13–23. Springer, Singapore (2019)
Kassim, M.N., Maarof, M.A., Zainal, A., Wahab, A.A.: enhanced rules application order to stem affixation, reduplication and compounding words in Malay texts. In: Pacific Rim Knowledge Acquisition Workshop, pp. 71–85. Springer, Cham (2016)
Hassan, A.: Morfologi, vol. 13. PTS Professional (2006)
Acknowledgements
The authors would like to thank the Editor in Chief and the anonymous reviewers of the manuscript for their valuable comments and suggestions. This research was funded by Universiti Teknologi Malaysia’s Research University Grant (VUP) PY/2017/01736.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kassim, M.N., Jali, S.H.M., Maarof, M.A., Zainal, A., Wahab, A.A. (2020). Design Consideration of Malay Text Stemmer Using Structured Approach. In: Zhang, YD., Mandal, J., So-In, C., Thakur, N. (eds) Smart Trends in Computing and Communications. Smart Innovation, Systems and Technologies, vol 165. Springer, Singapore. https://doi.org/10.1007/978-981-15-0077-0_43
Download citation
DOI: https://doi.org/10.1007/978-981-15-0077-0_43
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0076-3
Online ISBN: 978-981-15-0077-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)