Abstract
Spell checking is the process of detecting misspelled words in a written text and recommending alternative spellings. The first stage consists of detecting real-word errors and non-word errors in a given text. The second stage consists of error correction. In this paper we propose a novel method for spell checking Arabic text. Our system is a sequential combination of approaches including lexicon based, rule based, and statistical based. The experimental results show that the proposed method achieved good performance in terms of recall rate or precision rate in error detection, and correction comparing to other systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
MADAMIRA is a system developed for morphological analysis and Disambiguation of Arabic text.
- 2.
MultiUN.ar: a corpus available at http://www.euromatrixplus.net/multi-un/.
- 3.
- 4.
The list is freely available at: http://sourceforge.net/projects/arabic-wordlist/.
- 5.
The Ghalatawi autocorrect program is available as an open source program at http://ghalatawi.sourceforge.net.
- 6.
NE list: Available at www.github.com/mouradmars/Named_Entities_Project.
- 7.
SRILM: Language Model Toolkit http://www.speech.sri.com/projects/srilm/.
References
Alkanhal, M.I., Al-Badrashiny, M.A., Alghamdi, M.M., Al-Qabbany, A.O.: Automatic stochastic Arabic spelling correction with emphasis on space insertions and deletions. In: proceeding of IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 7 (2012)
AlShenaifi, N., AlNefie, R., Al-Yahya, M., Al-Khalifa, H.: ARIB@QALB-2015 shared task: a hybrid cascade model for Arabic spelling error detection and correction. In: Proceedings of ACL Workshop on Arabic Natural Language Processing, Beijing, China (2015)
Attia, M., Al-Badrashiny, M., Diab, M.: GWU-HASP-2015: priming spelling candidates with probability. In: Proceedings of ACL Workshop on Arabic Natural Language Processing, Beijing, China (2015)
Zribi, C.B.O., Ahmed, M.B.: Efficient automatic correction of misspelled Arabic words based on contextual information. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003 Part I. LNAI, vol. 2773, pp. 770–777. Springer, Heidelberg (2003)
Bouamor, H., Sajjad, H., Durrani, N., Oflazer, K.: QCMUQ@QALB-2015 shared task: combining character level MT and error-tolerant finite-state recognition for Arabic spelling correction. In: Proceedings of ACL Workshop on Arabic Natural Language Processing, Beijing, China (2015)
Bougares, F., Bouamor, H.: UMMU@QALB-2015 shared task: character and word level SMT pipeline for automatic error correction of Arabic text. In: Proceedings of ACL Workshop on Arabic Natural Language Processing, Beijing, China (2015)
Brill, E., Moore, R.: An improved error model for noisy channel spelling correction. In: Proceedings of ACL, pp. 286–293 (2000)
Church, K., Gale, W.: Probability scoring for spelling correction. Stat. Comput. 1, 93–103 (1991)
Dahlmeier, D., Ng, H.T.: Better evaluation for grammatical error correction. In: Proceedings of NAACL (2012)
Farra, N., Tomeh, N., Rozovskaya, A., Habash, N.: Generalized character-levelspelling error correction. In: Proceedings of Conference of the Associationfor Computational Linguistics (2014)
Fossati, D., Di Eugenio, B.: A Mixed Trigrams Approach for context sensitive spell checking. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 623–633. Springer, Heidelberg (2007)
Habash, N., Mohit, B., Obeid, O., Oflazer, K., Tomeh, N., Zaghouani, W.: QALB: Qatar Arabic Language bank. In: Proceedings of Qatar Annual Research Conference (2013)
Haddad, B., Mustafa, Y.: Detection and correction of non-words in Arabic: a hybrid approach. Int. J. Comput. Process. Orient. Lang. 20(4), 237–257 (2007)
Hassan, Y., Aly, M., Atiya, A.: Arabic spelling correction using supervised learning. In: Proceedings of EMNLP 2014 Workshop on Arabic Natural Language (2014)
Ibrahim, M.N., Ragheb, M.M.: CUFE@QALB-2015 shared task: Arabic error correction system. In: Proceedings of ACL Workshop on Arabic Natural Language Processing, Beijing, China (2015)
Islam, A., Inkpen, D.: Real-word spelling correction using Google Web IT 3-grams. In: Proceedings of EMPLN 2009, pp. 1241–1249. ACL (2009)
Kukich, K.: Techniques for automatically correcting words in text. ACM Comput. Surv. 24(4), 377–439 (1992)
Mars, M., Antoniadis, G., Zrigui, M.: Statistical part of speech tagger for Arabic language. In: ICAI - The 2010 International Conference on Artificial Intelligence (2010)
Mars, M., Antoniadis, G., Zrigui, M.: Which algorithm and approach for Arabic part of speech tagging. J. Res. Comput. Sci. (JRCS) (CITII) 50, 235–245 (2010)
Mohit, B., Rozovskaya, A., Habash, N., Zaghouani, W., Obeid, O.: The first QALB shared task on automatic text correction for Arabic. In: Proceedings of EMNLP 2014 Workshop on Arabic Natural Language (2014)
Muaidi, H., Al-Tarawneh, R.: Towards Arabic spell-checker based on N-grams scores. Int. J. Comput. Appl. 53(3), 12–16 (2012)
Mubarak, H., Darwish, K.: Automatic correction of Arabic text a cascaded approach. In: Proceedings of EMNLP 2014 Workshop on Arabic Natural Language (2014)
Mubarak, H., Darwish, K., Abdelali, A.: QCRI@QALB-2015 shared task: correction of Arabic text for native and non-native speakers errors. In: Proceedings of ACL Workshop on Arabic Natural Language Processing, Beijing, China (2015)
Ng, H.T., Wu, S.M., Wu, Y., Hadiwinoto, C., Tetreault, J.: The CoNLL-2013 shared task on grammatical error correction. In: Proceedings of CoNLL-2013 Shared Task (2013)
Obeid, O., Zaghouani, W., Mohit, B., Habash, N., Oflazer, K., Tomeh, N.: A web-based annotation framework for large-scale text correction. In: Proceedings of IJCNLP (2013)
Pasha, A., Al-Badrashiny, M., Kholy, A.E., Eskander, R., Diab, M., Habash, N., Pooleery, M., Rambow, O., Roth, R.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: Proceedings of LREC (2014)
Rozovskaya, A., Habash, N., Eskander, R., Farra, N., Salloum, W.: The Columbia system in the QALB-2014 shared task on Arabic error correction. In: Proceedings of EMNLP 2014 Workshop on Arabic Natural Language (2014)
Shaalan, K., Allam, A., Gomah, A.: Towards automatic spell checking for Arabic. In: Proceedings of 4th Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), pp. 240–247 (2003)
Stolcke, A., Zheng, J., Wang, W., Abrash, V.: SRILM at sixteen: update and outlook. In: Proceeding of IEEE Automatic Speech Recognition and Understanding Workshop (2011)
Tong, X., Evans, D.A.: A statistical approach to automatic OCR error correction in context. In: 4th Workshop on Very Large Corpora (1996)
Wasala, A., Weerasinghe, R., Pushpananda, R., Liyanage, C., Jayalatharachchi, E.: A data-driven approach to checking and correcting spelling errors in Sinhala. Int. J. Adv. ICT Emerg. Reg. 3, 11–24 (2010)
Zaghouani, W., Mohit, B., Habash, N., Obeid, O., Tomeh, N., Rozovskaya, A., Farra, N., Alkuhlani, S., Oflazer, K.: Large scale Arabic error annotation: guidelines and framework, In: Proceedings of 9th International Conference on Language Resources and Evaluation (LREC 2014) (2014)
Zrigui, M., Ayadi, R., Maraoui, M., Mars, M.: Arabic text classification framework based on Latent Dirichlet allocation. CIT J. 20, 125–140 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Mars, M. (2016). Toward a Robust Spell Checker for Arabic Text. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2016. ICCSA 2016. Lecture Notes in Computer Science(), vol 9790. Springer, Cham. https://doi.org/10.1007/978-3-319-42092-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-42092-9_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42091-2
Online ISBN: 978-3-319-42092-9
eBook Packages: Computer ScienceComputer Science (R0)