Skip to main content

Toward a Robust Spell Checker for Arabic Text

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2016 (ICCSA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9790))

Included in the following conference series:

Abstract

Spell checking is the process of detecting misspelled words in a written text and recommending alternative spellings. The first stage consists of detecting real-word errors and non-word errors in a given text. The second stage consists of error correction. In this paper we propose a novel method for spell checking Arabic text. Our system is a sequential combination of approaches including lexicon based, rule based, and statistical based. The experimental results show that the proposed method achieved good performance in terms of recall rate or precision rate in error detection, and correction comparing to other systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    MADAMIRA is a system developed for morphological analysis and Disambiguation of Arabic text.

  2. 2.

    MultiUN.ar: a corpus available at http://www.euromatrixplus.net/multi-un/.

  3. 3.

    http://aracomlex.sourceforge.net/.

  4. 4.

    The list is freely available at: http://sourceforge.net/projects/arabic-wordlist/.

  5. 5.

    The Ghalatawi autocorrect program is available as an open source program at http://ghalatawi.sourceforge.net.

  6. 6.

    NE list: Available at www.github.com/mouradmars/Named_Entities_Project.

  7. 7.

    SRILM: Language Model Toolkit http://www.speech.sri.com/projects/srilm/.

References

  1. Alkanhal, M.I., Al-Badrashiny, M.A., Alghamdi, M.M., Al-Qabbany, A.O.: Automatic stochastic Arabic spelling correction with emphasis on space insertions and deletions. In: proceeding of IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 7 (2012)

    Google Scholar 

  2. AlShenaifi, N., AlNefie, R., Al-Yahya, M., Al-Khalifa, H.: ARIB@QALB-2015 shared task: a hybrid cascade model for Arabic spelling error detection and correction. In: Proceedings of ACL Workshop on Arabic Natural Language Processing, Beijing, China (2015)

    Google Scholar 

  3. Attia, M., Al-Badrashiny, M., Diab, M.: GWU-HASP-2015: priming spelling candidates with probability. In: Proceedings of ACL Workshop on Arabic Natural Language Processing, Beijing, China (2015)

    Google Scholar 

  4. Zribi, C.B.O., Ahmed, M.B.: Efficient automatic correction of misspelled Arabic words based on contextual information. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003 Part I. LNAI, vol. 2773, pp. 770–777. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Bouamor, H., Sajjad, H., Durrani, N., Oflazer, K.: QCMUQ@QALB-2015 shared task: combining character level MT and error-tolerant finite-state recognition for Arabic spelling correction. In: Proceedings of ACL Workshop on Arabic Natural Language Processing, Beijing, China (2015)

    Google Scholar 

  6. Bougares, F., Bouamor, H.: UMMU@QALB-2015 shared task: character and word level SMT pipeline for automatic error correction of Arabic text. In: Proceedings of ACL Workshop on Arabic Natural Language Processing, Beijing, China (2015)

    Google Scholar 

  7. Brill, E., Moore, R.: An improved error model for noisy channel spelling correction. In: Proceedings of ACL, pp. 286–293 (2000)

    Google Scholar 

  8. Church, K., Gale, W.: Probability scoring for spelling correction. Stat. Comput. 1, 93–103 (1991)

    Article  Google Scholar 

  9. Dahlmeier, D., Ng, H.T.: Better evaluation for grammatical error correction. In: Proceedings of NAACL (2012)

    Google Scholar 

  10. Farra, N., Tomeh, N., Rozovskaya, A., Habash, N.: Generalized character-levelspelling error correction. In: Proceedings of Conference of the Associationfor Computational Linguistics (2014)

    Google Scholar 

  11. Fossati, D., Di Eugenio, B.: A Mixed Trigrams Approach for context sensitive spell checking. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 623–633. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. Habash, N., Mohit, B., Obeid, O., Oflazer, K., Tomeh, N., Zaghouani, W.: QALB: Qatar Arabic Language bank. In: Proceedings of Qatar Annual Research Conference (2013)

    Google Scholar 

  13. Haddad, B., Mustafa, Y.: Detection and correction of non-words in Arabic: a hybrid approach. Int. J. Comput. Process. Orient. Lang. 20(4), 237–257 (2007)

    Article  Google Scholar 

  14. Hassan, Y., Aly, M., Atiya, A.: Arabic spelling correction using supervised learning. In: Proceedings of EMNLP 2014 Workshop on Arabic Natural Language (2014)

    Google Scholar 

  15. Ibrahim, M.N., Ragheb, M.M.: CUFE@QALB-2015 shared task: Arabic error correction system. In: Proceedings of ACL Workshop on Arabic Natural Language Processing, Beijing, China (2015)

    Google Scholar 

  16. Islam, A., Inkpen, D.: Real-word spelling correction using Google Web IT 3-grams. In: Proceedings of EMPLN 2009, pp. 1241–1249. ACL (2009)

    Google Scholar 

  17. Kukich, K.: Techniques for automatically correcting words in text. ACM Comput. Surv. 24(4), 377–439 (1992)

    Article  Google Scholar 

  18. Mars, M., Antoniadis, G., Zrigui, M.: Statistical part of speech tagger for Arabic language. In: ICAI - The 2010 International Conference on Artificial Intelligence (2010)

    Google Scholar 

  19. Mars, M., Antoniadis, G., Zrigui, M.: Which algorithm and approach for Arabic part of speech tagging. J. Res. Comput. Sci. (JRCS) (CITII) 50, 235–245 (2010)

    Google Scholar 

  20. Mohit, B., Rozovskaya, A., Habash, N., Zaghouani, W., Obeid, O.: The first QALB shared task on automatic text correction for Arabic. In: Proceedings of EMNLP 2014 Workshop on Arabic Natural Language (2014)

    Google Scholar 

  21. Muaidi, H., Al-Tarawneh, R.: Towards Arabic spell-checker based on N-grams scores. Int. J. Comput. Appl. 53(3), 12–16 (2012)

    Google Scholar 

  22. Mubarak, H., Darwish, K.: Automatic correction of Arabic text a cascaded approach. In: Proceedings of EMNLP 2014 Workshop on Arabic Natural Language (2014)

    Google Scholar 

  23. Mubarak, H., Darwish, K., Abdelali, A.: QCRI@QALB-2015 shared task: correction of Arabic text for native and non-native speakers errors. In: Proceedings of ACL Workshop on Arabic Natural Language Processing, Beijing, China (2015)

    Google Scholar 

  24. Ng, H.T., Wu, S.M., Wu, Y., Hadiwinoto, C., Tetreault, J.: The CoNLL-2013 shared task on grammatical error correction. In: Proceedings of CoNLL-2013 Shared Task (2013)

    Google Scholar 

  25. Obeid, O., Zaghouani, W., Mohit, B., Habash, N., Oflazer, K., Tomeh, N.: A web-based annotation framework for large-scale text correction. In: Proceedings of IJCNLP (2013)

    Google Scholar 

  26. Pasha, A., Al-Badrashiny, M., Kholy, A.E., Eskander, R., Diab, M., Habash, N., Pooleery, M., Rambow, O., Roth, R.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: Proceedings of LREC (2014)

    Google Scholar 

  27. Rozovskaya, A., Habash, N., Eskander, R., Farra, N., Salloum, W.: The Columbia system in the QALB-2014 shared task on Arabic error correction. In: Proceedings of EMNLP 2014 Workshop on Arabic Natural Language (2014)

    Google Scholar 

  28. Shaalan, K., Allam, A., Gomah, A.: Towards automatic spell checking for Arabic. In: Proceedings of 4th Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), pp. 240–247 (2003)

    Google Scholar 

  29. Stolcke, A., Zheng, J., Wang, W., Abrash, V.: SRILM at sixteen: update and outlook. In: Proceeding of IEEE Automatic Speech Recognition and Understanding Workshop (2011)

    Google Scholar 

  30. Tong, X., Evans, D.A.: A statistical approach to automatic OCR error correction in context. In: 4th Workshop on Very Large Corpora (1996)

    Google Scholar 

  31. Wasala, A., Weerasinghe, R., Pushpananda, R., Liyanage, C., Jayalatharachchi, E.: A data-driven approach to checking and correcting spelling errors in Sinhala. Int. J. Adv. ICT Emerg. Reg. 3, 11–24 (2010)

    Google Scholar 

  32. Zaghouani, W., Mohit, B., Habash, N., Obeid, O., Tomeh, N., Rozovskaya, A., Farra, N., Alkuhlani, S., Oflazer, K.: Large scale Arabic error annotation: guidelines and framework, In: Proceedings of 9th International Conference on Language Resources and Evaluation (LREC 2014) (2014)

    Google Scholar 

  33. Zrigui, M., Ayadi, R., Maraoui, M., Mars, M.: Arabic text classification framework based on Latent Dirichlet allocation. CIT J. 20, 125–140 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mourad Mars .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Mars, M. (2016). Toward a Robust Spell Checker for Arabic Text. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2016. ICCSA 2016. Lecture Notes in Computer Science(), vol 9790. Springer, Cham. https://doi.org/10.1007/978-3-319-42092-9_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42092-9_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42091-2

  • Online ISBN: 978-3-319-42092-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics