Advertisement

Combining Methods for Detecting and Correcting Semantic Hidden Errors in Arabic Texts

  • Chiraz Ben Othmane Zribi
  • Hanene Mejri
  • Mohamed Ben Ahmed
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4394)

Abstract

In this paper, we address the problem of semantic hidden errors in Arabic texts. These are spelling errors occurring in valid words and causing semantic irregularities. We first expose the different types of these errors. Then, we present and argue the adopted approach, which is based on the combination of several methods. Next, we describe the context of our work and show the multi-agent architecture of our system. Finally we present the testing framework used to evaluate the implemented system.

Keywords

Latent Semantic Analysis Angular Distance Textual Corpus Semantic Group Semantic Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Verberne, S.: Context sensitive spell checking based on word trigram probabilities. Master thesis Taal, Spraak & Informatica, University of Nijmegen (2002)Google Scholar
  2. 2.
    Ben Othman, C.: De la synthèse lexicographique à la détection et la correction des graphie fautives arabes. Thèse de doctorat, Université de Paris XI, Orsay (1998)Google Scholar
  3. 3.
    Golding, A.: A Bayesian hybrid method for context-sensitive spelling correction. In: Proceedings of the third Workshop On Very Large Corpora, Cambridge, Massachuses, USA, pp. 39–53 (1995)Google Scholar
  4. 4.
    Golding, A., Schabes, Y.: Combining trigram based and feature based methods for context sensitive spelling correction. In: Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, pp. 71–78 (1996)Google Scholar
  5. 5.
    Golding, A.R., Roth, D.: A winnow-based approach to context-sensitive spelling correction. Machine Learning 34(1-3), 107–130 (1999)CrossRefzbMATHGoogle Scholar
  6. 6.
    Xiaolong, W., Jianhua, L.: Combine trigram and automatic weight distribution in Chinese spelling error correction. Journal of computer Science and Technology 17(Issue 6), Province, China (2001)Google Scholar
  7. 7.
    Bigert, J., Knutsson, O.: Robust Error Detection: A Hybrid Approach Combining Unsupervised Error Detection and Linguistic Knowledge. In: Proceedings of Robust Methods in Analysis of Natural Language Data (ROMAND’02), Frascati, Italy (2002)Google Scholar
  8. 8.
    Bolshakov, I., Gelbukh, A.: On Detection of Malapropisms by Multistage Collocation Testing. In: NLDB-2003. Lecture Notes in Informatics, pp. 28–41. Bonner Köllen Verlag, Bonn (2003)Google Scholar
  9. 9.
    Bolshakov, I.A., Gelbukh, A.: Paronyms for Accelerated Correction of Semantic Errors. International Journal on Information Theories and Applications 10, 11–19 (2003)Google Scholar
  10. 10.
    Gelbukh, A., Bolshakov, I.: On Correction of Semantic Errors in Natural Language Texts with a Dictionary of Literal Paronyms. In: Favela, J., Menasalvas, E., Chávez, E. (eds.) AWIC 2004. LNCS (LNAI), vol. 3034, pp. 105–114. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Bolshakov, I.A., Galicia-Haro, S.N., Gelbukh, A.: Detection and Correction of Malapropisms in Spanish by means of Internet Search. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 115–122. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Ben Othmane, Z.C., Ben Fraj, F., Ben Ahmed, M.: A Multi-Agent System for Detecting and Correcting ”Hidden” Spelling Errors in Arabic Texts. In: NLUCS 2005, pp. 149–154 (2005)Google Scholar
  13. 13.
    Ben Hamadou, A.: Vérification et correction automatique par analyse affixale des textes écrits en langue naturelle: le cas de l’arabe non voyellé. Thèse d’état en informatique, Faculté des Sciences de Tunis (1993)Google Scholar
  14. 14.
    Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  15. 15.
    Mlayeh, I.: Extraction de collocations à partir de corpus textuels en langue arabe. Mémoire de mastère, Ecole nationale des sciences informatiques, Université de la Manouba (2004)Google Scholar
  16. 16.
    Ben Othmane, Z.C., Ben Ahmed, M.: Le contexte au service des graphies fautives arabes. In: TALN 2003, Nantes, pp. 11–14 (2003)Google Scholar
  17. 17.
    Aloulou, C.: Utilisation de l’approche multi-critère pour orienter un processus de correction des erreurs d’accord dans des phrases de la langue arabe non voyellée. Mémoire de DEA, Institut Supérieur de Gestion, Université de Tunis III (1996)Google Scholar
  18. 18.
    Courtin, J., Genthial, D., Menézo, J.: Intégration de strategies de correction dans un système de detection/correction d’erreurs, Colloque Informatique et Langue Naturelle (ILN93), Nantes (1993)Google Scholar
  19. 19.
    Sulaiti, L.: Designing and Developing a Corpus of Contemporary Arabic. Master of Science, School of Computing, University of Leeds, United Kingdom (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Chiraz Ben Othmane Zribi
    • 1
  • Hanene Mejri
    • 1
  • Mohamed Ben Ahmed
    • 1
  1. 1.RIADI laboratory, National School of Computer Sciences, 2010, University of La ManoubaTunisia

Personalised recommendations