Advertisement

Linguistic Rules Based Approach for Automatic Restoration of Accents on French Texts

  • Paul Brillant Feuto Njonko
  • Sylviane Cardey-Greenfield
  • Peter Greenfield
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7614)

Abstract

Nowadays, in the context of email as well as many other domains, there are more and more French texts wrongly accented or completely unaccented. Furthermore, it should be noted that in French, the accent has a value and a linguistic function. It expresses the language’s subtleties and especially allows avoiding ambiguities and misinterpretation. Even though in most cases the loss of information resulting from the absence of accents is not a major issue for human beings, it is very problematic for automatic processing of text and increases the ambiguity involved in Natural Language Processing. However, it gets tedious to do this manually hence the importance of automatic accent restoration systems. In this perspective, this paper aims at presenting a novel system for the automatic restoration of accents in French texts. Unlike a few existing approaches using statistical methods, our approach is essentially based on linguistic rules that are more reliable.

Keywords

Natural Language Processing Automatic Restoration of Accents Linguistic Rules 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Simard, M.: Réaccentuation automatique de textes français. Centre d’innovation en technologies de l’information (CITI), Laval (1996)Google Scholar
  2. 2.
    El-beze, M., Spriet, T.: Réaccentuation automatique de textes. Laboratoire Informatique d’Avignon, LIA (1996)Google Scholar
  3. 3.
    Mary, V., Le beux, P.: Grepator: Accents & Case Mix for Thesaurus. In: Connecting Medical Informatics and Bio-Informatics: Proceedings of the XIXth International Congress of the European Federation for Medical Informatics, pp. 787–792. IOS Press (2005)Google Scholar
  4. 4.
    Imprimerie, N.: Lexique des règles typographiques en usage à l’Imprimerie nationale. Imprimerie nationale (2002)Google Scholar
  5. 5.
    Grevisse, M., Goosse, A.: le bon usage électronique: grammaire française, 14th edn., de boeck duculot (2007)Google Scholar
  6. 6.
    Doppagne, A.: Majuscules, abréviations, symboles et sigle pour une toilette parfaite du texte, 3e édition, Paris, Bruxelles, Duculot (1998)Google Scholar
  7. 7.
    Bioud, M.: Une normalisation sur l’emploi de la majuscule et sa représentation formelle pour un système de vérification automatique des majuscules dans un texte: thèse de doctorat, Centre de recherche Lucien Tesnière, Université de Franche-Comté (2006)Google Scholar
  8. 8.
    Al-Shafi, B.: Traitement informatique des signes diacritiques, pour une application automatique et didactique: thèse de doctorat, Centre de recherche Lucien Tesnière, Université de Franche-Comté (1996)Google Scholar
  9. 9.
    Feuto, N.P.B.: Rule based approach for normalizing messages in the security domain. In: Natural Language Processing and Human Language Technology, BULAG n36, PUFC (2011) ISSN 0758 6787Google Scholar
  10. 10.
    Cardey, S., Greenfield, P.: A Core Model of Systemic Linguistic Analysis. In: Proceedings of the International Conference RANLP 2005 Recent Advances in Natural Language Processing, Borovets, Bulgaria (September 2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Paul Brillant Feuto Njonko
    • 1
  • Sylviane Cardey-Greenfield
    • 1
  • Peter Greenfield
    • 1
  1. 1.Centre Tesnière - Équipe d’Accueil EA 2283Université de Franche-Comté - UFR SLHSBesançon CedexFrance

Personalised recommendations