A Lexical Database for Modern Standard Arabic Interoperable with a Finite State Morphological Transducer

  • Mohammed Attia
  • Pavel Pecina
  • Antonio Toral
  • Lamia Tounsi
  • Josef van Genabith
Conference paper

DOI: 10.1007/978-3-642-23138-4_7

Part of the Communications in Computer and Information Science book series (CCIS, volume 100)
Cite this paper as:
Attia M., Pecina P., Toral A., Tounsi L., van Genabith J. (2011) A Lexical Database for Modern Standard Arabic Interoperable with a Finite State Morphological Transducer. In: Mahlow C., Piotrowski M. (eds) Systems and Frameworks for Computational Morphology. SFCM 2011. Communications in Computer and Information Science, vol 100. Springer, Berlin, Heidelberg

Abstract

Current Arabic lexicons, whether computational or otherwise, make no distinction between entries from Modern Standard Arabic (MSA) and Classical Arabic (CA), and tend to include obsolete words that are not attested in current usage. We address this problem by building a large-scale, corpus-based lexical database that is representative of MSA. We use an MSA corpus of 1,089,111,204 words, a pre-annotation tool, machine learning techniques, and knowledge-based templatic matching to automatically acquire and filter lexical knowledge about morpho-syntactic attributes and inflection paradigms. Our lexical database is scalable, interoperable and suitable for constructing a morphological analyser, regardless of the design approach and programming language used. The database is formatted according to the international ISO standard in lexical resource representation, the Lexical Markup Framework (LMF). This lexical database is used in developing an open-source finite-state morphological processing toolkit. We build a web application, AraComLex (Arabic Computer Lexicon), for managing and curating the lexical database.

Keywords

Arabic Lexical Database Modern Standard Arabic Arabic morphology Arabic Morphological Transducer 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Mohammed Attia
    • 1
  • Pavel Pecina
    • 1
  • Antonio Toral
    • 1
  • Lamia Tounsi
    • 1
  • Josef van Genabith
    • 1
  1. 1.School of ComputingDublin City UniversityDublinIreland

Personalised recommendations