Advertisement

A Property Grammar-Based Method to Enrich the Arabic Treebank ATB

  • Raja Bensalem Bahloul
  • Kais Haddar
  • Philippe Blache
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 631)

Abstract

We present a method based on the formalism of Property Grammars to enrich the Arabic treebank ATB with syntactic constraints (so-called properties). The Property Grammar formalism is an effectively constraint-based approach that directly specifies the constraints on information categories. This can facilitate the enrichment process. The latter is based on three phases: the problem formalization, the Property Grammar induction from the ATB and the treebank regeneration with a new syntactic property-based representation. The enrichment of the ATB can make it more useful for many NLP applications such as the ambiguity resolution. This allows also the acquisition of new linguistic resources and the ease of the probabilistic parsing process. This enrichment process is purely automatic and independent from any language and source corpus formalism. This motivates its reuse. We obtained good and encouraging experiment results and various properties of different types.

Keywords

Arabic language Property grammar Treebank enrichment 

References

  1. 1.
    Abdul-Mageed, M., Diab, M.: AWATIF: a multi-genre corpus for modern standard Arabic subjectivity and sentiment analysis. In: Language Resources and Evaluation Conference (LREC 2012), Istanbul, Turkey (2012)Google Scholar
  2. 2.
    Alkuhlani, S., Habash, N.: A corpus for modeling morpho-syntactic agreement in Arabic: gender, number and rationality. In: Association for Computational Linguistics (ACL 2011), Portland, Oregon, USA (2011)Google Scholar
  3. 3.
    Alkuhlani, S., Habash, N., Roth, R.: Automatic morphological enrichment of a morphologically underspecified treebank. In: North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL 2013), pp. 460–470, Atlanta, Georgia, USA (2013)Google Scholar
  4. 4.
    Bensalem, R.B., Elkarwi, M.: Induction d’une grammaire de propriétés à granularité variable à partir du treebank arabe ATB. In: Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2014), pp. 124–135, ATALA, ACL-ontology, Marseille, France (2014)Google Scholar
  5. 5.
    Bahloul, R.B., Elkarwi, M., Haddar, K., Blache, P.: Building an Arabic linguistic resource from a treebank: the case of property grammar. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 240–246. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10816-2_30 Google Scholar
  6. 6.
    Blache, P., Rauzy, S.: Hybridization and treebank enrichment with constraint-based representations. In: LREC 2012 - Workshop on Advanced Treebanking, Istanbul, Turkey (2012)Google Scholar
  7. 7.
    Çakıcı, R.: Automatic induction of a CCG grammar for Turkish. In: ACL Student Research Workshop, pp. 73–78, Ann Arbor, Michigan (2005)Google Scholar
  8. 8.
    El-taher, A.I., Abo Bakr, H.M., Zidan, I., Shaalan, K.: An Arabic CCG approach for determining constituent types from Arabic treebank. J. King Saud Univ. Comput. Inf. Sci. 1319–1578 (2014) Google Scholar
  9. 9.
    Habash, N., Rambow O.: Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: ACL, pp. 573–580, Ann Arbor, Michigan (2005)Google Scholar
  10. 10.
    Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: the 90% solution. In: North American Chapter of the Association for Computational Linguistics (NAACL 2006), pp. 57–60, USA (2006)Google Scholar
  11. 11.
    Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic treebank: building a large-scale annotated Arabic corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, Cairo, Egypt (2004)Google Scholar
  12. 12.
    Maruyama, H.: Structural disambiguation with constraint propagation. In: ACL 1990 Workshop on Dependency-based Grammars, pp. 31–38. Pittsburgh, Pennsylvania, USA (1990)Google Scholar
  13. 13.
    Müller, H.H.: Annotation of morphology and NP structure in the Copenhagen Dependency Treebanks (CDT). In: International Workshop on Treebanks and Linguistic Theories, pp. 151–162, University of Tartu, Estonia (2010)Google Scholar
  14. 14.
    Oepen, S., Flickinger, D., Toutanova, K., Manning, C.D.: LinGO redwoods - a rich and dynamic treebank for HPSG. In: LREC 2002 - Workshop on Parsing Evaluation, Las Palmas, Spain (2002)Google Scholar
  15. 15.
    Palmer, M., Babko-Malaya, O., Bies, A., Diab, M., Maamouri, M., Mansouri, A., Zaghouani, W.: A pilot Arabic propbank. In: LREC 2008, Marrakech, Morocco (2008)Google Scholar
  16. 16.
    Pollard, C., Sag, I.: Head-driven Phrase Structure Grammars. Chicago University Press, Chicago (1994)Google Scholar
  17. 17.
    Tounsi, L., Attia, M., Van-Genabith, J.: Automatic treebank-based acquisition of Arabic LFG dependency structures. In: The European Chapter of the ACL (EACL) Workshop on Computational Approaches to Semitic Languages, pp. 45–52, Greece (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Raja Bensalem Bahloul
    • 1
  • Kais Haddar
    • 1
  • Philippe Blache
    • 2
  1. 1.Multimedia InfoRmation Systems and Advanced Computing LaboratoryHigher Institute of Computer Science and MultimediaSfaxTunisia
  2. 2.Laboratoire Parole et Langage, CNRSUniversité de ProvenceMarseilleFrance

Personalised recommendations