Skip to main content
Log in

Parsing Arabic using induced probabilistic context free grammar

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The importance of the parsing task for NLP applications is well understood. However developing parsers remains difficult because of the complexity of the Arabic language. Most parsers are based on syntactic grammars that describe the syntactic structures of a language. The development of these grammars is laborious and time consuming. In this paper we present our method for building an Arabic parser based on an induced grammar, PCFG grammar. We first induce the PCFG grammar from an Arabic Treebank. Then, we implement the parser that assigns syntactic structure to each input sentence. The parser is tested on sentences extracted from the treebank (1650 sentences).We calculate the precision, recall and f-measure. Our experimental results showed the efficiency of the proposed parser for parsing modern standard Arabic sentences (Precision: 83.59 %, Recall: 82.98 % and F-measure: 83.23 %).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Aloulou, C. (2005). Une approche multi-agent pour l’analyse de l’arabe : Modélisation de la syntaxe. Doctoral dissertation, University of Manouba, Tunisia

  • Alqrainy, S., Muaidi, H., & Alkoffash, M. S. (2012). Context-free grammar analysis for Arabic sentences. International Journal of Computer Applications, 53(3), 7–11.

    Article  Google Scholar 

  • Al-Taani, A., Msallam, M., & Wedian, S. (2012). A top-down chart parser for analyzing Arabic sentences. The International Arab Journal of Information Technology, 9, 109–116.

    Google Scholar 

  • Bataineh, B. M., & Bataineh, E. A. (2009). An efficient recursive transition network parser for Arabic language. In Proceedings of the World Congress on Engineering, vol 2 (pp. 1–3)

  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python. Sebastopol: O’Reilly Media Inc.

    MATH  Google Scholar 

  • Buckwalter T. (2004). ‘Buckwalter Arabic morphological analyzer version 2.0′.

  • Debili, F., Achour, H., & Souissi, E. (2001). La langue Arabe et l’ordinateur: De l’etiquetage grammatical à la voyellation automatique, Correspondances 71 (1), Lyon, (pp. 1–20).

  • Green, S., and Manning, C. D. (2010). Better Arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd international conference on computational linguistics (pp. 394–402). Baltimore: Association for Computational Linguistics.

  • Habash, N. Y. (2010). Introduction to Arabic Natural Language Processing. Synthesis Lectures on Human Language Technologies, G. Hirst, (Series Ed). 3(1).

  • Habash, N. Y., & Roth, R. M. (2009). Catib: The Columbia Arabic Treebank. In Proceedings of the ACL-IJCNLP 2009 conference short papers (pp. 221–224). Stroudsburg, PA: Association for Computational Linguistics.

  • Hajic, J., Vidová-Hladká, B., & Pajas, P. (2001). The Prague dependency treebank: Annotation structure and support. In Proceedings of the IRCS workshop on linguistic databases (pp. 105–114).

  • Khoufi, N., Aloulou, C., & Hadrich Belguith, L. (2014) Chunking Arabic texts using conditional random fields, In Proceedings of the 11th ACS/IEEE international conference on computer systems and applications (AICCSA 2014) (pp. 428–432), November 2014, Doha.

  • Khoufi, N., Louati, S., Aloulou, C., & Hadrich Belguith, L.(2013) Supervised learning model for parsing Arabic language, In Proceedings of the 10th International workshop on natural language processing and cognitive science (NLPCS 2013) (pp. 129–136), Marseille.

  • Klein, D., & Manning, C. D. (2003). Fast exact inference with a factored model for natural language parsing. Advances in Neural Information Processing Systems 15 (NIPS 2002), Cambridge (pp. 3–10). MA: MIT Press.

    Google Scholar 

  • Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004). The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus. The NEMLAR conference on Arabic language resources and tools, pp. 102–109.

  • Maamouri, M., Bies, A. and Kulick, S. (2008). Enhancing the Arabic Treebank: A collaborative effort toward new annotation guidelines. In Proceedings of the sixth international conference on language resources and evaluation (LREC 2008), Marrakech May 28-30, 2008.

  • Maamouri M., Bies A., Kulick S., Krouna S., Gaddeche F. & Zaghouani W. (2010). Arabic Treebank: Part 3 v 3.2 LDC2010T08. Web Download. Philadelphia: Linguistic Data Consortium.

  • McCord, M. C., & Cavalli-Sforza, V. (2007). An arabic slot grammar parser. In Proceedings of the 2007 Workshop on computational approaches to semitic languages: Common issues and resources (pp. 81–88). Baltimore: Association for Computational Linguistics.

  • Othman, E., Shaalan, K., and Rafea, A. (2003). A chart parser for analyzing Modern Standard Arabic sentences. In Proceedings of the MT summit IX workshop on machine translation for semitic languages: issues and approaches (pp. 37–44).

  • Ouersighni, R. (2001). A major offshoot of the DIINAR-MBC project: AraParse, a morphosyntactic analyzer for unvowelled Arabic texts. ACL 39th Annual Meeting, Stroudsburg (pp. 9–16). Association for Computational Linguistics: PA.

    Google Scholar 

  • Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery M., Rambow O., & Roth, R. M. (2014). Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In Proceedings of the language resources and evaluation conference (LREC), Reykjavik.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nabil Khoufi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khoufi, N., Aloulou, C. & Belguith, L.H. Parsing Arabic using induced probabilistic context free grammar. Int J Speech Technol 19, 313–323 (2016). https://doi.org/10.1007/s10772-015-9300-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9300-x

Keywords

Navigation