Skip to main content

Transducer Cascade to Parse Arabic Corpora

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10859))

Abstract

Arabic parsing is an important task in several NLP applications. Indeed to obtain a robust, efficient and extensible parser treating several phenomena, several issues (i.e., ambiguity and embedded structures) must be resolved. In this context, we will build an Arabic parser based on a deep linguistic study done with a new vision allowing the problem division and on a transducer cascade implemented in the NooJ linguistic platform. This parser is accomplished through our designed dictionaries, morphological grammars and transducers recognizing different sentence forms. The constructed parser is applied to two test corpora containing more than 5900 sentences with different structures. The parser outputs are XML annotated sentences. To evaluate the obtained results, we calculated the measure values of the precision, the recall and the f-measure, and compare them with those obtained by recursive transducer parser. The calculated measure values show that these results are encouraging.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abney, S.: Partial parsing via finite-state cascades. Nat. Lang. Eng. 2(4), 337–344 (1996)

    Article  Google Scholar 

  2. Boukedi, S., Haddar, K.: HPSG grammar for Arabic coordination experimented with LKB system. In: Proceedings of the Twenty-Seventh International Florida Artificial Intelligence Research Society Conference, FLAIRS 2014, Pensacola Beach, Florida, 21–23 May 2014, pp. 166–169 (2014)

    Google Scholar 

  3. Ghezaiel, N., Haddar, K.: Parsing Arabic nominal sentences with transducers to annotate corpora. Computación y Sistemas, 21(4), 647–656 (2017). Advances in Human Language Technologies (Guest Editor: A. Gelbukh)

    Google Scholar 

  4. Hammouda, N.G., Haddar, K.: Integration of a segmentation tool for Arabic corpora in NooJ platform to build an automatic annotation tool. In: Barone, L., Monteleone, M., Silberztein, M. (eds.) NooJ 2016. CCIS, vol. 667, pp. 89–100. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-55002-2_8

    Chapter  Google Scholar 

  5. Hammouda, N.G., Haddar, K.: Arabic NooJ parser: nominal sentence case. In: Mbarki, S., Mourchid, M., Silberztein, M. (eds.) NooJ 2017. CCIS, vol. 811, pp. 69–80. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73420-0_6

    Chapter  Google Scholar 

  6. Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic Treebank: building a large-scale annotated Arabic corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, vol. 27, pp. 466–467 (2004)

    Google Scholar 

  7. Mesmia, F.B., Zid, F., Haddar, K., Maurel, D.: ASRextractor: a tool extracting semantic relations between Arabic named entities. In: 3rd International Conference on Arabic Computational Linguistics, ACLing 2017, 5–6 November 2017, Dubai (2017)

    Article  Google Scholar 

  8. Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Roth, R.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: Proceedings of LREC, Reykjavik, vol. 14, pp. 1094–1101 (2014)

    Google Scholar 

  9. Schiehlen, M.: A cascaded finite-state parser for German. In: Proceedings of EACL 2003, vol. 2, pp. 163–166 (2003)

    Google Scholar 

  10. Silberztein, M.: A new linguistic engine for NooJ: parsing context-sensitive grammars with finite-state machines. In: Mbarki, S., Mourchid, M., Silberztein, M. (eds.) NooJ 2017. CCIS, vol. 811, pp. 240–250. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73420-0_20

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nadia Ghezaiel Hammouda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ghezaiel Hammouda, N., Torjmen, R., Haddar, K. (2018). Transducer Cascade to Parse Arabic Corpora. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91947-8_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91946-1

  • Online ISBN: 978-3-319-91947-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics