Advertisement

Amazigh PoS Tagging Using Machine Learning Techniques

  • Amri Samir
  • Zenkouar Lahbib
  • Outahajala Mohamed
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 37)

Abstract

Amazigh is a morphologically rich language, which presents a challenge for Part of Speech tagging. Part of Speech (POS) tagging is an important component for almost all Natural Language Processing (NLP) application areas.

Applying machine-learning techniques to the less computerized languages require development of appropriately tagged corpus. In this paper, we have developed POS taggers for Amazigh language, a less privileged language, using Conditional Random Field (CRF), Support Vector Machine (SVM) and TreeTagger system. We have manually annotated approximately 75000 tokens, collected from the written texts with a POS tagset of 28 tags defined for the Amazigh language. The POS taggers make use of the different contextual and orthographic word-level features. These features are language independent and applicable to other languages also. POS taggers have been trained, and tested with the same corpora. Evaluation results demonstrated the accuracies of 89.18%, 88.02% and 90.86% in the CRF, SVM and TreeTagger, respectively.

Keywords

Amazigh Corpus CRF NLP Machine learning POS tagging 

References

  1. 1.
    Singh, J., Joshi, N., Mathur, I.: Development of Marathi part of speech tagger using statistical approach. In: International Conference on Advances in Computing, Communications and Informatics (2013)Google Scholar
  2. 2.
    Kumar, D., Singh Josan, G.: Part of speech tagger for morphologically rich Indian language: a survey. Int. J. Comput. Appl. 6(5), 1–9 (2010)Google Scholar
  3. 3.
    Dhanalakshmi, V., Kumar, A., Shivapratap, G., Soman, K.P., Rajendran, S.: Tamil POS tagging using linear programming. Int. J. Recent Trends Eng. 1(2), 166–169 (2009) Google Scholar
  4. 4.
    Kaur Sidhu, G., Kaur, N.: Role of machine translation and word sense disambiguation in natural language processing. IOSR J. Comput. Eng. (IOSR-JCE) 11, 78–83 (2013)CrossRefGoogle Scholar
  5. 5.
    Martin, J.H, Jurafsky, D.: Speech and Language Processing. International Edition (2010)Google Scholar
  6. 6.
    Van Guilder, L.: Automated Part of Speech Tagging: A Brief Overview, Handout for LING361. Georgetown University (1995)Google Scholar
  7. 7.
    Nakagawa, T., Uchimoto, K.: A hybrid approach to word segmentation and pos tagging. In: The 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 217–220 (2007) Google Scholar
  8. 8.
    Charniak, E.: Statistical Language Learning. MIT Press, Cambridge (1993)Google Scholar
  9. 9.
    Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21, 543–565 (1995)MathSciNetGoogle Scholar
  10. 10.
    Schmid, H.: Improvements in part-of-speech tagging with an application to German. In: Proceedings of the ACL SIGDAT-Workshop, pp. 13–26. Academic Publishers, Dordrecht (1999)Google Scholar
  11. 11.
    Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of EMNLP, Philadelphia, USA (1996)Google Scholar
  12. 12.
    Kudo, T., Matsumoto, Y.: Use of support vector learning for chunk identification (2000)Google Scholar
  13. 13.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML 2001, pp. 282–289 (2001)Google Scholar
  14. 14.
    Chafiq, M.: [Forty four lessons in Amazigh]. éd. Arabo-africaines (1991)Google Scholar
  15. 15.
    Chaker, S.: Textes en linguistique berbère - introduction au domaine berbère, éditions du CNRS, pp. 232–242 (1984)Google Scholar
  16. 16.
    Boukhris, F., Boumalk, A., Elmoujahid, E., Souifi, H.: «La nouvelle grammaire de l’amazighe». IRCAM, Rabat (2008)Google Scholar
  17. 17.
    Amri, S., Zenkouar, L., Outahajala, M.: Amazigh part-of-speech tagging using Markov models and decision trees. IJCSIT J. 8(5), 61–71 (2016)CrossRefGoogle Scholar
  18. 18.
    Amri, S., Zenkouar, L., Outahajala, M.: Build a morphosyntaxically annotated amazigh corpus. In: Proceedings of the 2nd International Conference on Big Data, Cloud and Applications, Tetuan, Morocco (2017).  https://doi.org/10.1145/3090354.3090362
  19. 19.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)Google Scholar
  20. 20.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, Manchester, UK, pp. 44–49 (1994)Google Scholar
  21. 21.
    Dermatas, E., George, K.: Automatic stochastic tagging of natural language texts. Comput. Linguist. 21(2), 137–163 (1995)Google Scholar
  22. 22.
    Outahajala, M., Benajiba, Y., Rosso, P., Zenkouar, L.: POS tagging in amazigh using support vector machines and conditional random fields. In: Natural Language to Information Systems. LNCS, vol. 6716, pp. 238–241. Springer (2011).  https://doi.org/10.1007/978-3-642-22327-3_28

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Amri Samir
    • 1
  • Zenkouar Lahbib
    • 1
  • Outahajala Mohamed
    • 2
  1. 1.LEC Laboratory, EMI SchoolUniversity Med VRabatMorocco
  2. 2.CESIC LaboratoryIRCAM InstituteRabatMorocco

Personalised recommendations