Persian in MULTEXT-East Framework

  • Behrang QasemiZadeh
  • Saeed Rahimi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)


Farsi, also known as Persian, is the official language of Iran, Tajikistan and one of the two main languages spoken in Afghanistan. It is an Indo-European agglutinating language, written in Arabic script. This paper presents the first step in creating Farsi basic language resources kit. This Step comprises the specifications for morphosyntactic encoding, which is based on the EAGLES/MULTEXT model and specific resources of MULTEXT-East. This paper introduces the language i.e. Farsi, with an emphasis on its writing system and morphological properties, and its specifications. Two other important issues introduced in this paper are; one, a novel Part of Speech (PoS) categorization and, the other, a unified orthography of Farsi in digital environment. A lexicon and an annotated corpus are under preparation.


Machine Translation Writing System Parallel Corpus Auxiliary Verb Morphosyntactic Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Strik, H., Daelemans, W., Binnenpoorte, D., Sturm, J., de Vriend, F., Cucchiarini, C.: Dutch HLT resources: From BLARK to priority lists. In: Proceedings of ICSLP, Denver, USA, pp. 1549–1552 (2002)Google Scholar
  2. 2.
    Krauwer, S., Maegaard, B., Choukri, K., Damsgaard Jørgensenm, L.: Report on BLARK for Arabic (2004)Google Scholar
  3. 3.
    Ide, N., Veronis, J.: Multext: Multilingual Text Tools And Corpora. In: 15th Int. Conference On Computational Linguistics, Kyoto, Japan, pp. 588–592 (1994)Google Scholar
  4. 4.
    Erjavec, T., Krstev, C., Petkevic, V., Simov, K., Tadic, M., Vitas, D.: The MULTEXT-East Morphosyntactic Specifications For Slavic Languages. In: Proceedings Of The EACL 2003 Workshop On The Morphological Processing Of Slavic Languages (2003)Google Scholar
  5. 5.
    Kalbasi, I.: The Derivational Structure of Word In Modern Farsi, Tehran (2001) ISBN 964-426-128-3 Google Scholar
  6. 6.
    Samare, I.: Typological Features Of Farsi. Journal Of Linguistics (7), 61–80 (1990)Google Scholar
  7. 7.
    Keshani, K.: Suffix Derivation in Contemporary Farsi, 1st edn. Iran University Press (1992)Google Scholar
  8. 8.
    Lutz, W.: Unicode and Arabic Script. In: Workshop Unicode Und Mehrschriftlichkeit in Katalogen, Sbb Pk, Berlin (2003)Google Scholar
  9. 9.
    Karine, M., Zajac R.: Processing Farsi Text: Tokenization In The Shiraz Projec., Nmsu., Crl., Memoranda In Computer And Cognitive Scienc. (2000)Google Scholar
  10. 10.
    Qasemizadeh, B., Rahimi, S.: Farsi Morphology. In: 11th Computer Society of Iran Computer Conference, IPM, Tehran, Iran (2006)Google Scholar
  11. 11.
    Rezaie, S.: Tokenizing an Arabic Script Language. Arabic Language Processing: Status And Prospects. In: Acl/Eacl (2001)Google Scholar
  12. 12.
    Isiri 6219:2002: Information Technology - Farsi Information Interchange and Display Mechanism, Using Unicode (2002)Google Scholar
  13. 13.
    Iran’s Academy Of Farsi Language and Literature: Official Farsi Orthography, 3rd edn. (2005), ISBN: 964-7531-13-3Google Scholar
  14. 14.
    Hasan, A., Ahmadi Givi, H.: Farsi Grammar, 22nd edn., Tehran (2002), ISBN964-318-007-7Google Scholar
  15. 15.
    Meshkatodini, M.: Introduction to Farsi Transformational Syntax, 2nd edn. Ferdowsi University Press (2003) ISBN: 964-6335-80-2Google Scholar
  16. 16.
    Lazard, G.: A Grammar of Contemporary Farsi. Mazda Publishers (1992)Google Scholar
  17. 17.
    Riazati, D.: Computational Analysis of Farsi Morphology, Msc Thesis, Department Of Computer Science, RMIT (1997)Google Scholar
  18. 18.
    Bateni, M.: Towsif-E Sakhteman-E Dastury-E Zaban-E Farsi [Description Of The Linguistic Structure Of Farsi Language]. Amir Kabir Publishers, Tehran (1995)Google Scholar
  19. 19.
    Erjavec, T.: MULTEXT-East Morphosyntactic Specifications, Version 3.0. Supported By EU Projects Multext-East, Concede And TELRI (2004)Google Scholar
  20. 20.
    Keyvan, R., Borjian, H., Kashef, M., Fellbaum, C.: Developing Farsiet: The Farsi Wordnet. In: Proceedings GWC 2006, pp. 315–318 (2005)Google Scholar
  21. 21.
    Assi, S.M., Haji Abdolhosseini, M.: Grammatical Tagging of a Farsi Corpus. International Journal of Corpus Linguistics 5(1), 69–81 (2000)CrossRefGoogle Scholar
  22. 22.
    Assi, S.M.: Farsi Linguistic Database (FLDB). International Journal of Lexicography, Euralex Newsletter 10(3) (1997)Google Scholar
  23. 23.
    Amtrup, J.W., Mansouri Rad, H., Megerdoomian, K., Zajac, R.: Farsi-English Machine Translation: An Overview of the Shiraz Project. NMSU, CRL, Memoranda in Computer and Cognitive Science (MCCS-00-319) (2000)Google Scholar
  24. 24.
    Megerdoomian, K., Mansouri Rad, H.: Acquisition of Farsi Resources: Corpora and Dictionary Development in the Shiraz Project. NMSU, CRL, Memoranda in Computer and Cognitive Science (MCCS-00-323) (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Behrang QasemiZadeh
    • 1
  • Saeed Rahimi
    • 2
  1. 1.Computer DepartmentIran University of Science and TechnologyNarmak, TehranIran
  2. 2.Faculty of Literature and HumanitiesTehran UniversityEnqelab, TehranIran

Personalised recommendations