Morphological Processing of Semitic Languages

Chapter
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

This chapter addresses morphological processing of Semitic languages. In light of the complex morphology and problematic orthography of many of the Semitic languages, the chapter begins with a recapitulation of the challenges these phenomena pose on computational applications. It then discusses the approaches that were suggested to cope with these challenges in the past. The bulk of the chapter, then, discusses available solutions for morphological processing, including analysis, generation, and disambiguation, in a variety of Semitic languages. The concluding section discusses future research directions.

References

  1. 1.
    Adler M, Elhadad M (2006) An unsupervised morpheme-based HMM for Hebrew morphological disambiguation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney. Association for Computational Linguistics, pp 665–672. http://www.aclweb.org/anthology/P/P06/P06-1084
  2. 2.
    Al-Haj H, Lavie A (2010) The impact of Arabic morphological segmentation on broad-coverage English-to-Arabic statistical machine translation. In: Proceedings of the conference of the Association for Machine Translation in the Americas (AMTA), DenverGoogle Scholar
  3. 3.
    Alkuhlani S, Habash N (2011) A corpus for modeling morpho-syntactic agreement in Arabic: gender, number and rationality. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies, Portland. Association for Computational Linguistics, pp 357–362. http://www.aclweb.org/anthology/P11-2062
  4. 4.
    Al-Shalabi R, Evens M (1998) A computational morphology system for Arabic. In: Rosner M (ed) Proceedings of the workshop on computational approaches to Semitic languages, COLING-ACL’98, Montreal, pp 66–72Google Scholar
  5. 5.
    Al-Sughaiyer IA, Al-Kharashi IA (2004) Arabic morphological analysis techniques: a comprehensive survey. J Am Soc Inf Sci Technol 55(3):189–213CrossRefGoogle Scholar
  6. 6.
    Altantawy M, Habash N, Rambow O, Saleh I (2010) Morphological analysis and generation of Arabic nouns: a morphemic functional approach. In: Proceedings of the seventh international conference on language resources and evaluation (LREC), VallettaGoogle Scholar
  7. 7.
    Altantawy M, Habash N, Rambow O (2011) Fast yet rich morphological analysis. In: Proceedings of the 9th international workshop on finite-state methods and natural language processing (FSMNLP 2011), BloisGoogle Scholar
  8. 8.
    Amsalu S, Gibbon D (2005) A complete finite-state model for Amharic morphographemics. In: Yli-Jyrä A, Karttunen L, Karhumäki J (eds) FSMNLP. Lecture notes in computer science, vol 4002. Springer, Berlin/New York, pp 283–284Google Scholar
  9. 9.
    Amsalu S, Gibbon D (2005) Finite state morphology of Amharic. In: Proceedings of RANLP, Borovets, pp 47–51Google Scholar
  10. 10.
    Amtrup JW (2003) Morphology in machine translation systems: efficient integration of finite state transducers and feature structure descriptions. Mach Transl 18(3):217–238. doi:http://dx.doi.org/10.1007/s10590-004-2476-5
  11. 11.
    Argaw AA, Asker L (2007) An Amharic stemmer: reducing words to their citation forms. In: Proceedings of the ACL-2007 workshop on computational approaches to Semitic languages, PragueGoogle Scholar
  12. 12.
    Audebert C, Gaubert C, Jaccarini A (2009) Minimal resources for Arabic parsing: an interactive method for the construction of evolutive automata. In: Choukri K, Maegaard B (eds) Proceedings of the second international conference on Arabic language resources and tools, The MEDAR Consortium, CairoGoogle Scholar
  13. 13.
    Badr I, Zbib R, Glass J (2008) Segmentation for English-to-Arabic statistical machine translation. In: Proceedings of ACL-08: HLT, short papers, Columbus. Association for Computational Linguistics, pp 153–156. http://www.aclweb.org/anthology/P/P08/P08-2039
  14. 14.
    Bar-Haim R, Sima’an K, Winter Y (2005) Choosing an optimal architecture for segmentation and POS-tagging of Modern Hebrew. In: Proceedings of the ACL workshop on computational approaches to Semitic languages, Ann Arbor. Association for Computational Linguistics, pp 39–46, http://www.aclweb.org/anthology/W/W05/W05-0706
  15. 15.
    Bar-haim R, Sima’an K, Winter Y (2008) Part-of-speech tagging of Modern Hebrew text. Nat Lang Eng 14(2):223–251CrossRefGoogle Scholar
  16. 16.
    Barthélemy F (1998) A morphological analyzer for Akkadian verbal forms with a model of phonetic transformations. In: Proceedings of the Coling-ACL 1998 workshop on computational approaches to Semitic languages, Montreal, pp 73–81Google Scholar
  17. 17.
    Beesley KR (1996) Arabic finite-state morphological analysis and generation. In: Proceedings of COLING-96, the 16th international conference on computational linguistics, CopenhagenGoogle Scholar
  18. 18.
    Beesley KR (1998) Arabic morphological analysis on the internet. In: Proceedings of the 6th international conference and exhibition on multi-lingual computing, CambridgeGoogle Scholar
  19. 19.
    Beesley KR (1998) Arabic morphology using only finite-state operations. In: Rosner M (ed) Proceedings of the workshop on computational approaches to Semitic languages, COLING-ACL’98, Montreal, pp 50–57Google Scholar
  20. 20.
    Beesley KR (1998) Constraining separated morphotactic dependencies in finite-state grammars. In: FSMNLP-98, Bilkent, pp 118–127Google Scholar
  21. 21.
    Beesley KR, Karttunen L (2000) Finite-state non-concatenative morphotactics. In: Proceedings of the fifth workshop of the ACL special interest group in computational phonology, SIGPHON-2000, LuxembourgGoogle Scholar
  22. 22.
    Beesley KR, Karttunen L (2003) Finite-state morphology: xerox tools and techniques. CSLI, StanfordGoogle Scholar
  23. 23.
    Belguith LH, Aloulou C, Ben Hamadou A (2008) MASPAR: De la segmentation à l’analyse syntaxique de textes arabes. Rev Inf Interact Intell I3 7(2):9–36Google Scholar
  24. 24.
    Bentur E, Angel A, Segev D (1992) Computerized analysis of Hebrew words. Hebrew Linguist 36:33–38. (in Hebrew)Google Scholar
  25. 25.
    Berri J, Zidoum H, Atif Y (2001) Web-based Arabic morphological analyzer. In: Gelbukh A (ed) CICLing 2001. Lecture notes in computer science, vol 2004. Springer, Berlin, pp 389–400Google Scholar
  26. 26.
    Brants T (2000) TnT: a statistical part-of-speech tagger. In: Proceedings of the sixth conference on applied natural language processing, Seattle. Association for Computational Linguistics, pp 224–231. doi:10.3115/974147.974178, http://www.aclweb.org/anthology/A00-1031
  27. 27.
    Buckwalter T (2004) Buckwalter Arabic morphological analyzer version 2.0. Linguistic Data Consortium, PhiladelphiaGoogle Scholar
  28. 28.
    Buckwalter T (2004) Issues in Arabic orthography and morphology analysis. In: Farghaly A, Megerdoomian K (eds) COLING 2004 computational approaches to Arabic script-based languages, COLING, Geneva, pp 31–34Google Scholar
  29. 29.
    Choueka Y (1966) Computers and grammar: mechnical analysis of Hebrew verbs. In: Proceedings of the annual conference of the Israeli Association for Information Processing, Rehovot, pp 49–66. (in Hebrew)Google Scholar
  30. 30.
    Choueka Y (1972) Fast searching and retrieval techniques for large dictionaries and concordances. Heb Comput Linguist 6:12–32. (in Hebrew)Google Scholar
  31. 31.
    Choueka Y (1980) Computerized full-text retrieval systems and research in the humanities: the Responsa project. Comput Humanit 14:153–169CrossRefGoogle Scholar
  32. 32.
    Choueka Y (1990) MLIM – a system for full, exact, on-line grammatical analysis of Modern Hebrew. In: Eizenberg Y (ed) Proceedings of the annual conference on computers in education, Tel Aviv, p 63. (in Hebrew)Google Scholar
  33. 33.
    Choueka Y (1993) Response to “computerized analysis of Hebrew words”. Heb Linguist 37:87. (in Hebrew)Google Scholar
  34. 34.
    Cohen D (1970) Essai d’une analyse automatique de l’arabe. In: Etudes de linguistique sémitique et arabe, De Gruyter, Germany, pp 49–78Google Scholar
  35. 35.
    Cohen SB, Smith NA (2007) Joint morphological and syntactic disambiguation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Prague. Association for Computational Linguistics, pp 208–217. http://www.aclweb.org/anthology/D/D07/D07-1022
  36. 36.
    Cohen-Sygal Y, Wintner S (2006) Finite-state registered automata for non-concatenative morphology. Comput Linguist 32(1):49–82CrossRefMATHMathSciNetGoogle Scholar
  37. 37.
    Collins M (2002) Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, EMNLP ’02, Philadelphia, Vol 10. Association for Computational Linguistics, pp 1–8. doi:http://dx.doi.org/10.3115/1118693.1118694
  38. 38.
    Daelemans W, van den Bosch A (2005) Memory-based language processing. Studies in natural language processing. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  39. 39.
    Darwish K (2002) Building a shallow Arabic morphological analyzer in one day. In: Rosner M, Wintner S (eds) ACL’02 workshop on computational approaches to Semitic languages, Philadelphia, pp 47–54Google Scholar
  40. 40.
    Daya E, Roth D, Wintner S (2007) Learning to identify Semitic roots. In: Soudi A, Neumann G, van den Bosch A (eds) Arabic computational morphology: knowledge-based and empirical methods, text, speech and language technology, vol 38. Springer, Dordrecht, pp 143–158CrossRefGoogle Scholar
  41. 41.
    Diab M (2007) Improved Arabic base phrase chunking with a new enriched POS tag set. In: Proceedings of the 2007 workshop on computational approaches to Semitic languages: common issues and resources, Prague, pp 89–96. http://www.aclweb.org/anthology/W/W07/W07-0812
  42. 42.
    Diab M, Hacioglu K, Jurafsky D (2004) Automatic tagging of Arabic text: from raw text to base phrase chunks. In: Proceedings of HLT-NAACL 2004, BostonGoogle Scholar
  43. 43.
    Dichy J, Farghaly A (2003) Roots and patterns vs. stems plus grammar-lexis specifications: on what basis should a multilingual lexical database centered on Arabic be built. In: Proceedings of the MT-Summit IX workshop on machine translation for Semitic languages, New Orleans, pp 1–8Google Scholar
  44. 44.
    Duh K, Kirchhoff K (2005) POS tagging of dialectal Arabic: a minimally supervised approach. In: Proceedings of the ACL workshop on computational approaches to Semitic languages, Ann Arbor. Association for Computational Linguistics, pp 55–62. http://www.aclweb.org/anthology/W/W05/W05-0708
  45. 45.
    El Kholy A, Habash N (2010) Orthographic and morphological processing for English-Arabic statistical machine translation. In: In actes de traitement automatique des langues naturelles (TALN), MontréalGoogle Scholar
  46. 46.
    El Kholy A, Habash N (2010) Techniques for Arabic morphological detokenization and orthographic denormalization. In: Proceedings of LREC-2010, Valletta (Malta)Google Scholar
  47. 47.
    Elming J, Habash N (2007) Combination of statistical word alignments based on multiple preprocessing schemes. In: Human language technologies 2007: the conference of the North American chapter of the Association for Computational Linguistics, Companion Volume, Short Papers, Prague, pp 25–28. http://www.aclweb.org/anthology/N/N07/N07-2007
  48. 48.
    Fissaha Adafre S (2005) Part of speech tagging for Amharic using conditional random fields. In: Proceedings of the ACL workshop on computational approaches to Semitic languages, Ann Arbor. Association for Computational Linguistics, pp 47–54. http://www.aclweb.org/anthology/W/W05/W05-0707
  49. 49.
    Fissaha S, Haller J (2003) Amharic verb lexicon in the context of machine translation. In: Proceedings of the TALN workshop on natural language processing of minority languages, Batz-sur-MerGoogle Scholar
  50. 50.
    Forsberg M (2007) Three tools for language processing: BNF converter, functional morphology, and extract. PhD thesis, Göteborg University and Chalmers University of TechnologyGoogle Scholar
  51. 51.
    Forsberg M, Ranta A (2004) Functional morphology. In: Proceedings of the ninth ACM SIGPLAN international conference on functional programming (ICFP’04), Snowbird. ACM, New York, pp 213–223Google Scholar
  52. 52.
    Fraenkel AS (1976) All about the Responsa retrieval project – what you always wanted to know but were afraid to ask. Jurimetrics J 16(3):149–156MathSciNetGoogle Scholar
  53. 53.
    Gadish R (ed) (2001) Klalei ha-Ktiv Hasar ha-Niqqud, 4th edn. Academy for the Hebrew Language, Jerusalem. (in Hebrew)Google Scholar
  54. 54.
    Gambäck B, Olsson F, Argaw AA, Asker L (2009) An Amharic corpus for machine learning. In: Proceedings of the 6th world congress of African linguistics, CologneGoogle Scholar
  55. 55.
    Gambäck B, Olsson F, Argaw AA, Asker L (2009) Methods for Amharic part-of-speech tagging. In: Proceedings of the first workshop on language technologies for African languages, Athen. Association for Computational Linguistics, Stroudsburg, pp 104–111Google Scholar
  56. 56.
    Gasser M (2009) Semitic morphological analysis and generation using finite state transducers with feature structures. In: Proceedings of the 12th conference of the European chapter of the ACL (EACL 2009), Athens. Association for Computational Linguistics, pp 309–317. http://www.aclweb.org/anthology/E09-1036
  57. 57.
    Gasser M (2011) HornMorpho: a system for morphological processing of Amharic, Oromo, and Tigrinya, Bibliotheca Alexandrina, Alexandria, pp 94–99Google Scholar
  58. 58.
    Giménez J, Màrquez L (2004) SVMTool: a general POS tagger generator based on support vector machines. In: Proceedings of 4th international conference on language resources and evaluation (LREC), Lisbon, pp 43–46Google Scholar
  59. 59.
    Goldberg Y, Tsarfaty R (2008) A single generative model for joint morphological segmentation and syntactic parsing. In: Proceedings of ACL-08: HLT, Columbus. Association for Computational Linguistics, pp 371–379. http://www.aclweb.org/anthology/P/P08/P08-1043
  60. 60.
    Goldstein L (1991) Generation and inflection of the possession inflection of Hebrew nouns. Master’s thesis, Technion, Haifa (in Hebrew)Google Scholar
  61. 61.
    Habash N (2004) Large scale lexeme based arabic morphological generation. In: Proceedings of traitement automatique du langage naturel (TALN-04), FezGoogle Scholar
  62. 62.
    Habash N (2007) Arabic morphological representations for machine translation. In: van den Bosch A, Soudi A (eds) Arabic computational morphology: knowledge-based and empirical methods. Springer, DordrechtGoogle Scholar
  63. 63.
    Habash N (2010) Introduction to Arabic natural language processing. Synthesis lectures on human language technologies. Morgan & Claypool, San Rafael. doi:http://dx.doi.org/10.2200/S00277ED1V01Y201008HLT010
  64. 64.
    Habash N, Rambow O (2005) Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: Proceedings of the 43rd annual meeting of the Association for Computational Linguistics (ACL’05), University of Michigan. Association for Computational Linguistics, Ann Arbor, pp 573–580. http://www.aclweb.org/anthology/P/P05/P05-1071
  65. 65.
    Habash N, Rambow O (2006) MAGEAD: a morphological analyzer and generator for the Arabic dialects. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney. Association for Computational Linguistics, pp 681–688. http://www.aclweb.org/anthology/P/P06/P06-1086
  66. 66.
    Habash N, Rambow O (2007) Arabic diacritization through full morphological tagging. In: Human language technologies 2007: the conference of the North American chapter of the association for computational linguistics; Companion Volume, Short Papers, Rochester. Association for Computational Linguistics, pp 53–56. http://www.aclweb.org/anthology/N/N07/N07-2014
  67. 67.
    Habash N, Sadat F (2006) Arabic preprocessing schemes for statistical machine translation. In: Moore RC, Bilmes JA, Chu-Carroll J, Sanderson M (eds) HLT-NAACL, New York. The Association for Computational LinguisticsGoogle Scholar
  68. 68.
    Habash N, Rambow O, Kiraz G (2005) Morphological analysis and generation for Arabic dialects. In: Proceedings of the ACL workshop on computational approaches to Semitic languages, Ann Arbor. Association for Computational Linguistics, pp 17–24. http://www.aclweb.org/anthology/W/W05/W05-0703
  69. 69.
    Habash N, Gabbard R, Rambow O, Kulick S, Marcus M (2007) Determining case in Arabic: learning complex linguistic behavior requires complex linguistic features. In: Proceeings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL 2007), PragueGoogle Scholar
  70. 70.
    Habash N, Rambow O, Roth R (2009) MADA+TOKAN: a toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In: Choukri K, Maegaard B (eds) Proceedings of the second international conference on Arabic language resources and tools, Cairo, The MEDAR ConsortiumGoogle Scholar
  71. 71.
    Habash N, Diab M, Rabmow O (2012) Conventional orthography for Dialectal Arabic. In: Proceedings of the language resources and evaluation conference (LREC), IstanbulGoogle Scholar
  72. 72.
    Habash N, Eskander R, Hawwari A (2012) A morphological analyzer for Egyptian Arabic. In: Proceedings of the twelfth meeting of the special interest group on computational morphology and phonology, Montréal. Association for Computational Linguistics, pp 1–9. http://www.aclweb.org/anthology/W12-2301
  73. 73.
    Haertel RA, McClanahan P, Ringger EK (2010) Automatic diacritization for low-resource languages using a hybrid word and consonant CMM. In: Human language technologies: the 2010 annual conference of the north american chapter of the Association for Computational Linguistics, HLT ’10, Stroudsburg. Association for Computational Linguistics, pp 519–527Google Scholar
  74. 74.
    Hajič J (2000) Morphological tagging: Data vs. dictionaries. In: Proceedings of ANLP-NAACL conference, Seattle, pp 94–101Google Scholar
  75. 75.
    Hajič J, Hladká B (1998) Tagging inflective languages: prediction of morphological categories for a rich, structured tagset. In: Proceedings of the 36th annual meeting of the Association for Computational Linguistics and 17th international conference on computational linguistics, Montreal. Association for Computational Linguistics, Stroudsburg, pp 483–490. doi:http://dx.doi.org/10.3115/980845.980927, http://dx.doi.org/10.3115/980845.980927
  76. 76.
    Harley HB (2006) English words: a linguistic introduction. The language library. Wiley-Blackwell, MaldenGoogle Scholar
  77. 77.
    Hetzron R (ed) (1997) The Semitic languages. Routledge, London/New YorkGoogle Scholar
  78. 78.
    Hulden M (2009) Foma: a finite-state compiler and library. In: Proceedings of the demonstrations session at EACL 2009, Athens. Association for Computational Linguistics, pp 29–32. http://www.aclweb.org/anthology/E09-2008
  79. 79.
    Hulden M (2009) Revisiting multi-tape automata for Semitic morphological analysis and generation. In: Proceedings of the EACL 2009 workshop on computational approaches to Semitic languages, Athens. Association for Computational Linguistics, pp 19–26. http://www.aclweb.org/anthology/W09-0803
  80. 80.
    Itai A, Wintner S (2008) Language resources for Hebrew. Lang Resour Eval 42(1):75–98CrossRefGoogle Scholar
  81. 81.
    Johnson CD (1972) Formal aspects of phonological description. Mouton, The HagueGoogle Scholar
  82. 82.
    Kammoun NC, Belguith LH, Mesfar S (2010) Arabic POS tagging based on NooJ grammars and the Arabic morphological analyzer MORPH2. In: Proceedings of NooJ 2010, KomotiniGoogle Scholar
  83. 83.
    Kaplan RM, Kay M (1994) Regular models of phonological rule systems. Comput Linguist 20(3):331–378Google Scholar
  84. 84.
    Karttunen L, Beesley KR (2001) A short history of two-level morphology. In: Talk given at the ESSLLI workshop on finite state methods in natural language processing. http://www.helsinki.fi/esslli/evening/20years/twol-history.html
  85. 85.
    Kataja L, Koskenniemi K (1988) Finite-state description of Semitic morphology: a case study of ancient Akkadian. In: COLING, Budapest, pp 313–315Google Scholar
  86. 86.
    Kay M (1987) Nonconcatenative finite-state morphology. In: Proceedings of the third conference of the European chapter of the Association for Computational Linguistics, Copenhagen, pp 2–10Google Scholar
  87. 87.
    Khoja S (2001) APT: Arabic part-of-speech tagger. In: Proceedings of the student workshop at the second meeting of the North American chapter of the Association for Computational Linguistics (NAACL2001), PittsburghGoogle Scholar
  88. 88.
    Kiraz GA (2000) Multitiered nonlinear morphology using multitape finite automata: a case study on Syriac and Arabic. Comput Linguist 26(1):77–105CrossRefGoogle Scholar
  89. 89.
    Koskenniemi K (1983) Two-level morphology: a general computational model for word-form recognition and production. The Department of General Linguistics, University of HelsinkiGoogle Scholar
  90. 90.
    Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning (ICML-01), Williamstown, pp 282–289Google Scholar
  91. 91.
    Lavie A, Itai A, Ornan U, Rimon M (1988) On the applicability of two-level morphology to the inflection of Hebrew verbs. In: Proceedings of the international conference of the ALLC, JerusalemGoogle Scholar
  92. 92.
    Lee J, Naradowsky J, Smith DA (2011) A discriminative model for joint morphological disambiguation and dependency parsing. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies, Portland. Association for Computational Linguistics, pp 885–894. http://www.aclweb.org/anthology/P11-1089
  93. 93.
    Maamouri M, Bies A, Buckwalter T, Mekki W (2004) The Penn Arabic treebank: building a large-scale annotated Arabic corpus. In: NEMLAR conference on Arabic language resources and tools, Cairo, pp 102–109Google Scholar
  94. 94.
    Macks A (2002) Parsing Akkadian verbs with Prolog. In: Proceedings of the ACL-02 workshop on computational approaches to Semitic languages, PhiladelphiaGoogle Scholar
  95. 95.
    MacWhinney B (2000) The CHILDES project: tools for analyzing talk, 3rd edn. Lawrence Erlbaum Associates, MahwahGoogle Scholar
  96. 96.
    Magdy W, Darwish K (2006) Arabic OCR error correction using character segment correction, language modeling, and shallow morphology. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Sydney. Association for Computational Linguistics, pp 408–414. http://www.aclweb.org/anthology/W/W06/W06-1648
  97. 97.
    Mohamed E, Kübler S (2009) Diacritization for real-world Arabic texts. In: Proceedings of the international conference RANLP-2009, pp 251–257. http://www.aclweb.org/anthology/R09-1047
  98. 98.
    Mohamed E, Kübler S (2010) Arabic part of speech tagging. In: Proceedings of the seventh conference on international language resources and evaluation (LREC’10), European Language Resources Association (ELRA), VallettaGoogle Scholar
  99. 99.
    Mohamed E, Kübler S (2010) Is Arabic part of speech tagging feasible without word segmentation? In: Human language technologies: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics, HLT’10, Los Angeles. Association for Computational Linguistics, Stroudsburg, pp 705–708. http://dl.acm.org/citation.cfm?id=1857999.1858104
  100. 100.
    Nelken R, Shieber SM (2005) Arabic diacritization using weighted finite-state transducers. In: Proceedings of the ACL workshop on computational approaches to Semitic languages, Ann Arbor. Association for Computational Linguistics, pp 79–86. http://www.aclweb.org/anthology/W/W05/W05-0711
  101. 101.
    Netzer Y, Adler M, Gabay D, Elhadad M (2007) Can you tag the modal? You should. In: Proceedings of the ACL-2007 workshop on computational approaches to Semitic languages, PragueGoogle Scholar
  102. 102.
    Nir B, MacWhinney B, Wintner S (2010) A morphologically-analyzed CHILDES corpus of Hebrew. In: Proceedings of the seventh conference on international language resources and evaluation (LREC’10), Valletta. European Language Resources Association (ELRA), pp 1487–1490Google Scholar
  103. 103.
    Ornan U (1985) Indexes and concordances in a phonemic Hebrew script. In: Proceedings of the ninth world congress of Jewish studies, World Union of Jewish Studies, Jerusalem, pp 101–108. (in Hebrew)Google Scholar
  104. 104.
    Ornan U (1985) Vocalization by a computer: a linguistic lesson. In: Luria BZ (ed) Avraham Even-Shoshan book, Kiryat-Sefer, Jerusalem, pp 67–76. (in Hebrew)Google Scholar
  105. 105.
    Ornan U (1986) Phonemic script: a central vehicle for processing natural language – the case of Hebrew. Technical report 88.181, IBM Research Center, HaifaGoogle Scholar
  106. 106.
    Ornan U (1987) Computer processing of Hebrew texts based on an unambiguous script. Mishpatim 17(2):15–24. (in Hebrew)Google Scholar
  107. 107.
    Ornan U, Katz M (1995) A new program for Hebrew index based on the Phonemic Script. Technical report LCL 94-7, Laboratory for Computational Linguistics, Technion, HaifaGoogle Scholar
  108. 108.
    Ornan U, Kazatski W (1986) Analysis and synthesis processes in Hebrew morphology. In: Proceedings of the 21 national data processing conference, Israel. (in Hebrew)Google Scholar
  109. 109.
    Owens J (1997) The Arabic grammatical tradition. In: Hetzron R (ed) The Semitic languages. Routledge, London/New York, chap 3, pp 46–58Google Scholar
  110. 110.
    Pinkas G (1985) A linguistic system for information retrieval. Maase Hoshev 12:10–16. (in Hebrew)Google Scholar
  111. 111.
    Ratnaparkhi A (1996) A maximum entropy model for part-of-speech tagging. In: Brill E, Church K (eds) Proceedings of the conference on empirical methods in natural language processing, Copenhagen. Association for Computational Linguistics, pp 133–142Google Scholar
  112. 112.
    Roark B, Sproat RW (2007) Computational approaches to morphology and syntax. Oxford University Press, New YorkGoogle Scholar
  113. 113.
    Roche E, Schabes Y (eds) (1997) Finite-state language processing. Language, speech and communication. MIT, CambridgeGoogle Scholar
  114. 114.
    Roth D (1998) Learning to resolve natural language ambiguities: a unified approach. In: Proceedings of AAAI-98 and IAAI-98, Madison, pp 806–813Google Scholar
  115. 115.
    Roth R, Rambow O, Habash N, Diab M, Rudin C (2008) Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking. In: Proceedings of ACL-08: HLT, Short Papers, Columbus. Association for Computational Linguistics, pp 117–120. http://www.aclweb.org/anthology/P/P08/P08-2030
  116. 116.
    Sadat F, Habash N (2006) Combination of Arabic preprocessing schemes for statistical machine translation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney. Association for Computational Linguistics, pp 1–8. http://www.aclweb.org/anthology/P/P06/P06-1001
  117. 117.
    Schippers A (1997) The Hebrew grammatical tradition. In: Hetzron R (ed) The Semitic languages. Routledge, London/New York, chap 4, pp 59–65Google Scholar
  118. 118.
    Shaalan K, Abo Bakr HM, Ziedan I (2009) A hybrid approach for building Arabic diacritizer. In: Proceedings of the EACL 2009 workshop on computational approaches to Semitic languages, Semitic’09, Athens. Association for Computational Linguistics, Stroudsburg, pp 27–35Google Scholar
  119. 119.
    Shacham D, Wintner S (2007) Morphological disambiguation of Hebrew: a case study in classifier combination. In: Proceedings of EMNLP-CoNLL 2007, the conference on empirical methods in natural language processing and the conference on computational natural language learning, Prague. Association for Computational LinguisticsGoogle Scholar
  120. 120.
    Shany-Klein M (1990) Generation and analysis of Segolate noun inflection in Hebrew. Master’s thesis, Technion, Haifa. (in Hebrew)Google Scholar
  121. 121.
    Shany-Klein M, Ornan U (1992) Analysis and generation of Hebrew Segolate nouns. In: Ornan U, Arieli G, Doron E (eds) Hebrew computational linguistics. Ministry of Science and Technology, Jerusalem, chap 4, pp 39–51. (in Hebrew)Google Scholar
  122. 122.
    Shapira M, Choueka Y (1964) Mechanographic analysis of Hebrew morphology: possibilities and achievements. Leshonenu 28(4):354–372. (in Hebrew)Google Scholar
  123. 123.
    Silberztein M (2004) NooJ: an object-oriented approach. In: Muller C, Royauté J, Silberztein M (eds) INTEX pour la linguistique et le traitement automatique des Langues, cahiers de la MSH Ledoux, Presses Universitaires de Franche-Comté, pp 359–369Google Scholar
  124. 124.
    Smith NA, Smith DA, Tromble RW (2005) Context-based morphological disambiguation with random fields. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing, Vancouver. Association for Computational Linguistics, Morristown, pp 475–482Google Scholar
  125. 125.
    Smrž O (2007) ElixirFM: implementation of functional Arabic morphology. In: Proceedings of the 2007 workshop on computational approaches to Semitic languages: common issues and resources, Prague. Association for Computational Linguistics, Stroudsburg, pp 1–8Google Scholar
  126. 126.
    Smrž O (2007) Functional Arabic morphology. Prague Bull Math Linguist 88:5–30Google Scholar
  127. 127.
    Soudi A, van den Bosch A, Neumann G (2007) Arabic computational morphology: knowledge-based and empirical methods. Springer, DordrechtCrossRefGoogle Scholar
  128. 128.
    Sproat RW (1992) Morphology and computation. MIT, CambridgeGoogle Scholar
  129. 129.
    Tachbelie MY, Abate ST, Besacier L (2011) Part-of-speech tagging for under-resourced and morphologically rich languages – the case of Amharic, Bibliotheca Alexandrina, Alexandria, pp 50–55. http://aflat.org/files/HLTD201109.pdf
  130. 130.
    Toutanova K, Manning CD (2000) Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 joint SIGDAT conference on empirical methods in natural language processing and very large corpora, Morristown. Association for Computational Linguistics, pp 63–70. doi:http://dx.doi.org/10.3115/1117794.1117802
  131. 131.
    Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: NAACL ’03: Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on human language technology, Edmonton. Association for Computational Linguistics, Morristown, pp 173–180. doi:http://dx.doi.org/10.3115/1073445.1073478
  132. 132.
    Tsarfaty R (2006) Integrated morphological and syntactic disambiguation for Modern Hebrew. In: Proceedings of the COLING/ACL 2006 student research workshop, Sydney. Association for Computational Linguistics, pp 49–54. http://www.aclweb.org/anthology/P/P06/P06-3009
  133. 133.
    Tsuruoka Y, Tsujii J (2005) Bidirectional inference with the easiest-first strategy for tagging sequence data. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, HLT’05, Vancouver. Association for Computational Linguistics, Stroudsburg, pp 467–474. doi:http://dx.doi.org/10.3115/1220575.1220634, http://dx.doi.org/10.3115/1220575.1220634
  134. 134.
    Tsuruoka Y, Tateishi Y, Kim JD, Ohta T, McNaught J, Ananiadou S, Tsujii J (2005) Developing a robust part-of-speech tagger for biomedical text. In: Bozanis P, Houstis EN (eds) Advances in informatics. LNCS, vol 3746. Springer, Berlin/Heidelberg, chap 36, pp 382–392. doi:10.1007/11573036_36, http://dx.doi.org/10.1007/11573036_36
  135. 135.
    Wintner S (2004) Hebrew computational linguistics: past and future. Artif Intell Rev 21(2):113–138. doi:http://dx.doi.org/10.1023/B:AIRE.0000020865.73561.bc
  136. 136.
    Wintner S (2008) Strengths and weaknesses of finite-state technology: a case study in morphological grammar development. Nat Lang Eng 14(4):457–469. doi:http://dx.doi.org/10.1017/S1351324907004676
  137. 137.
    Wintner S (2009) Language resources for Semitic languages: challenges and solutions. In: Nirenburg S (ed) Language engineering for lesser-studied languages. IOS, Amsterdam, pp 277–290Google Scholar
  138. 138.
    Yona S, Wintner S (2008) A finite-state morphological grammar of Hebrew. Nat Lang Eng 14(2):173–190CrossRefGoogle Scholar
  139. 139.
    Zitouni I, Sorensen JS, Sarikaya R (2006) Maximum entropy based restoration of Arabic diacritics. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney. Association for Computational Linguistics, pp 577–584. http://www.aclweb.org/anthology/P/P06/P06-1073
  140. 140.
    Zwicky AM, Pullum GK (1983) Cliticization vs. inflection: English n’t. Language 59(3): 502–513CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.University of HaifaHaifaIsrael

Personalised recommendations