Language Resources and Evaluation

, Volume 42, Issue 1, pp 75–98 | Cite as

Language resources for Hebrew

Article

Abstract

We describe a suite of standards, resources and tools for computational encoding and processing of Modern Hebrew texts. These include an array of XML schemas for representing linguistic resources; a variety of text corpora, raw, automatically processed and manually annotated; lexical databases, including a broad-coverage monolingual lexicon, a bilingual dictionary and a WordNet; and morphological processors which can analyze, generate and disambiguate Hebrew word forms. The resources are developed under centralized supervision, so that they are compatible with each other. They are freely available and many of them have already been used for several applications, both academic and industrial.

Keywords

Language resources Hebrew Corpora Lexicon Morphological processing WordNet 

References

  1. Abney, S. (1996). Statistical methods and linguistics. In J. Klavans & P. Resnik (Eds.), The balancing act: Combining symbolic and statistical approaches to language. Cambridge: The MIT Press.Google Scholar
  2. Adler, M., & Elhadad, M. (2006). An unsupervised Morpheme-based HMM for Hebrew morphological disambiguation. In Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (pp. 665–672). Sydney: Association for Computational Linguistics.Google Scholar
  3. Agirre, E., & Rigau, G. (1996). Word sense disambiguation using conceptual density. In Proceedings of the 16th conference on computational linguistics (pp. 16–22). Morristown: Association for Computational Linguistics.Google Scholar
  4. Bar-Haim, R., Sima’an, K., & Winter, Y. (2005). Choosing an optimal architecture for segmentation and POS-tagging of modern Hebrew. In Proceedings of the ACL workshop on computational approaches to semitic languages (pp. 39–46). Ann Arbor: Association for Computational Linguistics.Google Scholar
  5. Bar-haim, R., Sima’an, K., & Winter, Y. (2008). Part-of-speech tagging of modern Hebrew text. Natural Language Engineering. To appear.Google Scholar
  6. Barkali, S. (2000a). Lux HaP’alim HaShalem (the complete verbs table). In Hebrew (51st ed.). Jerusalem: Rubin Mass.Google Scholar
  7. Barkali, S. (2000b). Lux HaShemot (the nouns table). In Hebrew (18th ed.). Jerusalem: Rubin Mass.Google Scholar
  8. Beesley, K. R., & Karttunen, L. (2003). Finite-state morphology: Xerox tools and techniques. Stanford: CSLI.Google Scholar
  9. Bentivogli, L., Pianta, E., & Girardi, C. (2002). MultiWordNet: Developing an aligned multilingual database. In Proceedings of the first international conference on global Wordnet. Mysore.Google Scholar
  10. Black, W., Elkateb, S., Rodriguez, H., Alkhalifa, M., Vossen, P., Pease, A., & Fellbaum, C. (2006). Introducing the Arabic WordNet project. In Proceedings of the third global WordNet meeting.Google Scholar
  11. Bonnema, R. (1997). Data oriented semantics. Master’s thesis, University of Amsterdam.Google Scholar
  12. Buckwalter, T. (2002). Buckwalter Arabic morphological analyzer. Distributed through LDC as LDC2002L49.Google Scholar
  13. Connolly, D. (1997). XML: Principles, tools, and techniques. O’Reilly.Google Scholar
  14. Dahan, H. (1997). Hebrew–English English–Hebrew dictionary. Jerusalem: Academon.Google Scholar
  15. Daya, E., Roth, D., & Wintner, S. (2004). Learning Hebrew roots: Machine learning with linguistic constraints. In Proceedings of EMNLP’04 (pp. 357–364). Barcelonan.Google Scholar
  16. de Buenaga Rodríguez, M., Hidalgo, J. M. G., & Díaz-Agudo, B. (1997). Using WordNet to complement training information in text categorization. In Proceedings of the 2nd international conference on recent advances in natural language processing.Google Scholar
  17. Diab, M. (2004). The feasibility of bootstrapping an Arabic WordNet leveraging parallel corpora and an English WordNet. In Proceedings of the Arabic language technologies and resources. Cairo: NEMLAR.Google Scholar
  18. Dichy, J., & Farghaly, A. (2003). Roots and patterns vs. stems plus grammar-lexis specifications: On what basis should a multilingual lexical database centered on Arabic be built. In Proceedings of the MT-Summit IX workshop on machine translation for semitic languages (pp. 1–8). New Orleans.Google Scholar
  19. DuBois, P. (1999). MySQL. New Riders.Google Scholar
  20. Fellbaum, C. (Ed.) (1998). WordNet: An electronic lexical database, language, speech and communication. MIT Press.Google Scholar
  21. Fellbaum, C., Palmer, M., Dang, H. T., Delfs, L., & Wolf, S. (2001). Manual and automatic semantic annotation with WordNet. In Proceedings of WordNet and other lexical resources workshop.Google Scholar
  22. Gadish, R. (Ed.) (2001). Klalei ha-Ktiv Hasar ha-Niqqud. In Hebrew (4th ed.). Academy for the Hebrew Language.Google Scholar
  23. Habash, N., & Rambow, O. (2005). Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05) (pp. 573–580). Ann Arbor: Association for Computational Linguistics.Google Scholar
  24. Harabagiu, S. (Ed.) (1998). Usage of WordNet in natural language processing systems: Proceedings of the Coling-ACL 1998 workshop. Montreal: Association for Computational Linguistics.Google Scholar
  25. Har’El, N., & Kenigsberg, D. (2004). Hspell: A free Hebrew speller. Available from http://www.ivrix.org.il/projects/spell-checker/
  26. Ide, N., Bonhomme, P., & Romary, L. (2000). XCES: An XML-based encoding standard for linguistic corpora. In Proceedings of the second international language resources and evaluation conference. Paris.Google Scholar
  27. Ide, N., Romary, L., & de la Clergerie, E. (2003). International standard for a linguistic annotation framework. In SEALTS ’03: Proceedings of the HLT-NAACL 2003 workshop on software engineering and architecture of language technology systems (pp. 25–30). Morristown: Association for Computational Linguistics.Google Scholar
  28. Ide, N. M., & Veronis, J. (Eds.) (1995). Text encoding initiative: Background and contexts. Norwell: Kluwer Academic Publishers.Google Scholar
  29. Itai, A. (2006). Knowledge center for processing Hebrew. In Proceedings of the LREC-2006 workshop “Towards a Research Infrastructure for Language Resources”. Genoa, Italy.Google Scholar
  30. Itai, A., Wintner, S., & Yona, S. (2006). A computational lexicon of contemporary Hebrew. In Proceedings of the fifth international conference on language resources and evaluation (LREC-2006). Genoa, Italy.Google Scholar
  31. Jing, H. (1998). Usage of WordNet in natural language generation. In S. Harabagiu (Ed.), Usage of WordNet in natural language processing systems: Proceedings of the Coling-ACL 1998 workshop (pp. 128–134). Association for Computational Linguistics.Google Scholar
  32. Lavie, A., Wintner, S., Eytani, Y., Peterson, E., & Probst, K. (2004). Rapid prototyping of a transfer-based Hebrew-to-English machine translation system. In Proceedings of TMI-2004: The 10th international conference on theoretical and methodological issues in machine translation. Baltimore.Google Scholar
  33. Mandala, R., Tokunaga, T., Tanaka, H., Okumura, A., & Satoh, K. (1998). Ad hoc retrieval experiments using WordNet and automatically constructed thesauri. In TREC (pp. 414–419).Google Scholar
  34. Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge: MIT Press.Google Scholar
  35. Ordan, N., & Wintner, S. (2007). Hebrew WordNet: A test case of aligning lexical databases across languages. International Journal of Translation, special issue on Lexical Resources for Machine Translation, 19(1), 39–58.Google Scholar
  36. Segal, E. (1997). Morphological analyzer for unvocalized Hebrew words. Unpublished work.Google Scholar
  37. Segal, E. (1999). Hebrew morphological analyzer for Hebrew undotted texts. Master’s thesis, Technion, Israel Institute of Technology, Haifa. In Hebrew.Google Scholar
  38. Shacham, D., & Wintner, S. (2007). Morphological disambiguation of Hebrew: A case study in classifier combination. In Proceedings of EMNLP-CoNLL 2007, the conference on empirical methods in natural language processing and the conference on computational natural language learning. Prague.Google Scholar
  39. Shapira, M., & Choueka, Y. (1964). Mechanographic analysis of Hebrew morphology: Possibilities and achievements. Leshonenu, 28(4), 354–372, In Hebrew.Google Scholar
  40. Sima’an, K., Itai, A., Winter, Y., Altman, A., & Nativ, N. (2001). Building a tree-bank of modern Hebrew text. Traitment Automatique des Langues, 42(2).Google Scholar
  41. Sperberg-McQueen, C. M., & Burnard, L. (Eds.) (2002). Guidelines for text encoding and interchange. Oxford: University of Oxford.Google Scholar
  42. Stern, N. (1994). Milon ha-Poal. Bar Ilan University. In Hebrew.Google Scholar
  43. Szpektor, I., Dagan, I., Lavie, A., Shacahm, D., & Wintner, S. (2007). Cross lingual and semantic retrieval for cultural heritage appreciation. In Proceedings of the ACL-2007 workshop on language technology for cultural heritage data (LaTeCH 2007). Prague.Google Scholar
  44. van der Vlist, E. (2002). XML Schema. O’Reilly.Google Scholar
  45. Wintner, S. (2004). Hebrew computational linguistics: Past and future. Artificial Intelligence Review, 21(2), 113–138.CrossRefGoogle Scholar
  46. Wintner, S. (2007). Finite-state technology as a programming environment. In A. Gelbukh (Ed.), Proceedings of the conference on computational linguistics and intelligent text processing (CICLing-2007) (Vol. 4394 of Lecture notes in computer science, pp. 97–106). Berlin and Heidelberg: Springer.Google Scholar
  47. Wintner, S., & Yona, S. (2003). Resources for processing Hebrew. In Proceedings of the MT-Summit IX workshop on machine translation for semitic languages (pp. 53–60). New Orleans.Google Scholar
  48. Yona, S., & Wintner, S. (2005). A finite-state morphological grammar of Hebrew. In Proceedings of the ACL workshop on computational approaches to semitic languages (pp. 9–16). Ann Arbor: Association for Computational Linguistics.Google Scholar
  49. Yona, S., & Wintner, S. (2007). A finite-state morphological grammar of Hebrew. Natural Language Engineering. To appear.Google Scholar
  50. Zdaqa, Y. (1974). Luxot HaPoal (The verb tables). Jerusalem: Kiryath Sepher. In Hebrew.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  1. 1.Department of Computer Science, TechnionIsrael Institute of TechnologyHaifaIsrael
  2. 2.Department of Computer ScienceUniversity of HaifaHaifaIsrael

Personalised recommendations