Automatic Morphological Annotation in a Text-to-Speech System for Hebrew
The paper presents the module for automatic morphological annotation within a text synthesizer for Hebrew, based on an efficient combination of two approaches. The first approach includes the selection of lexemes from appropriate lexica, while the other approach involves automatic morphological analysis of text input using a complex expert algorithm relying on a set of transformational rules and using 6 types of scoring procedures. The module operates on a set of 30 part-of-speech tags with more than 3000 corresponding morphological categories. The paper discusses the advantages of the proposed method in the context of an extremely morphologically complex language such as Hebrew, with particular emphasis given to the relative importance of individual scoring procedures. When all 6 scoring procedures are applied, the accuracy of 99.6% is achieved on a corpus of 3093 sentences (55046 words).
Keywordspart-of-speech tagging speech synthesis Hebrew
Unable to display preview. Download preview PDF.
- 1.Manning, C., Schütze, H.: Foundations of statistical natural language processing. MIT Press, Cambridge (2000)Google Scholar
- 2.Aronoff, M., Rees-Miller, J.: Morphophonemics of modern Hebrew. Wiley-Blackwell, San Francisco (2003)Google Scholar
- 3.Fellman, J.: Concerning the “revival” of the Hebrew language. Anthropol. Linguist. 15(5), 250–257 (1973)Google Scholar
- 4.Lembersky, G., Shacham, D., Wintner, S.: Morphological disambiguation of Hebrew: A case study in classifier combination. Nat. Lang. Eng. Available on CJO 2012 (2012)Google Scholar