Classifying Lexical Variation
As our goal in this book is to generate a wide range of lexical variants from the same underlying representation, we first of all need to investigate more closely the notion of ‘paraphrase’, and then to delimit the range of variation to be produced in our system. A central idea of our project is to see multilingual generation as a mere extension of the monolingual paraphrase task and to devise the system architecture accordingly: Rephrasing an English sentence with different English words should in principle not be different from rephrasing that same sentence with German words. Obviously, this stance is more difficult to defend when looking at less closely related languages, but for our purposes here we stick to English and German, occasionally looking at other languages for interesting examples. Therefore, we develop the following overview of lexical variation by looking at both paraphrases within a single language (Section 3.1) and differences between languages, the so-called divergences (Section 3.2). Then, Section 3.3 points out the commonalities between the two and argues that from the viewpoint of MLG there should not be a fundamental difference between them.
Unable to display preview. Download preview PDF.