Language-Independent Hybrid MT: Comparative Evaluation of Translation Quality

  • George Tambouratzis
  • Marina Vassiliou
  • Sokratis Sofianopoulos
Part of the Theory and Applications of Natural Language Processing book series (NLP)


The present chapter reviews the development of a hybrid Machine Translation (MT) methodology, which is readily portable to new language pairs. This MT methodology (which has been developed within the PRESEMT project) is based on sampling mainly monolingual corpora, with very limited use of parallel corpora, thus supporting portability to new language pairs. In designing this methodology, no assumptions are made regarding the availability of extensive and expensive-to-create linguistic resources. In addition, the general-purpose NLP tools used can be chosen interchangeably. Thus PRESEMT circumvents the requirement for specialised resources and tools so as to further support the creation of MT systems for diverse language pairs.

In the current chapter, the proposed hybrid MT methodology is compared to established MT systems, both in terms of design concept and in terms of output quality. More specifically, the translation performance of the proposed methodology is evaluated against that of existing MT systems. The chapter summarises implementation decisions, using the Greek-to-English language pair as a test case. In addition, the detailed comparison of PRESEMT to other established MT systems provides insight on their relative advantages and disadvantages, focusing on specific translation tasks and addressing both translation quality as well as translation consistency and stability. Finally, directions are discussed for improving the performance of PRESEMT. This will allow PRESEMT to move beyond the original requirements for an MT system for gisting, towards a high-performing general-purpose MT system.


Machine Translation Target Language Conditional Random Field Source Language Parallel Corpus 

1 Introduction

Rule-based machine translation (RBMT) is one of the oldest MT paradigms and still has a substantial influence over modern MT systems. An RBMT system relies on creating a comprehensive set of rules at various levels e.g. in syntax, semantics etc. for translating between two languages.

RBMT systems have been developed for over 50 years and still remain one of the most popular paradigms because of their superior translation quality. However, the main disadvantage of this paradigm is that in most cases it is impossible to use directly the rules and expertise of an existing RBMT system to create a translation system for new language pairs. In addition, progress in RBMT is hindered mainly by inadequate grammar resources for most languages and absence of appropriate lexical resources and methods that would enable correct disambiguation and lexical choice.

The alternative approach is the development of Corpus-based (CBMT) approaches for MT. The advantage of corpus-based approaches lies in the hypothesis that language-specific information can be induced rather than being hand-written explicitly as was done in RBMT. In CBMT linguistic rules denoting the syntactic and semantic preferences of words, as well as word order, constitute a large part of the implicit information provided by the corpus. As a result, much of the linguistic knowledge is retrieved directly from the corpora, while rules are minimised. The two major approaches within CBMT are Example-Based MT (EBMT) and Statistical MT (SMT).

In terms of research activity, the most important representative of CBMT is the SMT paradigm. SMT has been introduced by Brown et al. (1993), while the most recent developments are summarised in Koehn (2010). A main benefit of SMT is that it is directly amenable to new language pairs using the same set of algorithms. However, an SMT system requires appropriate training data in the form of parallel corpora for extracting the relevant translation models. Thus, to develop an SMT system from a source language (SL) to a target language (TL), SL-TL parallel corpora of the order of millions of tokens are required to allow the extraction of meaningful models for translation. Such corpora are hard to obtain, particularly for less resourced languages and are frequently restricted to a specific domain (or a narrow range of domains), and thus are not suitable for creating general-purpose MT systems that focus on other domains. For this reason, in SMT, researchers are increasingly using syntax-based models as well as investigating the extraction of information from monolingual corpora, including lexical translation probabilities (Klementiev et al. 2012) and topic-specific information (Su et al.2012).

The third MT paradigm is EBMT (Gough and Way 2004; Hutchins 2005), which is based on having a set of known pairs of input sentence (in SL) and corresponding translation (in TL) and translations are generated by analogy, by appropriately utilising the information within this set.

In a bid to achieve higher translation quality, researchers have studied the combination of principles from more than one MT paradigm, leading to what is termed as Hybrid MT (HMT). Examples of HMT include the systems by Eisele et al. (2008) and Quirk and Menezes (2006). The general convergence of MT systems towards the combination of the most promising characteristics of each paradigm has been documented by Wu (2005, 2009), having started from pure MT systems belonging to one of the main paradigms (RBMT, SMT, EBMT) and increasingly progressing towards systems that combine characteristics from multiple paradigms. A comprehensive survey of the latest HMT activity is provided by Costa-Jussa et al. (2013).

Alternative techniques have been studied for creating MT systems requiring resources which may be less informative but are also less expensive to collect or to create from scratch. The approach adopted has been to eliminate the parallel corpus needed in SMT (or drastically reduce its size), employing instead monolingual corpora. Monolingual resources can be readily assembled for any language, for instance by harvesting the web with relatively low effort. Methods following this approach had been proposed by Carbonell et al. (2006), Dologlou et al. (2003), Carl et al. (2008) and Markantonatou et al. (2009). Though these methods do not provide a translation quality as high as SMT, their ability to develop MT systems with a very limited amount of specialised resources represents an important starting point.

It is on the basis of the aforementioned works that the PRESEMT methodology (Tambouratzis et al. 2013) has been established. In PRESEMT, the design decision is to use a large monolingual corpus, supplemented by a small parallel corpus (whose size is only a few hundred sentences) to provide information on the mapping of sentence structures from SL to TL. The design brief for PRESEMT has been to create a language-independent methodology that -with limited resources- can translate unconstrained texts giving a quality suitable for gisting purposes. According to the preceding review of MT systems, PRESEMT can be classified within Hybrid MT, based on the argumentation of Quirk and Menezes (2006) and Wu (2005) for cross-fertilisation between SMT and EBMT.

The reader may visit the project’s website1 to either download the PRESEMT package and some limited resources for the German-to-English and Greek-to-English language pairs or run the fully functional online system that currently supports 13 language pairs. The website also provides detailed technical documentation and links to the standalone versions of the major PRESEMT modules hosted at Google Code.

2 Description of the PRESEMT Methodology

The MT methodology has been developed within the PRESEMT (Pattern REcognition-based Statistically Enhanced MT) project, funded by the European Commission. The MT methodology encompasses three stages:

Stage 1: Pre-processing of the input sentence. This involves tagging, lemmatising and grouping the tokens into phrases, in preparation for the actual transformation from SL to TL.

Stage 2: Main translation, where the actual translation output is generated. The main translation process can in turn be divided into two phases, namely:
  • Phase A: the establishment of the translation structure in terms of phrase order

  • Phase B: the definition of word order and resolution of lexical ambiguities at an intra-phrase level

Stage 3: Post-processing. The tokens in TL are generated from lemmas.

In terms of resources, PRESEMT employs the following:
  • A bilingual lemma dictionary providing SL—TL lexical correspondences

  • An extensive TL monolingual corpus, compiled via web crawling, to generate a language model

  • A very small bilingual corpus

The bilingual corpus only numbers a few hundred sentences, which provide samples of the structural transformation when moving from SL to TL. The use of a small corpus reduces substantially the need for locating parallel corpora, whose procurement or development can be extremely expensive. Instead, in PRESEMT due to its small size the parallel corpus can be assembled with limited recourse to costly human resources. More specifically, in the present chapter such corpora are assembled from available parallel corpora which are extracted from multilingual websites. These corpora are only processed by replacing free translations with more literal ones, to allow the accurate extraction of structural modifications. According to the specifications of the methodology, the parallel corpus coverage is not studied prior to integration in PRESEMT.

3 Processing the Parallel Corpus

The present section describes how the parallel corpus is analysed, to extract information supporting the MT process. Initially, the bilingual corpus is annotated with lemma and Part-of-Speech (PoS) information and other language-specific morphological features (e.g. case, number, tense etc.). Furthermore, the TL side is chunked into phrases. As the PRESEMT methodology has been developed to maximise the use of publicly-available software, the user is free to select any desired parser for the TL language. For the implementation reported here, TreeTagger (Schmid 1994) has been used for the English (TL) text processing and the FBT PoS tagger (Prokopidis et al. 2011) has been employed for the processing of the Greek (SL) text.

3.1 Aligning the SL and TL Tokens

To determine the optimal transfer of phrases from SL to TL, it is essential to have the sentences of the parallel corpus split into corresponding phrases in both SL and TL. Development work in earlier systems revealed that the establishment of equivalent phrasing schemes for SL and TL is very time-consuming, and thus cannot form the basis for an MT methodology which is readily portable to new language pairs with minimal effort. To avoid either (a) having to locate an additional SL side parser or (b) resolving (most likely by hand-written rules) the inconsistencies of two separate parsers in different languages, in PRESEMT the Phrase aligner module (PAM) (Tambouratzis et al. 2011) is implemented. This module is dedicated to transferring to the SL side the TL side parsing scheme, which encompasses lemma, tag and parsing information. The information being transferred encompasses phrase boundaries and phrase type, where the TL-phrase type is used to characterise the SL phrase.

PAM establishes the SL-side phrasing automatically, based on the TL phrasing, by (a) identifying SL-to-TL token alignments and (b) extracting probabilistic information of SL-tag to TL-tag correspondences. In this process the types of allowed alignment from SL-to-TL are n-to-m (where n ≥ 1 and m ≥ 1).

The information used to perform the alignments includes lexical information as well as statistical data on PoS tag correspondences extracted from the lexicon. More specifically, PAM follows a 3-step process, where in each subsequent step tokens that remain unaligned are processed. Each step has a lower likelihood of producing the correct alignment, as it uses more general information to achieve alignment. In the first step alignments are performed on the basis of the bilingual lexicon entries. In the second step, existing alignments are discovered based on similarity of grammatical features between adjoining tokens and PoS tag correspondences. Finally, in the third step, alignments are performed based on the established alignments of their neighbouring words. This process is described in more detail in Tambouratzis et al. (2012).

Following this step, phases in SL are established, by grouping together all tokens from SL that correspond to the same single TL phrase. In the case of split phrases, the constituents are merged into a single phrase.

3.2 Phrasing the Input Text

Following the transfer of the phrasing scheme to SL, training examples have been prepared which describe the appropriate phrasing of SL texts according to the TL parser. Now, the aim is to construct a linguistic tool (termed Phrasing model generator) that can accurately segment arbitrary input text into phrases which are compatible with the TL phrasing scheme. If this is achieved, the aligned parallel corpus can be used to transform the structure from SL to TL.

When initiating the work on the Phrasing model generator (PMG), a survey of the literature was undertaken for appropriate methods. It was established that among the statistical-based models used, Conditional Random Fields (CRF) provides the most promising avenue, due to the considerable representation capabilities of this model (Lafferty et al. 2001). CRF is a statistical modelling method that takes context into account to predict labels for sequences of input samples. Within PRESEMT, the open-source implementation of CRF has been employed. In addition, comparative experiments have shown that it provides a performance superior to that of other approaches, both statistical (based on Hidden Markov models) and rule-based ones.

A recent development has involved the implementation of an alternative phrasing methodology (termed PMG-simple) based on template-matching principles. PMG-simple is trained with the parallel corpus, similarly to CRF. However, the PMG-simple learning method is different. The wide acceptance of CRF is based on its use of complex mathematical models, which require a wealth of training data. Since the small PRESEMT parallel corpus is the sole source for training data, it is likely that the data available for CRF to learn the phrasing scheme is limited.

PMG-simple locates phrases that match exactly what it has seen before, based on a simple template-matching algorithm (Duda et al. 2001). In contrast to CRF, which constructs an elaborate mathematical model, PMG-simple implements a greedy search (Black 2005) without backtracking. In PMG-simple, initially all phrases from the SL side of the parallel corpus are recorded and are then inserted into a list, ordered according to their likelihood of being accurately detected. The aim is then to determine the most likely phrases to which the sentence can be split, starting with no phrases being defined in the sentence to be segmented. At each turn, the phrase with the highest likelihood is chosen, and for this phrase PMG-simple examines if the corresponding sequence of tokens occurs at any point in the input sentence (taking into account word tag and case information). If it does and none of the constituent words in the sentence form part of an already established phrase, the constituent words are marked as parts of this phrase and are no longer considered in the phrase-matching process. On the other hand, if the phrase sequence does not exist, or if at least one of the required constituent tokens is already allocated to another phrase, no match is attained. In this case, the next phrase from the ordered list is considered, until either the ordered phrase list is exhausted, or all sentence tokens are assigned into phrases. In order to improve the performance of PMG-simple, a generalisation step is added to the PMG-simple mechanism (for more details cf. Tambouratzis 2014), which provides equivalence information between PoS types, to enrich the variety of phrasal templates that may be established from the parallel corpus. Comparative results for CRF and PMG-simple are reported in the evaluation section (Sect. 7.2).

4 Main Translation Engine

Local and long distance reordering is one of the most challenging aspects of any machine translation system. In phrase-based SMT, numerous approaches have used pre-processing techniques that perform word reordering in the source side based on the syntactic properties of the target side (Rottmann and Vogel 2007; Popovic and Ney 2006; Collins et al. 2005) in order to overcome the long distance word reordering problem. Short range reorderings are captured by the phrase table and the target side language model. Of course, in order for the statistical approaches to be effective, a sizeable amount of parallel training data needs to be available.

In PRESEMT, translation is performed in two steps, each step challenging different aspects of the translation process making use of syntactic knowledge. The first step performs structural transformation of the source side in accordance with the syntactic phrases of the target side, trying to capture long range reordering, while the second step makes lexical choices and performs local word reordering within each phrase. Because of the modular nature of PRESEMT each one of the steps is performed by a separate module: structural transformations are executed by the structure selection module (SSM), while local word reordering and disambiguation by the translation equivalent selection module (TES).

5 Structure Selection Module (SSM)

The objective of the Structure selection module is to transform the structure of the input text using the limited bilingual corpus as a structural knowledge base, closely resembling the “translation by analogy” aspect of EBMT systems. Using available structural information, namely the type of syntactic phrases, the part-of-speech tag of the head token of each phrase and the case of the head token (if available) we retrieve the most similar source side sentence from the parallel corpus. Using the stored alignment information from the corpus between the source and target side, we then perform all necessary actions in order to transform the structure of the input sentence to the structure of the target side of the corpus sentence pair.

Figure 1 depicts the functionality of the structure selection module. The input is a source sentence that has been annotated with PoS-tag and lemma information and segmented in clauses and chunks by the Phrasing model generator; the output is the same sentence with a target language structure.
Fig. 1

Data flow in the structure selection module

For the retrieval of the most similar source side sentence, we selected an algorithm from the dynamic programming paradigm, treating the structure selection process as a sequence alignment, aligning the input sentence to a source side sentence from the aligned parallel corpus and assigning a similarity score. The implemented algorithm is based on the Smith-Waterman algorithm (Smith and Waterman 1981), initially proposed for performing local sequence alignment for determining similar regions between two protein or DNA sequences, structural alignment and RNA structure prediction. The algorithm is guaranteed to find the optimal local alignment between the two input sequences at clause level.

The similarity of two clauses is calculated using intra-clause information by taking into account the edit operations (replacement, insertion or removal) needed to be performed to the input sentence in order to transform it to a source side sentence from the corpus. Each of these operations has an associated cost, considered as a system parameter. The aligned corpus sentence that achieves the highest similarity score is the most similar one to the input source sentence.

5.1 Calculating Similarity Using a Dynamic Programming Algorithm

The source sentence is parsed in accordance to the phrasing model extracted from the Phrasing model generator (PMG). The first step of the algorithm is to compare each input source sentence (ISS) of the SL text to all the source side sentences of the parallel corpus in terms of structure. A two-dimensional table is built with each of the ISS phrases occupying one column (the corresponding phrases being shown at the top of the table) and the candidate corpus sentence (CCS) phrases each occupying one row (the corresponding CCS phrases being shown along the left side of the table). A cell (i, j) represents the similarity of the subsequence of elements up to the mapping of elements Ei of CCS and E’j of ISS. Elements refer to syntactic phrases, represented by their type and the Part-of-speech (PoS) tag and case (where available) of each phrase head word.

The value of cell (i, j) is filled by taking into account the cells directly to the left (i, j − 1), directly above (i − 1, j) and directly above-left (i − 1, j − 1), these containing values V1, V2 and V3 respectively, and is calculated as the maximum of the three numbers {V1, V2, V3 + ElementSimilarity(Ei, E’j)}. While calculating the value of each cell, the algorithm also keeps tracking information so as to allow the construction of the actual alignment vector.

The similarity of two phrases (PhrSim) is calculated as the weighted sum of the phrase type similarity (PhrTypSim), the phrase head PoS tag similarity (denoted as PhrHPosSim), the phrase head case similarity (PhrHCasSim) and the functional phrase head PoS tag similarity (PhrfHPosSim):
$$ \begin{aligned}&\mathrm{PhrSim}\left({\mathrm{E}}_{\mathrm{i}},\ {\mathrm{E}{\hbox{'}}}_{\mathrm{j}}\right)\\ &\quad = {\mathrm{W}}_{\mathrm{phraseType}}*\mathrm{PhrTypSim}\left({\mathrm{E}}_{\mathrm{i}},\ {\mathrm{E}{\hbox{'}}}_{\mathrm{j}}\right) + {\mathrm{W}}_{\mathrm{headPoS}}*\mathrm{PhrHPosSim}\left({\mathrm{E}}_{\mathrm{i}},\ {\mathrm{E}{\hbox{'}}}_{\mathrm{j}}\right)\\ &\qquad + {\mathrm{W}}_{\mathrm{headCase}}*\mathrm{PhrHCasSim}\left({\mathrm{E}}_{\mathrm{i}},\ {\mathrm{E}{\hbox{'}}}_{\mathrm{j}}\right) + {\mathrm{W}}_{\mathrm{fheadPoS}}*\mathrm{PhrfHPosSim}\left({\mathrm{E}}_{\mathrm{i}},\ {\mathrm{E}{\hbox{'}}}_{\mathrm{j}}\right)\end{aligned} $$
In the current implementation of the algorithm, the weights have been given the following initial values, yet the optimal values are to be determined during an optimisation phase:
  • WphraseType = 0.6

  • WheadPoS = 0.1

  • WfheadPoS = 0.1

  • WheadCase = 0.2

For normalisation purposes, the sum of the four aforementioned weights is equal to 1.

The similarity score range is from 100 to 0, denoting exact match and total dissimilarity between two elements Ei and E’j respectively. In case of a zero similarity score, a penalty weight (−50) is employed, to further discourage selection of such correspondences.

When the algorithm has reached the jth element of the ISS, the similarity score between the two SL clauses is calculated as the value of the maximum-scoring jth cell. The CCS that achieves the highest similarity score is the one closest to the input SL clause in terms of phrase structure information.

Apart from the final similarity score, the comparison table of the algorithm is used for finding the actual alignment of phrases between the two SL clauses. By combining the SL clause alignment from the algorithm with the alignment information between the CCS and the attached TL sentence, the ISS phrases are reordered according to the TL structure. The algorithm has been extended to tackle the subject pronoun drop phenomenon in languages like Greek, using the same alignment information. In the parallel corpus when a subject is dropped from either side of a given corpus sentence, then the phrase containing the subject on the other side will be mapped to an “empty phrase”. This allows the algorithm to exploit this information during translation in order to add or remove the subject phrase accordingly in TL.

If more than one CCS achieve the same similarity score, and they lead to different structural transformations, then the module returns both results as equivalent TL structures. Moreover, if the highest similarity score is lower than a threshold, the input sentence structure is maintained, to prevent transformation towards an ill-fitting prototype. For most of our experiments an indicative threshold value is between 85 and 90 %. For a better understanding of this approach an example is provided next with Greek as the source language and English as the target.

5.2 Structural Similarity Example

The input source sentence is the following:
  • Mε τον όρο Mηχανική Mετάφραση αναφερόμαστε σε μια αυτοματοποιημένη διαδικασία.

  • Exact translation: with the term Machine Translation refer (1st pl) to an automated procedure

  • Correct translation: The term Machine Translation denotes an automated procedure

The ISS phrase structure representation, after applying the parsing scheme using the Phrase aligner module, is the following:
One of the retrieved candidate sentence pairs from the aligned bilingual corpus is the following:
  • (Greek) Οι ιστορικές ρίζες της Ευρωπαϊκής Ένωσης ανάγονται στο Δεύτερο Παγκόσμιο Πόλεμο.

  • (Lexicon-based translation) the historical roots the gen European gen Union gen lie (3rd pl ) in-the Second World War

  • (English) “The historical roots of the European Union lie in the Second World War”
    Fig. 2

    Example of a dynamic programming table

Its structural information being:
  • pp(no_nm) pp(no_ge) vg(vb) pp(no_ac)

After calculating the similarity scores for each phrase pair of the above sentences (the input sentence ISS and the SL-side sentence from the bilingual corpus, hereafter denoted as Aligned Corpus Sentence—ACS) the dynamic programming table (depicted in Fig. 2) is filled out (the arrows denoting the longest aligned subsequence):

When an arrow moves diagonally from cell A to cell B, this denotes that the phrases mapped at cell A are aligned. When an arrow moves horizontally, the ISS phrase is aligned with a space, and when an arrow moves vertically the ACS phrase is aligned with a space.

Figure 2 forms then the base for calculating the transformation cost (being 340 in this case), on the basis of which the ISS is modified in accordance to the attached TL structure.

6 Translation Equivalent Selection Module (TES)

The second step of the PRESEMT translation process is the translation equivalent selection module, which performs word translation disambiguation, local word reordering within each syntactic phrase as well as addition and/or deletion of auxiliary verbs, articles and prepositions. In the default settings of PRESEMT, all of the above are performed by only using a syntactic phrase model extracted from a large TL monolingual corpus. The final translation is produced by the token generation component, since all processing during the translation process is lemma-based.

The module input is the output of the structure selection module, augmented with the TL lemmata of the source words provided by the bilingual lexicon. Each sentence contained within the text to be translated is processed separately, so there is no exploitation of inter-sentential information. The first task is to select the correct TL translation of each word. In the PRESEMT methodology, alternative methods for the word translation disambiguation have also been integrated and can be used instead of the default. These include Self-Organising Maps, n-gram vector space models or SRI n-gram models extracted from the TL monolingual corpus, though none of these is used in the PRESEMT configuration reported here. The second task involves establishing the correct word order within each phrase. With the default settings of the PRESEMT system this step is performed simultaneously with the translation disambiguation step, using the same TL phrase model. In the case of selecting one of the alternative methods for disambiguation, the phrase model is used only for local word reordering, within the boundaries of the phrases. During word reordering the algorithm also resolves issues regarding the insertion or deletion of words such as articles and other auxiliary tokens. Finally, token generation is applied to the lemmas of the translated sentence together with their morphological features. In that way, the final tokens are generated.

The token generator used constitutes a simple mapping from lemmas and morphological features to tokens. This mapping has been extracted from morphological and lemma information contained in the monolingual corpus. Due to data sparseness, such a mapping will always contain gaps particularly in the case of rather infrequent words. A more sophisticated approach would therefore try to close the gaps in the inflectional paradigms of the lemmas. This could for instance be done by inferring inflectional paradigms of infrequent words from those of more frequent words.

Figure 3 provides an overview of the translation equivalent selection module, which receives as input the output of the first phase of the main translation engine, i.e. a source sentence with its constituent phrases reordered in accordance to the target language. The output is the final translation generated by the system.
Fig. 3

Data flow in the translation equivalent selection module

It should be noted that the two instances of the TL monolingual corpus depicted in the diagram are actually two different TL models: the first instance refers to the indexed model of TL phrases (see Sect. 6.1), whereas the second one refers to a table of lemma-token pairs, which is also extracted from the TL corpus.

6.1 Description of the Phrase Model

The phrase model used in the translation equivalent selection module is in essence a language model, but instead of n-grams of words such as used in most SMT systems, the words here are grouped together based on the syntactic phrases extracted from the chunked TL monolingual corpus. The extracted phrases are then organised in a hash map, using as a key the following 3 criteria: (1) type of the syntactic phrase (i.e. whether it is a noun phrase or a verb phrase), (2) lemma of the phrase head word and (3) PoS tag of the phrase head word. For each TL phrase we store its frequency of occurrence in the corpus. However, it is likely that a slightly different modelling scheme may prove more effective. For instance, the environment of the phrase may also be required to be used (i.e. the type of the previous and next phrases within the sentence may be of use in the translation equivalent selection, and in this case the phrase organisation may be modified) either in the current model or in a complimentary model for the structural context.

Finally, each map is serialized and stored in a separate file in the file system, with an appropriate name for easy retrieval. For example, for the English monolingual corpus, all verb phrases with the lemma of the head token being “read” (verb) and the PoS tag “VV”, are stored in a file named “read_VV”.

For example, let us assume a very small TL-side monolingual corpus consisting only of the following sentence: “A typical scheme would have eight electrodes penetrating human brain tissue; wireless electrodes would be much more practical and could be conformal to several different areas of the brain.” The syntactic phrases extracted from this small corpus are shown in Table 1, while the files created for the model are shown in Fig. 4. Because all phrases only appear once in the corpus, the frequencies are omitted in the specific example.
Table 1

Syntactic phrases extracted from the TL monolingual corpus

Phrase id.

Phrase type

Phrase content

Phrase head lemma/PoS



A typical scheme




Would have




Eight electrodes








Human brain tissue




Wireless electrodes




Would be




Much more practical




Could be








To several different areas




Of the brain


Fig. 4

Example of monolingual corpus phrases split into files

It should be noted that, with respect to large corpora, in order to reduce the number of files created, if a sub-group file remains very small (based on the definition of a small threshold value), it is not stored independently but is grouped with all other phrases from very small files. This allows (1) the reduction in the number of files, in order to prevent the creation of an excessive number of groups but also (2) allows the system to process phrases with heads for which no groups have been created from the monolingual corpus. Another step towards reducing the number of produced files is to altogether skip the creation of files for phrases that only contain a single word, as these would not be of use for word reordering.

One issue that has been studied during the implementation of the translation equivalent selection is the sheer size of the monolingual corpus, which necessitates special techniques to organise and process it, so that during run-time the required intermediate results are readily available, to minimise the computational load. To obtain a more precise understanding of the task, it is essential to have a quantitative view of the corpora involved. The monolingual corpora for two of the PRESEMT target languages, namely English and German, are summarised in Table 2.
Table 2

Characteristics of monolingual corpora




Size in tokens



Number of raw text files (each containing a block of ca. 1 Mbyte)



Number of sentencesa

1.0 × 108

9.5 × 107

Number of phrasesa

8.0 × 108

6.0 × 108

Number of extracted phrase files



a Indicates the inclusion of an estimate rather than an exact value

6.2 Applying the Phrase Model to the Tasks of the Translation Equivalent Selection Module

When initiating the translation equivalent selection module, a matching algorithm accesses the TL phrase model to retrieve similar phrases and select the most similar one through a comparison process with the aim of performing word sense disambiguation and establishing the correct word order within each phrase. The comparison process is viewed as an assignment problem that can be solved by using either exact algorithms guaranteeing the identification of the optimal solution or algorithms which yield sub-optimal solutions. In the current implementation we have opted for the Gale-Shapley algorithm (Gale and Shapley 1962 and Mairson 1992), a non-exact algorithm, over the Kuhn-Munkres algorithm (Kuhn 1955 and Munkres 1957) that computes an exact solution of the assignment problem. This has been decided upon after experimentation with the Kuhn-Munkres algorithm in the METIS-2 project Markantonatou et al. (2009), where it has been found that the exact solution of the assignment problem was responsible for a large fraction of the required computation effort.

On the contrary, the Gale-Shapley algorithm solves the assignment problem by separating the items into two distinct sets with different properties. In this approach, the two sets are termed (1) suitors and (2) reviewers. In the present MT application, the aim is to create assignments between tokens of the SL (which are assigned the role of suitors) and tokens of the TL (which undertake the roles of reviewers). In the Gale-Shapley algorithm, the two groups have different roles. More specifically, the suitors have the responsibility of defining their order of preference of being assigned to a specific reviewer, giving an ordered list of their preferences. Based on these lists, the reviewers can select one of the suitors by evaluating them based on their ordered lists of preference, in subsequent steps revising their selection so that the resulting assignment is optimised. As a consequence, this process provides a solution which is suitor-optimal but potentially non-optimal from the reviewers’ viewpoint. However, the complexity of the algorithm is substantially lower to that of Kuhn-Munkres and thus it is used in the translation equivalent selection process as the algorithm of choice, so as to reduce the computation time required. Any errors due to using this sub-optimal approach are limited to the reordering of phrases on the TL-side, with no lexical selection changes (since these are decided upon by sampling the files of phrases).

The main issue at this stage is to be able to reorder appropriately any items within each phrase, while at the same time selecting the most appropriate translation for each word that the bilingual lexicon has provided. This entails that tokens from (a) a given phrase of the input sentence, call it ISP (Input Sentence Phrase), and from (b) a TL phrase extracted from the TL phrase model and denoted as MCP (Monolingual Corpus Phrase), are close to each other in terms of number of tokens and type. More specifically, the number and identity of items in a given MCP being used as a template is at least equal to (or larger than) the number of elements in ISP (since it is required to be in a position to handle all tokens of ISP, it is safer to delete existing MCP elements from their existing locations rather than introduce new ones). In principle, the number of ISP tokens should be equal to or very close to that of MCP. This means that a search needs to be performed, which is algorithmically described by the following steps:

Step 1: For each phrase (ISP) a decoder creates a vector containing all translation equivalents using the bilingual lexicon. The number of vectors created is the same as the number of translation equivalents of the phrase head words. The word order does not change.

Step 2: Iteratively process each vector; retrieve for each one the corresponding set of phrases of MCP from the phrase model, based on the phrase type, the lemma and PoS tag of the phrase head token.

Step 3: For each ISP in the vector apply the Gale-Shapley algorithm for aligning the tokens of the ISP to those of the retrieved MCPs. The word alignment provides a guideline for reordering the ISP according to the MCP word order and also provides a similarity score through a comparison formula applied to each one of the aligned word pairs (see the equation below). The similarity score is calculated as the weighted sum of the comparison scores of four types of information, namely (a) phrase types (PTypeCmp), (b) phrase head word lemma (LemCmp), (c) phrase head word PoS tag (TgCmp) and (d) phrase case (CsCmp), if this latter information is available.
$$ Score={\displaystyle {w}_{ptype}}\times PTypeCmp+{\displaystyle {w}_{lem}}\times LemCmp+{\displaystyle {w}_{tag}}\times TgCmp+{\displaystyle {w}_{case}}\times CsCmp $$
where all weights are real positive-valued parameters that sum up to one.

Step 4: After performing all comparisons the best matching ISP-MCP pair is selected taking into account the similarity score as well as the MCP frequency of occurrence in the TL corpus. Similarity scores are not compared as absolute values but as a ratio, so as to allow the insertion and/or deletion of words such as articles and other functional words. If the similarity scores of two or more MCPs are close according to the ratio score, then we compare their frequencies in order to select one, and only if the frequencies are also close, we use the absolute comparison values to select the most appropriate ISP-MCP.

Step 5: By selecting the most appropriate ISP-MCP pair, the algorithm performs lexical disambiguation by rejecting all other equivalent ISPs in the vector. Words are also reordered based on the MCP using the word alignment produced by the Gale-Shapley algorithm. After applying the previous steps for all phrases in the input sentence, the final sentence translation is produced. It should be noted that the order of phrases has already been established in the structure selection module.

Step 6: A token generator component is applied to the lemmas of the TL sentence together with their morphological features. In that way, the final tokens are generated and the final translation is produced.

6.3 Example of Translation Equivalent Selection

To illustrate the Translation equivalent selection, the handling of the final phrase from the example sentence of Sect. 5.2 is discussed here. This specific phrase comprises four tokens as shown in Table 3, where for reasons of simplicity abbreviated SL tags are used, including PoS tag and case only:
Table 3

Phrase tokens and their tags in SL and TL respectively

Token id.

SL token (lemma)

SL PoS tag

Lemmas from bilingual lexicon

TL PoS tag




[at, in, into, on, to, upon]



μια (ένας)


[1, a, an, one]



αυτοματοποιημένη (αυτοματοποιημένος)


[automate, automated]



διαδικασία (διαδικασία)


[procedure, process]


In this phrase, the fourth token (“διαδικασία”) is the phrase head. Thus, when searching for the best phrase translation, the indexed files for each of the two candidate translations (“procedure” and “process”) are searched using as an additional constraint the phrase type (in this case “PC”) and the head PoS type (here “NN”). Hence, following the annotation introduced in Sect. 6.1, files “PC/procedure_NN” and “PC/process_NN”, which contain 21,939 and 35,402 distinct entries, respectively, are searched for matching phrase occurrences. Since all four tokens have multiple translations suggested by the lexicon, a number of possible combinations (6 × 4 × 2 × 2 = 96) of lemma sequences need to be matched to the phrase instances contained in the two indexed files. The best-matching phrase instances retrieved from the indexed files are shown in Table 4 in order of retrieval.
Table 4

Candidates of phrase translation retrieved from TL model


Sequence of tokens (lemmatised)

Originating indexed file

TL corpus frequency

Matching score (%)


To a store procedure





To an automate procedure





In an ongoing process




As can be seen, the first two entries are retrieved from the file containing PC-type phrases with “procedure” as their head while the third one from the file containing PC phrases with head “process”. An exhaustive search of the two indexed files has shown that no exact matches to the input phrase exist. The highest matching score is 92.5 %, as for all three examined phrases the lemma of the third token is not matched. Still, the 92.5 % score is sufficiently high to form a sound basis for the translation (on the contrary if it was below a user-defined threshold typically chosen from the range of 75 to 90 %, this translation would be rejected and the SL order of tokens in the phrase would be adopted). In addition, the frequencies of candidates 2 and 3 are comparable, differing by less than an order of magnitude. As all retrieved phrases have equal matching scores, the winning phrase is selected to be the one with the highest frequency of occurrence in the TL monolingual corpus. In this specific example, based on the contents of the 4th column, the chosen phrase is the third phrase of Table 4. This phrase is then used as the basis for translating the respective SL-side phrase, by replacing the token “ongoing” (which is not an appropriate translation, based on the bilingual lexicon) with the token “automated” that is suggested by the lexicon. The sequence obtained with this replacement (namely “in an automated process”) represents the translation of this phrase, which forms part of the final sentence translation.

7 Evaluation of the PRESEMT MT System

The current section provides an account of the evaluation conducted in order to assess the performance of PRESEMT both individually and against other MT systems with respect to the Greek-to-English language pair.

MT systems are normally evaluated via automatic metrics, which compare the MT system output to one (or more) human-produced reference translation(s) and calculate their similarity. Such metrics, pretty much established and widely used in the field, include BLEU, NIST, Meteor and TER.

BLEU (Papineni et al. 2002) and NIST (NIST 2002) measure the common n-grams between the system output and the reference translation. The BLEU score may range between [0, 1], with 1 denoting a perfect match, i.e. a perfect translation, while the NIST score range is [0, ∝), where a higher score signifies a better translation quality. Meteor (Denkowski and Lavie 2011) calculates similarity against each reference translation and produces in the end the highest score. Its score range is [0, 1], with 1 signifying a perfect translation. Finally TER (Snover et al. 2006) resembles the philosophy of the Levenshtein distance (Levenshtein 1966), in that it calculates the minimum number of edits needed to change a candidate translation so that it exactly matches one of the reference translations, normalised by the average length of the references (Snover et al. 2006, p. 3).

The translation output of MT systems can also be evaluated by humans, usually in terms of adequacy, referring to how much information of the source language text has been retained in the translation, and fluency, which measures the degree to which the translation is grammatically well-formed according to the grammar of the target language.

For PRESEMT both types of evaluation were employed. The present section describes the results of the automatic evaluation only for the Greek to English language pair. The interested reader is referred to the project deliverable D9.2,2 which additionally reports on the human evaluation process and contains the results of the evaluation for the other language pairs that PRESEMT handles.

7.1 Dataset

The test dataset, which was used for the evaluation, was collected over the web in accordance to appropriately defined specifications. More specifically, the web was crawled over for retrieving a corpus of 1000 sentences, the length of which ranged between 7 and 40 tokens. Subsequently, 200 sentences were randomly chosen out of the given corpus, these sentences constituting the test dataset. Since the specific dataset was intended to be used for development, its size was purposely kept small.

Then, these sentences were manually translated by native speakers of Greek into English. The correctness of the translations, which would serve as reference ones, was next checked by target language-native speakers, who were independent to the ones that originally created the data. Table 5 illustrates the profile of the test dataset.
Table 5

Description of the test dataset

Source language


Target language


Sentence size

7–40 tokens

Number of tokens


Test dataset size for automatic evaluation

200 sentences

Number of reference translations


7.2 Evaluation Results

Our goal during evaluation was not only to evaluate the translation output of PRESEMT but also to examine how it performs in comparison to other MT systems. Therefore, the test dataset was translated by three other MT systems as well: Google Translate, Bing Translator and WorldLingo.

Our experiments (cf. results in Table 6) span two different time periods, namely mid-2012 (as reported in Sofianopoulos et al. (2012)) and beginning of 2014, thus allowing the reader to form a view of how those MT systems evolve over time. Especially as far as PRESEMT is concerned, two sets of results are provided, reflecting the utilisation of a different module for the segmentation of the SL input in phrases (based on CRF and PMG-simple respectively, as discussed in Sect. 3.2). For each system, the scores obtained in 2012 and 2014 are provided, together with a measure of the improvement in each metric, expressed as a percentage. Since PMG-simple is a recent enhancement to the PRESEMT system, the improvement in performance is reported over the 2012 PRESEMT system employing CRF.
Table 6

Evaluation results for PRESEMT and other MT systems for Greek-to-English





















+74 %

+97 %

−5 %

+8 %













+16 %

+20 %

−3 %

+4 %













+12 %

+17 %

−1 %

+6 %













−12 %

−18 %

+42 %

−9 %

WorldLingo translations are not included for 2014, since they could not be obtained via the corresponding website (

From the evaluation results it is evident that PRESEMT has exhibited a remarkable improvement, which is the outcome of various factors such as the modifications in the Structure selection algorithm, the fine-tuning of the Phrase aligner and the PMG, the enhancement of the token generation module through the expansion of the TL monolingual corpus, lexicon corrections, and the improved handling of syntactic phenomena.

PRESEMT is outperformed by 215 % and by 162 % by Google and Bing respectively in the first period (for the BLEU metric), indicating a very large difference in translation quality, this difference being markedly reduced (to 51 % and to 43 %) in the second period. This indicates a substantial improvement in the quality of the translation generated by PRESEMT. In addition, the improvement in PRESEMT is quite high (always being in absolute values more than 15 %), while the corresponding improvement in Bing is much smaller (typically less than 10 %). Finally, though Google Translate still remains the system with the highest scores, its actual performance over time seems to be deteriorating as measured by the metrics. Therefore, as PRESEMT is maturing, its performance can be seen to be improving substantially in terms of translation accuracy. On the other hand, PRESEMT expectedly sacrifices the top-end of translation quality due to a number of design factors including (1) the easy portability to new language pairs and (2) the use of publicly-available linguistic tools, without enhancements and adaptation to the specific MT methodology.

One final point concerns the margin over which Google Translate and Bing Translator exceed the PRESEMT performance. This is quite sizeable, as evidenced by Table 6. However, it is widely accepted that automatic metrics such as BLEU and NIST tend to favour statistical MT approaches. Thus, it is plausible that the performance advantages of Google and Bing are not as sizeable as suggested by Table 6. The human evaluation did confirm the higher performance of Bing/Google over the PRESEMT version which was current in December 2012, and thus it would be useful to perform a new human evaluation comparing the more recent versions of the MT systems.

As a further indication of the proposed methodology characteristics, results of building an MT system for the Greek-to-German language pair are summarised in Table 7, for lemmatised output. For comparison purposes, the metric values obtained for Bing Translator and Google Translate are also included in this table. PRESEMT is clearly outperformed by both Bing and Google. The reason for this is that the Greek-to-German system has been much less extensively developed, which has resulted in substantial improvements in the metric scores being achieved with a limited number of actions. Several areas still exist for improvements in the MT translation. For instance, the bilingual dictionary adopted for reasons of availability is quite fragmentary and has been developed by scanning a printed version, without any editing of entries by a native speaker. In addition, even chunking for the TL-side (German) corpora has been performed using the TreeTagger package for German, which however has not been as accurate as the English-language version of TreeTagger, and which does not currently generate adverbial or adjectival chunks (ADVC and ADJC respectively). This indicates one of the weaknesses of the proposed method, namely that by adopting resources and tools from third parties there is a risk that these are not fully compatible or have inconsistencies which can affect the translation accuracy. At present, efforts are continuing towards improving this language pair. In addition, due to the highly-inflectional nature of German, and its more complex syntax, further work is needed to bring the performance to levels comparable to those of Greek-to-English. A final note that should be made is that for this language pair, the CRF-based variant is more effective than PMG-simple. However, as more work is needed for this specific language pair, these results need to be revisited for more reliable conclusions to be drawn. These developments are to be reported in future publications.
Table 7

Evaluation results for PRESEMT and other MT systems for Greek-to-German

































WorldLingo translations are not available for 2014, since they could not be obtained via the corresponding website (

7.3 Comparison to Other MT Systems

Leaving aside the numerical results it is worth looking at individual cases, to gain an insight on the behaviour of the different MT systems. In the following examples we take a look at the translation output provided by PRESEMT, Bing and Google at the two evaluation periods, which show that PRESEMT, although still outperformed, exhibits a consistent performance against that of the other systems in certain cases. The following examples are not intended to degrade the good translation quality achieved by either Bing or Google but they are rather an attempt to highlight the fact that the mere alignment of SL-TL segments without any information about the syntactic structure fails to handle grammatical phenomena successfully. In each example the SL sentence is initially provided together with its translation in English (placed in brackets). Then, the translations of the 3 systems are listed in tabular form. Missing tokens in the translation are indicated by the symbol ‘’.

Examples 1 and 2 illustrate that Bing or Google sometimes omit the translations of some source words. In Example 1, this concerns the adverb ‘mainly’, while in Example 2, it is the possessive clitic ‘her’ that is missing. Furthermore, Example 3 shows that the handling of gender is not always successful.

Of course, PRESEMT does not produce perfect translations (cf. the placement of the possessive clitic in the second example or the choice of the adjective ‘mysterious’ for translating ‘σκοτεινό’ in the third example). The argument placed here is that since PRESEMT is aware of grammatical features, then it is expected to have a consistent behaviour when translating.

Example 1

SL sentence: Πρώτον, αυτή είναι μια κυρίως πολιτική πρόκληση. [= Firstly, this is amainlypolitical challenge]




First, she are an especially civil challenge

First, this is an especially political challenge


Firstly, this is a political challenge

Firstly, this is primarily a political challenge


First, this is primarily a political challenge

First, this is primarily a political challenge

Example 2

SL sentence: Ο πατέρας της προσπαθεί μάταια να τη μεταπείσει. [= Her father tries in vain to dissuade her]




Her father tries vain to her coax

Her father tries in vain to her dissuade


Her father tries in vain to dissuade

Her father tries in vain to dissuade


Her father tries in vain to convince the ∅

Her father tries in vain to persuade her

Example 3

SL sentence: Έχω ζήσει μ' αυτήν σ' ένα σκοτεινό, κρύο και υγρό δωμάτιο. [= I have lived withherin a dark, cold and damp room]




Have lived in she in mysterious, wet and room cold

Have lived to her in a dark, cold and wet room


I have experienced it in a dark, cold and wet room.

I've lived with it in a dark, cold and wet room.


I've lived with it in a dark, cold and damp room.

I've lived with it in a dark, cold and wet room.

Other grammatical features such as subject-verb agreement (example 4) or verb (non-)finiteness (example 5) can also be mishandled by Bing and Google.

Example 4

SL sentence: Αν διαβάσουμε ιστορία θα καταλάβουμε γιατί δεν έχει γίνει Εθνικό Κτηματολόγιο. [=Ifweread historywewill understand why there has been no National Cadastre.]




If the book read history because not will understand has become the ethnic register land

If we read history we will understand because has not been national land register


If you read history you will understand why there has been no National Register

If you read history you will understand why he has not become a national cadastre


If we read history will understand why there has been no National

If you read history you will understand why he has become National Cadastre

Example 5

SL sentence: Την εκπαίδευση έχουν αναλάβει οι επιχειρήσεις. [= Companieshave undertakenthe training]




Operations have undertaken education

The teaching have undertaken the companies


Education responsible businesses

The training undertaken by businesses


The training undertaken by companies

The training undertaken by businesses

8 Future Extensions and Potential Improvements on PRESEMT

Within the work reviewed in the present chapter, a number of potential directions for further improving the PRESEMT system have been identified. These have been based on the experiments performed and the corresponding observations, as summarised above.

An obvious avenue for improvements concerns revising the two translation phases. Experimentation with the PRESEMT prototype has indicated scope for improving in particular the structure selection algorithm. More elaborate metrics and methods for measuring the matching of sentence structures may be introduced. In addition, a more elaborate process for combining sub-sentential parts from different clauses can be employed to define the structure of the entire sentence.

Regarding the Translation equivalent selection step, improvements in the target language model may also be achieved. To that end, the indexing scheme employed in PRESEMT for phrases will be expanded to include—apart from frequency of occurrence—information regarding the context in which the phrase appears. Such information will support the search for a more appropriate match, conforming to the environment of each phrase.

In a similar vein, it is possible to augment the language model, to include a combination of models. Currently, the PRESEMT TL model relies on phrases indexed on the basis of head lemma and phrase type information. Experiments have shown that errors in the resulting translations can in some cases be corrected by resorting to simple n-gram information. Even a sequential application of the n-gram information for correction purposes leads to improvements in the translation accuracy. The issue then becomes to optimally combine the two language models by applying them concurrently, so as to achieve the best possible translation performance. This would allow the more appropriate treatment of cases where less than accurate matches in terms of sentence structure are achieved (for instance when very long sentences need to be translated). This approach can notably address errors at the phrasal boundaries (i.e. between the last tokens of a phrase and the first tokens of the following phrase).

All aforementioned improvements relate to the existing resources, and thus should not affect the portability of PRESEMT to new language pairs. One of the shortcomings of the system has been the establishment of an effective disambiguation module. At present, both the disambiguation and intra-phrase token sequencing tasks are established by a single search in the language model of indexed phrases. However, this arrangement potentially results in interference between the two tasks, as a single solution is chosen in one step. If a reliable disambiguation module can be established, which samples the same corpus as the indexed phrases, this may lead to more consistency in the translations, by decoupling the disambiguation between multiple translations and the token order within each phrase.

It should be noted that all disambiguation modules could be constructed with monolingual data (using solely TL-side corpora), or with bilingual data. In the latter case, the requirement for more corpora to develop a new language pair becomes evident. On the other hand, the benefits of combining SL and TL corpora can be much greater in terms of translation quality.

A further addition could be the introduction of linguistic knowledge in the translation process. For instance, PRESEMT is agnostic regarding the role of subject and object, leading to less than optimal translations. The introduction of such knowledge is expected to improve the structure selection performance, though in part this contradicts the requirement for minimal specialised linguistic tools. Nonetheless, it appears that, short of providing a much larger parallel corpus, this is the main way to a much more natural translation. This is one of the main issues being researched by the PRESEMT research group, in pursuit of a breakthrough in translation quality.

Closing Note

Please visit the project’s website ( to download and experiment with the PRESEMT package or just play around with the fully functional online system. Detailed technical documentation for even creating a new language pair is provided.



  1. Black, P.E. 2005. Dictionary of algorithms and data structures. U.S. National Institute of Standards and Technology (NIST)Google Scholar
  2. Brown, P.F., S.A. Della Pietra, V.J. Della Pietra, and R.L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2): 263–311.Google Scholar
  3. Carbonell, J., S. Steve Klein, D. Miller, M. Steinbaum, T. Grassiany, and J. Frey. 2006. Context-based machine translation. In Proceedings of the 7th AMTA Conference, 19–28. Cambridge, MA.Google Scholar
  4. Carl, M., M. Melero, T. Badia, V. Vandeghinste, P. Dirix, I. Schuurman, S. Markantonatou, S. Sofianopoulos, M. Vassiliou, and O. Yannoutsou. 2008. METIS-II: Low resources machine translation: background, implementation, results and potentials. Machine Translation 22(1–2): 67–99.CrossRefGoogle Scholar
  5. Collins, M., P. Koehn, and I. Kucerova. 2005. Clause re-structuring for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 43, 531.Google Scholar
  6. Costa-jussà, M. Ruiz, R. Banchs, R. Rapp, P. Lambert, K. Eberle, and B. Babych. 2013. Workshop on hybrid approaches to translation: overview and developments. In Proceedings of the 2nd HYTRA Workshop, held within ACL-2013, 1–6. Sofia.Google Scholar
  7. Denkowski, M., and A. Lavie. 2011. Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In: Proceedings of the EMNLP 2011 Workshop on Statistical Machine Translation, 85–91. Edinburgh.Google Scholar
  8. Dologlou, I., S. Markantonatou, G. Tambouratzis, O. Yannoutsou, A. Fourla, and N. Ioannou. 2003. Using monolingual corpora for statistical machine translation: the METIS system. In Proceedings of the EAMT-CLAW 2003 Workshop. 61–68. Dublin.Google Scholar
  9. Duda, R.O., P.E. Hart, and D.G. Stork. 2001. Pattern classification, 2nd edn. New York: Wiley.Google Scholar
  10. Eisele, A., C. Federmann, H. Uszkoreit, H. Saint-Amand, M. Kay, M. Jellinghaus, S. Hunsicker, T. Herrmann, and Y. Chen. 2008. Hybrid machine translation architectures within and beyond the EuroMatrix project. In European Machine Translation Conference. Hamburg.Google Scholar
  11. Gale, D., and L.S. Shapley. 1962. College admissions and the stability of marriage. American Mathematical Monthly 69: 9–14.MathSciNetCrossRefMATHGoogle Scholar
  12. Gough, N., and A. Way. 2004. Robust large-scale EBMT with marker-based segmentation. In Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation (TMI-04), 95–104. Baltimore, MD.Google Scholar
  13. Hutchins, J. 2005. Example-based machine translation: A review and commentary. Machine Translation 19: 197–211.CrossRefGoogle Scholar
  14. Klementiev, A., A. Irvine, C. Callison-Burch, and D. Yarowsky. 2012. Toward statistical machine translation without parallel corpora. In Proceedings of EACL 2012, 130–140. AvignonGoogle Scholar
  15. Koehn, P. 2010. Statistical machine translation. Cambridge: Cambridge University Press.MATHGoogle Scholar
  16. Kuhn, H.W. 1955. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly 2: 83–97.MathSciNetCrossRefMATHGoogle Scholar
  17. Lafferty, J., A. McCallum, and F.C.N. Pereira. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML ‘01), 282–289. San Francisco: Morgan Kaufmann.Google Scholar
  18. Levenshtein, V.I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10: 707–710.MathSciNetMATHGoogle Scholar
  19. Mairson, H. 1992. The stable marriage problem. The Brandeis Review 12: 1. Available at: Scholar
  20. Markantonatou, S., S. Sofianopoulos, O. Giannoutsou, and M. Vassiliou 2009. Hybrid machine translation for low- and middle- density languages. In Language engineering for lesser-studied languages, eds. S. Nirenburg, 243–274. Amsterdam: IOS Press.Google Scholar
  21. Munkres, J. 1957. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics 5: 32–38.MathSciNetCrossRefMATHGoogle Scholar
  22. NIST. 2002. Automatic Evaluation of Machine Translation Quality Using n-gram Co-occurrences Statistics (Report). Available at:
  23. Papineni, K., S. Roukos, T. Ward, and W.J. Zhu 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311–318. Philadelphia.Google Scholar
  24. Popovic, M., and H. Ney 2006. POS-based word reorderings for statistical machine translation. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC2006), 1278–1283. Genoa.Google Scholar
  25. Prokopidis, P., B. Georgantopoulos, and H. Papageorgiou 2011. A suite of NLP tools for Greek. In Proceedings of the 10th ICGL Conference, 373–383. Komotini.Google Scholar
  26. Quirk, C., and A. Menezes. 2006. Dependency Treelet translation: The convergence of statistical and example-based machine translation? Machine Translation 20: 43–65.CrossRefGoogle Scholar
  27. Rottmann, K., and S. Vogel. 2007. Word reordering in statistical machine translation with a POS-based distortion model. In Proceedings of the 11th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI 2007), 171–180. Skövde.Google Scholar
  28. Schmid, H. 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing, 44–49. Manchester.Google Scholar
  29. Smith, T.F., and M.S. Waterman. 1981. Identification of common molecular subsequences. Journal of Molecular Biology 147: 195–197.CrossRefGoogle Scholar
  30. Snover, M., B. Dorr, R. Schwartz, L. Micciulla, and J. Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th AMTA Conference, 223–231. Cambridge, MA.Google Scholar
  31. Sofianopoulos, S., M. Vassiliou, and G. Tambouratzis. 2012. Implementing a language-independent MT methodology. In Proceedings of the 1st Workshop on Multilingual Modeling (held within ACL-2012), 1–10. JejuGoogle Scholar
  32. Su, J., H. Wu, H. Wang, Y. Chen, X. Shi, H. Dong, and Q. Liu. 2012. Translation model adaptation for statistical machine translation with monolingual topic information. In Proceedings of the ACL2012, 459–468. Jeju.Google Scholar
  33. Tambouratzis, G., F. Simistira, S. Sofianopoulos, N. Tsimboukakis, and M. Vassiliou. 2011. A resource-light phrase scheme for language-portable MT. In Proceedings of the 15th International Conference of the European Association For Machine Translation, eds. M. L. Forcada, H. Depraetere, and V. Vandeghinste, 185–192. Leuven.Google Scholar
  34. Tambouratzis, G., M. Troullinos, S. Sofianopoulos, and M. Vassiliou. 2012. Accurate phrase alignment in a bilingual corpus for EBMT systems. In Proceedings of the 5th BUCC Workshop, held within the LREC2012 Conference, 104–111. Istanbul.Google Scholar
  35. Tambouratzis, G., S. Sofianopoulos, and M. Vassiliou. 2013. Language-independent hybrid MT with PRESEMT. In Proceedings of HYTRA-2013 Workshop, held within the ACL-2013 Conference, 123–130. Sofia (ISBN 978-1-937284-53-4).Google Scholar
  36. Tambouratzis, G. 2014. Comparing CRF and template-matching in phrasing tasks within a Hybrid MT system. In Proceedings of the 3rd Workshop on Hybrid Approaches to Translation (held within the EACL-2014 Conference), 7–14. Gothenburg.Google Scholar
  37. Wu, D. 2005. MT model space: statistical versus compositional versus example-based machine translation. Machine Translation 19: 213–227.CrossRefGoogle Scholar
  38. Wu, D. 2009. Toward machine translation with statistics and syntax and semantics. In Proceedings of the IEEE Workshop On Automatic Speech Recognition & Understanding, 12–21. Merano.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • George Tambouratzis
    • 1
  • Marina Vassiliou
    • 1
  • Sokratis Sofianopoulos
    • 1
  1. 1.ILSP, Athens R.C.MarousiGreece

Personalised recommendations