Translation Memories (TM) play an important role in the translation workflow of many professional translators. The underlying idea of TMs is that a translator should benefit as much as possible from previous translations. To achieve this, whenever a segment is to be translated, identical or similar segments are first sought in a database of previous translations. If a match is found, the translator is asked to validate or post-edit the retrieved translation. The use of TMs, in general, leads to an increase in translators’ productivity and more consistent translations. A recent survey about translators’ needs regarding electronic tools revealed that 90% of them are familiar with TMs and that over 75% of them are using TMs to translate documents (Zaretskaya et al. 2018).

Surprisingly most existing TM tools hardly rely at all on natural language processing (NLP) to help translators. Companies who develop TM tools usually focus their efforts on improving the user experience by allowing the processing of a variety of document formats and on developing intuitive user interfaces. Unfortunately, they pay little attention to how they could enhance these tools with methods from NLP. Most TM tools still rely on simple forms of edit distance for matching a segment to be translated to segments already in the database. This approach has the advantage of being computationally efficient. Unfortunately, it fails when required to identify two segments which might have the same meaning despite using different words. A semantically enhanced edit-distance method that can identify such situations with the help of a paraphrase database was proposed by Gupta et al. (2016), but not integrated in a professional translation tool.

This double special issue on Natural Language Processing for TM is partially a result of two workshops organised in 2015Footnote 1 and 2016Footnote 2 dedicated to this topic. It also reflects the increasing interest of NLP researchers to develop methods that are directly applicable to TMs.

The core idea of TMs is comparing segments to be translated with segments of previous translations. As most of the existing implementations hardly use any NLP for this purpose, we expected to receive a number of submissions attempting to enhance the existing matching and retrieval methods with methods from language processing. Interestingly enough, the topic addressed by most of the papers submitted is that of cleaning of TMs and reflects the need of resources in the translation process. These papers are a direct result of the the 1st Automatic Translation Memories Cleaning Shared TaskFootnote 3 which was organised in conjunction with the 2nd Workshop on Natural Language Processing for Translation Memories (NLP4TM 2016). The first part of the special issue presented an overview of the shared task and lessons learnt (Barbu et al. 2016) together with one of the participating systems (Wolff 2016) which combined existing NLP components to assess the quality of TMs.

The first paper in this issue, Automatic translation memory cleaning by Negri at al., proposes supervised and unsupervised machine-learning approaches for cleaning of TMs. The supervised method obtains very good results on the data set released by the 1st Automatic Translation Memories Cleaning Shared Task, whilst evaluation of the unsupervised method shows its feasibility in cases where there is no training data available.

The paper Improving retrieval performance of translation memories using morphosyntactic analyses and generalized suffix arrays by Weitz addresses the problem of improving the retrieval from TMs by analysing the (morpho)syntactic structure of segments and identifying the longest common substring between two segments by means of generalised suffix arrays. Evaluation on German–English datasets reveals an increase in the precision of the retrieval, but a small reduction in recall. However, this reduction can always be addressed by requesting more segments from the database. The proposed method is very fast which means it can be used in real translation scenarios without any negative impact on the productivity of translators.

The final paper, A system for terminology extraction and translation equivalent detection in real time: Efficient use of statistical machine translation phrase tables by Oliver, presents a system for automatic terminology extraction and automatic detection of the equivalent terms in target language. The method relies on the output of Moses, a well-known statistical machine translation tool, and was evaluated using OmegaT, an open-source translation memory tool.

This double special issue would not have been possible without the help of the members of the guest editorial board. During the reviewing process they provided very valuable feedback to authors and helped us to select the most appropriate papers for the issue. The board members listed in alphabetical order are: Wilker Aziz, Bogdan Babych, Eduard Barbu, Maud Ehrmann, Kevin Flanagan, Mikel Forcada, Gabriela Gonzalez, Meritxell Gonzalez, Rohit Gupta, Maxim Khalilov, László Laki, Qun Liu, Yanjun Ma, Matteo Negri, Carla Parra Escartín, Michael Paul, Uwe Reinke, Germán Sanchis-Trilles, Christophe Servan, Liling Tan, Marco Turchi, Mihaela Vela, Friedel Wolff, Marcos Zampieri and Jian Zhang.

Last but not least, we would like to warmly thank all the authors who submitted papers to our special issue. Without them this special issue would have not been possible.