Extensions to the PRESEMT Methodology

  • George TambouratzisEmail author
  • Marina Vassiliou
  • Sokratis Sofianopoulos
Part of the SpringerBriefs in Statistics book series (BRIEFSSTATIST)


This chapter describes a number of improvements performed on the basic PRESEMT system. These improvements are aimed at specific modules of the system in an effort to achieve gains in the translation accuracy, for which alternative implementations have been suggested. These extensions concern different modules of the PRESEMT architecture. The first extension covers the pre-processing stage, where an improved phrasing model for the SL side is proposed. The second extension involves the use of supplementary language models (LM) in the TL, to improve the translation accuracy in terms of both the phrasal level but also the post-editing and token generation steps.


  1. Black PE (2005) Dictionary of algorithms and data structures. U.S. National Institute of Standards and Technology (NIST)Google Scholar
  2. Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2):263–311Google Scholar
  3. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley Interscience, New YorkGoogle Scholar
  4. Hwa R, Resnik P, Weinberg A, Cabezas C, Kolak O (2005) Bootstrapping parsers via syntactic projections across parallel texts. Nat Lang Eng 11:311–325CrossRefGoogle Scholar
  5. Koehn P, Hoang H (2007) Factored translation models. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp 868–876Google Scholar
  6. Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: Proceedings of ICML Conference, 28 June–1 July, Williamstown, USA, pp 282–289Google Scholar
  7. Och FJ (2003) Minimum error rate training for statistical machine translation. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL), Sapporo, Japan, July, pp 160–167Google Scholar
  8. Sha F, Pereira FCN (2003) Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL Conference, pp 213–220Google Scholar
  9. Smith DA, Eisner J (2009) Parser Adaptation and Projection with Quasi-Synchronous Grammar Features. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, vol 2, pp 822–831Google Scholar
  10. Sofianopoulos S, Tambouratzis G (2010) Multiobjective optimisation of real-valued parameters of a hybrid MT system using Genetic Algorithms. Pattern Recogn Lett 31(12):1672–1682CrossRefGoogle Scholar
  11. Stolcke A, Zheng J, Wang W, Abrash V (2011) SRILM at sixteen: update and outlook. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, December 2011Google Scholar
  12. Tambouratzis G (2014) Comparing CRF and template-matching in phrasing tasks within a Hybrid MT system. In: Proceedings of the 3rd Workshop on Hybrid Approaches to Translation (held within the EACL-2014 Conference), April 27, Gothenburg, Sweden, pp 7–14Google Scholar
  13. Tambouratzis G, Simistira F, Sofianopoulos S, Tsimboukakis N, Vassiliou M (2011) A resource-light phrase scheme for language-portable MT. In: Proceedings of the 15th International Conference of the European Association for Machine Translation, 30–31 May, Leuven, Belgium, pp 185–192Google Scholar
  14. Tambouratzis G, Sofianopoulos S, Vassiliou M (2014) Expanding the Language model in a low-resource hybrid MT system. In: Proceedings of SSST-8 Workshop, held within EMNLP-2014, 25 October 2014, Doha, Qatar, pp 57–66. ISBN 978-1-937284-96-1Google Scholar
  15. Tsuruoka Y, Tsujii J, Ananiadou S (2009) Fast full parsing by linear-chain conditional random fields. In: Proceedings of the 12th Conference of the European Chapter of the ACL, Athens, Greece, 30 March–3 April, pp 790–798Google Scholar
  16. Wallach HM (2004) Conditional random fields: an introduction. CIS Technical Report, MS-CIS-04-21. 24 February 2004, University of PennsylvaniaGoogle Scholar
  17. Yarowsky D, Ngai G (2001) Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. In: Proceedings of NAACL-2001 Conference, pp 200–207Google Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  • George Tambouratzis
    • 1
    Email author
  • Marina Vassiliou
    • 1
  • Sokratis Sofianopoulos
    • 1
  1. 1.Institute for Language and Speech ProcessingAthensGreece

Personalised recommendations