Advertisement

Linking Localisation and Language Resources

  • David LewisEmail author
  • Alexander O’Connor
  • Sebastien Molines
  • Leroy Finn
  • Dominic Jones
  • Stephen Curran
  • Séamus Lawless

Abstract

Industrial localisation is changing from the periodic translation of large bodies of content to a long-tail of small, heterogeneous translations processed in an agile and demand-driven manner. Software localisation and crowd-source translation already practice continuous fine-grained distribution of translation work. This requires close integration and round-trip interoperability between content creation and localisation processes, while at the same time recording the provenance of translated content to maximise it reuse in future translation tasks, and, increasingly, in training Statistical Machine Translation (SMT) engines. This work adopts a Linked Data approach to integrating the content translation round-trip process with the logging of process quality assurance provenance. This integration supports a pull-based interoperability model that supports continuous synchronising of content and process meta-data between the generating organisation and any number of language service providers or translators. We present a platform architecture for sharing, searching and interlinking of Linked Localisation and Language Data (termed L3Data) on the web. This is accomplished using a semantic schema for L3Data that is compatible with existing localisation data exchange standards and can be used to support the round-trip sharing of language resources. The paper describes our approach to development of L3Data schema and data management processes, web-based tools and data sharing infrastructure that use it. An initial proof of concept prototype is presented which implements a web application that segments and machine translates content for crowd-sourced post-editing and rating.

Keywords

Link Data Statistical Machine Translation Link Localisation Language Resource Triple Store 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allee V (2002) The Future of Knowledge: Increasing Prosperity through Value Networks. Butterworth-Heinemann Google Scholar
  2. Allemang D (2010) Semantic web and the linked data enterprise. In: Woods D (ed) Linking enterprise data, Springer, pp 3–23 CrossRefGoogle Scholar
  3. Bizer C, Heath T, Berners-Lee T (2009) Linked data - the story so far. International Journal on Semantic Web and Information Systems 5:1–22 Google Scholar
  4. Buitelaar P, Cimiano P, Haase P, Sintek M (2009) Towards linguistically grounded ontologies. In: Proceedings of the 6th European Semantic Web Conference (ESWC 2009), Heraklion, Greece, LNCS, vol 5554, pp 111–125 Google Scholar
  5. Cruz-Lara S, Gupta S, García J, Romary L (2005) Multilingual information framework for handling textual data in digital media. In: Proceedings of the 3rd International Conference on Active Media Technology (AMT 2005), Kagawa, Japan, pp 81–84 Google Scholar
  6. van Genabith J (2009) Next generation localisation. Localisation Focus: The International Journal of Localisation 8:4–10 Google Scholar
  7. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) MOSES: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007). Companion Volume Proceedings of the Demo and Poster Sessions, Prague, Czech Republic, pp 177–180 Google Scholar
  8. Lewis D, Curran S, Jones D, Moran J, Feeney K (2010) An open service framework for next generation localisation. In: LREC 2010 Workshop on Web Services and Processing Pipelines in HLT: Tool Evaluation, LR Production and Validation, Valetta, Malta, pp 52–59 Google Scholar
  9. Localisation Industry Standards Association (2005) TMX 1.4b Specification OSCAR Recommendation. http://www.lisa.org/fileadmin/standards/tmx1.4/tmx.htm, retrieved on 25 Feb 2010
  10. Localization Industry Standards Association (2008) Systems to manage terminology, knowledge, and content – TermBase eXchange (TBX). http://www.lisa.org/TBX-Specification.33.0.html, retrieved on 25 Feb 2010
  11. Marcus A (2006) A demand-based view of support: From the funnel to the cloud. Tech. rep., Service Innovation Consortium, San Carlos, CA, retrieved 18/8/11 Google Scholar
  12. Moreau L, Freire J, Futrelle J, McGrath R, Myers J, Paulson P (2008) The open provenance model: An overview. In: Freire J, Koop D, Moreau L (eds) Provenance and Annotation of Data and Processes, LNCS, vol 5272, Springer Berlin / Heidelberg, pp 323–326 CrossRefGoogle Scholar
  13. Windhouwer M, Wright SE (this vol.) Linking to linguistic data categories in ISOcat. pp 99–107 Google Scholar
  14. XLIFF, OASIS (2007) Xliff 1.2. a white paper on version 1.2 of the xml localisation interchange file format (xliff). http://xml.coverpages.org/XLIFF-Core-WhitePaper200710-CSv12.pdf, revision: 1.0, 17 Oct, retrieved on 25 Feb 2010
  15. Zydroń A (2011) Reference model for open architecture for XML authoring and localization 1.0 OASIS committee specification. http://www.oasis-open.org/committees/oaxal/, retrieved 18/8/11

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • David Lewis
    • 1
    Email author
  • Alexander O’Connor
  • Sebastien Molines
  • Leroy Finn
  • Dominic Jones
  • Stephen Curran
  • Séamus Lawless
  1. 1.Centre for Next Generation Localisation, Knowledge and Data Engineering GroupTrinity CollegeDublinIreland

Personalised recommendations