Recycling Annotated Parallel Corpora for Bilingual Document Composition
Parallel corpora enriched with descriptive annotations facilitate multilingual authoring development. Departing from an annotated bitext we show how SGML markup can be recycled to produce complementary language resources. On the one hand, several translation memory databases together with glossaries of proper nouns have been produced. On the other, DTDs for source and target documents have been derived and put into correspondence. This paper discusses how these resources have been automatically generated and applied to an interactive bilingual authoring system. This tool is capable of handling a substantial proportion of text both in the composition and translation of structured documents.
Unable to display preview. Download preview PDF.
- [Adphson, 1998]E. Adolphson Writing instruction and controlled language applications: panel discussion on standarization. Proceedings of GLAW’98, 191, 1998.Google Scholar
- [Ahonen, 1995]H. Ahonen. Automatic Generation of SGML Content Models. Electronic Publishing, 8(2-3):195–206, 1995.Google Scholar
- [Allen, 1999]J. Allen. Adapting the Concept of Translation Memory to Authoring Memory for a Controlled Language Writing Enviroment. ASLIB-TG21, 1999.Google Scholar
- [Brown, 1999]R. D. Brown. Adding Linguistic Knowledge to a Lexical Example-Based Translation System. Proceedings of the Eighth International Conference on Theoretical and Methodological Issues in Machine Translation, 22–32, 1999.Google Scholar
- [Casillas, 1999]A. Casillas, J. Abaitua, R. Martinez. Extraction y aprovechamiento de DTDs emparejadas en corpus paralelos. Procesamiento del Lenguaje Natural, 25:33–41, 1999.Google Scholar
- [ISO8879, 1986]ISO 8879, Information Processing-Text and Office Systems-Standard Generalized Markup Language (SGML). International Organization For Standards, 1986, Geneva.Google Scholar
- [Lange, 1997]
- [Martinez, 1997]R. Martínez, J. Abaitua, A. Casillas. Bilingual parallel text segmentation and tagging for specialized documentation. Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP’97), 369–372, 1997.Google Scholar
- [Martinez, 1998a]R. Martínez, J. Abaitua, A. Casillas. Bitext Correspondences through Rich Mark-up. 36th Annual Meeting of the Association for Computational Linguistics abd 11 International Conference on Computational Linguistics (COLING-ACL’98), 812–818, 1998.Google Scholar
- [Martinez, 1998b]R. Martínez, J. Abaitua, A. Casillas. Aligning tagged bitexts. Sixth Workshop on Very Large Corpora, 102–109, 1998.Google Scholar
- [Sperberg.McQueen, 1994]C. Sperberg-McQueen, L. Burnard. Guidelines for the Encoding and Interchange (P3). Text Encoding Initiative, 1994.Google Scholar