Abstract
This paper reports on a method for exploiting a bitext as the primary linguistic information source for the design of a generation environment for specialized bilingual documentation. The paper discusses such issues as Text Encoding Initiative (TEI), proposals for specialized corpus tagging, text segmentation and alignment of translation units and their allocation into translation memories, Document Type Definition (DTD), abstraction from tagged texts, and DTD deployment for bilingual text generation. The parallel corpus used for experimentation has two main features:
Similar content being viewed by others
References
Adamson G., Boreham J. (1974) The Use of an Association Measure Based on Character Structure to Identify Semantically Related Pairs of Words and Document Titles. Infor-mation Storage and Retrieval, 10, pp.253-260.
Adolphson E. (1998) Writing Instruction and Controlled Language Applications:Panel Discussion on Standardization. Proceedings of Controlled Language Applications Work-shop, CLAW '98, p.191.
Aduriz I., Aldezabal I., Artola X., Ezeiza N., Urizar R. (1996) MultiWord Lexical Units in EUSLEM, a Lemmatiser-Tagger for Basque. Papers in Computational Lexicography COMPLEX '96, pp.1-8.
Ahonen H. (1995) Automatic Generation of SGML Content Models. Electronic Publishing, 8(2-3), pp.195-206.
Baeza-Yates R., Navarro G. (1996) A Faster Algorithm for Approximate String Matching. Proceedings of Combinatorial Pattern Matching, CPM '96, pp.1-23.
Brown P., Lai J.C., Mercer R. (1991) Aligning Sentences in Parallel Corpora. Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, pp.169-176.
Burnard L., Sperberg-McQueen C.M. (1995) TEI Lite: An Introduction to Text Encoding for Interchange.[http://www-tei.uic.edu in the le orgs/tei/intros/teiu5.tei].
Casillas A., Abaitua J., Martínez R. (1999) Extracción y aprovechamiento de DTDs empa-rejadas en corpus paralelos. Procesamiento del Lenguaje Natural, 25, pp.33-41.
Casillas A., Abaitua J., Martínez R. (2000a) Advantages and Difficulties with TEI Tagging: Experiences from Aided Document Composition and Translation Tool. Extreme Markup Languages, pp.30-35.
Casillas A., Abaitua J., Martínez R. (2000b) Recycling Annotated Parallel Corpora for Bilingual Document Composition. Association for Machine Translation in the Americas, AMTA 2000. Springer-Verlag, pp.117-126.
Dice L.R. (1945) Measures of the Amount of Ecologic Association Between Species. Ecology, 26, pp.297-302.
Gale W., Church K.W. (1991) A Program for Aligning Sentences in Bilingual Corpora. Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, pp.177-184.
Harris B. (1988) Bi-Text, a New Concept in Translation Theory. Language Monthly.
Ide N., Veronis J. (1995) The Text Encoding Initiative: Background and Contexts.Kluwer Academic Publishers, Dordrecht.
Kay M. (1997) The Proper Place of Men and Machines in Language Translation. Machine Translation, 12, pp.3-23.
Kay M., Roscheisen M. (1993) Text-Translation Alignment. Computational Linguistics, 19(1), pp.121-142.
Langé J., Gaussier É., Daile B. (1997) Bricks and Skeletons: Some Ideas for the Near Future of MATH. Machine Translation, 12, pp.39-51.
Martínez R., Abaitua J., Casillas A. (1997a) Bilingual Parallel Text Segmentation and Tagging for Specialized Documentation. Proceedings of the International Conference Recent Ad-vances in Natural Language Processing RANLP '97, pp.369-372.
Martínez R., Abaitua J., Casillas A. (1997b) Bitext Correspondences through Rich Mark-Up. Proceedings of the 17th International Conference on Computational Linguistics (COL-ING '98) and 36th Annual Meeting of the Association for Computational Linguistics (ACL '98), pp.812-818.
Martínez R., Abaitua J., Casillas A. (1998) Aligning Tagged Bitext. Proceedings of the Sixth Workshop on Very Large Corpora, pp.102-109.
Melamed I.D. (1996) A Geometric Approach to Mapping Bitext Correspondence. First Conference on Empirical Methods in Natural Language Processing (EMNLP '96).
Melamed I.D. (1997) A Portable Algorithm for Mapping Bitext Correspondence. 35th Con-ference of the Association for Computational Linguistics (ACL '97), pp.305-312.
Melby A. (1987) On Human-machine Interaction in Translation. Machine Translation, pp.145-154.
Melby A. (1995) The Possibility of Language. A Discussion of the Nature of Language with Implications for Human and Machine Translation. John Benjamins.
MtSeg. (1997) Multext-Document MSG 1. MtSeg/Overview. [http://www.lpl.univ-aix.fr/ projects/multext/MUL7.html].
MUC-6. (1995) Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufman.
Rahtz S. (2000) XSL Stlylesheets for TEI XML. [http://users.ox.ac.uk/rahtz/tei].
Ravin Y., Wacholder N. (1997) Extracting Names From Natural-Language Text. Research Report RC 20338(92147) Declassified. IBM Research Division.
Romary L., Bonhomme P. (2000) Parallel Alignment of Structured Documents. In Veronis J. (ed.), Parallel Text Processing. Kluwer Academic Publishers, Dordrecht.
Shafer K. (1995) Automatic DTD creation via the GB-Engine and Fred. [http://www.oclc.org/ fred/docs/papers].
Simard M., Foster G.F., Isabelle P. (1992) Using Cognates to Align Sentences in Bilingual Corpora. Proceedings of the Fourth International Conference on Theoretical and Method-ological Issues in Machine Translation, TMI-92, pp.67-81.
Smadja F., McKeown K., Hatzivassiloglou V. (1996) Translating Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics, 22(1), pp.1-38
Sperberg-McQueen C.M., Burnard L. (1994) Guidelines for Electronic Text Encoding and Interchange (Text Encoding Initiative P3). Text Encoding Initiative.
Sperberg-McQueen C.M., Burnard L. (1995) The Design of the TEI Encoding Scheme. Computers and Humanities, 29(1).
Wakao T., Gaizauskas R., Wilks Y. (1996) Evaluation of an Algorithm for the Recognition and Classi cation of Proper Names. Proceedings of the 16th International Conference on Computational Linguistics (COLING '96), pp.418-423.
Wolinski F., Vichot F., Dillet B. (1995) Automatic Processing of Proper Names in Texts. The Computation and Language E-Print Archive. [http://xxx.lanl.gov/list/cmp-lg/9504001].
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Casillas, A., Martínez, R. Bitext Generation Through Rich Markup. Computers and the Humanities 38, 223–251 (2004). https://doi.org/10.1007/s10579-004-0233-2
Issue Date:
DOI: https://doi.org/10.1007/s10579-004-0233-2