Computers and the Humanities

, Volume 38, Issue 3, pp 223–251 | Cite as

Bitext Generation Through Rich Markup

  • Arantza Casillas
  • Raquel Martínez


This paper reports on a method for exploiting a bitext as the primary linguistic information source for the design of a generation environment for specialized bilingual documentation. The paper discusses such issues as Text Encoding Initiative (TEI), proposals for specialized corpus tagging, text segmentation and alignment of translation units and their allocation into translation memories, Document Type Definition (DTD), abstraction from tagged texts, and DTD deployment for bilingual text generation. The parallel corpus used for experimentation has two main features:

alignment bilingual document generation bitext parallel corpus segmentation SGML TEI translation memories 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Adamson G., Boreham J. (1974) The Use of an Association Measure Based on Character Structure to Identify Semantically Related Pairs of Words and Document Titles. Infor-mation Storage and Retrieval, 10, pp.253-260.Google Scholar
  2. Adolphson E. (1998) Writing Instruction and Controlled Language Applications:Panel Discussion on Standardization. Proceedings of Controlled Language Applications Work-shop, CLAW '98, p.191.Google Scholar
  3. Aduriz I., Aldezabal I., Artola X., Ezeiza N., Urizar R. (1996) MultiWord Lexical Units in EUSLEM, a Lemmatiser-Tagger for Basque. Papers in Computational Lexicography COMPLEX '96, pp.1-8.Google Scholar
  4. Ahonen H. (1995) Automatic Generation of SGML Content Models. Electronic Publishing, 8(2-3), pp.195-206.Google Scholar
  5. Baeza-Yates R., Navarro G. (1996) A Faster Algorithm for Approximate String Matching. Proceedings of Combinatorial Pattern Matching, CPM '96, pp.1-23.Google Scholar
  6. Brown P., Lai J.C., Mercer R. (1991) Aligning Sentences in Parallel Corpora. Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, pp.169-176.Google Scholar
  7. Burnard L., Sperberg-McQueen C.M. (1995) TEI Lite: An Introduction to Text Encoding for Interchange.[ in the le orgs/tei/intros/teiu5.tei].Google Scholar
  8. Casillas A., Abaitua J., Martínez R. (1999) Extracción y aprovechamiento de DTDs empa-rejadas en corpus paralelos. Procesamiento del Lenguaje Natural, 25, pp.33-41.Google Scholar
  9. Casillas A., Abaitua J., Martínez R. (2000a) Advantages and Difficulties with TEI Tagging: Experiences from Aided Document Composition and Translation Tool. Extreme Markup Languages, pp.30-35.Google Scholar
  10. Casillas A., Abaitua J., Martínez R. (2000b) Recycling Annotated Parallel Corpora for Bilingual Document Composition. Association for Machine Translation in the Americas, AMTA 2000. Springer-Verlag, pp.117-126.Google Scholar
  11. Dice L.R. (1945) Measures of the Amount of Ecologic Association Between Species. Ecology, 26, pp.297-302.Google Scholar
  12. Gale W., Church K.W. (1991) A Program for Aligning Sentences in Bilingual Corpora. Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, pp.177-184.Google Scholar
  13. Harris B. (1988) Bi-Text, a New Concept in Translation Theory. Language Monthly.Google Scholar
  14. Ide N., Veronis J. (1995) The Text Encoding Initiative: Background and Contexts.Kluwer Academic Publishers, Dordrecht.Google Scholar
  15. Kay M. (1997) The Proper Place of Men and Machines in Language Translation. Machine Translation, 12, pp.3-23.Google Scholar
  16. Kay M., Roscheisen M. (1993) Text-Translation Alignment. Computational Linguistics, 19(1), pp.121-142.Google Scholar
  17. Langé J., Gaussier É., Daile B. (1997) Bricks and Skeletons: Some Ideas for the Near Future of MATH. Machine Translation, 12, pp.39-51.Google Scholar
  18. Martínez R., Abaitua J., Casillas A. (1997a) Bilingual Parallel Text Segmentation and Tagging for Specialized Documentation. Proceedings of the International Conference Recent Ad-vances in Natural Language Processing RANLP '97, pp.369-372.Google Scholar
  19. Martínez R., Abaitua J., Casillas A. (1997b) Bitext Correspondences through Rich Mark-Up. Proceedings of the 17th International Conference on Computational Linguistics (COL-ING '98) and 36th Annual Meeting of the Association for Computational Linguistics (ACL '98), pp.812-818.Google Scholar
  20. Martínez R., Abaitua J., Casillas A. (1998) Aligning Tagged Bitext. Proceedings of the Sixth Workshop on Very Large Corpora, pp.102-109.Google Scholar
  21. Melamed I.D. (1996) A Geometric Approach to Mapping Bitext Correspondence. First Conference on Empirical Methods in Natural Language Processing (EMNLP '96).Google Scholar
  22. Melamed I.D. (1997) A Portable Algorithm for Mapping Bitext Correspondence. 35th Con-ference of the Association for Computational Linguistics (ACL '97), pp.305-312.Google Scholar
  23. Melby A. (1987) On Human-machine Interaction in Translation. Machine Translation, pp.145-154.Google Scholar
  24. Melby A. (1995) The Possibility of Language. A Discussion of the Nature of Language with Implications for Human and Machine Translation. John Benjamins.Google Scholar
  25. MtSeg. (1997) Multext-Document MSG 1. MtSeg/Overview. [ projects/multext/MUL7.html].Google Scholar
  26. MUC-6. (1995) Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufman.Google Scholar
  27. Rahtz S. (2000) XSL Stlylesheets for TEI XML. [].Google Scholar
  28. Ravin Y., Wacholder N. (1997) Extracting Names From Natural-Language Text. Research Report RC 20338(92147) Declassified. IBM Research Division.Google Scholar
  29. Romary L., Bonhomme P. (2000) Parallel Alignment of Structured Documents. In Veronis J. (ed.), Parallel Text Processing. Kluwer Academic Publishers, Dordrecht.Google Scholar
  30. Shafer K. (1995) Automatic DTD creation via the GB-Engine and Fred. [ fred/docs/papers].Google Scholar
  31. Simard M., Foster G.F., Isabelle P. (1992) Using Cognates to Align Sentences in Bilingual Corpora. Proceedings of the Fourth International Conference on Theoretical and Method-ological Issues in Machine Translation, TMI-92, pp.67-81.Google Scholar
  32. Smadja F., McKeown K., Hatzivassiloglou V. (1996) Translating Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics, 22(1), pp.1-38Google Scholar
  33. Sperberg-McQueen C.M., Burnard L. (1994) Guidelines for Electronic Text Encoding and Interchange (Text Encoding Initiative P3). Text Encoding Initiative.Google Scholar
  34. Sperberg-McQueen C.M., Burnard L. (1995) The Design of the TEI Encoding Scheme. Computers and Humanities, 29(1).Google Scholar
  35. Wakao T., Gaizauskas R., Wilks Y. (1996) Evaluation of an Algorithm for the Recognition and Classi cation of Proper Names. Proceedings of the 16th International Conference on Computational Linguistics (COLING '96), pp.418-423.Google Scholar
  36. Wolinski F., Vichot F., Dillet B. (1995) Automatic Processing of Proper Names in Texts. The Computation and Language E-Print Archive. [].Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Arantza Casillas
    • 1
  • Raquel Martínez
    • 1
  1. 1.Departamento Electridad y Electrónica, Facultad de Ciencia y TecnologíaUPV-EHUSpain

Personalised recommendations