Abstract
The most promising approach to word alignment is to combine statistical methods with non-statistical information sources. Some of the proposed non-statistical sources, including bilingual dictionaries, POS-taggers and lemmatizers, rely on considerable linguistic knowledge, while other knowledge-lite sources such as cognate heuristics and word order heuristics can be implemented relatively easy. While knowledge-heavy sources might be expected to give better performance, knowledge-lite systems are easier to port to new language pairs and text types, and they can give sufficiently good results for many purposes, e.g. if the output is to be used by a human user for the creation of a complete word-aligned bitext. In this paper we describe the current status of the Linköping Word Aligner (LWA), which combines the use of statistical measures of co-occurrence with four knowledge-lite modules for (i)) word categorization, (ii) morphological variation, (iii) word order, and (iv) phrase recognition. We demonstrate the portability of the system (from English-Swedish texts to French-English texts) and present results for these two language-pairs. Finally, we will report observations from an error analysis of system output, and identify the major strengths and weaknesses of the system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahrenberg, L. and Merkel, M. (1996). On Translation Corpora and Translation Support Tools—A Project Report. In Aijmer, K., Altenberg, B. and Johansson, M. (Eds). Languages in Contrast. Papers from a Symposium on Text-based Cross-linguistic Studies (pp. 185 - 200 ). Lund: Lund University Press.
Ahrenberg, L., Andersson, M. and Merkel, M. (1998). A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts. Proceedings of 36th Annual Meeting of the Association for Computational Linguistics and 17` h International Conference on Computational Linguistics, Montréal, Canada, 10-14 August 1998, 29 - 35.
Brown, P. F., Cocke, J., Della Pietra, S., Della Pietra, V., Jelinek, F., Mercer, R. L. and Roossin, P. (1988). A Statistical Approach to Language Translation. Proceedings of the 12th International Conference on Computational Linguistics. Budapest, 71 - 76.
Brown, P. F., Cocke, J., Della Pietra, S., Della Pietra, V., Jelinek, F., Mercer, R. L. and Roossin, P. (1990). A Statistical Approach to Machine Translation. Computational Linguistics, 16 (2), 7985.
Fung, P. and Church, K. W. (1994). K-vec: A New Approach for Aligning Parallel Texts. Proceed-
ings from the 15th International Conference on Computational Linguistics,Kyoto, 1096-1102.
Hunt, J. W. and Szymanski, T. G. (1977). A Fast Algorithm for Computing Longest Common Subsequences. Communications of the ACM, 20(5), 350 - 353.
Jones, D. B. and Alexa, M. (1997). Towards automatically aligning German compounds with English word groups. In Jones, D. and Somers, H. (Eds.) New Methods in Language Processing (pp. 207 - 218 ), UCL Press, London.
Melamed, I. D. (1995). Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons, Third Workshop on Very Large Corpora (WVLC3), 30 June 1995, MIT, Cambridge MA, 184 - 198.
Melamed, I. D. (1997). A Word-to-Word Model of Translational Equivalence, Proceedings of the 35th Conference of the Association for Computational Linguistics (ACL’97), Madrid, Spain, 712 July 1997, 490 - 497.
Merkel, M., Nilsson, B. and Ahrenberg, L. (1994). A Phrase-Retrieval System Based on Recurrence. Proceedings of the Second Annual Workshop on Very Large Corpora (WVLC-2) Kyoto, 99-108.
Merkel, M. and Andersson, M. (2000). Knowledge-lite extraction of multi-word units with language filters and entropy thresholds. Proceedings of RIAO-2000, Collège de France, Paris, 1214 April 2000, Vol. 1, 737-746.
Simard, M., Foster, G. F. and Isabelle, P. (1992). Using Cognates to Align Sentences in Bilingual Corpora. Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, Montreal, 67 - 82.
Tiedemann, J. (1998) Extraction of Translation Equivalents from Parallel Corpora. Proceedings of the 11 `h Nordic Conference on Computational Linguistics (NODALIDA ’98), Copenhagen, Center for Sprogteknologi, 28-29 January 1998, 120 - 128.
Véronis, J. and Langlais, Ph. (this volume). Evaluation of parallel text alignment systems. In Véronis, J.. (Ed.) Parallel Text Processing,Dordercht: Kluwer Academic Publishers.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Ahrenberg, L., Andersson, M., Merkel, M. (2000). A knowledge-lite approach to word alignment. In: Véronis, J. (eds) Parallel Text Processing. Text, Speech and Language Technology, vol 13. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2535-4_5
Download citation
DOI: https://doi.org/10.1007/978-94-017-2535-4_5
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5555-2
Online ISBN: 978-94-017-2535-4
eBook Packages: Springer Book Archive