A knowledge-lite approach to word alignment

Ahrenberg, Lars; Andersson, Mikael; Merkel, Magnus

doi:10.1007/978-94-017-2535-4_5

Lars Ahrenberg⁴,
Mikael Andersson⁴ &
Magnus Merkel⁴

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 13))

249 Accesses
7 Citations

Abstract

The most promising approach to word alignment is to combine statistical methods with non-statistical information sources. Some of the proposed non-statistical sources, including bilingual dictionaries, POS-taggers and lemmatizers, rely on considerable linguistic knowledge, while other knowledge-lite sources such as cognate heuristics and word order heuristics can be implemented relatively easy. While knowledge-heavy sources might be expected to give better performance, knowledge-lite systems are easier to port to new language pairs and text types, and they can give sufficiently good results for many purposes, e.g. if the output is to be used by a human user for the creation of a complete word-aligned bitext. In this paper we describe the current status of the Linköping Word Aligner (LWA), which combines the use of statistical measures of co-occurrence with four knowledge-lite modules for (i)) word categorization, (ii) morphological variation, (iii) word order, and (iv) phrase recognition. We demonstrate the portability of the system (from English-Swedish texts to French-English texts) and present results for these two language-pairs. Finally, we will report observations from an error analysis of system output, and identify the major strengths and weaknesses of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahrenberg, L. and Merkel, M. (1996). On Translation Corpora and Translation Support Tools—A Project Report. In Aijmer, K., Altenberg, B. and Johansson, M. (Eds). Languages in Contrast. Papers from a Symposium on Text-based Cross-linguistic Studies (pp. 185 - 200 ). Lund: Lund University Press.
Google Scholar
Ahrenberg, L., Andersson, M. and Merkel, M. (1998). A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts. Proceedings of 36th Annual Meeting of the Association for Computational Linguistics and 17` h International Conference on Computational Linguistics, Montréal, Canada, 10-14 August 1998, 29 - 35.
Google Scholar
Brown, P. F., Cocke, J., Della Pietra, S., Della Pietra, V., Jelinek, F., Mercer, R. L. and Roossin, P. (1988). A Statistical Approach to Language Translation. Proceedings of the 12th International Conference on Computational Linguistics. Budapest, 71 - 76.
Google Scholar
Brown, P. F., Cocke, J., Della Pietra, S., Della Pietra, V., Jelinek, F., Mercer, R. L. and Roossin, P. (1990). A Statistical Approach to Machine Translation. Computational Linguistics, 16 (2), 7985.
Google Scholar
Fung, P. and Church, K. W. (1994). K-vec: A New Approach for Aligning Parallel Texts. Proceed-
Google Scholar
ings from the 15th International Conference on Computational Linguistics,Kyoto, 1096-1102.
Google Scholar
Hunt, J. W. and Szymanski, T. G. (1977). A Fast Algorithm for Computing Longest Common Subsequences. Communications of the ACM, 20(5), 350 - 353.
Google Scholar
Jones, D. B. and Alexa, M. (1997). Towards automatically aligning German compounds with English word groups. In Jones, D. and Somers, H. (Eds.) New Methods in Language Processing (pp. 207 - 218 ), UCL Press, London.
Google Scholar
Melamed, I. D. (1995). Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons, Third Workshop on Very Large Corpora (WVLC3), 30 June 1995, MIT, Cambridge MA, 184 - 198.
Google Scholar
Melamed, I. D. (1997). A Word-to-Word Model of Translational Equivalence, Proceedings of the 35th Conference of the Association for Computational Linguistics (ACL’97), Madrid, Spain, 712 July 1997, 490 - 497.
Google Scholar
Merkel, M., Nilsson, B. and Ahrenberg, L. (1994). A Phrase-Retrieval System Based on Recurrence. Proceedings of the Second Annual Workshop on Very Large Corpora (WVLC-2) Kyoto, 99-108.
Google Scholar
Merkel, M. and Andersson, M. (2000). Knowledge-lite extraction of multi-word units with language filters and entropy thresholds. Proceedings of RIAO-2000, Collège de France, Paris, 1214 April 2000, Vol. 1, 737-746.
Google Scholar
Simard, M., Foster, G. F. and Isabelle, P. (1992). Using Cognates to Align Sentences in Bilingual Corpora. Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, Montreal, 67 - 82.
Google Scholar
Tiedemann, J. (1998) Extraction of Translation Equivalents from Parallel Corpora. Proceedings of the 11 `h Nordic Conference on Computational Linguistics (NODALIDA ’98), Copenhagen, Center for Sprogteknologi, 28-29 January 1998, 120 - 128.
Google Scholar
Véronis, J. and Langlais, Ph. (this volume). Evaluation of parallel text alignment systems. In Véronis, J.. (Ed.) Parallel Text Processing,Dordercht: Kluwer Academic Publishers.
Google Scholar

Download references

Author information

Authors and Affiliations

Linköping University, Sweden
Lars Ahrenberg, Mikael Andersson & Magnus Merkel

Authors

Lars Ahrenberg
View author publications
You can also search for this author in PubMed Google Scholar
Mikael Andersson
View author publications
You can also search for this author in PubMed Google Scholar
Magnus Merkel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université de Provence and CNRS, 29, Avenue Robert Schuman, 13100, Aix-en-Provence, France
Jean Véronis

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ahrenberg, L., Andersson, M., Merkel, M. (2000). A knowledge-lite approach to word alignment. In: Véronis, J. (eds) Parallel Text Processing. Text, Speech and Language Technology, vol 13. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2535-4_5

Download citation

DOI: https://doi.org/10.1007/978-94-017-2535-4_5
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5555-2
Online ISBN: 978-94-017-2535-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics