Skip to main content

A knowledge-lite approach to word alignment

  • Chapter
Parallel Text Processing

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 13))

Abstract

The most promising approach to word alignment is to combine statistical methods with non-statistical information sources. Some of the proposed non-statistical sources, including bilingual dictionaries, POS-taggers and lemmatizers, rely on considerable linguistic knowledge, while other knowledge-lite sources such as cognate heuristics and word order heuristics can be implemented relatively easy. While knowledge-heavy sources might be expected to give better performance, knowledge-lite systems are easier to port to new language pairs and text types, and they can give sufficiently good results for many purposes, e.g. if the output is to be used by a human user for the creation of a complete word-aligned bitext. In this paper we describe the current status of the Linköping Word Aligner (LWA), which combines the use of statistical measures of co-occurrence with four knowledge-lite modules for (i)) word categorization, (ii) morphological variation, (iii) word order, and (iv) phrase recognition. We demonstrate the portability of the system (from English-Swedish texts to French-English texts) and present results for these two language-pairs. Finally, we will report observations from an error analysis of system output, and identify the major strengths and weaknesses of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Ahrenberg, L. and Merkel, M. (1996). On Translation Corpora and Translation Support Tools—A Project Report. In Aijmer, K., Altenberg, B. and Johansson, M. (Eds). Languages in Contrast. Papers from a Symposium on Text-based Cross-linguistic Studies (pp. 185 - 200 ). Lund: Lund University Press.

    Google Scholar 

  • Ahrenberg, L., Andersson, M. and Merkel, M. (1998). A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts. Proceedings of 36th Annual Meeting of the Association for Computational Linguistics and 17` h International Conference on Computational Linguistics, Montréal, Canada, 10-14 August 1998, 29 - 35.

    Google Scholar 

  • Brown, P. F., Cocke, J., Della Pietra, S., Della Pietra, V., Jelinek, F., Mercer, R. L. and Roossin, P. (1988). A Statistical Approach to Language Translation. Proceedings of the 12th International Conference on Computational Linguistics. Budapest, 71 - 76.

    Google Scholar 

  • Brown, P. F., Cocke, J., Della Pietra, S., Della Pietra, V., Jelinek, F., Mercer, R. L. and Roossin, P. (1990). A Statistical Approach to Machine Translation. Computational Linguistics, 16 (2), 7985.

    Google Scholar 

  • Fung, P. and Church, K. W. (1994). K-vec: A New Approach for Aligning Parallel Texts. Proceed-

    Google Scholar 

  • ings from the 15th International Conference on Computational Linguistics,Kyoto, 1096-1102.

    Google Scholar 

  • Hunt, J. W. and Szymanski, T. G. (1977). A Fast Algorithm for Computing Longest Common Subsequences. Communications of the ACM, 20(5), 350 - 353.

    Google Scholar 

  • Jones, D. B. and Alexa, M. (1997). Towards automatically aligning German compounds with English word groups. In Jones, D. and Somers, H. (Eds.) New Methods in Language Processing (pp. 207 - 218 ), UCL Press, London.

    Google Scholar 

  • Melamed, I. D. (1995). Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons, Third Workshop on Very Large Corpora (WVLC3), 30 June 1995, MIT, Cambridge MA, 184 - 198.

    Google Scholar 

  • Melamed, I. D. (1997). A Word-to-Word Model of Translational Equivalence, Proceedings of the 35th Conference of the Association for Computational Linguistics (ACL’97), Madrid, Spain, 712 July 1997, 490 - 497.

    Google Scholar 

  • Merkel, M., Nilsson, B. and Ahrenberg, L. (1994). A Phrase-Retrieval System Based on Recurrence. Proceedings of the Second Annual Workshop on Very Large Corpora (WVLC-2) Kyoto, 99-108.

    Google Scholar 

  • Merkel, M. and Andersson, M. (2000). Knowledge-lite extraction of multi-word units with language filters and entropy thresholds. Proceedings of RIAO-2000, Collège de France, Paris, 1214 April 2000, Vol. 1, 737-746.

    Google Scholar 

  • Simard, M., Foster, G. F. and Isabelle, P. (1992). Using Cognates to Align Sentences in Bilingual Corpora. Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, Montreal, 67 - 82.

    Google Scholar 

  • Tiedemann, J. (1998) Extraction of Translation Equivalents from Parallel Corpora. Proceedings of the 11 `h Nordic Conference on Computational Linguistics (NODALIDA ’98), Copenhagen, Center for Sprogteknologi, 28-29 January 1998, 120 - 128.

    Google Scholar 

  • Véronis, J. and Langlais, Ph. (this volume). Evaluation of parallel text alignment systems. In Véronis, J.. (Ed.) Parallel Text Processing,Dordercht: Kluwer Academic Publishers.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Ahrenberg, L., Andersson, M., Merkel, M. (2000). A knowledge-lite approach to word alignment. In: Véronis, J. (eds) Parallel Text Processing. Text, Speech and Language Technology, vol 13. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2535-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2535-4_5

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-5555-2

  • Online ISBN: 978-94-017-2535-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics