A Chunk-Driven Bootstrapping Approach to Extracting Translation Patterns

Macken, Lieve; Daelemans, Walter

doi:10.1007/978-3-642-12116-6_33

Lieve Macken^17,18 &
Walter Daelemans¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6008))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1799 Accesses
1 Citations

Abstract

We present a linguistically-motivated sub-sentential alignment system that extends the intersected IBM Model 4 word alignments. The alignment system is chunk-driven and requires only shallow linguistic processing tools for the source and the target languages, i.e. part-of-speech taggers and chunkers.

We conceive the sub-sentential aligner as a cascaded model consisting of two phases. In the first phase, anchor chunks are linked based on the intersected word alignments and syntactic similarity. In the second phase, we use a bootstrapping approach to extract more complex translation patterns.

The results show an overall AER reduction and competitive F-Measures in comparison to the commonly used symmetrized IBM Model 4 predictions (intersection, union and grow-diag-final) on six different text types for English-Dutch. More in particular, in comparison with the intersected word alignments, the proposed method improves recall, without sacrificing precision. Moreover, the system is able to align discontiguous chunks, which frequently occur in Dutch.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Planas, E.: SIMILIS Second-generation translation memory software. In: 27th International Conference on Translating and the Computer (TC27), London, United Kingdom, ASLIB (2005)
Google Scholar
Itagaki, M., Aikawa, T., He, X.: Automatic Validation of Terminology Consistency with Statistical Method. In: Machine Translation Summit XI. European Associaton for Machine Translation, pp. 269–274 (2007)
Google Scholar
Macken, L., Lefever, E., Hoste, V.: Linguistically-based Sub-sentential Alignment for Terminology Extraction from a Bilingual Automotive Corpus. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, United Kingdom (2008)
Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Article Google Scholar
Brown, P.F., Della Pietra, V.J., Della Pietra, S.A., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open Source Toolkit for Statistical Machine Translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, Czech Republic, Prague. Association for Computational Linguistics, pp. 177–180 (2007)
Google Scholar
Ganchev, K., Graça, J.V., Taskar, B.: Better Alignments = Better Translations? In: Proceedings of ACL 2008: HLT, Columbus, Ohio. Association for Computational Linguistics, pp. 986–993 (2008)
Google Scholar
Zhang, H., Quirk, C., Moore, R.C., Gildea, D.: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing. In: Proceedings of ACL 2008: HLT, Columbus, Ohio. Association for Computational Linguistics, pp. 97–105 (2008)
Google Scholar
DeNero, J., Klein, D.: Tailoring Word Alignments to Syntactic Machine Translation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic. Association for Computational Linguistics, pp. 17–24 (2007)
Google Scholar
Tiedemann, J.: Combining Clues for Word Alignment. In: Proceedings of the 10th Conference of the European Chapter of the ACL (EACL 2003), Budapest, Hungary (2003)
Google Scholar
Daelemans, W., van den Bosch, A.: Memory-based language processing. Cambridge University Press, Cambridge (2005)
Book Google Scholar
van den Bosch, A., Busser, B., Daelemans, W., Canisius, S.: An efficient memory-based morphosyntactic tagger and parser for Dutch. In: Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting, Leuven, Belgium, pp. 191–206 (2007)
Google Scholar
Abney, S.: Parsing by chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-Based Parsing. Kluwer Academic Publisher, Dordrecht (1991)
Google Scholar
Melamed, D.I.: Models of translational equivalence among words. Computational Linguistics 26(2), 221–249 (2000)
Article Google Scholar
Moore, R.C.: Association-Based Bilingual Word Alignment. In: ACL Workshop on Building and Using Parallel Texts, Ann Arbor, Michigan, United States, pp. 1–8 (2005)
Google Scholar
Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19(1), 61–74 (1993)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. Massachusetts Institute of Technology (2003)
Google Scholar
McEnery, T., Xiao, R., Yukio, T.: Corpus-based Language Studies. An advanced resource book. Routledge, London (2006)
Google Scholar
Macken, L., Trushkina, J., Rura, L.: Dutch Parallel Corpus: MT corpus and Translator’s Aid. In: Machine Translation Summit XI, Copenhagen, Denmark, pp. 313–320 (2007)
Google Scholar
Melamed, D.I.: Empirical Methods for Exploiting Parallel Texts. MIT Press, Cambridge (2001)
Google Scholar
Davis, P.C.: Stone Soup Translation: The Linked Automata Model, Unpublished PhD, Ohio State University (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

LT3, University College Ghent, Groot-Brittanniëlaan 45, Ghent, Belgium
Lieve Macken
Dept. of Applied Mathematics and Computer Science, Ghent University, Krijgslaan 281(S9), Ghent, Belgium
Lieve Macken
CLiPS Computational Linguistics Group, University of Antwerp, Prinsstraat 13, 2000, Antwerpen, Belgium
Walter Daelemans

Authors

Lieve Macken
View author publications
You can also search for this author in PubMed Google Scholar
Walter Daelemans
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, 07738, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Macken, L., Daelemans, W. (2010). A Chunk-Driven Bootstrapping Approach to Extracting Translation Patterns. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-12116-6_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics