Abstract
In the context of parallel corpus alignment research between a pair of languages with various and important distinguishing factors (e.g., structural, lexical, morpho-syntactical), this paper presents an approach that deals with multiword terms alignment. Our system, ALINTEC, implements a hybrid strategy that adds various kinds of linguistic knowledge (an aligned corpus at the sentence level, POS tagging, grammatical patterns, and a bilingual glossary) to quantitative criteria such as frequency and distribution of terms in the corpus. The experiments were undertaken on a parallel corpus consisting on a collection of administrative and legal documents in Spanish and Basque. This pair of languages is representative of the context in which our work is framed. The results show that our approach obtains reasonably good results in aligning terms of a pair of languages of different typology such as Spanish and Basque.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
I. Dagan, K. Church. Termight: Identifying and translating Technical Terminology. Proceedings Fourth Conference on Applied Natural Language Processing (ANLP’94), 34–40 (1994)
B. Daille, E. Gaussier, J.M. Lange. Towards Automatic Extraction of Monolingual and Bilingual Terminology. Proceedings of the 15th International Conference on Computational Linguistics (COLING’94), 515–521 (1994)
L.R. Dice. Measures of the Amount of Ecologic Association Between Species. Ecology, 26:297–302 (1945)
P. van der Eijk. Automating the acquisition of Bilingual Terminology. Proceedings Sixth Conference of the European Chapter of the Association for Computational Linguistic (EACL’93), 113–119 (1993)
K. T. Frantzi, S. Ananiadou. A Hybrid Approach to Term Recognition. NLP+IA96/TAL+AI96, 93–98 (1996).
Fung, P., Church, K.W.. K-vec: A New approach for Aligning Parallel Texts. Proceedings of the 15th International Conference on Computational Linguistic (COLING-94), 1096–1101 (1994)
Fung, P., McKeown, K. Aligning Noisy Parallel Corpora Across Language Groups: Word Pair Feature Matching by Dynamic Time Warping. Proceedings of the First Conference of the Association for Machine Translation in the Americas (AMTA’94), 81–88 (1994)
Justeson, J.S., Katz, S.M. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1(1), 9–27 (1995).
Kaji, H., Kida, Y., Morimoto, Y., Learning Translation Templates from Bilingual Texts. Proceedings of the 13th International Conference on Computational Linguistics (COLING’92), 672–678 (1992)
B. Krenn, S. Evert. Can we do better than frequency? A case study on extracting PP-verb collocations. Proceedings of the ACL Workshop on Collocations (2001)
A. Kumano, H. Hirakawa. Building a MT dictionary from parallel texts based on linguistic and statistical information. Proceedings of the 15th International Conference on Computational Linguistics (COLING’94), 76–81 (1994)
I.D. Melamed: Automatic Discovery of Non-Compositional Compounds in Parallel Data. Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (1997)
F. Smadja, K. McKeown, V. Hatzivassiloglou. Translating Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics, 22(1):1–38 (1996)
R. Urizar, N. Ezeiza, I. Alegría: Morphosyntactic structure of terms in Basque for automatic terminology extraction. Proceedings of EURALEX 2000 (2000)
E. Viegas, S. Beale, S. Nirenburg: The Computational Lexical Semantics of Syntacmatic Relations. Proceedings of the 17th International Conference on Computational Linguistics (COLING’98) and 36th Annual Meeting of the Association for Computational Linguistics (ACL’98), 1328–1332 (1998)
Watanabe, H., Kurohashi, S., Aramaki, E.. “Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Translation.” Proceedings of the 18th International Conference on Computational Linguistics (COLING’00), 906–912 (2000)
Y. Yamamoto, M. Sakamoto. Extraction of technical term bilingual dictionary from bilingual corpus. IPSJ SIG Notes, 94–12 (1993)
Yamamoto, K., Matsumoto, Y.. ldAcquisition of Phrase-level Bilingual Correspondence using Dependency Structure.” Proceedings of the 18th International Conference on Computational Linguistics (COLING’00), 933–939 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Casillas, A., Martínez, R. (2002). Aligning Multiword Terms Using a Hybrid Approach. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2002. Lecture Notes in Computer Science, vol 2276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45715-1_28
Download citation
DOI: https://doi.org/10.1007/3-540-45715-1_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43219-7
Online ISBN: 978-3-540-45715-2
eBook Packages: Springer Book Archive