Aligning Multiword Terms Using a Hybrid Approach

  • Arantza Casillas
  • Raquel Martínez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2276)

Abstract

In the context of parallel corpus alignment research between a pair of languages with various and important distinguishing factors (e.g., structural, lexical, morpho-syntactical), this paper presents an approach that deals with multiword terms alignment. Our system, ALINTEC, implements a hybrid strategy that adds various kinds of linguistic knowledge (an aligned corpus at the sentence level, POS tagging, grammatical patterns, and a bilingual glossary) to quantitative criteria such as frequency and distribution of terms in the corpus. The experiments were undertaken on a parallel corpus consisting on a collection of administrative and legal documents in Spanish and Basque. This pair of languages is representative of the context in which our work is framed. The results show that our approach obtains reasonably good results in aligning terms of a pair of languages of different typology such as Spanish and Basque.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    I. Dagan, K. Church. Termight: Identifying and translating Technical Terminology. Proceedings Fourth Conference on Applied Natural Language Processing (ANLP’94), 34–40 (1994)Google Scholar
  2. 2.
    B. Daille, E. Gaussier, J.M. Lange. Towards Automatic Extraction of Monolingual and Bilingual Terminology. Proceedings of the 15th International Conference on Computational Linguistics (COLING’94), 515–521 (1994)Google Scholar
  3. 3.
    L.R. Dice. Measures of the Amount of Ecologic Association Between Species. Ecology, 26:297–302 (1945)CrossRefGoogle Scholar
  4. 4.
    P. van der Eijk. Automating the acquisition of Bilingual Terminology. Proceedings Sixth Conference of the European Chapter of the Association for Computational Linguistic (EACL’93), 113–119 (1993)Google Scholar
  5. 5.
    K. T. Frantzi, S. Ananiadou. A Hybrid Approach to Term Recognition. NLP+IA96/TAL+AI96, 93–98 (1996).Google Scholar
  6. 6.
    Fung, P., Church, K.W.. K-vec: A New approach for Aligning Parallel Texts. Proceedings of the 15th International Conference on Computational Linguistic (COLING-94), 1096–1101 (1994)Google Scholar
  7. 7.
    Fung, P., McKeown, K. Aligning Noisy Parallel Corpora Across Language Groups: Word Pair Feature Matching by Dynamic Time Warping. Proceedings of the First Conference of the Association for Machine Translation in the Americas (AMTA’94), 81–88 (1994)Google Scholar
  8. 8.
    Justeson, J.S., Katz, S.M. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1(1), 9–27 (1995).CrossRefGoogle Scholar
  9. 9.
    Kaji, H., Kida, Y., Morimoto, Y., Learning Translation Templates from Bilingual Texts. Proceedings of the 13th International Conference on Computational Linguistics (COLING’92), 672–678 (1992)Google Scholar
  10. 10.
    B. Krenn, S. Evert. Can we do better than frequency? A case study on extracting PP-verb collocations. Proceedings of the ACL Workshop on Collocations (2001)Google Scholar
  11. 11.
    A. Kumano, H. Hirakawa. Building a MT dictionary from parallel texts based on linguistic and statistical information. Proceedings of the 15th International Conference on Computational Linguistics (COLING’94), 76–81 (1994)Google Scholar
  12. 12.
    I.D. Melamed: Automatic Discovery of Non-Compositional Compounds in Parallel Data. Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (1997)Google Scholar
  13. 13.
    F. Smadja, K. McKeown, V. Hatzivassiloglou. Translating Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics, 22(1):1–38 (1996)Google Scholar
  14. 14.
    R. Urizar, N. Ezeiza, I. Alegría: Morphosyntactic structure of terms in Basque for automatic terminology extraction. Proceedings of EURALEX 2000 (2000)Google Scholar
  15. 15.
    E. Viegas, S. Beale, S. Nirenburg: The Computational Lexical Semantics of Syntacmatic Relations. Proceedings of the 17th International Conference on Computational Linguistics (COLING’98) and 36th Annual Meeting of the Association for Computational Linguistics (ACL’98), 1328–1332 (1998)Google Scholar
  16. 16.
    Watanabe, H., Kurohashi, S., Aramaki, E.. “Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Translation.” Proceedings of the 18th International Conference on Computational Linguistics (COLING’00), 906–912 (2000)Google Scholar
  17. 17.
    Y. Yamamoto, M. Sakamoto. Extraction of technical term bilingual dictionary from bilingual corpus. IPSJ SIG Notes, 94–12 (1993)Google Scholar
  18. 18.
    Yamamoto, K., Matsumoto, Y.. ldAcquisition of Phrase-level Bilingual Correspondence using Dependency Structure.” Proceedings of the 18th International Conference on Computational Linguistics (COLING’00), 933–939 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Arantza Casillas
    • 1
  • Raquel Martínez
    • 2
  1. 1.Facultad de CienciasUniversidad del País VascoSpain
  2. 2.Escuela Superior de CC. Experimentales y TecnologíaUniversidad Rey Juan CarlosSpain

Personalised recommendations