Skip to main content

Aligning Multiword Terms Using a Hybrid Approach

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2276))

  • 1490 Accesses

Abstract

In the context of parallel corpus alignment research between a pair of languages with various and important distinguishing factors (e.g., structural, lexical, morpho-syntactical), this paper presents an approach that deals with multiword terms alignment. Our system, ALINTEC, implements a hybrid strategy that adds various kinds of linguistic knowledge (an aligned corpus at the sentence level, POS tagging, grammatical patterns, and a bilingual glossary) to quantitative criteria such as frequency and distribution of terms in the corpus. The experiments were undertaken on a parallel corpus consisting on a collection of administrative and legal documents in Spanish and Basque. This pair of languages is representative of the context in which our work is framed. The results show that our approach obtains reasonably good results in aligning terms of a pair of languages of different typology such as Spanish and Basque.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. I. Dagan, K. Church. Termight: Identifying and translating Technical Terminology. Proceedings Fourth Conference on Applied Natural Language Processing (ANLP’94), 34–40 (1994)

    Google Scholar 

  2. B. Daille, E. Gaussier, J.M. Lange. Towards Automatic Extraction of Monolingual and Bilingual Terminology. Proceedings of the 15th International Conference on Computational Linguistics (COLING’94), 515–521 (1994)

    Google Scholar 

  3. L.R. Dice. Measures of the Amount of Ecologic Association Between Species. Ecology, 26:297–302 (1945)

    Article  Google Scholar 

  4. P. van der Eijk. Automating the acquisition of Bilingual Terminology. Proceedings Sixth Conference of the European Chapter of the Association for Computational Linguistic (EACL’93), 113–119 (1993)

    Google Scholar 

  5. K. T. Frantzi, S. Ananiadou. A Hybrid Approach to Term Recognition. NLP+IA96/TAL+AI96, 93–98 (1996).

    Google Scholar 

  6. Fung, P., Church, K.W.. K-vec: A New approach for Aligning Parallel Texts. Proceedings of the 15th International Conference on Computational Linguistic (COLING-94), 1096–1101 (1994)

    Google Scholar 

  7. Fung, P., McKeown, K. Aligning Noisy Parallel Corpora Across Language Groups: Word Pair Feature Matching by Dynamic Time Warping. Proceedings of the First Conference of the Association for Machine Translation in the Americas (AMTA’94), 81–88 (1994)

    Google Scholar 

  8. Justeson, J.S., Katz, S.M. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1(1), 9–27 (1995).

    Article  Google Scholar 

  9. Kaji, H., Kida, Y., Morimoto, Y., Learning Translation Templates from Bilingual Texts. Proceedings of the 13th International Conference on Computational Linguistics (COLING’92), 672–678 (1992)

    Google Scholar 

  10. B. Krenn, S. Evert. Can we do better than frequency? A case study on extracting PP-verb collocations. Proceedings of the ACL Workshop on Collocations (2001)

    Google Scholar 

  11. A. Kumano, H. Hirakawa. Building a MT dictionary from parallel texts based on linguistic and statistical information. Proceedings of the 15th International Conference on Computational Linguistics (COLING’94), 76–81 (1994)

    Google Scholar 

  12. I.D. Melamed: Automatic Discovery of Non-Compositional Compounds in Parallel Data. Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (1997)

    Google Scholar 

  13. F. Smadja, K. McKeown, V. Hatzivassiloglou. Translating Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics, 22(1):1–38 (1996)

    Google Scholar 

  14. R. Urizar, N. Ezeiza, I. Alegría: Morphosyntactic structure of terms in Basque for automatic terminology extraction. Proceedings of EURALEX 2000 (2000)

    Google Scholar 

  15. E. Viegas, S. Beale, S. Nirenburg: The Computational Lexical Semantics of Syntacmatic Relations. Proceedings of the 17th International Conference on Computational Linguistics (COLING’98) and 36th Annual Meeting of the Association for Computational Linguistics (ACL’98), 1328–1332 (1998)

    Google Scholar 

  16. Watanabe, H., Kurohashi, S., Aramaki, E.. “Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Translation.” Proceedings of the 18th International Conference on Computational Linguistics (COLING’00), 906–912 (2000)

    Google Scholar 

  17. Y. Yamamoto, M. Sakamoto. Extraction of technical term bilingual dictionary from bilingual corpus. IPSJ SIG Notes, 94–12 (1993)

    Google Scholar 

  18. Yamamoto, K., Matsumoto, Y.. ldAcquisition of Phrase-level Bilingual Correspondence using Dependency Structure.” Proceedings of the 18th International Conference on Computational Linguistics (COLING’00), 933–939 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Casillas, A., Martínez, R. (2002). Aligning Multiword Terms Using a Hybrid Approach. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2002. Lecture Notes in Computer Science, vol 2276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45715-1_28

Download citation

  • DOI: https://doi.org/10.1007/3-540-45715-1_28

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43219-7

  • Online ISBN: 978-3-540-45715-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics