Mining Bilingual Lexical Equivalences Out of Parallel Corpora

  • Stelios Piperidis
  • Ioannis Harlas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3955)


The role and importance of methods for lexical knowledge elicitation in the area of multilingual information processing, including machine translation, computer-aided translation and cross-lingual information retrieval is undisputable. The usefulness of such methods becomes even more apparent in cases of language pairs where no appropriate digital language resources exist. This paper presents encouraging experimental results in automatically eliciting bilingual lexica out of Greek-Turkish parallel corpora, consisting of international organizations’ documents available in English, Greek and Turkish, in an attempt to aid multilingual document processing involving these languages.


Target Word Machine Translation Dice Coefficient Parallel Corpus Language Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Brown, P., Lai, J., Mercer, R.: Aligning sentences in parallel corpora. In: Proc. 29th Annual Meeting of the ACL, Berkley, Calif., June 18-21, 1991, pp. 169–176 (1991)Google Scholar
  2. 2.
    Brown, R., Carbonell, J., Yang, Y.: Automatic Dictionary Extraction for Cross-Language Information Retrieval (December 1998)Google Scholar
  3. 3.
    Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, ISBN 0-07-013151-1Google Scholar
  4. 4.
    Gale, W.A., Church, K.W.: A Program for Aligning Sentences in Parallel Corpora. In: Proceedings of the 29th Annual Meeting of the ACL, pp. 177–184 (1991)Google Scholar
  5. 5.
    Gaussier, E.: Flow network models for word alignment and terminology extraction from bilingual corpora (1998)Google Scholar
  6. 6.
    Kageura, K., Tsuji, K., Aizawa, A.: Automatic Thesaurus Generation through Multiple Filtering (2000)Google Scholar
  7. 7.
    Kosinov, S.: Evaluation of N-GRAMS Conflation Approach in text-based information retrieval (2001)Google Scholar
  8. 8.
    Kupiec, J.: An algorithm for finding noun phrase correspondences in bilingual corpora. In: Proceedings of the 31st Annual Meeting of the ACL, Columbus, Ohio (1993)Google Scholar
  9. 9.
    Papageorgiou, H., Prokopidis, P., Giouli, V., Piperidis, S.: A Unified Tagging Architecture and its Application to Greek. In: Proceedings of Second International Conference on Language Resources and Evaluation-LREC 2000, Athens, Greece, May 31-June 2, 2000, pp. 1455–1462 (2000)Google Scholar
  10. 10.
    Piperidis, S., Boutsis, S., Demiros, I.: Automatic Translation Lexicon Generation from Multilingual texts. In: Workshop on Multilinguality in the Software Industry: the AI Contribution (MULSAIC 1997), Fifteenth International Joint Conference on Artificial Intelligence (IJCAI 1997), Nagoya, Japan, August 25, 1997, pp. 57–62 (1997)Google Scholar
  11. 11.
    Piperidis, S., Malavazos, C., Triantafyllou, Y.: A Multi-level Framework for Memory-Based Translation Aid Tools. In: Aslib, Translating and the Computer, vol. 21, pp. 10–11, London (November 1999)Google Scholar
  12. 12.
    Piperidis, S., Papageorgiou, H., Boutsis, S.: From sentences to words and clauses. In: Veronis, J. (ed.) Parallel Text Processing, Alignment and use of translation corpora. Text Speech and Language Technology Series, pp. 117–138. Kluwer Academic Publishers, Dordrecht (2000)Google Scholar
  13. 13.
    Porter, M.: An algorithm for suffix stripping, M.F. (1980),
  14. 14.
    Smadja, F., McKeown, K.R., Hatzivassiloglou, V.: Translating Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics 22(1), 1–38 (1996)Google Scholar
  15. 15.
    Tiedemann, J.: Recycling Translations. Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing. In: Acta Universitatis Upsaliensis. Studia Linguistica Upsaliensia, Uppsala, pp. 1–130 (2003), ISBN: 91-554-5815-7Google Scholar
  16. 16.
    Tufiş, D., Barbu, A.-M.: Automatic construction of translation lexicons (2001)Google Scholar
  17. 17.
    Van der Eijk, P.: Automating the Acquisition of Bilingual Terminology. In: Proceedings Sixth Conference of the European Chapter of the Association for Computational Linguistics, pp. 113–119. Association for Computational Linguistics, Utrecht, The Netherlands (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Stelios Piperidis
    • 1
    • 2
  • Ioannis Harlas
    • 3
  1. 1.Institute for Language and Speech ProcessingMarousiGreece
  2. 2.National Technical University of AthensGreece
  3. 3.Athens University of Economics & BusinessGreece

Personalised recommendations