Phrase Translation Extraction from Aligned Parallel Corpora Using Suffix Arrays and Related Structures

  • José Aires
  • Gabriel Pereira Lopes
  • Luis Gomes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5816)


In this paper, we will address term translation extraction from indexed aligned parallel corpora, by using a couple of association measures combined by a voting scheme, for scaling down translation pairs according to the degree of internal cohesiveness, and evaluate results obtained. Precision obtained is clearly much better than results obtained in related work for the very low range of occurrences we have dealt with, and compares with the best results obtained in word translation.


Translation Equivalents Extraction Suffix Arrays Parallel Corpus Alignment Language Independent Large Corpus 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aires, J., Lopes, G., Silva, J.: Efficient Multi-Word Expressions Extractor Using Suffix Arrays and Related Structures. In: ACM iNEWS 2008, Napa Valley, California, USA (2008)Google Scholar
  2. 2.
    Ballesteros, L., Croft, W.B.: Phrasal translation and query expansion techniques for cross language information retrieval. In: ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 84–91 (1997)Google Scholar
  3. 3.
    Gale, W.A., Church, K.W.: A Programme for aligning sentences in bilingual Corpora. Computational Linguistics 19(1), 75–102 (1993)Google Scholar
  4. 4.
    Gomes, L.: Parallel Texts Alignment, M.Sc. Thesis, FCT/UNL (2009)Google Scholar
  5. 5.
    Henderson, J.: Word Alignment Baselines. In: HLT-NAACL Workshop on Building and Using Parallel Texts Data Driven Machine Translation and Beyond, pp. 27–30 (2003)Google Scholar
  6. 6.
    Hjelm, H.: Identifying Cross Language Term Equivalents Using Statistical Machine Translation and Distributional Association Measures. In: Proceedings of Nodalida 2007, the 16th Nordic Conference of Computational Linguistics, Tartu, Estonia (2007)Google Scholar
  7. 7.
    Langlais, P., Simard, M.: Merging example-based and statistical machine translation: An experiment. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 104–113. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. In: Proceedings of The First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 319–327 (1990)Google Scholar
  9. 9.
    Melamed, D.: Models of translational equivalence among words. Computational Linguistics 26(2), 221–249 (2000)CrossRefGoogle Scholar
  10. 10.
    Och, F.J., Ney, H.: Asystematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)CrossRefzbMATHGoogle Scholar
  11. 11.
    Ribeiro, A., Dias, G., Lopes, G., Mexia, J.: Cognates Alignment. In: Maegaard, B. (ed.) Proceedings of the Machine Translation Summit VIII (MT Summit VIII), Santiago de Compostela, Spain, September 18-22, 2001. European Association of Machine Translation, pp. 287–292 (2001)Google Scholar
  12. 12.
    Ribeiro, A., Lopes, G., Mexia, J.: Extracting Translation Equivalents from aligned parallel texts: comparison of measures of similarity. In: Monard, M.C., Sichman, J.S. (eds.) SBIA 2000 and IBERAMIA 2000. LNCS, vol. 1952, pp. 339–349. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. 13.
    Sahlgren, M., Karlgren, J.: Automatic bilingual lexicon acquisition using random indexing of parallel corpora. Natural Language Engineering 11(3), 1–38 (2005)CrossRefGoogle Scholar
  14. 14.
    Smadja, F., McKeeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics 22(1), 1–38 (1996)Google Scholar
  15. 15.
    Venugopal, A., Vogel, S., Waibel, A.: Effective phrase translation extraction from alignment models. In: Proc. of the 41st Annual Meeting of ACL, July 2003, pp. 319–326 (2003)Google Scholar
  16. 16.
    Veronis, J., Langlais, P.: Evaluation of parallel text alignment systems: he ARCADE project. In: Veronis, J. (ed.) ‘Parallel Text Processing”, Text, Speech and Language Technology Series. Speech and Language Technology Series, vol. 13, pp. 369–388. Kluwer Academic Publishers, Dordrecht (2001)Google Scholar
  17. 17.
    Yamamoto, M., Church, K.: Using suffix arrays to compute term frequency and document frequency for all sub-strings in a corpus. Computational Linguistics 27(1), 1–30 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • José Aires
    • 1
  • Gabriel Pereira Lopes
    • 1
  • Luis Gomes
    • 1
  1. 1.CITI, Departamento de Informática, Faculdade de Ciências e TecnologiaUniversidade Nova de LisboaCaparicaPortugal

Personalised recommendations