Classification and Selection of Translation Candidates for Parallel Corpora Alignment

  • K. M. Kavitha
  • Luís Gomes
  • José Aires
  • José Gabriel P. Lopes
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9273)

Abstract

By incorporating human feedback in parallel corpora alignment and term translation extraction tasks, and by using all human validated term translation pairs that have been marked as correct, the alignment precision, term translation extraction quality and a bunch of closely correlated tasks improve. Moreover, such a labelled lexicon with entries tagged for correctness enables bilingual learning. From this perspective, we present experiments on automatic classification of translation candidates extracted from aligned parallel corpora. For this purpose, we train SVM based classifiers for three language pairs, English-Portuguese (EN-PT), English-French (EN-FR) and French-Portuguese (FR-PT). The approach enabled micro f-measure classification rates of 95.96%, 75.04% and 65.87% respectively, for the EN-PT, EN-FR and FR-PT language pairs.

Keywords

Support Vector Machine Statistical Machine Translation Parallel Corpus Language Pair Translation Quality 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  2. 2.
    Aker, A., Paramita, M.L., Gaizauskas, R.J.: Extracting bilingual terminologies from comparable corpora. In: Proceedings of the 51st Annual Meeting for Computational linguistics, vol. 2, pp. 402–411 (2013)Google Scholar
  3. 3.
    Bergsma, S., Kondrak, G.: Alignment-based discriminative string similarity. In: Annual meeting-ACL, vol. 45, p. 656 (2007)Google Scholar
  4. 4.
    Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational linguistics 19(2), 263–311 (1993)Google Scholar
  5. 5.
    Chen, B., Cattoni, R., Bertoldi, N., Cettolo, M., Federico, M.: The ITC-irst SMT system for IWSLT-2005, pp. 98–104 (2005)Google Scholar
  6. 6.
    Fraser, A., Marcu, D.: Measuring word alignment quality for statistical machine translation. Computational Linguistics 33(3), 293–303 (2007)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Gomes, L.: Parallel texts alignment. In: New Trends in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, October 2009Google Scholar
  8. 8.
    Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 624–633. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  9. 9.
    Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge Univ Pr., pp. 52–61 (1997)Google Scholar
  10. 10.
    Johnson, J.H., Martin, J., Foster, G., Kuhn, R.: Improving translation quality by discarding most of the phrasetable. In: Proceedings of EMNLP (2007)Google Scholar
  11. 11.
    Kavitha, K.M., Gomes, L., Lopes, G.P.: Using SVMs for filtering translation tables for parallel corpora alignment. In: 15th Portuguese Conference in Arificial Intelligence, EPIA 2011, pp. 690–702, October 2011Google Scholar
  12. 12.
    Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Identification of bilingual suffix classes for classification and translation generation. In: Bazzan, A.L.C., Pichara, K. (eds.) IBERAMIA 2014. LNCS, vol. 8864, pp. 154–166. Springer, Heidelberg (2014) Google Scholar
  13. 13.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. ACL (2007)Google Scholar
  14. 14.
    Kutsumi, T., Yoshimi, T., Kotani, K., Sata, I., Isahara, H.: Selection of entries for a bilingual dictionary from aligned translation equivalents using support vector machines. In: Proceedings of PACLING (2005)Google Scholar
  15. 15.
    Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. In: Proceedings of RANLP, pp. 214–218 (2009)Google Scholar
  16. 16.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710 (1966)MathSciNetGoogle Scholar
  17. 17.
    Melamed, I.D.: Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 184–198. Boston, MA (1995)Google Scholar
  18. 18.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational linguistics 29(1), 19–51 (2003)CrossRefMATHGoogle Scholar
  19. 19.
    Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4), 417–449 (2004)CrossRefMATHGoogle Scholar
  20. 20.
    Sato, K., Saito, H.: Extracting word sequence correspondences based on support vector machines. Journal of Natural Language Processing 10(4), 109–124 (2003)CrossRefGoogle Scholar
  21. 21.
    Tian, L., Wong, D.F., Chao, L.S., Oliveira, F.: A relationship: Word alignment, phrase table, and translation quality. The Scientific World Journal (2014)Google Scholar
  22. 22.
    Tiedemann, J.: Extraction of translation equivalents from parallel corpora. In: Proceedings of the 11th NoDaLiDa, pp. 120–128 (1998)Google Scholar
  23. 23.
    Tomeh, N., Cancedda, N., Dymetman, M.: Complexity-based phrase-table filtering for statistical machine translation (2009)Google Scholar
  24. 24.
    Tomeh, N., Turchi, M., Allauzen, A., Yvon, F.: How good are your phrases? Assessing phrase quality with single class classification. In: IWSLT, pp. 261–268 (2011)Google Scholar
  25. 25.
    Vapnik, V.: The Nature of Statistical Learning Theory. Data Mining and Knowledge Discovery 1–47 (2000)Google Scholar
  26. 26.
    Vilar, D., Popovic, M., Ney, H.: AER: Do we need to “improve” our alignments? In: IWSLT, pp. 205–212 (2006)Google Scholar
  27. 27.
    Way, A., Hearne, M.: On the role of translations in state-of-the-art statistical machine translation. Language and Linguistics Compass 5(5), 227–248 (2011)CrossRefGoogle Scholar
  28. 28.
    Zens, R., Stanton, D., Xu, P.: A systematic comparison of phrase table pruning techniques. In: Proceedings of the 2012 Joint Conference on EMNLP and CoNLL, EMNLP-CoNLL 2012, pp. 972–983. ACL (2012)Google Scholar
  29. 29.
    Zhao, B., Vogel, S., Waibel, A.: Phrase pair rescoring with term weightings for statistical machine translation (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • K. M. Kavitha
    • 1
    • 3
  • Luís Gomes
    • 1
    • 2
  • José Aires
    • 1
    • 2
  • José Gabriel P. Lopes
    • 1
    • 2
  1. 1.NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), Faculdade de Ciências e TecnologiaUniversidade Nova de LisboaCaparicaPortugal
  2. 2.ISTRION BOX-Translation & Revision, Lda., ParkurbisCovilhãPortugal
  3. 3.Department of Computer ApplicationsSt. Joseph Engineering CollegeMangaluruIndia

Personalised recommendations