EPIA 2015: Progress in Artificial Intelligence pp 723-734 | Cite as
Classification and Selection of Translation Candidates for Parallel Corpora Alignment
Abstract
By incorporating human feedback in parallel corpora alignment and term translation extraction tasks, and by using all human validated term translation pairs that have been marked as correct, the alignment precision, term translation extraction quality and a bunch of closely correlated tasks improve. Moreover, such a labelled lexicon with entries tagged for correctness enables bilingual learning. From this perspective, we present experiments on automatic classification of translation candidates extracted from aligned parallel corpora. For this purpose, we train SVM based classifiers for three language pairs, English-Portuguese (EN-PT), English-French (EN-FR) and French-Portuguese (FR-PT). The approach enabled micro f-measure classification rates of 95.96%, 75.04% and 65.87% respectively, for the EN-PT, EN-FR and FR-PT language pairs.
Keywords
Support Vector Machine Statistical Machine Translation Parallel Corpus Language Pair Translation QualityPreview
Unable to display preview. Download preview PDF.
References
- 1.Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597. Springer, Heidelberg (2009) CrossRefGoogle Scholar
- 2.Aker, A., Paramita, M.L., Gaizauskas, R.J.: Extracting bilingual terminologies from comparable corpora. In: Proceedings of the 51st Annual Meeting for Computational linguistics, vol. 2, pp. 402–411 (2013)Google Scholar
- 3.Bergsma, S., Kondrak, G.: Alignment-based discriminative string similarity. In: Annual meeting-ACL, vol. 45, p. 656 (2007)Google Scholar
- 4.Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational linguistics 19(2), 263–311 (1993)Google Scholar
- 5.Chen, B., Cattoni, R., Bertoldi, N., Cettolo, M., Federico, M.: The ITC-irst SMT system for IWSLT-2005, pp. 98–104 (2005)Google Scholar
- 6.Fraser, A., Marcu, D.: Measuring word alignment quality for statistical machine translation. Computational Linguistics 33(3), 293–303 (2007)MathSciNetCrossRefMATHGoogle Scholar
- 7.Gomes, L.: Parallel texts alignment. In: New Trends in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, October 2009Google Scholar
- 8.Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 624–633. Springer, Heidelberg (2011) CrossRefGoogle Scholar
- 9.Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge Univ Pr., pp. 52–61 (1997)Google Scholar
- 10.Johnson, J.H., Martin, J., Foster, G., Kuhn, R.: Improving translation quality by discarding most of the phrasetable. In: Proceedings of EMNLP (2007)Google Scholar
- 11.Kavitha, K.M., Gomes, L., Lopes, G.P.: Using SVMs for filtering translation tables for parallel corpora alignment. In: 15th Portuguese Conference in Arificial Intelligence, EPIA 2011, pp. 690–702, October 2011Google Scholar
- 12.Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Identification of bilingual suffix classes for classification and translation generation. In: Bazzan, A.L.C., Pichara, K. (eds.) IBERAMIA 2014. LNCS, vol. 8864, pp. 154–166. Springer, Heidelberg (2014) Google Scholar
- 13.Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. ACL (2007)Google Scholar
- 14.Kutsumi, T., Yoshimi, T., Kotani, K., Sata, I., Isahara, H.: Selection of entries for a bilingual dictionary from aligned translation equivalents using support vector machines. In: Proceedings of PACLING (2005)Google Scholar
- 15.Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. In: Proceedings of RANLP, pp. 214–218 (2009)Google Scholar
- 16.Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710 (1966)MathSciNetGoogle Scholar
- 17.Melamed, I.D.: Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 184–198. Boston, MA (1995)Google Scholar
- 18.Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational linguistics 29(1), 19–51 (2003)CrossRefMATHGoogle Scholar
- 19.Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4), 417–449 (2004)CrossRefMATHGoogle Scholar
- 20.Sato, K., Saito, H.: Extracting word sequence correspondences based on support vector machines. Journal of Natural Language Processing 10(4), 109–124 (2003)CrossRefGoogle Scholar
- 21.Tian, L., Wong, D.F., Chao, L.S., Oliveira, F.: A relationship: Word alignment, phrase table, and translation quality. The Scientific World Journal (2014)Google Scholar
- 22.Tiedemann, J.: Extraction of translation equivalents from parallel corpora. In: Proceedings of the 11th NoDaLiDa, pp. 120–128 (1998)Google Scholar
- 23.Tomeh, N., Cancedda, N., Dymetman, M.: Complexity-based phrase-table filtering for statistical machine translation (2009)Google Scholar
- 24.Tomeh, N., Turchi, M., Allauzen, A., Yvon, F.: How good are your phrases? Assessing phrase quality with single class classification. In: IWSLT, pp. 261–268 (2011)Google Scholar
- 25.Vapnik, V.: The Nature of Statistical Learning Theory. Data Mining and Knowledge Discovery 1–47 (2000)Google Scholar
- 26.Vilar, D., Popovic, M., Ney, H.: AER: Do we need to “improve” our alignments? In: IWSLT, pp. 205–212 (2006)Google Scholar
- 27.Way, A., Hearne, M.: On the role of translations in state-of-the-art statistical machine translation. Language and Linguistics Compass 5(5), 227–248 (2011)CrossRefGoogle Scholar
- 28.Zens, R., Stanton, D., Xu, P.: A systematic comparison of phrase table pruning techniques. In: Proceedings of the 2012 Joint Conference on EMNLP and CoNLL, EMNLP-CoNLL 2012, pp. 972–983. ACL (2012)Google Scholar
- 29.Zhao, B., Vogel, S., Waibel, A.: Phrase pair rescoring with term weightings for statistical machine translation (2004)Google Scholar