Advertisement

Phrase Tagset Mapping for French and English Treebanks and Its Application in Machine Translation Evaluation

  • Aaron Li-Feng Han
  • Derek F. Wong
  • Lidia S. Chao
  • Liangye He
  • Shuo Li
  • Ling Zhu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8105)

Abstract

Many treebanks have been developed in recent years for different languages. But these treebanks usually employ different syntactic tag sets. This forms an obstacle for other researchers to take full advantages of them, especially when they undertake the multilingual research. To address this problem and to facilitate future research in unsupervised induction of syntactic structures, some researchers have developed a universal POS tag set. However, the disaccord problem of the phrase tag sets remains unsolved. Trying to bridge the phrase level tag sets of multilingual treebanks, this paper designs a phrase mapping between the French Treebank and the English Penn Treebank. Furthermore, one of the potential applications of this mapping work is explored in the machine translation evaluation task. This novel evaluation model developed without using reference translations yields promising results as compared to the state-of-the-art evaluation metrics.

Keywords

Natural language processing Phrase tagset mapping Multilingual treebanks Machine translation evaluation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)Google Scholar
  2. 2.
    Mitchell, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: Annotating Predicate Argument Structure. In: Human Language Technology: Proceedings of Workshop, Plainsboro, New Jersey, March 8-11, pp. 114–119. H94-1020 (1994)Google Scholar
  3. 3.
    Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An annotation scheme for free word order languages. In: Proc. of ANLP, pp. 88–95 (1997)Google Scholar
  4. 4.
    Abeillé, A., Clément, L., Toussenel, F.: Building a Treebank for French. Building and Using Parsed Corpora. In: Abeillé (Abeillé, 2003), ch. 10. ANNE Abeillé, Treebanks. Kluwer Academic Publishers (2003)Google Scholar
  5. 5.
    Chen, K., Luo, C., Chang, M., Chen, F., Chen, C., Huang, C., Gao, Z.: Sinica tree-bank: Design criteria, representational issues and implementation. In: Abeillé, ch. 13, pp. 231–248 (2003)Google Scholar
  6. 6.
    Slav, P., Das, D., McDonald, R.: A Universal Part-of-Speech Tagset. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey (2012)Google Scholar
  7. 7.
    Bies, A., Ferguson, M., Katz, K., MacIntyre, R.: Bracketing Guidelines for Treebank II style Penn Treebank Project. Linguistic Data Consortium (1995)Google Scholar
  8. 8.
    Kishore, P., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL 2002), pp. 311–318. Association for Computational Linguistics, Stroudsburg (2002)Google Scholar
  9. 9.
    George, D.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research (HLT 2002), pp. 138–145. Morgan Kaufmann Publishers Inc., San Francisco (2002)Google Scholar
  10. 10.
    Satanjeev, B., Lavie, A.: METEOR: An Automatic Metric for MT Eval-uation with Improved Correlation with Human Judgments. In: Proceedings of the 43th An-nual Meeting of the Association of Computational Linguistics (ACL 2005), pp. 65–72. Association of Computational Linguistics, Ann Arbor (June 2005)Google Scholar
  11. 11.
    Matthew, S., Dorr, B.J., Schwartz, R., Micciulla, L., Makhoul, J.: A Study of Translation Edit Rate with Targeted Human Annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA 2006), USA, pp. 223–231 (2006)Google Scholar
  12. 12.
    Callison-Burch, C., Koehn, P., Monz, C., Zaidan, O.F.: Findings of the 2011 Workshop on Statistical Machine Translation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation of the Association for Computational Linguistics(ACL-WMT), pp. 22–64. Association for Computational Linguistics, Edinburgh (2011)Google Scholar
  13. 13.
    Chris, C.-B., Koehn, P., Monz, C., Post, M., Soricut, R., Specia, L.: Findings of the 2012Workshop on Statistical Machine Translation. In: Pro-ceedings of the Seventh Workshop on Statistical Machine Translation, pp. 10–51. Association for Computational Linguistics, Mon-treal (2012)Google Scholar
  14. 14.
    Petrov, S., Barrett, L., Thibaux, R., Klein, D.: Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st International Con-ference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL-44), pp. 433–440. Association for Computational Linguistics, Strouds-burg (2006)Google Scholar
  15. 15.
    Cohen, J.: Statistical power analysis for the behavioral sciences, 2nd edn. Psychology Press (1988)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Aaron Li-Feng Han
    • 1
  • Derek F. Wong
    • 1
  • Lidia S. Chao
    • 1
  • Liangye He
    • 1
  • Shuo Li
    • 1
  • Ling Zhu
    • 1
  1. 1.Department of Computer and Information ScienceUniversity of MacauMacauChina

Personalised recommendations