Incident-Driven Machine Translation and Name Tagging for Low-resource Languages

  • Ulf Hermjakob
  • Qiang Li
  • Daniel Marcu
  • Jonathan May
  • Sebastian J. Mielke
  • Nima Pourdamghani
  • Michael Pust
  • Xing Shi
  • Kevin Knight
  • Tomer Levinboim
  • Kenton Murray
  • David Chiang
  • Boliang Zhang
  • Xiaoman Pan
  • Di Lu
  • Ying Lin
  • Heng Ji
Article

Abstract

We describe novel approaches to tackling the problem of natural language processing for low-resource languages. The approaches are embodied in systems for name tagging and machine translation (MT) that we constructed to participate in the NIST LoReHLT evaluation in 2016. Our methods include universal tools, rapid resource and knowledge acquisition, rapid language projection, and joint methods for MT and name tagging.

Keywords

Low-resource languages Incident-driven Name tagging Machine translation 

Notes

Acknowledgements

We would like to thank other ELISA team members who contributed to resource construction and system preparation before the evaluation: Chris Callison-Burch (UPenn), Aliya Deri (USC) and Ashish Vaswani (Google). We thank Billy Wagner from Next Century for running the LTDE to produce name tagging runs. This work was supported by the U.S. Defense Advanced Research Projects Agency (DARPA) LORELEI Program No. HR0011-15-C-0115 and ARL/ARO MURI W911NF-10-1-0533. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

References

  1. Alvarez A, Levin L, Frederking R, Good J, Peterson E (2005) Semi-automated elicitation corpus generation. In: Proceedings of MT Summit XGoogle Scholar
  2. Baldwin T, Pool J, Colowick S (2010) PanLex and LEXTRACT: translating all words of all languages of the world. In: Proceedings of the 23rd international conference on computational linguisticsGoogle Scholar
  3. Bond F, Paik K (2012) A survey of Wordnets and their licenses. In: Proceedings of the 6th global WordNet conferenceGoogle Scholar
  4. Chiu JP, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. Trans ACI. arXiv:1511.08308
  5. Creutz M, Lagus K (2005) Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Helsinki University of Technology, HelsinkiGoogle Scholar
  6. Dryer MS, Haspelmath M (eds) (2013) WALS OnlineGoogle Scholar
  7. Engesath T, Yakup M, Dwyer A (2009) Greetings from the Teklimakan: a handbook of modern Uyghur. University of Kansas Scholarworks, LawrenceGoogle Scholar
  8. Ge T, Dou Q, Pan X, Ji H, Cui L, Chang B, Sui Z, Zhou M (2015) Aligning coordinated text streams through burst information network construction and decipherment. In: arXiv preprint arXiv:1609.08237
  9. Graves A, Jaitly N, Mohamed Ar (2013) Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE workshop on automatic speech recognition and understanding (ASRU). IEEE, pp 273–278Google Scholar
  10. Grishman R, Sundheim B (1996) Message understanding conference-6: a brief history. In: Proceedings of COLINGGoogle Scholar
  11. Heafield K, Lavie A (2010) Combining machine translation output with open source: the Carnegie Mellon multi-engine machine translation scheme. Prague Bull. Math. Linguist. 93:27–36CrossRefGoogle Scholar
  12. Ji H (2009) Mining name translations from comparable corpora by creating bilingual information networks. In: Proceedings of ACL-IJCNLP workshop on building and using comparable corporaGoogle Scholar
  13. Ji H, Grishman R (2007) Collaborative entity extraction and translation. In: Proceedings of international conference on recent advances in natural language processingGoogle Scholar
  14. Ji H, Grishman R (2011) Knowledge base population: Successful approaches and challenges. In: Proceedings of ACLGoogle Scholar
  15. Jiampojamarn S, Bhargava A, Dou Q, Dwyer K, Kondrak G (2009) Directl: A language-independent approach to transliteration. In: Proceedings of named entities workshopGoogle Scholar
  16. Kamholz D, Pool J, Colowick S (2014) Panlex: building a resource for panlingual lexical translation. In: Proceedings of the ninth international conference on language resources and evaluationGoogle Scholar
  17. Lample G, Ballesteros M, Kawakami K, Subramanian S, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings the 2016 conference of the North American chapter of the association for computational linguistics—human language technologies (NAACL-HLT 2016)Google Scholar
  18. Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: Proceedings of NAACL, pp 104–111Google Scholar
  19. Lin Y, Pan X, Deri A, Ji H, Knight K (2016) Leveraging entity linking and related language projection to improve name transliteration. In: Proceedings of ACL workshop on named entitiesGoogle Scholar
  20. Lu D, Pan X, Pourdamghani N, Chang SF, Ji H, Knight K (2016) A multi-media approach to cross-lingual entity knowledge transfer. In: Proceedings of ACIGoogle Scholar
  21. de Melo G (2014) Etymological wordnet: tracing the history of words. In: Proceeddings of the conference on language resourcesGoogle Scholar
  22. de Melo G, Weikum G (2009) Towards a universal Wordnet by learning from combined evidence. In: Proceedings of The conference on information and knowledge managementGoogle Scholar
  23. de Melo G, Weikum G (2010) Towards universal multilingual knowledge bases. In: Proceedings of the 5th global Wordnet conferenceGoogle Scholar
  24. Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of ACLGoogle Scholar
  25. Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1):19–51CrossRefMATHGoogle Scholar
  26. Pan X, Cassidy T, Hermjakob U, Ji H, Knight K (2015) Unsupervised entity linking with abstract meaning representation. In: Proceedings of NAACL-HLTGoogle Scholar
  27. Pan X, Zhang B, May J, Nothman J, Knight K, Ji H (2017) Cross-lingual name tagging and linking for 282 languages. In: Proceedings of ACLGoogle Scholar
  28. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318Google Scholar
  29. Pourdamghani N, Knight K (2017) Deciphering related languages. In: Proceedings of EMNLPGoogle Scholar
  30. Probst K, Brown RD, Carbonell JG, Lavie A, Levin L (2001) Design and implementation of controlled elicitation for machine translation of low-density languages. In: Machine Translation Summit VIIIGoogle Scholar
  31. Searle JR (1980) Minds, brains, and programs. Behav Brain Sci 3(03):417–424CrossRefGoogle Scholar
  32. Tiimiir H, Lee A (2003) Modern Uyghur grammar (morphology). Yildiz, IstanbulGoogle Scholar
  33. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the conference on computer vision and pattern recognitionGoogle Scholar
  34. Yu D, Pan X, Zhang B, Huang L, Lu D, Whitehead S, Ji H (2016) RPI_BLENDER TAC-KBP2016 system description. In: Proceedings of text analysis conference (TAC2016)Google Scholar
  35. Zakir H (2010) Introduction to modern Uighur. H. Zakir, New YorkGoogle Scholar
  36. Zhang B, Pan X, Wang T, Vaswani A, Ji H, Knight K, Marcu D (2016) Name tagging for low-resource incident languages based on expectation-driven learning. In: Proceedings of NAACL-HLTGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  • Ulf Hermjakob
    • 1
  • Qiang Li
    • 1
  • Daniel Marcu
    • 1
  • Jonathan May
    • 1
  • Sebastian J. Mielke
    • 1
  • Nima Pourdamghani
    • 1
  • Michael Pust
    • 1
  • Xing Shi
    • 1
  • Kevin Knight
    • 1
  • Tomer Levinboim
    • 2
  • Kenton Murray
    • 2
  • David Chiang
    • 2
  • Boliang Zhang
    • 3
  • Xiaoman Pan
    • 3
  • Di Lu
    • 3
  • Ying Lin
    • 3
  • Heng Ji
    • 3
  1. 1.Information Sciences InstituteUniversity of Southern CaliforniaMarina del ReyUSA
  2. 2.Department of Computer Science and EngineeringUniversity of Notre DameNotre DameUSA
  3. 3.Computer Science DepartmentRensselaer Polytechnic InstituteTroyUSA

Personalised recommendations