Advertisement

Machine Translation

, Volume 32, Issue 1–2, pp 105–126 | Cite as

The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approach

  • Patrick LittellEmail author
  • Tian Tian
  • Ruochen Xu
  • Zaid Sheikh
  • David Mortensen
  • Lori Levin
  • Francis Tyers
  • Hiroaki Hayashi
  • Graham Horwood
  • Steve Sloto
  • Emily Tagtow
  • Alan Black
  • Yiming Yang
  • Teruko Mitamura
  • Eduard Hovy
Article

Abstract

The LoReHLT16 evaluation challenged participants to extract Situation Frames (SFs)—structured descriptions of humanitarian need situations—from monolingual Uyghur text. The ARIEL-CMU SF detector combines two classification paradigms, a manually curated keyword-spotting system and a machine learning classifier. These were applied by translating the models on a per-feature basis, rather than translating the input text. The resulting combined model provides the accuracy of human insight with the generality of machine learning, and is relatively tractable to human analysis and error correction. Other factors contributing to success were automatic dictionary creation, the use of phonetic transcription, detailed, hand-written morphological analysis, and naturalistic glossing for error analysis by humans. The ARIEL-CMU SF pipeline produced the top-scoring LoReHLT16 situation frame detection systems for the metrics SFType, SFType+Place+Need, SFType+Place+Relief, and SFType+Place+Urgency, at each of the three checkpoints.

Keywords

LoReHLT LORELEI Situation frames Information extraction 

Notes

Acknowledgements

This project was sponsored by the Defense Advanced Research Projects Agency (DARPA) Information Innovation Office (I2O), program: Low Resource Languages for Emergent Incidents (LORELEI), issued by DARPA/I2O under Contract No. HR0011-15-C-0114.

References

  1. Baker M (1985) The mirror principle and morphosyntactic explanation. Linguistic Inquiry 16:373–415Google Scholar
  2. Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, StanfordGoogle Scholar
  3. Bharadwaj A, Mortensen D, Dyer C, Carbonell J (2016) Phonologically aware neural model for named entity recognition in low resource transfer settings. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, Texas, pp 1462–1472Google Scholar
  4. Brown PE, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(1):263–312Google Scholar
  5. Dyer C, Chahuneau V, Smith NA (2013) A simple, fast, and effective reparameterization of IBM Model 2. In: Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Atlanta, Georgia, pp 644–648Google Scholar
  6. Forcada ML, Ginestí-Rosell M, Nordfalk J, O’Regan J, Ortiz-Rojas S, Pérez-Ortiz JA, Sánchez-Martínez F, Ramírez-Sánchez G, Tyers FM (2011) Apertium: a free/open-source platform for rule-based machine translation. Mach Transl 25(2):127–144CrossRefGoogle Scholar
  7. Frost R, Launchbury J (1989) Constructing natural language interpreters in a lazy functional language. Comput J 32:108–121CrossRefGoogle Scholar
  8. Hutton G (1992) Higher-order functions for parsing. J Funct Progr 2:323–343MathSciNetCrossRefzbMATHGoogle Scholar
  9. Hutton G, Meijer E (1988) Monadic parser combinators. J Funct Progr 8:437–444CrossRefzbMATHGoogle Scholar
  10. Lewis MP, Simons GF, Fennig CD (2015) Ethnologue: languages of the world, 18th edn. SIL International, Dallas, TexasGoogle Scholar
  11. Linden K, Silfverberg M, Axelson E, Hardwick S, Pirinen T (2011) HFST-framework for compiling and applying morphologies. Commun Comput Inf Sci 100:67–85Google Scholar
  12. Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. CoRR abs/1301.3781Google Scholar
  13. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates, Inc., pp 3111–3119Google Scholar
  14. Olteanu A, Castillo C, Diaz F, Vieweg S (2014) Crisislex: a lexicon for collecting and filtering microblogged communications in crises. In: Proceedings of the AAAI conference on weblogs and social media (ICWSM’14), Ann Arbor, MI, USAGoogle Scholar
  15. Renduchintala A, Knowles R, Koehn P, Eisner J (2016) Creating interactive macaronic interfaces for language learning. In: Proceedings of ACL-2016 System Demonstrations, Association for Computational Linguistics, Berlin, Germany, pp 133–138Google Scholar
  16. Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou S, Tsujii J (2012) brat: a web-based tool for NLP-assisted text annotation. In: Proceedings of the demonstrations at the 13th conference of the European chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Avignon, France, pp 102–107Google Scholar
  17. Strassel S, Tracey J (2014) LORELEI language packs: data, tools, and resources for technology development in low resource languages. In: LREC 2016: 10th edition of the language resources and evaluation conference, Portoroz, pp 3273–3280Google Scholar
  18. Strassel S, Bies A, Tracey J (2017) Situational awareness for low resource languages: the LORELEI situation frame annotation task. In: SMERP2017: first international workshop on exploitation of social media for emergency relief and preparedness, AberdeenGoogle Scholar
  19. Temnikova I, Castillo C, Vieweg S (2015) Emterms 1.0: a terminological resource for crisis tweets. In: Proceedings of the international conference on information systems for crisis response and management (ISCRAM’15), Kristiansand, NorwayGoogle Scholar
  20. Washington JN, Ipasov IS, Tyers FM (2014) Finite-state morphological transducers for three Kypchak languages. In: Proceedings of the 9th conference on language resources and evaluation, LREC2014Google Scholar
  21. Xu R, Yang Y, Liu H, Hsi A (2016) Cross-lingual text classification via model translation with limited dictionaries. In: Proceedings of the 25th ACM international on conference on information and knowledge management, ACM, pp 95–104Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  • Patrick Littell
    • 1
    Email author
  • Tian Tian
    • 1
  • Ruochen Xu
    • 1
  • Zaid Sheikh
    • 1
  • David Mortensen
    • 1
  • Lori Levin
    • 1
  • Francis Tyers
    • 2
  • Hiroaki Hayashi
    • 1
  • Graham Horwood
    • 3
  • Steve Sloto
    • 1
  • Emily Tagtow
    • 1
  • Alan Black
    • 1
  • Yiming Yang
    • 1
  • Teruko Mitamura
    • 1
  • Eduard Hovy
    • 1
  1. 1.Language Technologies InstituteCarnegie Mellon UniversityPittsburghUSA
  2. 2.School of LinguisticsNational Research University<<Higher School of Economics>>MoscowRussia
  3. 3.Leidos, Inc.RestonUSA

Personalised recommendations