The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approach

Abstract

The LoReHLT16 evaluation challenged participants to extract Situation Frames (SFs)—structured descriptions of humanitarian need situations—from monolingual Uyghur text. The ARIEL-CMU SF detector combines two classification paradigms, a manually curated keyword-spotting system and a machine learning classifier. These were applied by translating the models on a per-feature basis, rather than translating the input text. The resulting combined model provides the accuracy of human insight with the generality of machine learning, and is relatively tractable to human analysis and error correction. Other factors contributing to success were automatic dictionary creation, the use of phonetic transcription, detailed, hand-written morphological analysis, and naturalistic glossing for error analysis by humans. The ARIEL-CMU SF pipeline produced the top-scoring LoReHLT16 situation frame detection systems for the metrics SFType, SFType+Place+Need, SFType+Place+Relief, and SFType+Place+Urgency, at each of the three checkpoints.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

Notes

  1. 1.

    www.nist.gov/multimodal-information-group/lorehlt-2016-evaluations.

  2. 2.

    It should be noted, however, that it does not vary monotonically with precision (or recall or F1), as the SFE and precision values in Tables 5 and 6 will show.

  3. 3.

    http://cc-cedict.org.

  4. 4.

    http://reliefweb.int.

  5. 5.

    http://crisis.net.

  6. 6.

    www.ushahidi.com.

  7. 7.

    www.opensource.gov.

  8. 8.

    www.crowdflower.com.

  9. 9.

    http://github.com/dmort27/epitran.

  10. 10.

    http://svn.code.sf.net/p/apertium/svn/incubator/apertium-uig.

  11. 11.

    This is thus not lemmatization per se—the lemma of all of these is qatar, with -liq being a suffix—but rather an attempt to find the most appropriate corresponding word in the lexicons, whether it is a lemma or not.

  12. 12.

    http://cldr.unicode.org.

  13. 13.

    http://code.google.com/archive/p/word2vec/.

  14. 14.

    Compared to our SFType detection systems, the features in our English Status-detection decision trees focused comparatively more on functional words (e.g., words more often indicative of tense, aspect, or modality) than content words. We did not believe these words would translate well using a lexical feature-translation approach, so we did not submit any of these results as part of a primary submission.

  15. 15.

    The error correction was performed on both models, but in the keyword model it was more straightforward to fix (i.e., by simply removing the keyword) and to know that the fix had worked.

References

  1. Baker M (1985) The mirror principle and morphosyntactic explanation. Linguistic Inquiry 16:373–415

    Google Scholar 

  2. Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, Stanford

    Google Scholar 

  3. Bharadwaj A, Mortensen D, Dyer C, Carbonell J (2016) Phonologically aware neural model for named entity recognition in low resource transfer settings. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, Texas, pp 1462–1472

  4. Brown PE, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(1):263–312

  5. Dyer C, Chahuneau V, Smith NA (2013) A simple, fast, and effective reparameterization of IBM Model 2. In: Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Atlanta, Georgia, pp 644–648

  6. Forcada ML, Ginestí-Rosell M, Nordfalk J, O’Regan J, Ortiz-Rojas S, Pérez-Ortiz JA, Sánchez-Martínez F, Ramírez-Sánchez G, Tyers FM (2011) Apertium: a free/open-source platform for rule-based machine translation. Mach Transl 25(2):127–144

    Article  Google Scholar 

  7. Frost R, Launchbury J (1989) Constructing natural language interpreters in a lazy functional language. Comput J 32:108–121

    Article  Google Scholar 

  8. Hutton G (1992) Higher-order functions for parsing. J Funct Progr 2:323–343

    MathSciNet  Article  MATH  Google Scholar 

  9. Hutton G, Meijer E (1988) Monadic parser combinators. J Funct Progr 8:437–444

    Article  MATH  Google Scholar 

  10. Lewis MP, Simons GF, Fennig CD (2015) Ethnologue: languages of the world, 18th edn. SIL International, Dallas, Texas

  11. Linden K, Silfverberg M, Axelson E, Hardwick S, Pirinen T (2011) HFST-framework for compiling and applying morphologies. Commun Comput Inf Sci 100:67–85

    Google Scholar 

  12. Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. CoRR abs/1301.3781

  13. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates, Inc., pp 3111–3119

  14. Olteanu A, Castillo C, Diaz F, Vieweg S (2014) Crisislex: a lexicon for collecting and filtering microblogged communications in crises. In: Proceedings of the AAAI conference on weblogs and social media (ICWSM’14), Ann Arbor, MI, USA

  15. Renduchintala A, Knowles R, Koehn P, Eisner J (2016) Creating interactive macaronic interfaces for language learning. In: Proceedings of ACL-2016 System Demonstrations, Association for Computational Linguistics, Berlin, Germany, pp 133–138

  16. Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou S, Tsujii J (2012) brat: a web-based tool for NLP-assisted text annotation. In: Proceedings of the demonstrations at the 13th conference of the European chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Avignon, France, pp 102–107

  17. Strassel S, Tracey J (2014) LORELEI language packs: data, tools, and resources for technology development in low resource languages. In: LREC 2016: 10th edition of the language resources and evaluation conference, Portoroz, pp 3273–3280

  18. Strassel S, Bies A, Tracey J (2017) Situational awareness for low resource languages: the LORELEI situation frame annotation task. In: SMERP2017: first international workshop on exploitation of social media for emergency relief and preparedness, Aberdeen

  19. Temnikova I, Castillo C, Vieweg S (2015) Emterms 1.0: a terminological resource for crisis tweets. In: Proceedings of the international conference on information systems for crisis response and management (ISCRAM’15), Kristiansand, Norway

  20. Washington JN, Ipasov IS, Tyers FM (2014) Finite-state morphological transducers for three Kypchak languages. In: Proceedings of the 9th conference on language resources and evaluation, LREC2014

  21. Xu R, Yang Y, Liu H, Hsi A (2016) Cross-lingual text classification via model translation with limited dictionaries. In: Proceedings of the 25th ACM international on conference on information and knowledge management, ACM, pp 95–104

Download references

Acknowledgements

This project was sponsored by the Defense Advanced Research Projects Agency (DARPA) Information Innovation Office (I2O), program: Low Resource Languages for Emergent Incidents (LORELEI), issued by DARPA/I2O under Contract No. HR0011-15-C-0114.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Patrick Littell.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Littell, P., Tian, T., Xu, R. et al. The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approach. Machine Translation 32, 105–126 (2018). https://doi.org/10.1007/s10590-017-9205-3

Download citation

Keywords

  • LoReHLT
  • LORELEI
  • Situation frames
  • Information extraction