Information Systems Frontiers

, Volume 20, Issue 5, pp 909–923 | Cite as

Emergency Vocabulary

  • Dávid Márk NemeskeyEmail author
  • András Kornai


For disaster preparedness, a key aspect of the work is the identification, ahead of time, of the vocabulary of emergency messages. Here we describe how static repositories of traditional news reports can be rapidly exploited to yield disaster- or accident-implicated words and named entities.


Information retrieval Emergency Vocabulary 



We thank Stephanie Strassel (LDC) for her support and encouragment, Graham Horwood (Leidos) for preparing some of the data used in the evaluation, and the anonymous referees for valuable suggestions that led to major improvements. Special thanks to Judit Ács who produced the original BV list that was the starting point of the entire work.


  1. Ács, J., & Kornai, A. (2016). Evaluating embeddings on dictionary-based similarity. In Levy, O. (Ed.) Proceedings of the first workshop on evaluating vector-space representations for NLP (RepEval) (pp. 78–82).Google Scholar
  2. Ács, J., Pajkossy, K., Kornai, A. (2013). Building basic vocabulary across 40 languages. In Proceedings of the sixth workshop on building and using comparable corpora (pp. 52–58). Sofia: Association for Computational Linguistics.Google Scholar
  3. Basu, M., Roy, A., Ghosh, K., Bandyopadhyay, S., Ghosh, S. (2017). Microblog retrieval in a disaster situation: a new test collection for evaluation. In Moens, M.F., Jones, G., Ghosh, S., Ganguly, D., Chakraborty, T., Ghosh, K. (Eds.) Proceedings of SMERP 2017.Google Scholar
  4. Borbély, G., Makrai, M., Nemeskey, D.M., Kornai, A. (2016). Evaluating multi-sense embeddings for semantic resolution monolingually and in word translation. In Proceedings of the 1st workshop on evaluating vector-space representations for NLP (pp. 83–89). Association for Computational Linguistics.
  5. Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M. (1998). Min-wise independent permutations. In Proceedings of the thirtieth annual ACM symposium on theory of computing (pp. 327–336). ACM.Google Scholar
  6. Buckley, C., Singhal, A., Mita, M. (1995). New retrieval approaches using SMART: TREC 4. In Proceedings of TREC, (Vol. 4 pp. 25–48).Google Scholar
  7. Ester, M., Kriegel, H.P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, (Vol. 96 pp. 226–231).Google Scholar
  8. Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3, 1289–1305.Google Scholar
  9. Gallagher, R., Reing, K., Kale, D., Steeg, G.V. (2017). Anchored correlation explanation: topic modeling with minimal domain knowledge. Transactions of the Association for Computational Linguistics, 5, 529–542. Scholar
  10. Gionis, A., Indyk, P., Motwani, R., et al. (1999). Similarity search in high dimensions via hashing. In VLDB, (Vol. 6 pp. 518–529).Google Scholar
  11. Han, L., Kashyap, A.L., Finin, T., Mayfield, J., Weese, J. (2013). UMBC_EBIQUITY-CORE: semantic textual similarity systems. In Second joint conference on lexical and computational semantics (*SEM) (pp. 44–52). Atlanta: Association for Computational Linguistics.Google Scholar
  12. Hashimoto, K., Kontonatsios, G., Miwa, M., Ananiadou, S. (2016). Topic detection using paragraph vectors to support active learning in systematic reviews. Journal of Biomedical Informatics, 62, 59–65.CrossRefGoogle Scholar
  13. Imran, M. (2017). Time-critical analysis of evolving social media streams during sudden-onset events. In Moens, M.F., Jones, G., Ghosh, S., Ganguly, D., Chakraborty, T., Ghosh, K. (Eds.) Proceedings of SMERP 2017.Google Scholar
  14. Imran, M., Castillo, C., Lucas, J., Meier, P., Vieweg, S. (2014). AIDR: artificial intelligence for disaster response. In Proceedings of WWW (companion) IW3C2 (pp. 159–162).Google Scholar
  15. Imran, M., Castillo, C., Diaz, F., Vieweg, S. (2015). Processing social media messages in mass emergency: a survey. ACM Computing Surveys (CSUR), 47, 1–38.CrossRefGoogle Scholar
  16. Indyk, P., & Motwani, R. (1998). Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on theory of computing (pp. 604–613). ACM.Google Scholar
  17. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422–446.CrossRefGoogle Scholar
  18. Kohlschütter, C., Fankhauser, P., Nejdl, W. (2010). Boilerplate detection using shallow text features. In Proceedings of the third ACM international conference on Web search and data mining (pp. 441–450). ACM.Google Scholar
  19. Kornai, A. (2010). The algebra of lexical semantics. In Ebert, C., Jäger, G., Michaelis, J. (Eds.) Proceedings of the 11th mathematics of language workshop, LNAI 6149 (pp. 174–199). Springer.Google Scholar
  20. Kornai, A. (2018). Semantics. Springer.
  21. Kornai, A., Krellenstein, M., Mulligan, M., Twomey, D., Veress, F., Wysoker, A. (2003). Classifying the Hungarian Web. In Copestake, A., & Hajic, J. (Eds.) Proceedings of the EACL (pp. 203–210).Google Scholar
  22. Lewis, D., Yang, Y., Rose, T., Li, F. (2004). RCV1: a new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361–397.Google Scholar
  23. Lui, M., & Baldwin, T. (2012). an off-the-shelf language identification tool. In Proceedings of the ACL 2012 system demonstrations (pp. 25–30). Association for Computational Linguistics.Google Scholar
  24. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Association for computational linguistics (ACL) system demonstrations (pp. 55–60).
  25. Olteanu, A., Castillo, C., Diaz, F., Vieweg, S. (2014). CrisisLex: a lexicon for collecting and filtering microblogged communications in crises. In Proceedings of the AAAI conference on weblogs and social media (ICWSM’14). AAAI Press.Google Scholar
  26. Pennington, J., Socher, R., Manning, C. (2014). Glove: global vectors for word representation. In Conference on empirical methods in natural language processing (EMNLP 2014).Google Scholar
  27. Phuvipadawat, S., & Murata, T. (2011). Detecting a multi-level content similarity from microblogs based on community structures and named entities. Journal of Emerging Technologies in Web Intelligence, 3(1), 11–19.CrossRefGoogle Scholar
  28. Soni, R., & Pal, S. (2017). Microblog retrieval for disaster relief: how to create ground truths? In Moens, M.F., Jones, G., Ghosh, S., Ganguly, D., Chakraborty, T., Ghosh, K. (Eds.) Proceedings of SMERP 2017.Google Scholar
  29. Spärck-Jones, K., Walker, S., Robertson, S.E. (2000). A probabilistic model of information retrieval: development and comparative experiments. Information Processing & Management, 36(6), 779–840.CrossRefGoogle Scholar
  30. Strassel, S., Bies, A., Tracey, J. (2017). Situational awareness for low resource languages: the LORELEI situation frame annotation task. In Moens, M.F., Jones, G., Ghosh, S., Ganguly, D., Chakraborty, T., Ghosh, K. (Eds.) Proceedings of SMERP 2017.Google Scholar
  31. TRADE Emergency Management Issues SIG Glossary Task Force. (1999). Glossary and acronyms of emergency management terms, 3rd edn. Office of Emergency Management, U.S. Department of Energy.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.HAS Institute of Computer ScienceBudapestHungary

Personalised recommendations