Cross-Lingual Classification of Crisis Data

  • Prashant Khare
  • Grégoire Burel
  • Diana Maynard
  • Harith Alani
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11136)


Many citizens nowadays flock to social media during crises to share or acquire the latest information about the event. Due to the sheer volume of data typically circulated during such events, it is necessary to be able to efficiently filter out irrelevant posts, thus focusing attention on the posts that are truly relevant to the crisis. Current methods for classifying the relevance of posts to a crisis or set of crises typically struggle to deal with posts in different languages, and it is not viable during rapidly evolving crisis situations to train new models for each language. In this paper we test statistical and semantic classification approaches on cross-lingual datasets from 30 crisis events, consisting of posts written mainly in English, Spanish, and Italian. We experiment with scenarios where the model is trained on one language and tested on another, and where the data is translated to a single language. We show that the addition of semantic features extracted from external knowledge bases improve accuracy over a purely statistical model.


Semantics Cross-lingual Multilingual Crisis informatics Tweet classification 



This work has received support from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 687847 (COMRADES).


  1. 1.
    Araujo, M., Reis, J., Pereira, A., Benevenuto, F.: An evaluation of machine translation for multilingual sentence-level sentiment analysis. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 1140–1145. ACM (2016)Google Scholar
  2. 2.
    Burel, G., Saif, H., Alani, H.: Semantic wide and deep learning for detecting crisis-information categories on social media. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 138–155. Springer, Cham (2017). Scholar
  3. 3.
    Burel, G., Saif, H., Fernandez, M., Alani, H.: On semantics and deep learning for event detection in crisis situations. In: Workshop on Semantic Deep Learning (SemDeep) at ESWC (2017)Google Scholar
  4. 4.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, New York (2000)CrossRefGoogle Scholar
  5. 5.
    Derczynski, L., Meesters, K., Bontcheva, K., Maynard, D.: Helping crisis responders find the informative needle in the tweet haystack. arXiv preprint arXiv:1801.09633 (2018)
  6. 6.
    Deriu, J., et al.: Leveraging large amounts of weakly supervised data for multi-language sentiment classification. In: Proceedings of the 26th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp. 1045–1052 (2017)Google Scholar
  7. 7.
    Gao, H., Barbier, G., Goolsby, R.: Harnessing the crowdsourcing power of social media for disaster relief. IEEE Intell. Syst. 26(3), 10–14 (2011)CrossRefGoogle Scholar
  8. 8.
    Imran, M., Elbassuoni, S., Castillo, C., Diaz, F., Meier, P.: Extracting information nuggets from disaster-related messages in social media. In: ISCRAM (2013)Google Scholar
  9. 9.
    Imran, M., Elbassuoni, S., Castillo, C., Diaz, F., Meier, P.: Practical extraction of disaster-relevant information from social media. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1021–1024. ACM (2013)Google Scholar
  10. 10.
    Karimi, S., Yin, J., Paris, C.: Classifying microblogs for disasters. In: Proceedings of the 18th Australasian Document Computing Symposium, pp. 26–33. ACM (2013)Google Scholar
  11. 11.
    Khare, P., Burel, G., Alani, H.: Classifying crises-information relevancy with semantics. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 367–383. Springer, Cham (2018). Scholar
  12. 12.
    Khare, P., Fernandez, M., Alani, H.: Statistical semantic classification of crisis information. In: Workshop on HSSUES at ISWC (2017)Google Scholar
  13. 13.
    Li, R., Lei, K.H., Khadiwala, R., Chang, K.C.C.: TEDAS: a twitter-based event detection and analysis system. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 1273–1276. IEEE (2012)Google Scholar
  14. 14.
    Mihalcea, R., Banea, C., Wiebe, J.: Learning multilingual subjective language via cross-lingual projections. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 976–983 (2007)Google Scholar
  15. 15.
    Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Olteanu, A., Vieweg, S., Castillo, C.: What to expect when the unexpected happens: social media communications across crises. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work and Social Computing, pp. 994–1009. ACM (2015)Google Scholar
  17. 17.
    Power, R., Robinson, B., Colton, J., Cameron, M.: Emergency situation awareness: twitter case studies. In: Hanachi, C., Bénaben, F., Charoy, F. (eds.) ISCRAM-med 2014. LNBIP, vol. 196, pp. 218–231. Springer, Cham (2014). Scholar
  18. 18.
    Rogstadius, J., Vukovic, M., Teixeira, C., Kostakos, V., Karapanos, E., Laredo, J.A.: CrisisTracker: crowdsourced social media curation for disaster awareness. IBM J. Res. Dev. 57(5), 4-1–4-13 (2013)CrossRefGoogle Scholar
  19. 19.
    Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860. ACM (2010)Google Scholar
  20. 20.
    Severyn, A., Moschitti, A.: UNITN: training deep convolutional neural network for twitter sentiment classification. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 464–469 (2015)Google Scholar
  21. 21.
    Stowe, K., Paul, M.J., Palmer, M., Palen, L., Anderson, K.: Identifying and categorizing disaster-related tweets. In: Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media, pp. 1–6 (2016)Google Scholar
  22. 22.
    Tonon, A., Cudré-Mauroux, P., Blarer, A., Lenders, V., Motik, B.: ArmaTweet: detecting events by semantic tweet analysis. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10250, pp. 138–153. Springer, Cham (2017). Scholar
  23. 23.
    Vieweg, S., Hughes, A.L., Starbird, K., Palen, L.: Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1079–1088. ACM (2010)Google Scholar
  24. 24.
    Wick, M., Kanani, P., Pocock, A.C.: Minimally-constrained multilingual embeddings via artificial code-switching. In: AAAI, pp. 2849–2855 (2016)Google Scholar
  25. 25.
    Zhang, S., Vucetic, S.: Semi-supervised discovery of informative tweets during the emerging disasters. arXiv preprint arXiv:1610.03750 (2016)

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Knowledge Media InstituteThe Open UniversityMilton KeynesUK
  2. 2.Department of Computer ScienceUniversity of SheffieldSheffieldUK

Personalised recommendations