Skip to main content
Log in

Disaster damage assessment from the tweets using the combination of statistical features and informative words

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Nowadays, Twitter has become more popular among the users for communicating the information, especially during disaster. Identifying tweets related to the target event during disaster is a challenging task. Many prior studies discussed situational and non-situational information related to disaster. The detection of tweets related to damage assessment is a very difficult task in social media because it is a subset of situational information. One of the following drawbacks has been present in the existing damage assessment works: (1) focused only on infrastructure damage but does not include human damage in the assessment, (2) focused only on social media image data for damage assessment and (3) focused only on regional language tweets. To overcome these issues, Stacking-based Ensemble using Statistical features and Informative Words (SESIW) is proposed for detecting the tweets related to damage assessment. It uses proposed features, namely frequency of hashtags, user mentions, wh-words, URLs, count of numerals and informative words. Informative words are mined using term frequency and inverse document frequency technique. The SESIW method is tested on different Twitter disaster datasets, and it outperforms the baseline SVM with Bag-of-Words model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Buckinx W, Van den Poel D (2005) Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting. Eur J Oper Res 164(1):252–268

    Article  Google Scholar 

  • Caragea C, Silvescu A, Tapia AH (2016) Identifying informative messages in disaster events using convolutional neural networks. In: International conference on information systems for crisis response and management

  • Cresci S, Tesconi M, Cimino A, Dell’Orletta F (2015) A linguistically-driven approach to cross-event damage assessment of natural disasters from social media messages. In: Proceedings of the 24th international conference on world wide web, ACM, pp 1195–1200

  • Dietterich TG et al (2000) Ensemble methods in machine learning. Mult Classif Syst 1857:1–15

    Article  Google Scholar 

  • Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York

    MATH  Google Scholar 

  • Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory, Springer, pp 23–37

  • Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: Icml, vol 96, Bari, Italy, pp 148–156

  • Ghosh S, Ghosh K, Chakraborty T, Ganguly D, Jones G, Moens MF (2017) First international workshop on exploitation of social media for emergency relief and preparedness (SMERP). In: Jose JM et al (eds) Proceedings of the 39th European conference on IR research—ECIR 2017, LNCS 10193, ECIR 2017, Springer International Publishing AG, pp 779–783

  • Huang Y, Kechadi T (2013) An effective hybrid learning system for telecommunication churn prediction. Expert Syst Appl 40(14):5635–5647

    Article  Google Scholar 

  • Imran M, Castillo C, Lucas J, Meier P, Vieweg S (2014) Aidr: artificial intelligence for disaster response. In: Proceedings of the 23rd international conference on world wide web, ACM, pp 159–162

  • Imran M, Elbassuoni S, Castillo C, Diaz F, Meier P (2013a) Extracting information nuggets from disaster-related messages in social media. In: Iscram

  • Imran M, Elbassuoni S, Castillo C, Diaz F, Meier P (2013b) Practical extraction of disaster-relevant information from social media. In: Proceedings of the 22nd international conference on world wide web, ACM, pp 1021–1024

  • Imran M, Mitra P, Castillo C (2016) Twitter as a lifeline: human-annotated twitter corpora for NLP of crisis-related messages. arXiv preprint arXiv:1605.05894, pp 1638–1643

  • Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37

    Article  Google Scholar 

  • Lunetta KL, Hayward LB, Segal J, Van Eerdewegh P (2004) Screening large-scale association study data: exploiting interactions using random forests. BMC Genet 5(1):32

    Article  Google Scholar 

  • Nazer TH, Morstatter F, Dani H, Liu H (2016) Finding requests in social media for disaster relief. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), IEEE, pp 1410–1413

  • Nguyen DT, Mannai Kamela Ali Al, Joty S, Sajjad H, Imran M, Mitra P (2016) Rapid classification of crisis-related data on social networks using convolutional neural networks. arXiv preprint arXiv:1608.03902

  • Nguyen DT, Ofli F, Imran M, Mitra P (2017) Damage assessment from social media imagery data during disasters. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining, ACM, pp 569–576

  • Niculescu-MA, Perlich C, Swirszcz G, Sindhwani V, Liu Y, Melville P, Wang D, Xiao J, Hu J, Singh M et al (2009) Winning the KDD cup orange challenge with ensemble selection. In: Proceedings of the 2009 international conference on KDD-Cup, vol 7, JMLR. org, pp 23–34

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Rudra Koustav, Ganguly Niloy, Goyal Pawan, Ghosh Saptarshi (2018) Extracting and summarizing situational information from the twitter social media during disasters. ACM Trans Web (TWEB) 12(3):17

    Google Scholar 

  • Rudra K, Ghosh S, Ganguly N, Goyal P, Ghosh S(2015) Extracting situational information from microblogs during disaster events: a classification-summarization approach. In: Proceedings of the 24th ACM international on conference on information and knowledge management, ACM, pp 583–592

  • Sakaki T, Okazaki M, Matsuo Y (2013) Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans Knowl Data Eng 25(4):919–931

    Article  Google Scholar 

  • Schapire RE (1999) A brief introduction to boosting. In: IJCAI, vol 99, pp 1401–1406

  • Schwender H, Zucknick M, Ickstadt K, Bolt HM, GENICA network et al (2004) A pilot study on the application of statistical classification procedures to molecular epidemiological data. Toxicol Lett 151(1):291–299

  • Sreenivasulu M, Sridevi M (2017) Mining informative words from the tweets for detecting the resources during disaster. In: International conference on mining intelligence and knowledge exploration, Springer, pp 348–358

  • Verma S, Vieweg S, Corvey WJ, Palen L, Martin JH, Palmer M, Schram A, Anderson KM (2011) Natural language processing to the rescue? Extracting“ situational awareness” tweets during mass emergency. In: Fifth international AAAI conference on weblogs and social media, Citeseer, pp 385–392

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sreenivasulu Madichetty.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Madichetty, S., Sridevi, M. Disaster damage assessment from the tweets using the combination of statistical features and informative words. Soc. Netw. Anal. Min. 9, 42 (2019). https://doi.org/10.1007/s13278-019-0579-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-019-0579-5

Keywords

Navigation