Abstract
Nowadays, Twitter has become more popular among the users for communicating the information, especially during disaster. Identifying tweets related to the target event during disaster is a challenging task. Many prior studies discussed situational and non-situational information related to disaster. The detection of tweets related to damage assessment is a very difficult task in social media because it is a subset of situational information. One of the following drawbacks has been present in the existing damage assessment works: (1) focused only on infrastructure damage but does not include human damage in the assessment, (2) focused only on social media image data for damage assessment and (3) focused only on regional language tweets. To overcome these issues, Stacking-based Ensemble using Statistical features and Informative Words (SESIW) is proposed for detecting the tweets related to damage assessment. It uses proposed features, namely frequency of hashtags, user mentions, wh-words, URLs, count of numerals and informative words. Informative words are mined using term frequency and inverse document frequency technique. The SESIW method is tested on different Twitter disaster datasets, and it outperforms the baseline SVM with Bag-of-Words model.
Similar content being viewed by others
References
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Buckinx W, Van den Poel D (2005) Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting. Eur J Oper Res 164(1):252–268
Caragea C, Silvescu A, Tapia AH (2016) Identifying informative messages in disaster events using convolutional neural networks. In: International conference on information systems for crisis response and management
Cresci S, Tesconi M, Cimino A, Dell’Orletta F (2015) A linguistically-driven approach to cross-event damage assessment of natural disasters from social media messages. In: Proceedings of the 24th international conference on world wide web, ACM, pp 1195–1200
Dietterich TG et al (2000) Ensemble methods in machine learning. Mult Classif Syst 1857:1–15
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory, Springer, pp 23–37
Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: Icml, vol 96, Bari, Italy, pp 148–156
Ghosh S, Ghosh K, Chakraborty T, Ganguly D, Jones G, Moens MF (2017) First international workshop on exploitation of social media for emergency relief and preparedness (SMERP). In: Jose JM et al (eds) Proceedings of the 39th European conference on IR research—ECIR 2017, LNCS 10193, ECIR 2017, Springer International Publishing AG, pp 779–783
Huang Y, Kechadi T (2013) An effective hybrid learning system for telecommunication churn prediction. Expert Syst Appl 40(14):5635–5647
Imran M, Castillo C, Lucas J, Meier P, Vieweg S (2014) Aidr: artificial intelligence for disaster response. In: Proceedings of the 23rd international conference on world wide web, ACM, pp 159–162
Imran M, Elbassuoni S, Castillo C, Diaz F, Meier P (2013a) Extracting information nuggets from disaster-related messages in social media. In: Iscram
Imran M, Elbassuoni S, Castillo C, Diaz F, Meier P (2013b) Practical extraction of disaster-relevant information from social media. In: Proceedings of the 22nd international conference on world wide web, ACM, pp 1021–1024
Imran M, Mitra P, Castillo C (2016) Twitter as a lifeline: human-annotated twitter corpora for NLP of crisis-related messages. arXiv preprint arXiv:1605.05894, pp 1638–1643
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37
Lunetta KL, Hayward LB, Segal J, Van Eerdewegh P (2004) Screening large-scale association study data: exploiting interactions using random forests. BMC Genet 5(1):32
Nazer TH, Morstatter F, Dani H, Liu H (2016) Finding requests in social media for disaster relief. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), IEEE, pp 1410–1413
Nguyen DT, Mannai Kamela Ali Al, Joty S, Sajjad H, Imran M, Mitra P (2016) Rapid classification of crisis-related data on social networks using convolutional neural networks. arXiv preprint arXiv:1608.03902
Nguyen DT, Ofli F, Imran M, Mitra P (2017) Damage assessment from social media imagery data during disasters. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining, ACM, pp 569–576
Niculescu-MA, Perlich C, Swirszcz G, Sindhwani V, Liu Y, Melville P, Wang D, Xiao J, Hu J, Singh M et al (2009) Winning the KDD cup orange challenge with ensemble selection. In: Proceedings of the 2009 international conference on KDD-Cup, vol 7, JMLR. org, pp 23–34
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Rudra Koustav, Ganguly Niloy, Goyal Pawan, Ghosh Saptarshi (2018) Extracting and summarizing situational information from the twitter social media during disasters. ACM Trans Web (TWEB) 12(3):17
Rudra K, Ghosh S, Ganguly N, Goyal P, Ghosh S(2015) Extracting situational information from microblogs during disaster events: a classification-summarization approach. In: Proceedings of the 24th ACM international on conference on information and knowledge management, ACM, pp 583–592
Sakaki T, Okazaki M, Matsuo Y (2013) Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans Knowl Data Eng 25(4):919–931
Schapire RE (1999) A brief introduction to boosting. In: IJCAI, vol 99, pp 1401–1406
Schwender H, Zucknick M, Ickstadt K, Bolt HM, GENICA network et al (2004) A pilot study on the application of statistical classification procedures to molecular epidemiological data. Toxicol Lett 151(1):291–299
Sreenivasulu M, Sridevi M (2017) Mining informative words from the tweets for detecting the resources during disaster. In: International conference on mining intelligence and knowledge exploration, Springer, pp 348–358
Verma S, Vieweg S, Corvey WJ, Palen L, Martin JH, Palmer M, Schram A, Anderson KM (2011) Natural language processing to the rescue? Extracting“ situational awareness” tweets during mass emergency. In: Fifth international AAAI conference on weblogs and social media, Citeseer, pp 385–392
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Madichetty, S., Sridevi, M. Disaster damage assessment from the tweets using the combination of statistical features and informative words. Soc. Netw. Anal. Min. 9, 42 (2019). https://doi.org/10.1007/s13278-019-0579-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-019-0579-5