Abstract
The rapid growth of Internet-based communication technologies in the form of Social Media (SM) and associated mobile applications has enabled people to share information related to disaster events in “real-time” as they unfold. People are increasingly using SM platforms to report situational information during disasters, such as critical needs, dead or injured people, and property damage. Despite their usefulness, the majority of this pertinent data is not available to humanitarian organisations during emergencies, mainly due to several data processing and data quality issues. The proliferation of online news media has also led to the exchange of a massive amount of information during disasters, mostly validated by official sources. The integration of SM data with online news reports can provide filtered information while adding more details on the progress of an event than is already published in online news. This research project introduces Disaster Event Extraction System (DEES), a real-time system for extracting disaster events from both online news and tweets. DEES is evaluated on a dataset collected during the Nepal earthquake in 2015. Our results suggest that integrating both SM and news text data improves the event extraction system’s performance compared to using SM data alone. A demonstration of DEES is available at: https://mu-clab.github.io/.
Similar content being viewed by others
Data availability
Not applicable.
Notes
rnz news, https://www.rnz.co.nz/.
nzherald news, https://www.nzherald.co.nz/.
stuff news, https://www.stuff.co.nz/.
Python tweepy library version 3.9.0, https://pypi.org/project/tweepy/.
Python Spacy version 2.3.2, https://spacy.io/.
Python newspaper3k, https://newspaper.readthedocs.io/en/latest/.
NER implementation in Spacy, https://spacy.io/api/annotation#named-entities.
Nominatim version 3.5.1, https://github.com/osm-search/Nominatim.
OpenStreetMap,https://www.openstreetmap.org/#map=2/-41.2/-6.6.
geopy 2.2.0 https://pypi.org/project/geopy/.
CrisisNLPdataset, https://crisisnlp.qcri.org/.
References
Abel F, Hauff C, Houben G-J, Stronkman R, Tao K (2012) Twitcident: fighting fire with information from social web streams. In: Proceedings of the 21st international conference on World Wide Web, pp 305–308. https://doi.org/10.1145/2187980.2188035
Ahmad T, Ramsay A (2016) Linking tweets to news: is all news of interest? In: International conference on artificial intelligence: methodology, systems, and applications, Springer, pp 151–161 . https://doi.org/10.1007/978-3-319-44748-3_15
Alam F, Joty SR, Imran M (2018) Graph based semi-supervised learning with convolution neural networks to classify crisis related tweets. In: Proceedings of the twelfth international conference on Web and Social Media, ICWSM 2018, Stanford, California, USA, June 25–28, 2018, pp 556–559. AAAI Press. https://aaai.org/ocs/index.php/ICWSM/ICWSM18/paper/view/17815
Alam F, Joty S, Imran M (2018) Domain adaptation with adversarial training and graph embeddings. arXiv preprint arXiv:1805.05151. https://doi.org/10.18653/v1/P18-1099
Algiriyage N, Sampath R, Prasanna R, Doyle EE, Stock K, Johnston D (2021) Identifying disaster-related tweets: a large-scale detection model comparison
Alomari E, Mehmood R, Katib I (2019) Road traffic event detection using twitter data, machine learning, and apache spark. In: 2019 IEEE SmartWorld, ubiquitous intelligence & computing, advanced & trusted computing, scalable computing & communications, cloud & big data computing, internet of people and smart city innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), IEEE, pp 1888–1895 . https://doi.org/10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00332
Alsaedi N, Burnap P, Rana O (2017) Can we predict a riot? disruptive event detection using twitter. ACM Trans Internet Technol (TOIT) 17(2):1–26. https://doi.org/10.1145/2996183
Ashktorab Z, Brown C, Nandi M, Culotta A (2014) Tweedr: mining twitter to inform disaster response. In: ISCRAM, pp 269–272
Bhoi A, Pujari SP, Balabantaray RC (2020) A deep learning-based social media text analysis framework for disaster resource management. Soc Netw Anal Min 10(1):78. https://doi.org/10.1007/s13278-020-00692-1
Caragea C, McNeese NJ, Jaiswal AR, Traylor G, Kim H-W, Mitra P, Wu D, Tapia AH, Giles CL, Jansen BJ et al (2011) Classifying text messages for the haiti earthquake. In: ISCRAM . Citeseer
Colic N, Rinaldi F (2019) Improving spacy dependency annotation and pos tagging web service using independent NER services. Genom Inform. https://doi.org/10.5808/GI.2019.17.2.e21
Cretulescu RG, Morariu D, Breazu M, Volovici D (2019) Dbscan algorithm for document clustering. Int J Adv Stat IT &C Econ Life Sci. https://doi.org/10.2478/ijasitels-2019-0007
Danovitch J (2020) Linking social media posts to news with siamese transformers. CoRR arXiv:abs/2001.03303
Dhavase N, Bagade A (2014) Location identification for crime & disaster events by geoparsing twitter. In: International conference for convergence for technology-2014, IEEE, pp 1–3
Endsley MR (1995) Toward a theory of situation awareness in dynamic systems. Hum Fact 37(1):32–64. https://doi.org/10.1518/001872095779049543
Ester M, Kriegel H-P, Sander J, Xu X, et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol. 96, pp 226–231. http://www.aaai.org/Library/KDD/1996/kdd96-037.php
Guo W, Li H, Ji H, Diab M (2013) Linking tweets to news: a framework to enrich short text data in social media. In: Proceedings of the 51st annual meeting of the association for computational linguistics, Vol. 1, Long Papers, pp 239–249
Gupta D, Strötgen J, Berberich K (2016) Eventminer: mining events from annotated documents. In: Proceedings of the 2016 ACM international conference on the theory of information retrieval, pp 261–270. https://doi.org/10.1145/2970398.2970411
Ha H, Hwang B-Y (2016) Keyword filtering about disaster and the method of detecting area in detecting real-time event using twitter. KIPS Trans Softw Data Eng 5(7):345–350
Hagras M, Hassan G, Farag N (2017) Towards natural disasters detection from twitter using topic modelling. In: 2017 European conference on electrical engineering and computer science (EECS), IEEE, pp 272–279 . https://doi.org/10.1109/EECS.2017.57
Hamborg F, Breitinger C, Gipp B (2019) Giveme5w1h: A universal system for extracting main events from news articles. In: Özgöbek, Ö., Kille, B., Gulla, J.A., Lommatzsch, A. (eds.) Proceedings of the 7th International workshop on news recommendation and analytics in conjunction with 13th ACM conference on recommender systems, INRA@RecSys 2019, Copenhagen, Denmark, September 20, 2019. CEUR Workshop Proceedings, vol. 2554, pp 35–43. CEUR-WS.org. http://ceur-ws.org/Vol-2554/paper_06.pdf
Hamborg F, Lachnit S, Schubotz M, Hepp T, Gipp B (2018) Giveme5w: main event retrieval from news articles by extraction of the five journalistic w questions. In: International conference on information, Springer, pp 356–366 . https://doi.org/10.1007/978-3-319-78105-1_39
Hamborg F, Lachnit S, Schubotz M, Hepp T, Gipp B (2018) Giveme5w: main event retrieval from news articles by extraction of the five journalistic w questions. In: International conference on information, Springer, pp 356–366
Han X, Wang J (2019) Earthquake information extraction and comparison from different sources based on web text. ISPRS Int J Geo-Inf 8(6):252. https://doi.org/10.3390/ijgi8060252
Han X, Wang J (2019) Using social media to mine and analyze public sentiment during a disaster: a case study of the 2018 Shouguang city flood in china. ISPRS Int J Geo-Inf 8(4):185. https://doi.org/10.3390/ijgi8040185
Imran M, Castillo C, Diaz F, Vieweg S (2015) Processing social media messages in mass emergency: a survey. ACM Comput Surv (CSUR) 47(4):67. https://doi.org/10.1145/2771588
Imran M, Alam F, Qazi U, Peterson S, Ofli F (2020) Rapid damage assessment using social media images by combining human and machine intelligence. CoRR arXiv:2004.06675
Imran M, Castillo C, Lucas J, Meier P, Vieweg S (2014) Aidr: artificial intelligence for disaster response. In: 23rd international World Wide Web conference, WWW ’14, Seoul, Republic of Korea, April 7–11, 2014, Companion Volume, pp 159–162 . https://doi.org/10.1145/2567948.2577034
Imran M, Elbassuoni S, Castillo C, Diaz F, Meier P (2013) Extracting information nuggets from disaster-related messages in social media. In: 10th proceedings of the international conference on information systems for crisis response and management, Baden-Baden, Germany, May 12–15, 2013. ISCRAM Association. http://idl.iscram.org/files/imran/2013/613_Imran_etal2013.pdf
Imran M, Elbassuoni S, Castillo C, Diaz F, Meier P (2013) Practical extraction of disaster-relevant information from social media. In: Proceedings of the 22nd international conference on World Wide Web, pp 1021–1024 . https://doi.org/10.1145/2487788.2488109
Interdonato R, Guillaume J-L, Doucet A (2019) A lightweight and multilingual framework for crisis information extraction from twitter data. Soc Netw Anal Min 9(1):65. https://doi.org/10.1007/s13278-019-0608-4
Jang B, Kim I, Kim JW (2019) Word2vec convolutional neural networks for classification of news articles and tweets. PloS One 14(8):0220976
Kalyanam J, Quezada M, Poblete B, Lanckriet G (2016) Prediction and characterization of high-activity events in social media triggered by real-world news. PloS One 11(12):0166694
Karami A, Shah V, Vaezi R, Bansal A (2020) Twitter speaks: a case of national disaster situational awareness. J Inf Sci 46(3):313–324. https://doi.org/10.1177/0165551519828620
Kekäläinen J, Järvelin K (2002) Using graded relevance assessments in IR evaluation. J Assoc Inf Sci Technol 53(13):1120–1129. https://doi.org/10.1002/asi.10137
Li H, Guevara N, Herndon N, Caragea D, Neppalli K, Caragea C, Squicciarini AC, Tapia AH (2015) Twitter mining for disaster response: a domain adaptation approach. In: ISCRAM. http://idl.iscram.org/files/hongminli/2015/1234_HongminLi_etal2015.pdf
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 3111–3119
Neppalli VK, Caragea C, Caragea D (2018) Deep neural networks versus naive bayes classifiers for identifying informative tweets during disasters. In: ISCRAM
Norambuena BK, Horning M, Mitra T (2020) Evaluating the inverted pyramid structure through automatic 5w1h extraction and summarization. In: Computational journalism symposium
Pandhare KR, Shah MA (2017) Real time road traffic event detection using twitter and spark. In: 2017 International conference on inventive communication and computational technologies (ICICCT), IEEE, pp 445–449
Petroni F, Raman N, Nugent T, Nourbakhsh A, Panić Ž, Shah S, Leidner J (2018) An extensible event extraction system with cross-media event resolution. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 626–635 . https://doi.org/10.1145/3219819.3219827
Piskorski J, Tanev H, Atkinson M, Van Der Goot E, Zavarella V (2011) Online news event extraction for global crisis surveillance. In: Transactions on computational collective intelligence V, Springer, pp 182–212. https://doi.org/10.1007/978-3-642-24016-4_10
Rhodan M (2017) Please send help. Hurricane harvey victims turn to twitter and facebook. https://time.com/4921961/hurricane-harvey-twitter-facebook-social-media/. Accessed 10 Nov 2020
Rogstadius J, Vukovic M, Teixeira CA, Kostakos V, Karapanos E, Laredo JA (2013) Crisistracker: crowdsourced social media curation for disaster awareness. IBM J Res Dev 57(5):4:1-4:13. https://doi.org/10.1147/JRD.2013.2260692
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World Wide Web, pp 851–860 . https://doi.org/10.1145/1772690.1772777
Sen A, Rudra K, Ghosh S (2015) Extracting situational awareness from microblogs during disaster events. In: 2015 7th international conference on communication systems and networks (COMSNETS), IEEE, pp 1–6. https://doi.org/10.1109/COMSNETS.2015.7098720
Shrestha P, Jacquin C, Daille B (2012) Clustering short text and its evaluation. In: International conference on intelligent text processing and computational linguistics, Springer, pp 169–180
Tanev H, Piskorski J, Atkinson M (2008) Real-time news event extraction for global monitoring systems. Jt Res Center Eur Comm Web Lang Technol Group IPSC, TP 267. https://www.researchgate.net/publication/221474287_Real-Time_News_Event_Extraction_for_Global_Crisis_Monitoring
Téllez-Valero A, Montes-y-Gómez M, Pineda LV (2009) Using machine learning for extracting information from natural disaster news reports. Computación y Sistemas 13(1):33–44
Verma R, Karimi S, Lee D, Gnawali O, Shakery A (2019) Newswire versus social media for disaster response and recovery. In: 2019 resilience week (RWS), IEEE, vol. 1, pp 132–141
Wang KSHW (2010) Representing dynamic phenomena based on spatiotemporal information extracted from web documents. In: Extended abstracts, GIScience conference 2010
Wang W, Stewart K (2015) Spatiotemporal and semantic information extraction from web news reports about natural hazards. Comput Environ Urban Syst 50:30–40. https://doi.org/10.1016/j.compenvurbsys.2014.11.001
Wang Z, Ye X (2018) Social media analytics for natural disaster management. Int J Geogr Inf Sci 32(1):49–72. https://doi.org/10.1080/13658816.2017.1367003
Wang Z, Ye X (2019) Space, time, and situational awareness in natural hazards: a case study of hurricane sandy with social media data. Cartogr Geogr Inf Sci 46(4):334–346
Wang Z, Ye X, Tsou M-H (2016) Spatial, temporal, and content analysis of twitter for wildfire hazards. Nat Hazards 83(1):523–540
Wanichayapong N, Pruthipunyaskul W, Pattara-Atikom W, Chaovalit P (2011) Social-based traffic information extraction and classification. In: 2011 11th international conference on ITS telecommunications, IEEE, pp 107–112
Wiegmann M, Kersten J, Klan F, Potthast M, Stein B (2020) Analysis of detection models for disaster-related tweets. Anal Detect Mod Disaster-Relat Tweets, 872–880
Yuan F, Liu R (2020) Mining social media data for rapid damage assessment during hurricane Matthew: feasibility study. J Comput Civil Eng 34(3):05020001
Yun H (2011) Disaster events detection using twitter data. J Inf Commun converg Eng 9(1):69–73. https://doi.org/10.6109/jicce.2011.9.1.069
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
The authors confirm their contribution to the paper as follows: study conception and design, data collection, analysis and interpretation of results, and draft manuscript preparation: Nilani Algiriyage. All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Ethical Approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Algiriyage, N., Prasanna, R., Stock, K. et al. DEES: a real-time system for event extraction from disaster-related web text. Soc. Netw. Anal. Min. 13, 6 (2023). https://doi.org/10.1007/s13278-022-01007-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-022-01007-2