Abstract
Being able to automatically extract as much relevant posts as possible from social media in a timely manner is key in many activities, for example to provide useful information in order to rapidly create crisis maps during emergency events. While most social media support keyword-based searches, the amount and the accuracy of retrieved posts depend largely on the keywords employed. The goal of the proposed methodology is to dynamically extract relevant keywords for searching social media during an emergency event, following the event’s evolution. Starting from a set of keywords designed for the type of event being considered (floods and earthquakes, in particular), the set of keywords is automatically adjusted taking into account the spatio-temporal features of the monitored event. The goal is to retrieve posts following the event’s evolution and to benefit from cross-social crawling in order to exploit the specific characteristics of a social media over others. In the case considered in this paper, we exploit the precision of the geolocation of images posted in Flickr to extract keywords to search YouTube posts for the same event, since YouTube does not allow spatial crawling yet provides a richer source of information. The methodology was evaluated on three recent major emergency events, demonstrating a large increase in the number of retrieved posts compared with the use of generic seed keywords. This is a relevant improvement of relevance for providing information on emergency events, and the ability to follow the event’s development.
Similar content being viewed by others
Notes
Relevant here means that the title contains at least a seed keyword. Groups and albums are retrieved only for media obtained through seed keywords and are limited to media posted on the same day.
In this paper, we consider in the experimentation timeframes of 24 h, and iterations are repeated 3 times in each timeframe.
In this context, the past is the period of time before the beginning of the event, which models the normal situation. Three months are considered in the current implementation.
The higher it is, the less a higher coverage is penalized, and therefore tags frequent also in the past are less penalized. γ is set as 1 in the current implementation.
The total amount of tags available in OSM (https://taginfo.openstreetmap.org) is huge and their type is not fixed. Tags referred to locations typically affected by emergency events were selected and are not listed here for the sake of brevity.
In the current implementation, the maximum has been fixed at 2.
References
Ajao O, Hong J, Liu W (2015) A survey of location inference techniques on twitter. J Inf Sci 41(6):855–864
Ao J, Zhang P, Cao Y (2014) Estimating the locations of emergency events from twitter streams. In: Proceedings of the second international conference on information technology and quantitative management, ITQM 2014. National Research University Higher School of Economics (HSE), Moscow, pp 731–739
Atefeh F, Khreich W (2015) A survey of techniques for event detection in twitter. Comput Intell 31(1):132–164
Autelitano A (2018) Spatio-temporal cross-social media mining for emergency events, Master’s Thesis, Politecnico di Milano, Milan
Castillo C (2016) Big crisis data: social media in disasters and time-critical situations. Cambridge University Press, Cambridge
Francalanci C, Guglielmino P, Montalcini M, Scalia G, Pernici B (2017) IMEXT: a method and system to extract geolocated images from tweets—analysis of a case study. In: 2017 11th international conference on research challenges in information science (RCIS). IEEE
Francalanci C, Pernici B, Scalia G (2017) Exploratory spatio-temporal queries in evolving information. In: Mobility analytics for spatio-temporal and social data - first international workshop, MATES 2017, Munich, Germany, September 1, 2017, Revised Selected Papers, pp 138–156
Francalanci C, Pernici B, Scalia G, Zeug G (2018) Talking about places: considering context in geolocation of images extracted from tweets. In: GI-Forum 2018, Issue 1, Salzburg, July 2018, Short paper, pp 243–250
Haklay M M, Weber P (2008) OpenStreetMap: user-generated street maps. IEEE Pervasive Comput 7(4):12–18
Hauff C (2013) A study on the accuracy of Flickr’s geotag data. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 1037– 1040
Havas C, Resch B, Francalanci C, Pernici B, Scalia G, Fernandez-Marquez J L, Achte T V, Zeug G, Mondardini M R R, Grandoni D, Kirsch B, Kalas M, Lorini V, Rüping S (2017) E2mC: improving emergency management service practice through social media and crowdsourcing analysis in near real time. Sensors 17 (12):2766
Manning C D, Surdeanu M, Bauer J, Finkel J R, Bethard S, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, ACL 2014, Baltimore, pp 55–60
Middleton S E, Kordopatis-Zilos G, Papadopoulos S, Kompatsiaris Y (2018) Location extraction from social media: geoparsing, location disambiguation, and geotagging. ACM Trans Inf Syst 36(4):40:1–40:27
Panteras G, Wise S, Lu X, Croitoru A, Crooks A, Stefanidis A (2015) Triangulating social multimedia content for event localization using Flickr and Twitter. Trans GIS 19(5):694–715
Pezanowski S, MacEachren A M, Savelyev A, Robinson A C (2018) Senseplace3: a geovisual framework to analyze place–time–attribute information in social media. Cartogr Geogr Inf Sci 45(5):420– 437
Pohl D, Bouchachia A, Hellwagner H (2012) Automatic identification of crisis-related sub-events using clustering. In: 11th international conference on machine learning and applications, vol 2. ICMLA, Boca Raton, pp 333–338
Qu Q, Chen C, Jensen C S, Skovsgaard A (2015) Space-time aware behavioral topic modeling for microblog posts. IEEE Data Eng Bull 38(2):58–67
Resch B, Usländer F, Havas C (2018) Combining machine-learning topic models and spatio-temporal analysis of social media data for disaster footprint and damage assessment. Cartogr Geogr Inf Sci (CaGIS) 45(4):362–376. https://doi.org/10.1080/15230406.2017.1356242
Scalia G (2017) Network-based content geolocation on social media for emergency management, Master’s Thesis, Politecnico di Milano, Milan
Schubert E, Sander J, Ester M, Kriegel H P, Xu X (2017) Dbscan revisited: why and how you should (still) use dbscan. ACM Trans Database Systems (TODS) 42(3):19
Tamura K, Ichimura T (2013) Density-based spatiotemporal clustering algorithm for extracting bursty areas from georeferenced documents. In: IEEE international conference on systems, man, and cybernetics (SMC), 2013. IEEE, pp 2079–2084
Wang X, Tokarchuk L, Cuadrado F, Poslad S (2013) Exploiting hashtags for adaptive microblog crawling. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, pp 311–315
Zheng X, Sun A, Wang S, Han J (2017) Semi-supervised event-related tweet identification with dynamic keyword generation. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 1619–1628
Acknowledgements
This work was funded by the European Commission H2020 project E2mC “Evolution of Emergency Copernicus services” under project No. 730082. This work expresses the opinions of the authors and not necessarily those of the European Commission. The European Commission is not liable for any use that may be made of the information contained in this work. The authors thank Chiara Francalanci and Paolo Ravanelli for their support throughout this work and Nicole Gervasoni for her support in ground truth analysis and annotations.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Autelitano, A., Pernici, B. & Scalia, G. Spatio-temporal mining of keywords for social media cross-social crawling of emergency events. Geoinformatica 23, 425–447 (2019). https://doi.org/10.1007/s10707-019-00354-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-019-00354-1