Skip to main content
Log in

Spatio-temporal mining of keywords for social media cross-social crawling of emergency events

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Being able to automatically extract as much relevant posts as possible from social media in a timely manner is key in many activities, for example to provide useful information in order to rapidly create crisis maps during emergency events. While most social media support keyword-based searches, the amount and the accuracy of retrieved posts depend largely on the keywords employed. The goal of the proposed methodology is to dynamically extract relevant keywords for searching social media during an emergency event, following the event’s evolution. Starting from a set of keywords designed for the type of event being considered (floods and earthquakes, in particular), the set of keywords is automatically adjusted taking into account the spatio-temporal features of the monitored event. The goal is to retrieve posts following the event’s evolution and to benefit from cross-social crawling in order to exploit the specific characteristics of a social media over others. In the case considered in this paper, we exploit the precision of the geolocation of images posted in Flickr to extract keywords to search YouTube posts for the same event, since YouTube does not allow spatial crawling yet provides a richer source of information. The methodology was evaluated on three recent major emergency events, demonstrating a large increase in the number of retrieved posts compared with the use of generic seed keywords. This is a relevant improvement of relevance for providing information on emergency events, and the ability to follow the event’s development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://www.e2mc-project.eu

  2. https://twitter.com

  3. https://www.flickr.com

  4. https://www.youtube.com

  5. https://www.openstreetmap.org

  6. https://www.geonames.org

  7. https://www.facebook.com

  8. https://www.instagram.com

  9. Relevant here means that the title contains at least a seed keyword. Groups and albums are retrieved only for media obtained through seed keywords and are limited to media posted on the same day.

  10. https://en.wikipedia.org/wiki/Haversine_formula

  11. In this paper, we consider in the experimentation timeframes of 24 h, and iterations are repeated 3 times in each timeframe.

  12. In this context, the past is the period of time before the beginning of the event, which models the normal situation. Three months are considered in the current implementation.

  13. The higher it is, the less a higher coverage is penalized, and therefore tags frequent also in the past are less penalized. γ is set as 1 in the current implementation.

  14. https://wiki.openstreetmap.org/wiki/Overpass_API

  15. The total amount of tags available in OSM (https://taginfo.openstreetmap.org) is huge and their type is not fixed. Tags referred to locations typically affected by emergency events were selected and are not listed here for the sake of brevity.

  16. In the current implementation, the maximum has been fixed at 2.

  17. https://en.wikipedia.org/wiki/Hurricane_Harvey

  18. https://en.wikipedia.org/wiki/2013%E2%80%9314_United_Kingdom_winter_floods

  19. https://emergency.copernicus.eu/mapping/list-of-activations-rapid

References

  1. Ajao O, Hong J, Liu W (2015) A survey of location inference techniques on twitter. J Inf Sci 41(6):855–864

    Article  Google Scholar 

  2. Ao J, Zhang P, Cao Y (2014) Estimating the locations of emergency events from twitter streams. In: Proceedings of the second international conference on information technology and quantitative management, ITQM 2014. National Research University Higher School of Economics (HSE), Moscow, pp 731–739

  3. Atefeh F, Khreich W (2015) A survey of techniques for event detection in twitter. Comput Intell 31(1):132–164

    Article  Google Scholar 

  4. Autelitano A (2018) Spatio-temporal cross-social media mining for emergency events, Master’s Thesis, Politecnico di Milano, Milan

  5. Castillo C (2016) Big crisis data: social media in disasters and time-critical situations. Cambridge University Press, Cambridge

    Book  Google Scholar 

  6. Francalanci C, Guglielmino P, Montalcini M, Scalia G, Pernici B (2017) IMEXT: a method and system to extract geolocated images from tweets—analysis of a case study. In: 2017 11th international conference on research challenges in information science (RCIS). IEEE

  7. Francalanci C, Pernici B, Scalia G (2017) Exploratory spatio-temporal queries in evolving information. In: Mobility analytics for spatio-temporal and social data - first international workshop, MATES 2017, Munich, Germany, September 1, 2017, Revised Selected Papers, pp 138–156

  8. Francalanci C, Pernici B, Scalia G, Zeug G (2018) Talking about places: considering context in geolocation of images extracted from tweets. In: GI-Forum 2018, Issue 1, Salzburg, July 2018, Short paper, pp 243–250

  9. Haklay M M, Weber P (2008) OpenStreetMap: user-generated street maps. IEEE Pervasive Comput 7(4):12–18

    Article  Google Scholar 

  10. Hauff C (2013) A study on the accuracy of Flickr’s geotag data. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 1037– 1040

  11. Havas C, Resch B, Francalanci C, Pernici B, Scalia G, Fernandez-Marquez J L, Achte T V, Zeug G, Mondardini M R R, Grandoni D, Kirsch B, Kalas M, Lorini V, Rüping S (2017) E2mC: improving emergency management service practice through social media and crowdsourcing analysis in near real time. Sensors 17 (12):2766

    Article  Google Scholar 

  12. Manning C D, Surdeanu M, Bauer J, Finkel J R, Bethard S, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, ACL 2014, Baltimore, pp 55–60

  13. Middleton S E, Kordopatis-Zilos G, Papadopoulos S, Kompatsiaris Y (2018) Location extraction from social media: geoparsing, location disambiguation, and geotagging. ACM Trans Inf Syst 36(4):40:1–40:27

    Article  Google Scholar 

  14. Panteras G, Wise S, Lu X, Croitoru A, Crooks A, Stefanidis A (2015) Triangulating social multimedia content for event localization using Flickr and Twitter. Trans GIS 19(5):694–715

    Article  Google Scholar 

  15. Pezanowski S, MacEachren A M, Savelyev A, Robinson A C (2018) Senseplace3: a geovisual framework to analyze place–time–attribute information in social media. Cartogr Geogr Inf Sci 45(5):420– 437

    Article  Google Scholar 

  16. Pohl D, Bouchachia A, Hellwagner H (2012) Automatic identification of crisis-related sub-events using clustering. In: 11th international conference on machine learning and applications, vol 2. ICMLA, Boca Raton, pp 333–338

  17. Qu Q, Chen C, Jensen C S, Skovsgaard A (2015) Space-time aware behavioral topic modeling for microblog posts. IEEE Data Eng Bull 38(2):58–67

    Google Scholar 

  18. Resch B, Usländer F, Havas C (2018) Combining machine-learning topic models and spatio-temporal analysis of social media data for disaster footprint and damage assessment. Cartogr Geogr Inf Sci (CaGIS) 45(4):362–376. https://doi.org/10.1080/15230406.2017.1356242

  19. Scalia G (2017) Network-based content geolocation on social media for emergency management, Master’s Thesis, Politecnico di Milano, Milan

  20. Schubert E, Sander J, Ester M, Kriegel H P, Xu X (2017) Dbscan revisited: why and how you should (still) use dbscan. ACM Trans Database Systems (TODS) 42(3):19

    Article  Google Scholar 

  21. Tamura K, Ichimura T (2013) Density-based spatiotemporal clustering algorithm for extracting bursty areas from georeferenced documents. In: IEEE international conference on systems, man, and cybernetics (SMC), 2013. IEEE, pp 2079–2084

  22. Wang X, Tokarchuk L, Cuadrado F, Poslad S (2013) Exploiting hashtags for adaptive microblog crawling. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, pp 311–315

  23. Zheng X, Sun A, Wang S, Han J (2017) Semi-supervised event-related tweet identification with dynamic keyword generation. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 1619–1628

Download references

Acknowledgements

This work was funded by the European Commission H2020 project E2mC “Evolution of Emergency Copernicus services” under project No. 730082. This work expresses the opinions of the authors and not necessarily those of the European Commission. The European Commission is not liable for any use that may be made of the information contained in this work. The authors thank Chiara Francalanci and Paolo Ravanelli for their support throughout this work and Nicole Gervasoni for her support in ground truth analysis and annotations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barbara Pernici.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Autelitano, A., Pernici, B. & Scalia, G. Spatio-temporal mining of keywords for social media cross-social crawling of emergency events. Geoinformatica 23, 425–447 (2019). https://doi.org/10.1007/s10707-019-00354-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-019-00354-1

Keywords

Navigation