Skip to main content
Log in

Social media for crisis management: clustering approaches for sub-event detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Social media is getting increasingly important for crisis management, as it enables the public to provide information in different forms: text, image and video which can be valuable for crisis management. Such information is usually spatial and time-oriented, useful for understanding the emergency needs, performing decision making and supporting learning/training after the emergency. Due to the huge amount of data gathered during a crisis, automatic processing of the data is needed to support crisis management. One way of automating the process is to uncover sub-events (i.e., special hotspots) in the data collected from social media to enable better understanding of the crisis. We propose in the present paper clustering approaches for sub-event detection that operate on Flickr and YouTube data since multimedia data is of particular importance to understand the situation. Different clustering algorithms are assessed using the textual annotations (i.e., title, tags and description) and additional metadata information, like time and location. The empirical study shows in particular that social multimedia combined with clustering in the context of crisis management is worth using for detecting sub-events. It serves to integrate social media into crisis management without cumbersome manual monitoring.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. See http://flickrj.sourceforge.net/, https://developers.google.com/gdata/

  2. The precision of the geo-coordinates depends on the tagging behavior of the user and the platform offering the tagging mechanism, e.g., for Flickr when placing it on a map it is the sixth decimal.

  3. http://www.geonames.org/

References

  1. Avila LA, Cangialosi J (2012) National Hurricane Center. http://www.nhc.noaa.gov/data/tcr/AL092011_Irene.pdf

  2. BBC News Europe (2012) England Riots: Maps and Timeline. http://www.bbc.co.uk/news/uk-14436499

  3. Becker H, Naaman M, Gravano L (2010) Learning similarity metrics for event identification in social media. In: Proceedings of the 3rd ACM international conference on web search and data mining, WSDM ’10. ACM, New York, pp 291–300

  4. Bergstrand F, Landgren J (2009) Information sharing using live video in emergency response work. In: Proceedings of the 6th international conference on information systems for crisis response and management

  5. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  6. Bouchachia A, Pedrycz W (2006) Enhancement of fuzzy clustering by mechanisms of partial supervision. Fuzzy Sets Syst 157(13):1733–1759. doi:10.1016/j.fss.2006.02.015. http://www.sciencedirect.com/science/article/pii/S0165011406000960

    Article  MATH  MathSciNet  Google Scholar 

  7. Choudhary A, Hendrix W, Lee K, Palsetia D, Liao WK (2012) Social media evolution of the Egyptian revolution. Commun ACM 55(5):74–80

    Article  Google Scholar 

  8. Cover TM, Thomas JA (2006) Entropy, relative entropy and mutual information. In: Elements of information theory. Wiley, New Jersey

    Google Scholar 

  9. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell PAMI-1(2):224–227. doi:10.1109/TPAMI.1979.4766909

    Article  Google Scholar 

  10. Duda P, Hart E, Stork D (2001) Pattern classification. Wiley, New York

    MATH  Google Scholar 

  11. Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge

    MATH  Google Scholar 

  12. Fontugne R, Cho K, Won Y, Fukuda K (2011) Disasters seen through Flickr Cameras. In: Proceedings of the special workshop on internet and disasters, SWID ’11. ACM, New York, pp 5:1–5:10

  13. Han D, Li W, Li Z (2008) Semantic image classification using statistical local spatial relations model. Multimed Tools Appl 39(2):169–188

    Article  Google Scholar 

  14. Ireson N (2009) Local community situational awareness during an emergency. In: 3rd IEEE international conference on digital ecosystems and technologies (DEST ’09), pp 49–54

  15. Jaffe A, Naaman M, Tassa T, Davis M (2006) Generating summaries and visualization for large collections of geo-referenced photographs. In: Proceedings of the 8th ACM int’l workshop on multimedia information retrieval. ACM, New York, pp 89–98

  16. Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480

    Article  Google Scholar 

  17. Larose DT (2005) Discovering knowledge in data: an introduction to data mining. Wiley, Hoboken

    Google Scholar 

  18. Liu S, Palen L, Sutton J, Hughes A, Vieweg S (2008) In search of the bigger picture: the emergent role of on-line photo-sharing in times of disaster. In: Proceedings of the information systems for crisis response and management conf

  19. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  20. Marcus A, Bernstein MS, Badar O, Karger DR, Madden S, Miller RC (2011) Twitinfo: aggregating and visualizing microblogs for event exploration. In: Proceedings of annual conference on human factors in computing systems. ACM, New York, pp 227–236

  21. Mathioudakis M, Koudas N (2010) TwitterMonitor: trend detection over the Twitter stream. In: Proceedings of the international conference on management of data, SIGMOD ’10. ACM, New York, pp 1155–1158

  22. Petkos G, Papadopoulos S, Kompatsiaris Y (2012) Social event detection using multimodal clustering and integrating supervisory signals. In: Proceedings of the 2nd ACM international conference on multimedia retrieval, ICMR ’12. ACM, New York, pp 23:1–23:8

  23. Petrović S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to Twitter. In: The 2010 annual conferenece of the North American chapter of the association for computational linguistics, HLT ’10. Association for Computational Linguistics, Stroudsburg, pp 181–189

  24. Pohl D, Bouchachia A, Hellwagner H (2012) Automatic identification of crisis-related sub-events using clustering. In: International conference on machine learning and applications (ICMLA). Florida

  25. Pohl D, Bouchachia A, Hellwagner H (2012) Automatic sub-event detection in emergency management using social media. In: First international workshop on social web for disaster management (SWDM), in conjunction with WWW’12. Lyon

  26. Pohl D, Bouchachia A, Hellwagner H (2012) Supporting crisis management via sub-event detection in social networks. In: International conference on collaboration technologies and infrastructures. Toulouse

  27. Public Health Emergency: Hurricane Irene 2011 (2012) http://www.phe.gov/emergency/news/sitreps/Pages/irene-2011.aspx

  28. Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from Flickr tags. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07. ACM, New York, pp 103–110

  29. Rogstadius J, Kostakos V (2011) Towards real-time emergency response using crowd supported analysis of social media. In: Proceedings CHI workshop on crowdsourcing and human computation: systems, studies and platforms

  30. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  MATH  Google Scholar 

  31. Sagun A (2010) Advanced ICTs for disaster management and threat detection: collaborative and distributed frameworks, chap. Efficient deployment of ICT tools in disaster management process. Peremier Reference Source, pp 95–107

  32. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    Article  MATH  Google Scholar 

  33. Slaney M (2011) Web-scale multimedia analysis: does content matter? IEEE Multimed 18(2):12–15. doi:10.1109/MMUL.2011.34

    Article  Google Scholar 

  34. Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MATH  MathSciNet  Google Scholar 

  35. Terpstra T, de Vries A, Stronkman R, Paradies GL (2012) Towards a realtime Twitter analysis during crises for operational crisis management. In: Proceedings of the 9th international ISCRAM conference Vancouver

  36. Theodoridis S, Koutroumbas K (2006) Pattern recognition. Elsevier/Academic Press, San Diego

    MATH  Google Scholar 

  37. Tucker S, Lanfranchi V, Ireson N, Sosa A, Burel G, Ciravegna F (2012) “Straight to the information I need”: assessing collational interfaces for emergency response. In: Proceedings of the 9th international ISCRAM conference. Vancouver

  38. Vesanto J (2000) Neural network tool for data mining: SOM toolbox. In: TOOLMET 2000 symposiumtool environments and development methods for intelligent systems. Oulu

  39. Vieweg S, Hughes AL, Starbird K, Palen L (2010) Microblogging during two natural hazards events: what Twitter may contribute to situational awareness. In: Proceedings of the 28th international conference on human factors in computing systems, CHI ’10. ACM, New York, pp 1079–1088

  40. Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244

    Article  Google Scholar 

  41. Yang Y, Carbonell JG, Brown RD, Pierce T, Archibald BT, Liu X (1999) Learning approaches for detecting and tracking news events. IEEE Intell Syst 14(4):32–43

    Article  Google Scholar 

  42. Yates D, Paquette S (2011) Emergency knowledge management and social media technologies: a case study of the 2010 Haitian earthquake. Int J Inf Manag 31(1):6–13

    Article  Google Scholar 

  43. Yin J, Lampert A, Cameron M, Robinson B, Power R (2012) Using social media to enhance emergency situation awareness. IEEE Intell Syst PP(99):1

    Google Scholar 

  44. Zhou C, Frankowski D, Ludford P, Shekhar S, Terveen L (2007) Discovering personally meaningful places: an interactive clustering approach. ACM Trans Inf Syst 25(3):65–95

    Google Scholar 

Download references

Acknowledgments

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n°261817 and was partly performed in the Lakeside Labs research cluster at Alpen-Adria-Universität Klagenfurt. Thanks are due to Alexander Pask-Hughes for helping in labeling the data set.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniela Pohl.

Appendices

Appendix A: Algorithm descriptions

In the following, it is assumed that there are n input vectors with m features ({x i = x i, 1, x i, 2x i, m }, i = 1, … , n).

1.1 A.1 Self-Organizing Maps (SOM)

SOM can be seen as a neural network consisting of an input and an output layer without any hidden layers [17]. It maps inputs from the input layer to a lower dimensional output layer (i.e., the map) and clusters them accordingly. The output layer is described with so called map units. Closely related inputs are also closely related in the output layer, this means, that they are mapped to the same map unit. This mapping results in the corresponding clustering. The amount of map units N is given by the user. A map unit u is described by m weights (i.e., vector of {u j = u j, 1, u j, 2u j, m } j = 1, ..., N) allowing the adaptation based on incoming inputs. Algorithm 1 shows the general processing steps, where α describes the learning rate and γ represents the neighborhood function. The neighborhood function ensures that neighboring units are updated correspondingly. We used SOM in [25] and also in [26].

Algorithm 1 Pseudocode of the SOM Algorithm (adapted from [17])

1:  Randomly initialize the N map units with the corresponding m weights for each unit

2:  i = 1

3:  for i <= n do

4:      Given the input x i compute the best-matching-unit (BMU) with minimum distance

        given by the measure dist

                                  BMU = argmin j (dist(x i , u j ))                    (7)

5:     The weights of the BMU and the corresponding topological neighbors are updated

        based on γ

                              u j, (t+1) = u j,t + αγ BMU, j (x i u j,t )                    (8)

6:     i = i + 1

7:     end for

1.2 A.2 Agglomerative clustering (AC)

Agglomerative clustering is hierarchical clustering, where in each step the most similar clusters are merged [10]. At the beginning, each input (i.e., vector) is an individual cluster c. The clustering ends when only one cluster remains or another stopping criterion is reached (e.g., a specific number of clusters N is reached). The merge is based on a distance measure (e.g., average, center, complete-linkage, etc.) identifying the most similar pair of clusters. Algorithm 2 shows the processing steps for agglomerative clustering. For our settings, we used the WARD distance measure [26]

Algorithm 2 Pseudocode of the AC Algorithm (adapted from [10])

1:  Initialize the individual clusters with the inputs ( c i = x i ); ∀ c i ω 0

2:  t = 1

3:  for t < N do

4:        Identify the closest-related pair of clusters (c 1, c 2) in ω t − 1

                        (c 1, c 2) = min i,j (c i , c j ), ij                    (9)

5:        Define a new cluster c new = c 1c 2

                        w t = (w t−1 − {c 1, c 2}) ∪ c new                     (10)

6:     t = t + 1

7:  end for

Appendix B: Indices

Equation (11) from [36] describes the Dunn Index for m clusters where d describes a dissimilarity measure and diam the diameter of a cluster.

$$\begin{array}{@{}rcl@{}} d(C_{i},C_{j}) = \min\limits_{x \in C_{i} \ and \ y\in C_{j}}d(x,y) \\ diam(C) = \max\limits_{x,y \in C}d(x,y) \\ Dunn = \min\limits_{i,\dots,m} \left\{ \min\limits_{j=i+1,\dots,m} \left(\frac{d(C_{i},C_{j})}{max_{k=1,\dots,m} \quad diam(Ck)}\right) \right \} \end{array} $$
(11)

Equation (12) from [36] describes the DB index based on the s i dispersion of a cluster C i and d is again the dissimilarity measure between two clusters.

$$\begin{array}{@{}rcl@{}} R_{ij} = \frac{s_{i} + s_{j}}{d_{ij}} \\ R_i = \max\limits_{j=1,\dots,m; j\neq i} R_{ij}, \quad i = 1,\dots,m \\ DB = \frac{1}{m} \sum\limits_{i=1}^{m} R_{i} \end{array} $$
(12)

Equation (13) from [30] shows the Silhouette value of an item i of the cluster A. a(i) is the average dissimilarity of the item i to all other items in A. d(i, C) is the average dissimilarity of all objects of C to i. The Silhouette value of a cluster is the average s(i) from each item i in the cluster.

$$\begin{array}{@{}rcl@{}} b(i) = \min\limits_{C \neq A} d(i,C) \\ s(i) = \frac{b(i)-a(i)}{max\{a(i),b(i)\}} \end{array} $$
(13)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pohl, D., Bouchachia, A. & Hellwagner, H. Social media for crisis management: clustering approaches for sub-event detection. Multimed Tools Appl 74, 3901–3932 (2015). https://doi.org/10.1007/s11042-013-1804-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1804-2

Keywords

Navigation