Abstract
Social media is getting increasingly important for crisis management, as it enables the public to provide information in different forms: text, image and video which can be valuable for crisis management. Such information is usually spatial and time-oriented, useful for understanding the emergency needs, performing decision making and supporting learning/training after the emergency. Due to the huge amount of data gathered during a crisis, automatic processing of the data is needed to support crisis management. One way of automating the process is to uncover sub-events (i.e., special hotspots) in the data collected from social media to enable better understanding of the crisis. We propose in the present paper clustering approaches for sub-event detection that operate on Flickr and YouTube data since multimedia data is of particular importance to understand the situation. Different clustering algorithms are assessed using the textual annotations (i.e., title, tags and description) and additional metadata information, like time and location. The empirical study shows in particular that social multimedia combined with clustering in the context of crisis management is worth using for detecting sub-events. It serves to integrate social media into crisis management without cumbersome manual monitoring.
Similar content being viewed by others
Notes
The precision of the geo-coordinates depends on the tagging behavior of the user and the platform offering the tagging mechanism, e.g., for Flickr when placing it on a map it is the sixth decimal.
References
Avila LA, Cangialosi J (2012) National Hurricane Center. http://www.nhc.noaa.gov/data/tcr/AL092011_Irene.pdf
BBC News Europe (2012) England Riots: Maps and Timeline. http://www.bbc.co.uk/news/uk-14436499
Becker H, Naaman M, Gravano L (2010) Learning similarity metrics for event identification in social media. In: Proceedings of the 3rd ACM international conference on web search and data mining, WSDM ’10. ACM, New York, pp 291–300
Bergstrand F, Landgren J (2009) Information sharing using live video in emergency response work. In: Proceedings of the 6th international conference on information systems for crisis response and management
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Bouchachia A, Pedrycz W (2006) Enhancement of fuzzy clustering by mechanisms of partial supervision. Fuzzy Sets Syst 157(13):1733–1759. doi:10.1016/j.fss.2006.02.015. http://www.sciencedirect.com/science/article/pii/S0165011406000960
Choudhary A, Hendrix W, Lee K, Palsetia D, Liao WK (2012) Social media evolution of the Egyptian revolution. Commun ACM 55(5):74–80
Cover TM, Thomas JA (2006) Entropy, relative entropy and mutual information. In: Elements of information theory. Wiley, New Jersey
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell PAMI-1(2):224–227. doi:10.1109/TPAMI.1979.4766909
Duda P, Hart E, Stork D (2001) Pattern classification. Wiley, New York
Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge
Fontugne R, Cho K, Won Y, Fukuda K (2011) Disasters seen through Flickr Cameras. In: Proceedings of the special workshop on internet and disasters, SWID ’11. ACM, New York, pp 5:1–5:10
Han D, Li W, Li Z (2008) Semantic image classification using statistical local spatial relations model. Multimed Tools Appl 39(2):169–188
Ireson N (2009) Local community situational awareness during an emergency. In: 3rd IEEE international conference on digital ecosystems and technologies (DEST ’09), pp 49–54
Jaffe A, Naaman M, Tassa T, Davis M (2006) Generating summaries and visualization for large collections of geo-referenced photographs. In: Proceedings of the 8th ACM int’l workshop on multimedia information retrieval. ACM, New York, pp 89–98
Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480
Larose DT (2005) Discovering knowledge in data: an introduction to data mining. Wiley, Hoboken
Liu S, Palen L, Sutton J, Hughes A, Vieweg S (2008) In search of the bigger picture: the emergent role of on-line photo-sharing in times of disaster. In: Proceedings of the information systems for crisis response and management conf
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Marcus A, Bernstein MS, Badar O, Karger DR, Madden S, Miller RC (2011) Twitinfo: aggregating and visualizing microblogs for event exploration. In: Proceedings of annual conference on human factors in computing systems. ACM, New York, pp 227–236
Mathioudakis M, Koudas N (2010) TwitterMonitor: trend detection over the Twitter stream. In: Proceedings of the international conference on management of data, SIGMOD ’10. ACM, New York, pp 1155–1158
Petkos G, Papadopoulos S, Kompatsiaris Y (2012) Social event detection using multimodal clustering and integrating supervisory signals. In: Proceedings of the 2nd ACM international conference on multimedia retrieval, ICMR ’12. ACM, New York, pp 23:1–23:8
Petrović S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to Twitter. In: The 2010 annual conferenece of the North American chapter of the association for computational linguistics, HLT ’10. Association for Computational Linguistics, Stroudsburg, pp 181–189
Pohl D, Bouchachia A, Hellwagner H (2012) Automatic identification of crisis-related sub-events using clustering. In: International conference on machine learning and applications (ICMLA). Florida
Pohl D, Bouchachia A, Hellwagner H (2012) Automatic sub-event detection in emergency management using social media. In: First international workshop on social web for disaster management (SWDM), in conjunction with WWW’12. Lyon
Pohl D, Bouchachia A, Hellwagner H (2012) Supporting crisis management via sub-event detection in social networks. In: International conference on collaboration technologies and infrastructures. Toulouse
Public Health Emergency: Hurricane Irene 2011 (2012) http://www.phe.gov/emergency/news/sitreps/Pages/irene-2011.aspx
Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from Flickr tags. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07. ACM, New York, pp 103–110
Rogstadius J, Kostakos V (2011) Towards real-time emergency response using crowd supported analysis of social media. In: Proceedings CHI workshop on crowdsourcing and human computation: systems, studies and platforms
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Sagun A (2010) Advanced ICTs for disaster management and threat detection: collaborative and distributed frameworks, chap. Efficient deployment of ICT tools in disaster management process. Peremier Reference Source, pp 95–107
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Slaney M (2011) Web-scale multimedia analysis: does content matter? IEEE Multimed 18(2):12–15. doi:10.1109/MMUL.2011.34
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Terpstra T, de Vries A, Stronkman R, Paradies GL (2012) Towards a realtime Twitter analysis during crises for operational crisis management. In: Proceedings of the 9th international ISCRAM conference Vancouver
Theodoridis S, Koutroumbas K (2006) Pattern recognition. Elsevier/Academic Press, San Diego
Tucker S, Lanfranchi V, Ireson N, Sosa A, Burel G, Ciravegna F (2012) “Straight to the information I need”: assessing collational interfaces for emergency response. In: Proceedings of the 9th international ISCRAM conference. Vancouver
Vesanto J (2000) Neural network tool for data mining: SOM toolbox. In: TOOLMET 2000 symposiumtool environments and development methods for intelligent systems. Oulu
Vieweg S, Hughes AL, Starbird K, Palen L (2010) Microblogging during two natural hazards events: what Twitter may contribute to situational awareness. In: Proceedings of the 28th international conference on human factors in computing systems, CHI ’10. ACM, New York, pp 1079–1088
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244
Yang Y, Carbonell JG, Brown RD, Pierce T, Archibald BT, Liu X (1999) Learning approaches for detecting and tracking news events. IEEE Intell Syst 14(4):32–43
Yates D, Paquette S (2011) Emergency knowledge management and social media technologies: a case study of the 2010 Haitian earthquake. Int J Inf Manag 31(1):6–13
Yin J, Lampert A, Cameron M, Robinson B, Power R (2012) Using social media to enhance emergency situation awareness. IEEE Intell Syst PP(99):1
Zhou C, Frankowski D, Ludford P, Shekhar S, Terveen L (2007) Discovering personally meaningful places: an interactive clustering approach. ACM Trans Inf Syst 25(3):65–95
Acknowledgments
The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n°261817 and was partly performed in the Lakeside Labs research cluster at Alpen-Adria-Universität Klagenfurt. Thanks are due to Alexander Pask-Hughes for helping in labeling the data set.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Algorithm descriptions
In the following, it is assumed that there are n input vectors with m features ({x i = x i, 1, x i, 2 … x i, m }, i = 1, … , n).
1.1 A.1 Self-Organizing Maps (SOM)
SOM can be seen as a neural network consisting of an input and an output layer without any hidden layers [17]. It maps inputs from the input layer to a lower dimensional output layer (i.e., the map) and clusters them accordingly. The output layer is described with so called map units. Closely related inputs are also closely related in the output layer, this means, that they are mapped to the same map unit. This mapping results in the corresponding clustering. The amount of map units N is given by the user. A map unit u is described by m weights (i.e., vector of {u j = u j, 1, u j, 2 … u j, m } j = 1, ..., N) allowing the adaptation based on incoming inputs. Algorithm 1 shows the general processing steps, where α describes the learning rate and γ represents the neighborhood function. The neighborhood function ensures that neighboring units are updated correspondingly. We used SOM in [25] and also in [26].
Algorithm 1 Pseudocode of the SOM Algorithm (adapted from [17]) | |
---|---|
1: Randomly initialize the N map units with the corresponding m weights for each unit 2: i = 1 3: for i <= n do 4: Given the input x i compute the best-matching-unit (BMU) with minimum distance given by the measure dist BMU = argmin j (dist(x i , u j )) (7) 5: The weights of the BMU and the corresponding topological neighbors are updated based on γ u j, (t+1) = u j,t + αγ BMU, j (x i − u j,t ) (8) 6: i = i + 1 7: end for |
1.2 A.2 Agglomerative clustering (AC)
Agglomerative clustering is hierarchical clustering, where in each step the most similar clusters are merged [10]. At the beginning, each input (i.e., vector) is an individual cluster c. The clustering ends when only one cluster remains or another stopping criterion is reached (e.g., a specific number of clusters N is reached). The merge is based on a distance measure (e.g., average, center, complete-linkage, etc.) identifying the most similar pair of clusters. Algorithm 2 shows the processing steps for agglomerative clustering. For our settings, we used the WARD distance measure [26]
Algorithm 2 Pseudocode of the AC Algorithm (adapted from [10]) | |
---|---|
1: Initialize the individual clusters with the inputs ( c i = x i ); ∀ c i ∈ ω 0 2: t = 1 3: for t < N do 4: Identify the closest-related pair of clusters (c 1, c 2) in ω t − 1 (c 1, c 2) = min i,j (c i , c j ), i ≠ j (9) 5: Define a new cluster c new = c 1 ∪ c 2 w t = (w t−1 − {c 1, c 2}) ∪ c new (10) 6: t = t + 1 7: end for |
Appendix B: Indices
Equation (11) from [36] describes the Dunn Index for m clusters where d describes a dissimilarity measure and diam the diameter of a cluster.
Equation (12) from [36] describes the DB index based on the s i dispersion of a cluster C i and d is again the dissimilarity measure between two clusters.
Equation (13) from [30] shows the Silhouette value of an item i of the cluster A. a(i) is the average dissimilarity of the item i to all other items in A. d(i, C) is the average dissimilarity of all objects of C to i. The Silhouette value of a cluster is the average s(i) from each item i in the cluster.
Rights and permissions
About this article
Cite this article
Pohl, D., Bouchachia, A. & Hellwagner, H. Social media for crisis management: clustering approaches for sub-event detection. Multimed Tools Appl 74, 3901–3932 (2015). https://doi.org/10.1007/s11042-013-1804-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1804-2