Abstract
This paper proposes three methods of association analysis that address two challenges of Big Data: capturing relatedness among real-world events in high data volumes, and modeling similar events that are described disparately under high data variability. The proposed methods take as input a set of geotemporally-encoded text streams about violent events called “storylines”. These storylines are associated for two purposes: to investigate if an event could occur again, and to measure influence, i.e., how one event could help explain the occurrence of another. The first proposed method, Distance-based Bayesian Inference, uses spatial distance to relate similar events that are described differently, addressing the challenge of high variability. The second and third methods, Spatial Association Index and Spatio-logical Inference, measure the influence of storylines in different locations, dealing with the high-volume challenge. Extensive experiments on social unrest in Mexico and wars in the Middle East showed that these methods can achieve precision and recall as high as 80 % in retrieval tasks that use both keywords and geospatial information as search criteria. In addition, the experiments demonstrated high effectiveness in uncovering real-world storylines for exploratory analysis.
Similar content being viewed by others
Notes
civil unrest denotes an event of social impact, such as a strike or a protest.
An actor can be a political organization, the military, militias, terrorist organizations, and individuals, among others.
References
Bolzoni P, Helmer S, Wellenzohn K, Gamper J, Andritsos P (2014) Efficient itinerary planning with category constraints. In: Proceedings of the 22Nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’14. ACM, NY, USA, pp 203–212. doi:10.1145/2666310.2666411
Bouros P, Sacharidis D, Bikakis N (2014) Regionally influential users in location-aware social networks. In: Proceedings of the 22Nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’14. ACM, NY, USA, pp 501–504. doi:10.1145/2666310.2666489.
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30:107–117
Chan J, Bailey J, Leckie C (2008) Discovering correlated spatio-temporal changes in evolving graphs. Knowl Inf Syst 16:53–96
Chan J, Bailey J, Leckie C (2009) Using graph partitioning to discover regions of correlated spatio-temporal change in evolving graphs. Intell Data Anal (IDA) 13:755–793
George B, Kang J, Shekhar S (2009) Spatio-temporal sensor graphs (stsg): a data model for the discovery of spatio-temporal patterns. Intell Data Anal (IDA) 13:457–475
Hossain MS, Andrews C, Ramakrishnan N, North C (2011) Helping intelligence analysts make connections. In: Workshop on scalable integration of analytics and visualization, AAAI ’11, pp 22–31
Hossain MS, Butler P, Ramakrishnan N, Boedihardjo A Stortytelling in entity networks to support intelligence analysts. In: Conference on Knowledge Discovery and Data Mining (KDD’12), pp 1375–1383
Hossain M.S., Gresock J., Edmonds Y., Helm R., Potts M., Ramakrishnan N. (2012) Connecting the dots between pubmed abstracts, vol 7
Iarpa - open source indicators program (osi) (2014). http://www.iarpa.gov/solicitations_osi.html
Kimmig A, Bach SH, Broecheler M, Huang B, Getoor L (2012) A short introduction to probabilistic soft logic. In: NIPS Workshop on probabilistic programming: Foundations and applications
Kleinberg J. (1998) Authoritative sources in a hyperlinked environment. In: Society of industrial and applied mathematics (SIAM), pp 668–677
Kreinovich V, Kosheleva O (2008) Computational complexity of determining which statements about causality hold in different space-time models. Theor Comput Sci 405(1-2):50–63
Kumar D, Ramakrishnan N, Helm RF, Potts M (2008) Algorithms for storytelling. IEEE TKDE 20(6):32. http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.32
Leetaru K., Schrodt P. (2013) Gdelt: Global database of events, language, and tone, 1979-2012. In: Proceedings International Studies Associations Annual Conference (ISA)
Li Z, Wang B, Li M, Ma WY (2005) A probabilistic model for retrospective news event detection. In: ACM SIGIR Conference on research and development in information retrieval, SIGIR ’05 , pp 106–113
Liu M, Fu K, Lu CT, Chen G, Wang H (2014) A search and summary application for traffic events detection based on twitter data. In: Proceedings of the 22Nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’14. ACM, NY, USA, pp 549–552. doi:10.1145/2666310.2666366
Marchiori M (1997) The quest for correct information on web: Hyper search engines. In: World wide web conference (WWW), pp 1225–1235
Mondo GD, RodrGuez M, Claramunt C, Bravo L, Thibaud R (2013) Modeling consistency of spatio-temporal graphs. Data Knowl Eng 84:59–80
P. Mohan S, Shekhar JS, Rogers J (2012) Cascading spatio-temporal pattern discovery. Trans Knowl Data Eng (TKDE) 24(11):1977–1992
Radinsky K, Davidovich S, Markovitch S (2012) Learning causality for news events prediction. In: World wide web conference (WWW), pp 909–918
Radinsky K, Davidovich S, Markovitch S (2012) Learning to predict from textual data. J Artif Intell Res (JAIR) 45:641–684
Radinsky K, Horvitz E (2013) Mining the web to predict future events. In: Conference on web search and data mining, WSDM ’13, pp 255–264
Santos RD, Shah S, Chen F, Boedihardjo A, Butler P, Lu CT, Ramakrhishnan N (2016) A framework for intelligence analysis using spatio-temporal storytelling. Geoinformatica, Int J Adv Comput Sci Geogr Inf Syst:1
Shahaf D, Guestrin C (2010) Connecting the dots between news articles. In: ACM Conference on knowledge, discovery, and data mining (KDD ’10), pp 745–770
Shahaf D, Guestrin C, Horvitz E Metro maps of science. In: Conference on Knowledge Discovery and Data Mining, KDD’12, pp 1122–1130
Shahaf D, Guestrin C, Horvitz E Trains of thought: Generating information maps. In: World Wide Web Conference, WWW’12, pp 899–908
Shekhar S, Chawla S (2003) Spatial databases: a tour. Prentice Hall, New York
Turner S (1994) The creative process: A computer model of storytelling and creativity. Psychology Press, pp 122–123
Vavliakis KN, Symeonidis AL, Mitkas PA (2013) Event identification in web social media through named entity recognition and topic modeling. Data Knowl Eng 88:1–24
Wang B, Wang X (2011) Spatial entropy-based clustering for mining data with spatial correlation. In: Proceedings of the 15th pacific-asia conf. on adv. in knowledge discovery and data mining, PAKDD’11, pp 196–208
Zhang J.D, Chow C.Y, Li Y (2014) Lore: Exploiting sequential influence for location recommendations. In: Proceedings of the 22Nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’14. ACM, NY, USA, pp 103–112. doi:10.1145/2666310.2666400
Zhou X, Chen L (2014) Event detection over twitter social media streams. VLDB J 23(3):381–400. doi:10.1007/s00778-013-0320-3
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dos Santos, R.F., Boedihardjo, A., Shah, S. et al. The big data of violent events: algorithms for association analysis using spatio-temporal storytelling. Geoinformatica 20, 879–921 (2016). https://doi.org/10.1007/s10707-016-0247-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-016-0247-0