Clustering Heterogeneous Semi-structured Social Science Datasets for Security Applications
Social scientists have begun to collect large datasets that are heterogeneous and semi-structured, but the ability to analyze such data has lagged behind its collection. We design a process to map such datasets to a numerical form, apply singular value decomposition clustering, and explore the impact of individual attributes or fields by overlaying visualizations of the clusters. This provides a new path for understanding such datasets, which we illustrate with three real-world examples: the Global Terrorism Database, which records details of every terrorist attack since 1970; a Chicago police dataset, which records details of every drug-related incident over a period of approximately a month; and a dataset describing members of a Hezbollah crime/terror network in the U.S.
KeywordsClustering Hashing Terrorism Crime Global terrorism database Chicago policing Hezbollah
- 2.Godwin A, Chang R, Kosara R, Ribarsky W (2008). Visual analysis of entity relationships in global terrorism database. In: Defense and Security 2008, Proceedings of SPIE Vol 6893, 2008Google Scholar
- 3.Golub GH, van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, BaltimoreGoogle Scholar
- 5.LaFree G (2010) The global terrorism database: accomplishments and challenges. Perspect Terrorism 4(1)Google Scholar
- 7.Shafiq S, Haider Butt W, Qamar U (2014) Attack type prediction using hybrid classifier. In: Advanced data mining and applications, vol 8933. Springer Lecture Notes in Computer Science, pp 488–498Google Scholar