Abstract
The Hawkes process and its extensions effectively model self-excitatory phenomena including earthquakes, viral pandemics, financial transactions, neural spike trains and the spread of memes through social networks. The usefulness of these stochastic process models within a host of economic sectors and scientific disciplines is undercut by the processes’ computational burden: complexity of likelihood evaluations grows quadratically in the number of observations for both the temporal and spatiotemporal Hawkes processes. We show that, with care, one may parallelize these calculations using both central and graphics processing unit implementations to achieve over 100-fold speedups over single-core processing. Using a simple adaptive Metropolis–Hastings scheme, we apply our high-performance computing framework to a Bayesian analysis of big gunshot data generated in Washington D.C. between the years of 2006 and 2019, thereby extending a past analysis of the same data from under 10,000 to over 85,000 observations. To encourage widespread use, we provide hpHawkes, an open-source R package, and discuss high-level implementation and program design for leveraging aspects of computational hardware that become necessary in a big data setting.
Similar content being viewed by others
References
Allaire, J., Francois, R., Ushey, K., Vandenbrouck, G., Geelnard, M.: Intel: RcppParallel: Parallel Programming Tools for ‘Rcpp’. R package version 4.3.19 (2016)
Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the April 18–20, 1967, Spring Joint Computer Conference, pp. 483–485 (1967)
Beam, A.L., Ghosh, S.K., Doyle, J.: Fast Hamiltonian Monte Carlo using GPU computing. J. Comput. Graph. Stat. 25, 536–548 (2016)
Bjerregaard, B., Lizotte, A.J.: Gun ownership and gang membership. J. Crim. L. Criminol. 86, 37 (1995)
Carr, J., Doleac, J.L.: The geography, incidence, and underreporting of gun violence: new evidence using shotspotter data. In: Incidence, and Underreporting of Gun Violence: New Evidence Using Shotspotter Data (2016)
Carr, J.B., Doleac, J.L.: Keep the kids inside? Juvenile curfews and urban gun violence. Rev. Econ. Stat. 100, 609–618 (2018)
Centers for Disease Control and Prevention: Centers for Disease Control and Prevention, National Center for Health Statistics. Underlying Cause of Death 1999–2018 on CDC WONDER Online Database, released in 2020. Data are from the Multiple Cause of Death Files, 1999–2018, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program (2020). Accessed wonder.cdc.gov/ucd-icd10.html
Chavez-Demoulin, V., McGill, J.: High-frequency financial data modeling using Hawkes processes. J. Bank. Finance 36, 3415–3426 (2012)
Choi, E., Du, N., Chen, R., Song, L., Sun, J.: Constructing disease network and temporal progression model via context-sensitive Hawkes process. In: 2015 IEEE International Conference on Data Mining, pp. 721–726. IEEE (2015)
Daley, D.J.: An Introduction to the Theory of Point Processes: Elementary Theory of Point Processes. Springer, Berlin (2003)
Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes: Volume II: General Theory and Structure. Springer, Berlin (2007)
Eddelbuettel, D., François, R.: Rcpp: Seamless R and C++ integration. J. Stat. Softw. 40, 1–18 (2011)
Embrechts, P., Liniger, T., Lin, L.: Multivariate Hawkes processes: an application to financial data. J. Appl. Probab. 48, 367–378 (2011)
Fang, J., Varbanescu, A.L., Sips, H.: A comprehensive performance comparison of cuda and opencl. In: 2011 International Conference on Parallel Processing, pp. 216–225. IEEE (2011)
Federal Bureau of Investigation: Crime in the u.s. (2005). Accessed www2.fbi.gov/ucr/05cius/data/table_05.html
Flaxman, S.R.: Machine Learning in Space and Time. Ph.D. thesis, Carnegie Mellon University (2015)
Gelman, A., Roberts, G.O., Gilks, W.R., et al.: Efficient metropolis jumping rules. Bayesian Stat. 5, 42 (1996)
Grisales, C.: From Border Security to Tobacco Age, Both Parties Tout Key Wins in Spending Deal. NPR. Accessed (2019). www.npr.org/2019/12/16/788506571/border-wall-to-tobacco-age-both-parties-tout-key-wins-in-spending-deal
Haario, H., Saksman, E., Tamminen, J., et al.: An adaptive metropolis algorithm. Bernoulli 7, 223–242 (2001)
Hardiman, S.J., Bercot, N., Bouchaud, J.-P.: Critical reflexivity in financial markets: a Hawkes process analysis. Eur. Phys. J. B 86, 442 (2013)
Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970)
Hawkes, A.G.: Point spectra of some mutually exciting point processes. J. R. Stat. Soc. Ser. B Methodol. 33, 438–443 (1971a)
Hawkes, A.G.: Spectra of some self-exciting and mutually exciting point processes. Biometrika 58, 83–90 (1971b)
Hawkes, A.: Spectra of some mutually exciting point processes with associated variables. Stoch. Point Process. 261–271 (1972)
Hawkes, A.: Cluster models for earthquakes-regional comparisons. Bull. Int. Stat. Inst. 45, 454–461 (1973)
Hawkes, A.G.: Hawkes processes and their applications to finance: a review. Quant. Finance 18, 193–198 (2018)
Holbrook, A., Lemey, P., Baele, G., Dellicour, S., Brockmann, D., Rambaut, A., Suchard, M.: Massive parallelization boosts big Bayesian multidimensional scaling. arXiv preprint arXiv:1905.04582 (2019)
Kelly, J.D., Park, J., Harrigan, R.J., Hoff, N.A., Lee, S.D., Wannier, R., Selo, B., Mossoko, M., Njoloko, B., Okitolonda-Wemakoy, E., et al.: Real-time predictions of the 2018–2019 ebola virus disease outbreak in the democratic republic of the congo using hawkes point process models. Epidemics 28, 100354 (2019)
Kim, H.: Spatio-temporal Point Process Models for the Spread of Avian Influenza Virus (H5N1). Ph.D. thesis UC Berkeley (2011)
Laub, P.J., Taimre, T., Pollett, P.K.: Hawkes processes. arXiv preprint arXiv:1507.02822 (2015)
Lee, A., Yau, C., Giles, M.B., Doucet, A., Holmes, C.C.: On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. J. Comput. Graph. Stat. 19, 769–789 (2010)
Linderman, S., Adams, R.: Discovering latent network structure in point process data. In: International Conference on Machine Learning, pp. 1413–1421 (2014)
Linderman, S.W., Wang, Y., Blei, D.M.: Bayesian inference for latent Hawkes processes. Adv. Neural Inf. Process. Syst. (2017)
Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: Nvidia tesla: a unified graphics and computing architecture. IEEE Micro 28, 39–55 (2008)
Loeffler, C., Flaxman, S.: Is gun violence contagious? A spatiotemporal test. J. Quant. Criminol. 34, 999–1017 (2018)
Mares, D., Blackburn, E.: Evaluating the effectiveness of an acoustic gunshot location system in St. Louis, MO. Polic. J. Policy Pract. 6, 26–42 (2012)
Mei, H., Eisner, J.M.: The neural Hawkes process: A neurally self-modulating multivariate point process. In: Advances in Neural Information Processing Systems, pp. 6754–6764 (2017)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953)
Metropolitan Police Department: Juvenile and Adult Homicide in the District of Columbia—2001–2005 (2006)
Meyer, S., Held, L., et al.: Power-law models for infectious disease spread. Ann. Appl. Stat. 8, 1612–1639 (2014)
Mohler, G.: Marked point process hotspot maps for homicide and gun crime prediction in Chicago. Int. J. Forecast. 30, 491–497 (2014)
National Research Council: Firearms and Violence: A Critical Review. National Academies Press (2005)
National Research Council: Priorities for Research to Reduce the Threat of Firearm-Related Violence. National Academies Press (2013)
Ogata, Y.: Statistical models for earthquake occurrences and residual analysis for point processes. J. Am. Stat. Assoc. 83, 9–27 (1988)
Park, J., Schoenberg, F.P., Bertozzi, A.L., Brantingham, P.J.: Investigating Clustering and Violence Interruption in Gang-Related Violent Crime Data Using Spatial–Temporal Point Processes with Covariates (2019)
Petho, A., Fallis, D., Keating, D.: Shotspotter Detection System Documents 39,000 Shooting Incidents in the District. Washington Post (2013). Accessed www.washingtonpost.com/investigations/
Plummer, M., Best, N., Cowles, K., Vines, K.: Coda: convergence diagnosis and output analysis for MCMC. R News 6, 7–11 (2006)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Vienna, Austria (2019)
Rasmussen, J.G.: Bayesian inference for Hawkes processes. Methodol. Comput. Appl. Probab. 15, 623–642 (2013)
Ratcliffe, J.H., Rengert, G.F.: Near-repeat patterns in Philadelphia shootings. Secur. J. 21, 58–76 (2008)
Reinders, J.: Intel Threading Building Blocks, 1st edn. O’Reilly & Associates Inc, Sebastopol (2007)
Reinhart, A., Greenhouse, J.: Self-exciting point processes with spatial covariates: modelling the dynamics of crime. J. R. Stat. Soc. Ser. C 67, 1305–1329 (2018)
Reinhart, A., et al.: A review of self-exciting spatio-temporal point processes and their applications. Stat. Sci. 33, 299–318 (2018)
Rizoiu, M.-A., Mishra, S., Kong, Q., Carman, M., Xie, L.: Sir–Hawkes: linking epidemic models and Hawkes processes to model diffusions in finite populations. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web International World Wide Web Conferences Steering Committee, pp. 419–428 (2018)
Roberts, G.O., Rosenthal, J.S.: Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. J. Appl. Probab. 44, 458–475 (2007)
Roberts, G.O., Rosenthal, J.S.: Examples of adaptive MCMC. J. Comput. Graph. Stat. 18, 349–367 (2009)
Rubin, R.: Tale of 2 agencies: CDC avoids gun violence research but NIH funds it. JAMA 315, 1689–1692 (2016)
Schoenberg, F.P.: Facilitated estimation of etas. Bull. Seismol. Soc. Am. 103, 601–605 (2013)
Showen, R.: Operational gunshot location system. In: Surveillance and Assessment Technologies for Law Enforcement, Vol. 2935 International Society for Optics and Photonics, pp. 130–139 (1997)
Suchard, M., Rambaut, A.: Many-core algorithms for statistical phylogenetics. Bioinformatics 25, 1370–1376 (2009)
Suchard, M., Wang, Q., Chan, C., Frelinger, J., Cron, A., West, M.: Understanding GPU programming for statistical computation: studies in massively parallel massive mixtures. J. Comput. Graph. Stat. 19, 419–438 (2010a)
Suchard, M.A., Holmes, C., West, M.: Some of the what?, why?, how?, who? and where? of graphics processing unit computing for Bayesian analysis. Bull. Int. Soc. Bayesian Anal. 17, 12–16 (2010b)
Truccolo, W.: From point process observations to collective neural dynamics: nonlinear Hawkes process glms, low-dimensional dynamics and coarse graining. J. Physiol. Paris 110, 336–347 (2016)
Ushey, K., Falcou, J.: RcppNT2: ‘Rcpp’ Integration for the ‘NT2’ Scientific Computing Library. R package version 0.1.0 (2016)
Wadman, M.: Firearms research: the gun fighter. Nat. News 496, 412 (2013)
Warne, D.J., Sisson, S.A., Drovandi, C.: Acceleration of expensive computations in Bayesian statistics using vector operations (2019). arXiv preprint arXiv:1902.09046
White, G., Porter, M.D.: GPU accelerated MCMC for modeling terrorist activity. Comput. Stat. Data Anal. 71, 643–651 (2014)
Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Springer, New York (2016)
Woelfle, M., Olliaro, P., Todd, M.H.: Open science is a research accelerator. Nat. Chem. 3, 745–748 (2011)
Yang, S.-H., Zha, H.: Mixture of mutually exciting processes for viral diffusion. In: International Conference on Machine Learning, pp. 1–9 (2013)
Zhou, H., Lange, K., Suchard, M.: Graphics processing units and high-dimensional optimization. Stat. Sci. 25, 311–324 (2010)
Zhuang, J., Ogata, Y., Vere-Jones, D.: Analyzing earthquake clustering features by using stochastic reconstruction. J. Geophys. Res. Solid Earth (2004). https://doi.org/10.1029/2003JB002879
Acknowledgements
The research leading to these results has received funding through National Institutes of Health Grant U19 AI135995 and National Science Foundation Grant DMS1264153. AJH is supported by NIH Grant K25AI153816. We gratefully acknowledge support from Nvidia Corporation with the donation of parallel computing resources used for this research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Holbrook, A.J., Loeffler, C.E., Flaxman, S.R. et al. Scalable Bayesian inference for self-excitatory stochastic processes applied to big American gunfire data. Stat Comput 31, 4 (2021). https://doi.org/10.1007/s11222-020-09980-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-020-09980-4