AStA Advances in Statistical Analysis

, Volume 102, Issue 4, pp 537–563 | Cite as

A modified generalized lasso algorithm to detect local spatial clusters for count data

  • Hosik Choi
  • Eunjung Song
  • Seung-sik Hwang
  • Woojoo LeeEmail author
Original Paper


Detecting local spatial clusters for count data is an important task in spatial epidemiology. Two broad approaches—moving window and disease mapping methods—have been suggested in some of the literature to find clusters. However, the existing methods employ somewhat arbitrarily chosen tuning parameters, and the local clustering results are sensitive to the choices. In this paper, we propose a penalized likelihood method to overcome the limitations of existing local spatial clustering approaches for count data. We start with a Poisson regression model to accommodate any type of covariates, and formulate the clustering problem as a penalized likelihood estimation problem to find change points of intercepts in two-dimensional space. The cost of developing a new algorithm is minimized by modifying an existing least absolute shrinkage and selection operator algorithm. The computational details on the modifications are shown, and the proposed method is illustrated with Seoul tuberculosis data.


Spatial clustering Penalized likelihood Generalized LASSO Poisson regression 



This research was supported by INHA UNIVERSITY Research Grant.


  1. Amin, R., Bohnert, A., Holmes, L., Rajasekaran, A., Assanasen, C.: Epidemiologic mapping of Florida childhood cancer clusters. Pediatr. Blood Cancer 54, 511–518 (2010)Google Scholar
  2. Assunção, R., Costa, M., Tavares, A., Ferreira, S.: Fast detection of arbitrarily shaped disease clusters. Stat. Med. 25, 723–742 (2006)MathSciNetCrossRefGoogle Scholar
  3. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  4. Besag, J., Newell, J.: The detection of clusters in rare diseases. J. R. Stat. Soc. Ser. A 154, 143–155 (1991)CrossRefGoogle Scholar
  5. Besag, J., York, J., Mollié, A.: Bayesian image restoration, with two applications in spatial statistics. Ann. Inst. Stat. Math. 43, 1–20 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  6. Fan, J., Li, R.: Variable Selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Soc. 96, 1348–1360 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  7. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)CrossRefGoogle Scholar
  8. Hannart, A., Naveau, P.: An improved Bayesian information criterion for multiple change point models. Technometrics 54, 256–268 (2012)MathSciNetCrossRefGoogle Scholar
  9. Heinzl, F., Tutz, G.: Clustering in linear-mixed models with a group fused lasso penalty. Biom. J. 56, 44–68 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  10. Hunter, D.: MM algorithms for generalized Bradley–Terry models. Ann. Stat. 32, 384–406 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  11. Hunter, D., Lange, K.: A tutorial on MM algorithms. Am. Stat. 58, 30–37 (2004)MathSciNetCrossRefGoogle Scholar
  12. Hunter, D., Li, R.: Variable selection using MM algorithms. Ann. Stat. 33, 1617–1642 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  13. Jung, I.: A generalized linear models approach to spatial scan statistics for covariate adjustment. Stat. Med. 28, 1131–1143 (2009)MathSciNetCrossRefGoogle Scholar
  14. Kulldorff, M.: A spatial scan statistic. Commun. Stat. Theory Methods 26, 1481–1496 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  15. Kulldorff, M., Huang, L., Pickle, L., Duczmal, L.: An elliptic spatial scan statistic. Stat. Med. 25, 3929–3943 (2006)MathSciNetCrossRefGoogle Scholar
  16. Kulldorff, M., Nagarwalla, N.: Spatial disease clusters: detection and inference. Stat. Med. 14, 799–810 (1995)CrossRefGoogle Scholar
  17. Lange, K.: Optimization. Springer, London (2013)CrossRefzbMATHGoogle Scholar
  18. Lopez de Fede, A., Stewart, J., Harris, M., Mayfield-Smith, K.: Tuberculosis in socio-economically deprived neighborhoods: missed opportunities for prevention. Int. J. Tuberc. Lung Dis. 12, 1425–1430 (2008)Google Scholar
  19. McLennan, D., Barnes, H., Noble, M., Davies, J., Garratt, E.: The English Indices of Deprivation 2010, pp. 13–14. Department for Communities and Local Government, London (2011)Google Scholar
  20. Ngui, A.N., Apparicio, P., Fleury, M.J., Lesage, A., Grgoire, J.P., Moisan, J., Vanasse, A.: Spatio-temporal clustering of the incidence of Schizophrenia in Quebec, Canada from 2004 to 2007. Spat. Spat. Tempor. Epidemiol. (2013).
  21. Oelker, M., Gertheiss, J., Tutz, G.: Regularization and model selection with categorical predictors and effect modifiers in generalized linear models. Stat. Modelling 14, 157–177 (2014)MathSciNetCrossRefGoogle Scholar
  22. Ollier, E., Viallon, V.: Regression modeling on stratified data with the lasso. (2016). arXiv:1508.05476v2
  23. Openshaw, S., Charlton, M., Wymer, C., Craft, A.W.: analysis machine for the automated analysis of point data sets. Int. J. Geogr. Inf. Syst. 1, 335–358 (1987)CrossRefGoogle Scholar
  24. Picard, F., Robin, S., Lavielle, M., Vaisse, C., Daudin, J.: A statistical approach for array CGH data analysis. BMC Bioinf. 6, 27 (2005)CrossRefGoogle Scholar
  25. Richardson, S., Thompson, A., Best, N., Elliott, P.: Interpreting Posterior relative risk estimates in disease-mapping studies. Environ. Health Perspect. 112, 1016–1025 (2004)CrossRefGoogle Scholar
  26. Sommer, J.C., Gertheiss, J., Schmid, V.J.: Spatially regularized estimation for the analysis of dynamic contrast-enhanced magnetic resonance imaging data. Stat. Med. 33, 1029–1041 (2014)MathSciNetCrossRefGoogle Scholar
  27. Sugumaran, R., Larson, S.R., DeGroote, J.P.: Spatio-temporal cluster analysis of county-based human West Nile virus incidence in the continental United States. Int. J. Health Geogr. 8, 43 (2009)CrossRefGoogle Scholar
  28. Tango, T., Takahashi, K.: A flexibly shaped spatial scan statistic for detecting clusters. Int. J. Health Geogr. 4, 11 (2005)CrossRefGoogle Scholar
  29. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B 67, 91–108 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  30. Tibshirani, R., Wang, P.: Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9, 18–29 (2008)CrossRefzbMATHGoogle Scholar
  31. Tibshirani, R.J., Taylor, J.: The solution path of the generalized lasso. Ann. Stat. 39, 1335–1371 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  32. Townsend, P.: Deprivation. J. Soc. Policy 16, 125–146 (1987)CrossRefGoogle Scholar
  33. Wang, H., Rodríguez, A.: Identifying pediatric cancer clusters in Florida using log-linear models and generalized lasso penalties. Stat. Public Policy 1, 86–96 (2014)CrossRefGoogle Scholar
  34. Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  35. Zhang, N.R., Siegmund, D.O.: A modified Bayesian information criterion with applications to the analysis of comparative genome hybridization data. Biometrics 63, 22–32 (2007)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Hosik Choi
    • 1
  • Eunjung Song
    • 2
  • Seung-sik Hwang
    • 3
  • Woojoo Lee
    • 2
    Email author
  1. 1.Department of Applied StatisticsKyonggi UniversitySuwonKorea
  2. 2.Department of StatisticsInha UniversityIncheonKorea
  3. 3.Department of Public Health ScienceGraduate School of Public Health, Seoul National UniversitySeoulRepublic of Korea

Personalised recommendations