Semi-parametric Cluster Detection

  • Benjamin KedemEmail author
  • Shihua Wen


A semi-parametric density ratio testing method which borrows strength from two or more samples is applied to moving windows of variable size in cluster detection. The method requires neither the prior knowledge of the underlying distribution nor the number of cases before scanning. A Monte Carlo power study shows that given a cluster candidate, under certain conditions the semi-parametric density ratio method achieves a relatively higher power than the power achieved by Kulldorff‘s celebrated scan statistics method and by a certain focused test in testing the hypothesis of no cluster. The semi-parametric method potential in cluster detection is illustrated using both simulated and real spatial data.

AMS Subject Classification



Combined data exponential tilt likelihood moving window power scan statistics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Alexander, F., Cartwright, R., McKinney, P.A., Ricketts T.J., 1990. Investigation of spatial clustering of rare diseases: childhood malignancies in North Humberside. Journal of Epidemiology and Community Health 44, 39–46.CrossRefGoogle Scholar
  2. Colt, J.S., Blair, A., 1998. Parental Occupational Exposures and Risk of Childhood Cancer, Environmental Health Perspectives Supplements 106, No. S3, 909–925.CrossRefGoogle Scholar
  3. Cuzick J., Edwards, R., 1990. Spatial clustering for inhomogeneous populations. Journal of the Royal Statistical Society B 52, 73–104.MathSciNetzbMATHGoogle Scholar
  4. Fokianos, K., 2006. Density Ratio Model Selection. Journal of Statistical Computations and Simulation. To appear.Google Scholar
  5. Fokianos, K., Kaimi, I., 2006. On the effect of misspecifying the density of ratio model. Annals of the Institute of Statistical Mathematics 58, 475–497.MathSciNetCrossRefGoogle Scholar
  6. Fokianos, K., Kedem, B., Qin, J., Short, D.A., 2001. A semiparametric approach to the one-way layout. Technometrics 43, 56–65.MathSciNetCrossRefGoogle Scholar
  7. Gagnon, R., 2005. Certain Computational Aspects of Power Efficiency and of State Space Models. Ph.D Dissertation, Department of Mathematics, University of Maryland, College Park, Maryland.Google Scholar
  8. Glaz, K., 2003. Scan statistics. Chapman and Hall/CRC, New York.zbMATHGoogle Scholar
  9. Hjalmars, U., Kulldorff, M., Gustasfsson, G, 1996. Childhood leukemia in Sweden: using GIS and a spatial scan statistic for cluster detection. Statistics in Medicine 15, 707–715.CrossRefGoogle Scholar
  10. Kay, R., Little, S., 1987. Transformations of the explanatory variables in the logistic regression model for binary data. Biometrika 74, 495–501.MathSciNetCrossRefGoogle Scholar
  11. Kedem, B., Wolff, D.B., Fokianos, K., 2004. Statistical Comparison of Algorithms. IEEE Transactions on Instrumentation and Measurement 53, 770–776.CrossRefGoogle Scholar
  12. Keziou, A., Leoni-Aubin, S., (2005). Test of homogeneity in semiparametric two-sample density ratio models. Comptes Rendus de l‘Académie des Sciences, Paris, Ser. I 340, 905–910.MathSciNetzbMATHGoogle Scholar
  13. Kulldorff, M., 1997. A spatial scan statistic. Communications in Statistics—Theory and Methods 26, 1481–1496.MathSciNetCrossRefGoogle Scholar
  14. Kulldorff, M., 2001. Prospective time periodic geographical disease surveillance using a scan statistic, Journal of the Royal Statistical Society A, 164, part 1, 61–72.MathSciNetCrossRefGoogle Scholar
  15. Kulldorff, M., Nagarwalla, N., 1995. Spatial disease clusters: detection and inference, Statistics in Medicine 14, 799–810.CrossRefGoogle Scholar
  16. Lawson, A., Biggeri, A., Bohning, D., Lesaffre, E., Viel J-F., Bertollini, R., (Eds.) 1999. Disease Mapping and Risk Assessment for Public Health, Wiley, West Sussex, England.zbMATHGoogle Scholar
  17. McKinney P.A., Alexander F.E., Cartwright R.A., Parker L., 1991. Parental occupations of children with leukemia in west Cumbria, North Humberside, and Gateshead. British Medical Journal 302, 681–687.CrossRefGoogle Scholar
  18. Naus, J.I., 1965. The distribution of the size of the maximum cluster of points on a line. Journal of the American Statistical Association, 60, 532–538.MathSciNetCrossRefGoogle Scholar
  19. Pickle, L.W., Feuer, E.J., Edwards, B.K., 2003. U.S. Predicted Cancer Incidence, 1999: Complete Maps by County and State From Spatial Projection Models, National Cancer Institute, Cancer Surveillance Monograph No. 5, NIH Publication No. 03-5435.Google Scholar
  20. Qin, J., 1993. Empirical likelihood in biased sampling problems. The Annals of Statistics 21, 1182–1186.MathSciNetCrossRefGoogle Scholar
  21. Qin, J., Lawless, J.F., 1994. Empirical likelihood and general estimating equations. The Annals of Statistics 22, 300–325.MathSciNetCrossRefGoogle Scholar
  22. Qin, J., Zhang, B., 1997. A goodness of fit test for logistic regression models based on case-control data. Biometrika 84, 609–618.MathSciNetCrossRefGoogle Scholar
  23. Sen P.K., Singer J.M., 1993. Large sample method in statistics, An introduction with Applications. Chapman and Hall/CRC, London.CrossRefGoogle Scholar
  24. Waller, L.A., Lawson, A.B., 1995. The power of focused tests to detect disease clustering. Statistics in Medicine 14, 2291–2308.CrossRefGoogle Scholar

Copyright information

© Grace Scientific Publishing 2007

Authors and Affiliations

  1. 1.Department of MathematicsUniversity of MarylandCollege ParkUSA

Personalised recommendations