Clusters Identification in Binary Genomic Data: The Alternative Offered by Scan Statistics Approach

  • Danilo Pellin
  • Clelia Di SerioEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8452)


In many different research area, identification of clusters or regions showing an increment in event rate over a given study area is an important and interesting problem. Nowadays literature concerning scan statistics is quite broad and methods can be subdivided based on dimensional complexity of the study area, assumption on distribution generating the data under the null hypothesis and shape-dimension of the scanning window. The aim of this study is to adapt and apply this methodology to the genomics field taking into account for some peculiarities of these data and to compare its performance to existing method based on DBSCAN algorithm.


Hotspot Scan statistics Binary genomic event 


  1. 1.
    Aiuti, A., Cattaneo, F., Galimberti, S., Benninghoff, U., Cassani, B.: Gene therapy for immunodeficiency due to adenosine deaminase deficiency. N. Engl. J. Med. 360, 447–458 (2009)CrossRefGoogle Scholar
  2. 2.
    Cartier, N., Hacein-Bey-Abina, S., Bartholomae, C., Veres, G., Schmidt, M., et al.: Hematopoietic stem cell gene therapy with a lentiviral vector in X-linked adrenoleukodystrophy. Science 326, 818–823 (2009)Google Scholar
  3. 3.
    Cattoglio, C., Pellin, D., Rizzi, E., Maruggi, G., Corti, G., Miselli, F., Sartori, D., Guffanti, A., Di Serio, C., Ambrosi, A., De Bellis, G., Mavilio, F.: High-definition mapping of retroviral integration sites identifies active regulatory elements in human multipotent hematopoietic progenitors. Blood 116, 5507–5517 (2010)CrossRefGoogle Scholar
  4. 4.
    Dwass, M.: Modified randomization tests for nonparametric hypothesis. Ann. Math. Stat. 28, 181–187 (1957)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Hacein-Bey-Abina, S., Le Deist, F., Carlier, F., Bouneaud, C., Hue, C., et al.: Sustained correction of X-linked severe combined immunodeficiency by ex vivo gene therapy. Engl. J. Med. 346, 1185–1193 (2002)CrossRefGoogle Scholar
  6. 6.
    Kulldorff, M.: A spatial scan statistic. Commun. Stat. Theory Methods 26, 1481–1496 (1997)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Loader, C.R.: Large-deviation approximation to the distribution of scan statistics. Ann. Appl. Probab. 23, 751771 (1991)MathSciNetGoogle Scholar
  8. 8.
    Martin, E., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), Institute for Computer Science. University of Munich (1996)Google Scholar
  9. 9.
    Mavilio, F., Pellegrini, G., Ferrari, S., Di Nunzio, F., Di Iorio, E., et al.: Correction of junctional epidermolysis bullosa by transplantation of genetically modified epidermal stem cells. Nat. Med. 12, 1397–1402 (2006)CrossRefGoogle Scholar
  10. 10.
    Turnbull, B., Iwano, E.J., Burnett, W.S., Howe, H.L., Clark, L.: Monitoring for cluster of disease: application to leukemia incidence in upstate New York. Am. J. Epidemiol. 132, S136–S143 (1990)Google Scholar
  11. 11.
    Wu, X., Luke, B.T., Burgess, S.M.: Redefining the common insertion site. Virology 344, 292–295 (2006)CrossRefGoogle Scholar
  12. 12.
    Zhang, Z., Assuno, R., Kulldorff, M.: Spatial scan statistics adjusted for multiple clusters. J. Probab. Stat. 11 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.University Center of Statistics for the Biomedical Sciences, Vita-Salute San Raffaele UniversityMilanItaly

Personalised recommendations