Data Mining and Knowledge Discovery

, Volume 28, Issue 5–6, pp 1480–1502 | Cite as

Detecting localized homogeneous anomalies over spatio-temporal data

  • Aditya TelangEmail author
  • P. Deepak
  • Salil Joshi
  • Prasad Deshpande
  • Ranjana Rajendran


The last decade has witnessed an unprecedented growth in availability of data having spatio-temporal characteristics. Given the scale and richness of such data, finding spatio-temporal patterns that demonstrate significantly different behavior from their neighbors could be of interest for various application scenarios such as—weather modeling, analyzing spread of disease outbreaks, monitoring traffic congestions, and so on. In this paper, we propose an automated approach of exploring and discovering such anomalous patterns irrespective of the underlying domain from which the data is recovered. Our approach differs significantly from traditional methods of spatial outlier detection, and employs two phases—(i) discovering homogeneous regions, and (ii) evaluating these regions as anomalies based on their statistical difference from a generalized neighborhood. We evaluate the quality of our approach and distinguish it from existing techniques via an extensive experimental evaluation.


Outlier Detection Homogeneous Region Anomaly Detection Gini Index Homogeneous Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. 2012 IEEE Conf Comput Vis Pattern Recognit 0:1597–1604Google Scholar
  2. Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Analy Mach Intell 33(5):898–916CrossRefGoogle Scholar
  3. Birant D, Kut A (2007) St-dbscan: An algorithm for clustering spatial-temporal data. Data Knowl Eng 60(1):208–221CrossRefGoogle Scholar
  4. Bonett DG (2006) Confidence interval for a coefficient of quartile variation. Comput Stat Data Anal 50(11):2953–2957CrossRefzbMATHMathSciNetGoogle Scholar
  5. Bonnet N, Cutrona J, Herbin M (2002) A no-thresholdhistogram-based image segmentation method. Pattern Recognit 35(10):2319–2322CrossRefzbMATHGoogle Scholar
  6. Ceriani L, Verme P (2012) The origins of the gini index: extracts from variabilità e mutabilità (1912) by corrado gini. J Econ Inequal 10(3):421–443CrossRefGoogle Scholar
  7. Cheng T, Li Z (2004) A hybrid approach to detect spatial-temporal outliers. In Proceedings of the 12th International Conference on Geoinformatics Geospatial Information Research, pp. 173–178.Google Scholar
  8. Deaton A (1997) The analysis of household surveys: a microeconometric approach to development policy. Johns Hopkins University Press, BaltimoreCrossRefGoogle Scholar
  9. Duczmal L (2004) A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Comput Stat Data Anal 45(2):269–286CrossRefzbMATHMathSciNetGoogle Scholar
  10. El-Hamdouchi A, Willett P (1989) Comparison of hierarchie agglomerative clustering methods for document retrieval. Comput J 32(3):220–227CrossRefGoogle Scholar
  11. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pp. 226–231Google Scholar
  12. Fan J, Yau DK, Elmagarmid AK, Aref WG (2001) Automatic image segmentation by integrating color-edge extraction and seeded region growing. IEEE Trans Image Process 10(10):1454–1466CrossRefzbMATHGoogle Scholar
  13. Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181CrossRefGoogle Scholar
  14. Friedman JH, Fisher NI (1999) Bump hunting in high-dimensional data. Stat Comput 9(2):123–143CrossRefGoogle Scholar
  15. Gajdos T, Weymark JA (2005) Multidimensional generalized Gini indices. Econ Theory 26(3):471–496CrossRefzbMATHMathSciNetGoogle Scholar
  16. Grady L, Schwartz EL (2006) Isoperimetric graph partitioning for image segmentation. IEEE Trans Pattern Anal Mach Intell 28(3):469–475CrossRefGoogle Scholar
  17. Huelsenbeck JP, Crandall KA (1997) Phylogeny estimation and hypothesis testing using maximum likelihood. Ann Rev Ecol Syst 28:437–466CrossRefGoogle Scholar
  18. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRefGoogle Scholar
  19. Joseph FL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382CrossRefGoogle Scholar
  20. Kisilevich S, Mansmann F, Nanni M, Rinzivillo S (2010) Spatio-temporal clustering: a survey. Data mining and knowledge discovery handbook. Springer, New York, pp 855–874Google Scholar
  21. Kou Y, tien Lu C (2006) Spatial weighted outlier detection. In Proceedings of SIAM Conference on Data MiningGoogle Scholar
  22. Kulldorff M (1997) A spatial scan statistic. Commun Stat-Theory Methods 26(6):1481–1496CrossRefzbMATHMathSciNetGoogle Scholar
  23. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(11):159–174Google Scholar
  24. Lukasová A (1979) Hierarchical agglomerative clustering procedure. Pattern Recognit 11(5–6):365–381CrossRefzbMATHGoogle Scholar
  25. Mankiewicz R (2000) The story of mathematics. Princeton University Department of Art, PrincetonzbMATHGoogle Scholar
  26. Mood A, Graybill F, Boes D (1963) Introduction to the theory of statistics. Mc-graw hill book company. Inc., New YorkGoogle Scholar
  27. Neill DB, Moore AW (2004) Rapid detection of significant spatial clusters. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’04, pp. 256–265, New York, NY. ACM.Google Scholar
  28. Neill DB, Moore AW, Cooper GF (2005) A bayesian spatial scan statistic. In NIPSGoogle Scholar
  29. Ohlander R, Price K, Reddy DR (1978) Picture segmentation using a recursive region splitting method. Comput Gr Image Process 8(3):313–333CrossRefGoogle Scholar
  30. Pang LX, Chawla S, Liu W, Zheng Y (2011) On mining anomalous patterns in road traffic streams. In Advanced Data Mining and Applications, pp. 237–251. SpringerGoogle Scholar
  31. Patil GP, Taillie C (2004) Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environ Ecol Stat 11:183–197CrossRefMathSciNetGoogle Scholar
  32. Reades J, Calabrese F, Sevtsuk A, Ratti C (2007) Cellular census: explorations in urban data collection. IEEE Pervasive Comput 6(3):30–38CrossRefGoogle Scholar
  33. Revol C, Jourlin M (1997) A new minimum variance region growing algorithm for image segmentation. Pattern Recognit Lett 18(3):249–258CrossRefGoogle Scholar
  34. Schubert E, Zimek A, Kriegel H-P (2014) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Know Discov 28(1):190–237CrossRefzbMATHMathSciNetGoogle Scholar
  35. Shekar S, Lu C-T, Zhang P (2002) Detecting graph-based spatial outliers. Intell Data Anal 6(5):451–468Google Scholar
  36. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRefGoogle Scholar
  37. Sindhu B, Suresh I, Unnikrishnan A, Bhatkar N, Neetu S, Michael G (2007) Improved bathymetric datasets for the shallow water regions in the indian ocean. J Earth Syst Sci 116(3):261–274CrossRefGoogle Scholar
  38. Stolorz PE, Nakamura H, Mesrobian E, Muntz RR, Shek EC, Santos JR, Yi J, Ng KW, Chien S-Y, Mechoso CR, Farrara JD (1995) Fast spatio-temporal data mining of large geophysical datasets. In KDD, pp. 300–305Google Scholar
  39. Tango T, Takahashi K (2005) A flexibly shaped spatial scan statistic for detecting clusters. Int J Health Geogr 4:11CrossRefGoogle Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  • Aditya Telang
    • 1
    Email author
  • P. Deepak
    • 1
  • Salil Joshi
    • 1
  • Prasad Deshpande
    • 1
  • Ranjana Rajendran
    • 2
  1. 1.IBM ResearchBangaloreIndia
  2. 2.University of CaliforniaSanta CruzUSA

Personalised recommendations