Skip to main content

Advertisement

Log in

Detecting localized homogeneous anomalies over spatio-temporal data

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The last decade has witnessed an unprecedented growth in availability of data having spatio-temporal characteristics. Given the scale and richness of such data, finding spatio-temporal patterns that demonstrate significantly different behavior from their neighbors could be of interest for various application scenarios such as—weather modeling, analyzing spread of disease outbreaks, monitoring traffic congestions, and so on. In this paper, we propose an automated approach of exploring and discovering such anomalous patterns irrespective of the underlying domain from which the data is recovered. Our approach differs significantly from traditional methods of spatial outlier detection, and employs two phases—(i) discovering homogeneous regions, and (ii) evaluating these regions as anomalies based on their statistical difference from a generalized neighborhood. We evaluate the quality of our approach and distinguish it from existing techniques via an extensive experimental evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://climate.geog.udel.edu/~climate/html_pages/download.html#ghcn_T_P2.

  2. In this paper, we extensively use color-based figures to illustrate the concepts of anomalies. Hence, we request the reader to refer to the electronic version or a colored printout of the paper for better readability.

  3. http://en.wikipedia.org/wiki/Taklamakan_Desert.

  4. http://www.guardian.co.uk/news/datablog/2012/jul/04/us-fourth-july-twitter-beer-church.

  5. http://en.wikipedia.org/wiki/Bible_Belt.

  6. For sake of clarity, we illustrate a spatial grid; however, the formulation is extendible to the temporal dimension.

  7. http://en.wikipedia.org/wiki/Chebyshev_distance.

  8. http://en.wikipedia.org/wiki/Statistical_model#Model_comparison.

  9. http://climate.geog.udel.edu/~climate/html_pages/download.html#ghcn_T_P2.

  10. http://www.nio.org/index/option/com_subcategory/task/show/title/Sea-floorData/tid/2/sid/18/thid/113.

  11. We do not include outlier detection techniques in our comparative analysis since it is not clear as to how outlier detection techniques that estimate divergent behavior at each data object level may be fairly compared with techniques that discover groups of objects that exhibit divergent behavior.

  12. It must be noted that conducting user surveys is a difficult task. Hence, we conducted the user survey on \(Dataset_1\) only and not on on \(Dataset_2\).

  13. http://en.wikipedia.org/wiki/Students_t-test.

  14. http://docs.oracle.com/javase/6/docs/api/java/util/Collections.html#shuffle(java.util.List).

References

  • Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. 2012 IEEE Conf Comput Vis Pattern Recognit 0:1597–1604

    Google Scholar 

  • Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Analy Mach Intell 33(5):898–916

    Article  Google Scholar 

  • Birant D, Kut A (2007) St-dbscan: An algorithm for clustering spatial-temporal data. Data Knowl Eng 60(1):208–221

    Article  Google Scholar 

  • Bonett DG (2006) Confidence interval for a coefficient of quartile variation. Comput Stat Data Anal 50(11):2953–2957

    Article  MATH  MathSciNet  Google Scholar 

  • Bonnet N, Cutrona J, Herbin M (2002) A no-thresholdhistogram-based image segmentation method. Pattern Recognit 35(10):2319–2322

    Article  MATH  Google Scholar 

  • Ceriani L, Verme P (2012) The origins of the gini index: extracts from variabilità e mutabilità (1912) by corrado gini. J Econ Inequal 10(3):421–443

    Article  Google Scholar 

  • Cheng T, Li Z (2004) A hybrid approach to detect spatial-temporal outliers. In Proceedings of the 12th International Conference on Geoinformatics Geospatial Information Research, pp. 173–178.

  • Deaton A (1997) The analysis of household surveys: a microeconometric approach to development policy. Johns Hopkins University Press, Baltimore

    Book  Google Scholar 

  • Duczmal L (2004) A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Comput Stat Data Anal 45(2):269–286

    Article  MATH  MathSciNet  Google Scholar 

  • El-Hamdouchi A, Willett P (1989) Comparison of hierarchie agglomerative clustering methods for document retrieval. Comput J 32(3):220–227

    Article  Google Scholar 

  • Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pp. 226–231

  • Fan J, Yau DK, Elmagarmid AK, Aref WG (2001) Automatic image segmentation by integrating color-edge extraction and seeded region growing. IEEE Trans Image Process 10(10):1454–1466

    Article  MATH  Google Scholar 

  • Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181

    Article  Google Scholar 

  • Friedman JH, Fisher NI (1999) Bump hunting in high-dimensional data. Stat Comput 9(2):123–143

    Article  Google Scholar 

  • Gajdos T, Weymark JA (2005) Multidimensional generalized Gini indices. Econ Theory 26(3):471–496

    Article  MATH  MathSciNet  Google Scholar 

  • Grady L, Schwartz EL (2006) Isoperimetric graph partitioning for image segmentation. IEEE Trans Pattern Anal Mach Intell 28(3):469–475

    Article  Google Scholar 

  • Huelsenbeck JP, Crandall KA (1997) Phylogeny estimation and hypothesis testing using maximum likelihood. Ann Rev Ecol Syst 28:437–466

    Article  Google Scholar 

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  • Joseph FL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382

    Article  Google Scholar 

  • Kisilevich S, Mansmann F, Nanni M, Rinzivillo S (2010) Spatio-temporal clustering: a survey. Data mining and knowledge discovery handbook. Springer, New York, pp 855–874

    Google Scholar 

  • Kou Y, tien Lu C (2006) Spatial weighted outlier detection. In Proceedings of SIAM Conference on Data Mining

  • Kulldorff M (1997) A spatial scan statistic. Commun Stat-Theory Methods 26(6):1481–1496

    Article  MATH  MathSciNet  Google Scholar 

  • Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(11):159–174

  • Lukasová A (1979) Hierarchical agglomerative clustering procedure. Pattern Recognit 11(5–6):365–381

    Article  MATH  Google Scholar 

  • Mankiewicz R (2000) The story of mathematics. Princeton University Department of Art, Princeton

    MATH  Google Scholar 

  • Mood A, Graybill F, Boes D (1963) Introduction to the theory of statistics. Mc-graw hill book company. Inc., New York

    Google Scholar 

  • Neill DB, Moore AW (2004) Rapid detection of significant spatial clusters. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’04, pp. 256–265, New York, NY. ACM.

  • Neill DB, Moore AW, Cooper GF (2005) A bayesian spatial scan statistic. In NIPS

  • Ohlander R, Price K, Reddy DR (1978) Picture segmentation using a recursive region splitting method. Comput Gr Image Process 8(3):313–333

    Article  Google Scholar 

  • Pang LX, Chawla S, Liu W, Zheng Y (2011) On mining anomalous patterns in road traffic streams. In Advanced Data Mining and Applications, pp. 237–251. Springer

  • Patil GP, Taillie C (2004) Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environ Ecol Stat 11:183–197

    Article  MathSciNet  Google Scholar 

  • Reades J, Calabrese F, Sevtsuk A, Ratti C (2007) Cellular census: explorations in urban data collection. IEEE Pervasive Comput 6(3):30–38

    Article  Google Scholar 

  • Revol C, Jourlin M (1997) A new minimum variance region growing algorithm for image segmentation. Pattern Recognit Lett 18(3):249–258

    Article  Google Scholar 

  • Schubert E, Zimek A, Kriegel H-P (2014) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Know Discov 28(1):190–237

    Article  MATH  MathSciNet  Google Scholar 

  • Shekar S, Lu C-T, Zhang P (2002) Detecting graph-based spatial outliers. Intell Data Anal 6(5):451–468

    Google Scholar 

  • Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  • Sindhu B, Suresh I, Unnikrishnan A, Bhatkar N, Neetu S, Michael G (2007) Improved bathymetric datasets for the shallow water regions in the indian ocean. J Earth Syst Sci 116(3):261–274

    Article  Google Scholar 

  • Stolorz PE, Nakamura H, Mesrobian E, Muntz RR, Shek EC, Santos JR, Yi J, Ng KW, Chien S-Y, Mechoso CR, Farrara JD (1995) Fast spatio-temporal data mining of large geophysical datasets. In KDD, pp. 300–305

  • Tango T, Takahashi K (2005) A flexibly shaped spatial scan statistic for detecting clusters. Int J Health Geogr 4:11

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditya Telang.

Additional information

Responsible editor: Toon Calders, Floriana Esposito, Eyke Hüllermeier, Rosa Meo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Telang, A., Deepak, P., Joshi, S. et al. Detecting localized homogeneous anomalies over spatio-temporal data. Data Min Knowl Disc 28, 1480–1502 (2014). https://doi.org/10.1007/s10618-014-0366-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-014-0366-x

Keywords

Navigation