Abstract
The problem of detecting spatially-coherent groups of data that exhibit anomalous behavior has started to attract attention due to applications across areas such as epidemic analysis and weather forecasting. Earlier efforts from the data mining community have largely focused on finding outliers, individual data objects that display deviant behavior. Such point-based methods are not easy to extend to find groups of data that exhibit anomalous behavior. Scan statistics are methods from the statistics community that have considered the problem of identifying regions where data objects exhibit a behavior that is atypical of the general dataset. The spatial scan statistic and methods that build upon it mostly adopt the framework of defining a character for regions (e.g., circular or elliptical) of objects and repeatedly sampling regions of such character followed by applying a statistical test for anomaly detection. In the past decade, there have been efforts from the statistics community to enhance efficiency of scan statistics as well as to enable discovery of arbitrarily shaped anomalous regions. On the other hand, the data mining community has started to look at determining anomalous regions that have behavior divergent from their neighborhood. In this chapter, we survey the space of techniques for detecting anomalous regions on spatial data from across the data mining and statistics communities while outlining connections to well-studied problems in clustering and image segmentation. We analyze the techniques systematically by categorizing them appropriately to provide a structured birds-eye view of the work on anomalous region detection; we hope that this would encourage better cross-pollination of ideas across communities to help advance the frontier in anomaly detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use the prefix general to differentiate these from spatial outlier detection methods, that we will see shortly.
- 2.
An intuitive likelihood estimates the chances of generating the data points as against the expected probability, and aggregates it across the objects. Under the condition that the expected probability is the same for objects within and outside Z, any value of Z would yield the same likelihood.
- 3.
References
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: Optics: ordering points to identify the clustering structure. In: ACM Sigmod Record, vol. 28, pp. 49–60. ACM, New York (1999)
Balachandran, V., Deepak, P., Khemani, D.: Interpretable and reconfigurable clustering of document datasets by deriving word-based rules. Knowl. Inf. Syst. 32(3), 475–503 (2012)
Bonnet, N., Cutrona, J., Herbin, M.: A ‘no-threshold’ histogram-based image segmentation method. Pattern Recogn. 35(10), 2319–2322 (2002)
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM, New York (2000)
Celebi, M.E.: Partitional Clustering Algorithms. Springer, New York (2015)
Chaudhary, A., Szalay, A.S., Moore, A.W.: Very fast outlier detection in large multidimensional data sets. In: DMKD (2002)
Chawla, S., Sun, P.: Slom: a new measure for local spatial outliers. Knowl. Inf. Syst. 9(4), 412–429 (2006)
Deepak, P., Deshpande, P., Visweswariah, K., Telang, A.: System and method for clustering ensuring convexity in subspaces. Prior Art Database (IP.COM) (2013)
Duczmal, L., Assuncao, R.: A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Comput. Stat. Data Anal. 45(2), 269–286 (2004)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol. 96, pp. 226–231 (1996)
Fan, J., Yau, D.K., Elmagarmid, A.K., Aref, W.G.: Automatic image segmentation by integrating color-edge extraction and seeded region growing. IEEE Trans. Image Process. 10(10), 1454–1466 (2001)
Friedman, J.H., Fisher, N.I.: Bump hunting in high-dimensional data. Stat. Comput. 9(2), 123–143 (1999)
Glaz, J., Pozdnyakov, V., Wallenstein, S.: Scan Statistics: Methods and Applications. Springer, Berlin (2009)
Grady, L., Schwartz, E.L.: Isoperimetric graph partitioning for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28(3), 469–475 (2006)
Guha, S., Rastogi, R., Shim, K.: Rock: a robust clustering algorithm for categorical attributes. In: Proceedings of the 15th International Conference on Data Engineering, 1999, pp. 512–521. IEEE, New York (1999)
Hawkins, D.M.: Identification of Outliers, vol. 11. Springer, New York (1980)
He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recogn. Lett. 24(9), 1641–1650 (2003)
Jiang, M.-F., Tseng, S.-S., Su, C.-M.: Two-phase clustering process for outliers detection. Pattern Recogn. Lett. 22(6), 691–700 (2001)
Knox, E.M., Ng, R.T.: Algorithms for mining distance based outliers in large datasets. In: Proceedings of the International Conference on Very Large Data Bases, pp. 392–403. Citeseer, New York (1998)
Kulldorff, M.: A spatial scan statistic. Commun. Stat. Theory Methods 26(6), 1481–1496 (1997)
Kulldorff, M., Huang, L., Pickle, L., Duczmal, L.: An elliptic spatial scan statistic. Stat. Med 25(22), 3929–3943 (2006)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, California, vol. 1, pp. 281–297 (1967)
Mahoney, M.V., Chan, P.K., Arshad, M.H.: A machine learning approach to anomaly detection. Technical report, Tech. rep. CS–2003–06, Department of Computer Science, Florida Institute of Technology Melbourne (2003)
Neill, D.B., Moore, A.W., Cooper, G.F.: A bayesian spatial scan statistic. Adv. Neural Inf. Process. Syst. 18, 1003 (2006)
Patil, G., Taillie, C.: Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environ. Ecol. Stat. 11(2), 183–197 (2004)
Portnoy L., Eskin E., Stolfo S.: Intrusion detection with unlabeled data using clustering. In Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA-2001). pp. 5–8 (2001)
Revol, C., Jourlin, M.: A new minimum variance region growing algorithm for image segmentation. Pattern Recogn. Lett. 18(3), 249–258 (1997)
Schubert, E., Zimek, A., Kriegel, H.-P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Disc. 28(1), 190–237 (2014)
Shaw, J.R.: Quickfill: an efficient flood fill algorithm. The Code Project (2004)
Shekhar, S., Lu, C.-T., Zhang, P.: Detecting graph-based spatial outliers: algorithms and applications (a summary of results). In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 371–376. ACM, New York (2001)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Tango, T., Takahashi, K.: A flexibly shaped spatial scan statistic for detecting clusters. Int. J. Health Geogr. 4(1), 11 (2005)
Telang, A., Deepak, P., Joshi, S., Deshpande, P., Rajendran, R.: Detecting localized homogeneous anomalies over spatio-temporal data. Data Min. Knowl. Disc. 28(5–6), 1480–1502 (2014)
Turnbull, B.W., Iwano, E.J., Burnett, W.S., Howe, H.L., Clark, L.C.: Monitoring for clusters of disease: application to leukemia incidence in upstate new york. Am. J. Epidemiol. 132(Suppl. 1), 136–143 (1990)
Yu, D., Sheikholeslami, G., Zhang, A.: Findout: finding outliers in very large datasets. Knowl. Inf. Syst. 4(4), 387–412 (2002)
Zepeda-Mendoza, M.L., Resendis-Antonio, O.: Hierarchical agglomerative clustering. In: Encyclopedia of Systems Biology, pp. 886–887. Springer, Berlin (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Deepak, P. (2016). Anomaly Detection for Data with Spatial Attributes. In: Celebi, M., Aydin, K. (eds) Unsupervised Learning Algorithms. Springer, Cham. https://doi.org/10.1007/978-3-319-24211-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-24211-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24209-5
Online ISBN: 978-3-319-24211-8
eBook Packages: EngineeringEngineering (R0)