Advertisement

GeoInformatica

, Volume 18, Issue 3, pp 501–536 | Cite as

On detecting spatial categorical outliers

  • Xutong LiuEmail author
  • Feng Chen
  • Chang-Tien Lu
Article

Abstract

Spatial outlier detection is an important research problem that has received much attentions in recent years. Most existing approaches are designed for numerical attributes, but are not applicable to categorical ones (e.g., binary, ordinal, and nominal) that are popular in many applications. The main challenges are the modeling of spatial categorical dependency as well as the computational efficiency. This paper presents the first outlier detection framework for spatial categorical data. Specifically, a new metric, named as Pair Correlation Ratio (PCR), is measured for each pair of category sets based on their co-occurrence frequencies at specific spatial distance ranges. The relevances among spatial objects are then calculated using PCR values with regard to their spatial distances. The outlierness for each object is defined as the inverse of the average relevance between an object and its spatial neighbors. Those objects with the highest outlier scores are returned as spatial categorical outliers. A set of algorithms are further designed for single-attribute and multi-attribute spatial categorical datasets. Extensive experimental evaluations on both simulated and real datasets demonstrated the effectiveness and efficiency of our proposed approaches.

Keywords

Spatial Categorical data Spatial dependency Pair correlation Outlier detection 

References

  1. 1.
    Adam NR, Janeja VP, Atluri V (2004) Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets. In: Proceedings of the 2004 ACM symposium on applied computing, pp 576–583Google Scholar
  2. 2.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, VLDB ’94, pp 487–499Google Scholar
  3. 3.
    Anselin L (1995) Local indicators of spatial association-lisa. Geogr Anal 27(2):93–115CrossRefGoogle Scholar
  4. 4.
    Aurenhammer F (1991) Voronoi diagrams: a survey of a fundamental geometric data structure. ACM Comput Surv 23(3):345–405CrossRefGoogle Scholar
  5. 5.
    Bel L, Allard D, Laurent JM, Cheddadi R, Bar-Hen A (2009) Cart algorithm for spatial data: application to environmental and ecological data. Comput Statist Data Anal 53(8):3082–3093CrossRefGoogle Scholar
  6. 6.
    Berchtold S, Ertl B, Keim DA, Kriegel HP, Seidl T (1998) Fast nearest neighbor search in high-dimensional space. In: Proceedings of the 14th international conference on data engineering, pp 209–218Google Scholar
  7. 7.
    Boriah S, Chandola V, Kumar V (2008) Similarity measures for categorical data: a comparative evaluation. In: SDM, pp 243–254Google Scholar
  8. 8.
    Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 93–104Google Scholar
  9. 9.
    Bronstein R, Das J, Duro M, Friedrich R, Kleyner G, Mueller M, Singhal S, Cohen I, Kleyner G, Mueller M, Singhal S, Cohen I (2001) Self-aware services: using bayesian networks for detecting anomalies in internet-based services. Northwestern University and Stanford University, pp 623–638Google Scholar
  10. 10.
    Chan PK, Mahoney MV, Arshad MH (2003) A machine learning approach to anomaly detection. Technical ReportGoogle Scholar
  11. 11.
    Chandola V, Boriah S, Kuman V (2008) Understanding categorical similarity measures for outlier detection. Technical report, University of MinnesotaGoogle Scholar
  12. 12.
    Chen D, Lu C-T, Kou Y, Chen F (2008) On detecting spatial outliers. Geoinformatica 12(4):455–475Google Scholar
  13. 13.
    Chen F, Lu C-T, Boedihardjo AP (2010) Gls-sod: a generalized local statistical approach for spatial outlier detection. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1069–1078Google Scholar
  14. 14.
    Das K, Schneider J (2007) Detecting anomalous records in categorical datasets. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’07, pp 220–229Google Scholar
  15. 15.
    Ferhatosmanoglu H, Tuncel E, Agrawal D, Abbadi AE (2001) Approximate nearest neighbor searching in multimedia databases. In: Proceedings of the 17th international conference on data engineering. IEEE Computer Society, 2–6 Apr 2001. Heidelberg, Germany, pp 503–511CrossRefGoogle Scholar
  16. 16.
    Goovaerts P (1997) Geostatistics for natural resources evaluation. Applied geostatistics series, Oxford University PressGoogle Scholar
  17. 17.
    Grekousis G, Fotis YN (2012) A fuzzy index for detecting spatiotemporal outliers. Geoinformatica 16(3):597–619CrossRefGoogle Scholar
  18. 18.
    Haining R (1990) Spatial data analysis in the social and environmental sciences. Cambridge University PressGoogle Scholar
  19. 19.
    He Z, Deng S, Xu X, Huang JZ (2006) A fast greedy algorithm for outlier mining. In: Proceedings of the 10th Pacific–Asia conference on knowledge and data discovery, pp 567–576Google Scholar
  20. 20.
    He Z, Xu X, Deng S (2005) An optimization model for outlier detection in categorical data. CoRR, abs/cs/0503081Google Scholar
  21. 21.
    He Z, Xu X, Huang JZ, Deng S (2004) A frequent pattern discovery method for outlier detection. In: WAIM, pp 726–732Google Scholar
  22. 22.
    He Z, Xu X, Huang JZ, Deng S (2005) Fp-outlier: frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118Google Scholar
  23. 23.
    Hjaltason GR, Samet H (1998) Incremental distance join algorithms for spatial databases. In: SIGMOD conference, pp 237–248Google Scholar
  24. 24.
    Huang Y, Pei J, Xiong H (2006) Mining co-location patterns with rare events from spatial data sets. Geoinformatica 10(3):239–260Google Scholar
  25. 25.
    Huang Y, Shekhar S, Xiong H (2004) Discovering colocation patterns from spatial data sets: a general approach. IEEE Trans Knowl Data Eng 16(12):1472–1485CrossRefGoogle Scholar
  26. 26.
    Illian J, Penttinen A, Stoyan H, Stoyan D (2008) Statistical analysis and modelling of spatial point patterns. Int Stat Rev 76:458CrossRefGoogle Scholar
  27. 27.
    Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. Analysis 6:47–66Google Scholar
  28. 28.
    Kou Y, Lu C-T, Santos RFD (2007) Spatial outlier detection: a graph-based approach. In: 19th IEEE international conference on tools with artificial intelligence, ICTAI ’07, Patras, Greece, pp 281–288Google Scholar
  29. 29.
    Koufakou A, Ortiz EG, Georgiopoulos M, Anagnostopoulos GC, Reynolds KM (2007) A scalable and efficient outlier detection strategy for categorical data. In: Proceedings of the 19th IEEE international conference on tools with artificial intelligence, vol 02, ICTAI ’07, pp 210–217Google Scholar
  30. 30.
    Koufakou A, Secretan J, Reeder J, Cardona K, Georgiopoulos M (2008) Fast parallel outlier detection for categorical datasets using mapreduce. In: IEEE world congress on computational intelligence (WCCI)Google Scholar
  31. 31.
    Liu X, Lu C-T, Chen F (2010) Spatial outlier detection: random walk based approaches. In: ACM SIGGIS, pp 370–379Google Scholar
  32. 32.
    Lu C-T, Chen D, Kou Y (2003) Algorithms for spatial outlier detection. In: ICDM, pp 597–600Google Scholar
  33. 33.
    Lu C-T, Chen D, Kou Y (2003) Detecting spatial outliers with multiple attributes. In: ICTAI, pp 122–128Google Scholar
  34. 34.
    Mingming NY (2000) Probabilistic networks with undirected links for anomaly detection. In: Proceedings of IEEE systems, man, and cybernetics information assurance and security workshop, pp 175–179Google Scholar
  35. 35.
    Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Discov 12:203–228CrossRefGoogle Scholar
  36. 36.
    Pelleg D (2004) Scalable and practical probability density estimators for scientific anomaly detection. PhD thesis, Carnegie Mellon UniversityGoogle Scholar
  37. 37.
    Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 427–438Google Scholar
  38. 38.
    Reed T, Gubbins K (1973) Applied statistical mechanics: thermodynamic and transport properties of fluids. Butterworth-Heinemann reprint series in chemical engineering. Butterworth-HeinemannGoogle Scholar
  39. 39.
    Shekhar S, Chawla S (2003) Spatial databases—a tour. Prentice HallGoogle Scholar
  40. 40.
    Shekhar S, Chawla S, Ravada S, Fetterer A, Liu X, Lu CT (1999) Spatial databases: accomplishments and research needs. IEEE Trans Knowl Data Eng 11:45–55CrossRefGoogle Scholar
  41. 41.
    Shekhar S, Huang Y (2001) Discovering spatial co-location patterns: a summary of results. In: Proceedings of the 7th international symposium on advances in spatial and temporal databases, SSTD ’01. Springer, London, pp 236–256Google Scholar
  42. 42.
    Shekhar S, Lu C-T, Zhang P (2001) Detecting graph-based spatial outliers: algorithms and applications (a summary of results). In: KDD, pp 371–376Google Scholar
  43. 43.
    Shekhar S, Lu C-T, Zhang P, Shekhar S, Lu CT, Zhang P (2003) A unified approach to spatial outliers detection. GeoInformatica 7:139–166CrossRefGoogle Scholar
  44. 44.
    Stanoi I, Agrawal D, Abbadi AE (2000) Reverse nearest neighbor queries for dynamic databases. In: In ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 44–53Google Scholar
  45. 45.
    Sun P, Chawla S (2004) On local spatial outliers. In: IEEE international conference on data mining, pp 209–216Google Scholar
  46. 46.
    Tobler WR (1979) Cellular geography, pp 379–389. Reidel, Dordrecht, NetherlandsGoogle Scholar
  47. 47.
    Wong W-K, Moore A, Cooper G, Wagner M (2002) Rule-based anomaly pattern detection for detecting disease outbreaks. In: Eighteenth national conference on Artificial intelligence, pp 217–223Google Scholar
  48. 48.
    Yoo JS, Shekhar S (2006) A joinless approach for mining spatial colocation patterns. IEEE Trans Knowl Data Eng 18(10):1323–1337CrossRefGoogle Scholar
  49. 49.
    Zhao J, Lu C-T, Kou Y (2003) Detecting region outliers in meteorological data. In: Proceedings of the 11th ACM international symposium on advances in geographic information systems, pp 49–55Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Traffic Science, ebay IncBellevueUSA
  2. 2.Inter-disciplinary research center (iLab)Carnegie Mellon UniversityPittsburghUSA
  3. 3.Department of Computer ScienceVirginia Polytechnic Institute and State UniversityFalls ChurchUSA

Personalised recommendations