Advertisement

Mathematical Geosciences

, Volume 46, Issue 1, pp 1–31 | Cite as

Multivariate Spatial Outlier Detection Using Robust Geographically Weighted Methods

  • Paul Harris
  • Chris Brunsdon
  • Martin Charlton
  • Steve Juggins
  • Annemarie Clarke
Article

Abstract

Outlier detection is often a key task in a statistical analysis and helps guard against poor decision-making based on results that have been influenced by anomalous observations. For multivariate data sets, large Mahalanobis distances in raw data space or large Mahalanobis distances in principal components analysis, transformed data space, are routinely used to detect outliers. Detection in principal components analysis space can also utilise goodness of fit distances. For spatial applications, however, these global forms can only detect outliers in a non-spatial manner. This can result in false positive detections, such as when an observation’s spatial neighbours are similar, or false negative detections such as when its spatial neighbours are dissimilar. To avoid mis-classifications, we demonstrate that a local adaptation of various global methods can be used to detect multivariate spatial outliers. In particular, we account for local spatial effects via the use of geographically weighted data with either Mahalanobis distances or principal components analysis. Detection performance is assessed using simulated data as well as freshwater chemistry data collected over all of Great Britain. Results clearly show value in both geographically weighted methods to outlier detection.

Keywords

Non-stationarity Mahalanobis distance Principal components analysis Co-kriging cross-validation Freshwater acidification Anomaly detection 

Notes

Acknowledgements

Research presented in this paper was funded by a Strategic Research Cluster grant (07/SRC/I1168) by the Science Foundation Ireland under the National Development Plan. The authors gratefully acknowledge this support. We would also like to thank the anonymous reviewers whose comments helped to significantly improve this paper.

References

  1. Anselin L (1995) Local indicators of spatial association. Geogr Anal 27:93–115 CrossRefGoogle Scholar
  2. Banerjee M, Capozzoli M, McSweeney L, Sinha D (1999) Beyond kappa: a review of interrater agreement measures. Can J Stat 27:3–23 CrossRefGoogle Scholar
  3. Baxter MJ (1995) Standardization and transformation in principal component analysis, with applications to archaeometry. J R Stat Soc, Ser C, Appl Stat 44:513–527 Google Scholar
  4. Boucher A, Dimitrakopoulos R (2012) Multivariate block-support simulation of the Yandi ore deposit. Western Australia Math Geosci 44:449–468 CrossRefGoogle Scholar
  5. Brunsdon C, Fotheringham AS, Charlton M (1996) Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr Anal 28:281–298 CrossRefGoogle Scholar
  6. Brunsdon C, Fotheringham AS, Charlton M (2002) Geographically weighted summary statistics—a framework for localised exploratory data analysis. Comput Environ Urban Syst 26:501–524 CrossRefGoogle Scholar
  7. Brunsdon C, Fotheringham AS, Charlton ME (2007) Geographically weighted discriminant analysis. Geogr Anal 39:376–996 CrossRefGoogle Scholar
  8. Cao Y, Williams DD, Williams NE (1999) Data transformation and standardization in the multivariate analysis of river water quality. Ecol Appl 9:669–677 CrossRefGoogle Scholar
  9. Chen D, Lu C, Kou Y, Chen F (2008) On detecting spatial outliers. GeoInformatica 12:455–475 CrossRefGoogle Scholar
  10. Chilès JP, Delfiner P (1999) Geostatistics—modelling spatial uncertainty. Wiley, New York CrossRefGoogle Scholar
  11. CLAG Freshwaters (1995) Critical loads of acid deposition for United Kingdom freshwaters, critical loads advisory group, sub-report on freshwaters. ITE, Penicuik, 80 pp Google Scholar
  12. Daszykowski M, Kaczmarek K, Vander Heyden Y, Walczek B (2007) Robust statistics in data analysis—a review of basic concepts. Chemom Intell Lab Syst 85:203–219 CrossRefGoogle Scholar
  13. Desbarats AJ, Dimitrakopoulos R (2000) Geostatistical simulation of regionalized pore-size distributions using min/max autocorrelation factors. Math Geol 32:919–942 CrossRefGoogle Scholar
  14. Deutsch CV, Journel AG (1998) GSLIB geostatistical software library and user’s guide. Oxford University Press, New York Google Scholar
  15. Dykes J, Brunsdon C (2007) Geographically weighted visualisation: interactive graphics for scale-varying exploratory analysis. IEEE Trans Vis Comput Graph 13:1161–1168 CrossRefGoogle Scholar
  16. Filzmoser P, Todorov V (2013) Robust tools for the imperfect world. Inf Sci. doi: 10.1016/j.ins.2012.10.017. In press Google Scholar
  17. Filzmoser P, Garrett R, Reimann C (2005) Multivariate outlier detection in exploration geochemistry. Comput Geosci 31:579–587 CrossRefGoogle Scholar
  18. Filzmoser P, Maronna R, Werner M (2008) Outlier identification in high dimensions. Comput Stat Data Anal 52:1694–1711 CrossRefGoogle Scholar
  19. Foley P, Demšar U (2013) Using geovisual analytics to compare the performance of geographically weighted discriminant analysis versus its global counterpart, linear discriminant analysis. Int J Geogr Inf Sci 27:633–661 CrossRefGoogle Scholar
  20. Fotheringham AS, Brunsdon C, Charlton ME (2002) Geographically weighted regression—the analysis of spatially varying relationships. Wiley, Chichester Google Scholar
  21. Glatzer E, Müller WG (2004) Residual diagnostics for variogram fitting. Comput Geosci 30:859–866 CrossRefGoogle Scholar
  22. Goovaerts P (2001) Geostatistical modelling of uncertainty in soil science. Geoderma 103:3–26 CrossRefGoogle Scholar
  23. Haas TC (1996) Multivariate spatial prediction in the presence of non-linear trend and covariance non-stationarity. Environmetrics 7:145–165 CrossRefGoogle Scholar
  24. Harris P, Fotheringham AS, Crespo R, Charlton M (2010a) The use of geographically weighted regression for spatial prediction: an evaluation of models using simulated data sets. Math Geosci 42:657–680 CrossRefGoogle Scholar
  25. Harris P, Charlton M, Fotheringham AS (2010b) Moving window kriging with geographically weighted variograms. Stoch Environ Res Risk Assess 24:1193–1209 CrossRefGoogle Scholar
  26. Harris P, Fotheringham AS, Juggins S (2010c) Robust geographically weighed regression: a technique for quantifying spatial relationships between freshwater acidification critical loads and catchment attributes. Ann Assoc Am Geogr 100:286–306 CrossRefGoogle Scholar
  27. Harris P, Juggins S (2011) Estimating freshwater critical load exceedance data for great Britain using space-varying relationship models. Math Geosci 43:265–292 CrossRefGoogle Scholar
  28. Harris P, Brunsdon C, Charlton M (2011a) Geographically weighted principal components analysis. Int J Geogr Inf Sci 25:1717–1736 CrossRefGoogle Scholar
  29. Harris P, Brunsdon C, Fotheringham AS (2011b) Links, comparisons and extensions of the geographically weighted regression model when used as a spatial predictor. Stoch Environ Res Risk Assess 25:123–138 CrossRefGoogle Scholar
  30. Harris P, Brunsdon C, Charlton M (2011c) Multivariate spatial outlier detection using geographically weighted principal components analysis. In: 7th international symposium on spatial data quality, Coimbra, Portugal Google Scholar
  31. Hawkins DM, Cressie N (1984) Robust kriging—a proposal. Math Geol 16:3–18 Google Scholar
  32. Howarth RJ, Earle SAM (1979) Application of a generalised power transformation to geochemical data. Math Geol 11:62 Google Scholar
  33. Hubert M, Rousseeuw PJ, Vanden Branden K (2005) ROBPCA: a new approach to robust principal component analysis. Technometrics 47:64–79 CrossRefGoogle Scholar
  34. Hubert M, Vandervieren E (2008) An adjusted boxplot for skewed distributions. Comput Stat Data Anal 52:5186–5201 CrossRefGoogle Scholar
  35. Journel AG (1986) Geostatistics: models and tools for the earth sciences. Math Geol 18:119–140 CrossRefGoogle Scholar
  36. Kou Y, Lu C-T, Chen D (2006) Spatial weighted outlier detection. In: Proceedings of the 2006 SIAM international conference on data mining, vol 614 Google Scholar
  37. Kreiser AM, Patrick ST, Battarbee RW (1993) Critical loads for UK freshwaters—introduction, sampling strategy and use of maps. In: Hornung M, Skeffington RA (eds) Critical loads: concepts and applications. ITE symposium no 28. HMSO, London, pp 94–98 Google Scholar
  38. Krige DG, Magri EJ (1982) Studies of the effects of outliers and data transformation on variogram estimates for a base metal and a gold ore body. Math Geol 14:557–564 Google Scholar
  39. Lark RM (2002) Robust estimation of the pseudo cross-variogram for cokriging soil properties. Eur J Soil Sci 53:253–270 CrossRefGoogle Scholar
  40. Liu H, Jezek KC, O’Kelly M (2001) Detecting outliers in irregularly distributed spatial data sets by locally adaptive and robust statistical analysis and GIS. Int J Geogr Inf Sci 15:721–741 CrossRefGoogle Scholar
  41. Ljung GM (1993) On outlier detection in time series. J R Stat Soc B 55:559–567 Google Scholar
  42. Lu C-T, Chen D, Kou Y (2004) Multivariate spatial outlier detection. Int J Artif Intell Tools 13:801–811 CrossRefGoogle Scholar
  43. Machuca-Mory DF, Deutsch CV (2012) Non-stationary geostatistical modeling based on distance weighted statistics and distributions. Math Geosci 45:31–48 CrossRefGoogle Scholar
  44. Nakaya T, Fotheringham AS, Brunsdon C, Charlton M (2005) Geographically weighted Poisson regression for disease association mapping. Stat Med 24:2695–2717 CrossRefGoogle Scholar
  45. Pebesma EJ (2004) Multivariate geostatistics in S: the gstat package. Comput Geosci 30:683–691 CrossRefGoogle Scholar
  46. Rousseeuw PJ (1985) Multivariate estimation with high breakdown point. In: Grossman W, Pflug G, Vincze I, Wertz W (eds) Mathematical statistics and applications, vol B. Reidel, Dordrecht, pp 283–297 CrossRefGoogle Scholar
  47. Rousseeuw PJ, Croux C (1993) Alternatives to median absolute deviation. J Am Stat Assoc 88:1273–1283 CrossRefGoogle Scholar
  48. Rousseeuw PJ, Ruts I, Tukey JW (1999) The bagplot: a bivariate boxplot. Am Stat 53:382–387 Google Scholar
  49. Rousseeuw PJ, Debruyne M, Engelen S, Hubert M (2006) Robustness and outlier detection in chemometrics. Crit Rev Anal Chem 36:221–242 CrossRefGoogle Scholar
  50. Ruppert D (2006) Multivariate transformations. In: Encyclopedia of environmetrics. Wiley, New York Google Scholar
  51. Sun Y, Genton M (2011) Adjusted functional boxplots for spatio-temporal data visualization and outlier detection. Environmetrics 23:54–64 CrossRefGoogle Scholar
  52. Templ M, Alfons A, Filzmoser P (2012) Exploring incomplete data using visualization tools. J Adv Data Anal Class 6:29–47 CrossRefGoogle Scholar
  53. Varmuza K, Filzmoser P (2009) Introduction to multivariate statistical analysis in chemometrics. CRC Press, Boca Raton CrossRefGoogle Scholar
  54. Wackernagel H (2003) Multivariate geostatistics—an introduction with applications. Springer, Berlin Google Scholar
  55. Wheeler D (2007) Diagnostic tools and a remedial method for collinearity in geographically weighted regression. Environ Plan A 39:2461–2481 CrossRefGoogle Scholar
  56. Yeo I, Johnson R (2000) A new family of power transformations to improve normality or symmetry. Biometrika 87:954–959 CrossRefGoogle Scholar
  57. Zhang H, Mei C (2011) Local least absolute deviation estimation of spatially varying coefficient models: robust geographically weighted regression approaches. Int J Geogr Inf Sci 25:1467–1489 CrossRefGoogle Scholar

Copyright information

© International Association for Mathematical Geosciences 2013

Authors and Affiliations

  • Paul Harris
    • 1
  • Chris Brunsdon
    • 2
  • Martin Charlton
    • 1
  • Steve Juggins
    • 3
  • Annemarie Clarke
    • 4
  1. 1.National Centre for GeocomputationNational University of Ireland MaynoothMaynooth, Co.Ireland
  2. 2.Geography and PlanningUniversity of LiverpoolLiverpoolUK
  3. 3.School of Geography, Politics and SociologyUniversity of NewcastleNewcastle upon TyneUK
  4. 4.APEM LtdLlantrisantWalesUK

Personalised recommendations