Advertisement

QualESTIM: Interactive Quality Assessment of Socioeconomic Data Using Outlier Detection

  • Christine Plumejeaud
  • Marlène Villanova-Oliver
Chapter
Part of the Lecture Notes in Geoinformation and Cartography book series (LNGC)

Abstract

This paper presents a platform, called QualESTIM, for exploring socioeconomic statistical data (also called indicators). QualESTIM integrates various outlier detection methods that make it possible to evaluate the logical consistency of a dataset, and its quality in fine. Without recourse to ‘ground truth’ of some kind, data values are compared to various spatiotemporal distributions given by statistical models. However, an outlier is not necessarily an error: experts should always interpret the outlying value. That is why we claim here that such a quality assessment process has to be interactive and that metadata associated with such data should be made available in order to refine the analysis. Dedicated to outlier detection and their visualization by an expert, the platform is connected to a database that contains both the data and their metadata, structured according to an ISO 19115 profile. A case study illustrates the interest of this approach.

Keywords

Quality Logical consistency Outliers detection Metadata Socio-economic data Interactive assessment 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

The research presented in this paper has been supported by the ESPON 2013 database project, of the European Spatial Planning and Observation Network for Territorial Cohesion. We would like to thank Claude Grasland for its advices, as well as Martin Charlton and Paul Harris who provided the implementation of the outlier detection methods in R. The authors would like to thank the reviewers for their comments that help to improve the paper.

References

  1. Beard, M. K., Buttenfield, B. P. and Clapham S. B. (1991). NCGIA Research Initiative 7: Visualization of Spatial Data Quality. NCGIA Technical Paper 91-26.Google Scholar
  2. Bivand, R. S., Pebesma, E. J., and Gómez-Rubio, V. (2008) Applied Spatial Data Analysis with R, XIV, 378 p., Springer.Google Scholar
  3. Brunsdon C, Fotheringham S, Charlton M (2007). “Geographically Weighted Discriminant Analysis.” Geographical Analysis, 39(4), pp. 376–396.Google Scholar
  4. Caussinus H, Ruiz A (1990). “Interesting Projections of Multidimensional Data by Means of Generalized Principal Components Analysis.” In COMPSTAT90, pp. 121–126. Physica- Verlag, Heidelberg, Germany.Google Scholar
  5. Chrisman, N. R., (1984) The role of quality information in the long-term functioning of a geographic information system. Cartographica, 21, pp. 79-87.Google Scholar
  6. Chrisman, N. R., (1991) The error component in spatial data. In Longley, P. A. & Goodchild, M. F. & Maguire, D. J. & Rhind, D. W., editors, Geographic Information Systems and Science, pp. 165-174. Longman Scientific and Technical.Google Scholar
  7. Clarke, D.G., and Clark, D.M., (1995), Lineage. In Guptill S.C. & Morrison J.L., editors, Elements of spatial data quality, pp. 13–30. Oxford, Elsevier.Google Scholar
  8. Cheng, T., and Li, Z., (2006) A multi-scale approach for spatial-temporal outlier detection, Transactions in GIS, 10(2), pp. 253-263.Google Scholar
  9. Daniel F., Casati F., Palpanas, T., Chayka O., and Cinzia C., (2008) Enabling Better Decisions through Quality-aware Reports. In: International Conference on Information Quality (ICIQ).Google Scholar
  10. Dean P., and Sundgren B., (1996) Quality Aspects of a Modern Database Service. In: Proc. of the 8th Int. Conf. on Scientific and Statistical Database Management, SSDBM’96, pp. 156-161.Google Scholar
  11. Gotway, C., and Young, L, (2002) “Combining incompatible spatial data”, in Journal of the American Statistical Association, (2002), 97(458) pp. 632-648Google Scholar
  12. Grasland, C., and Gensel, J., (2010) ESPON 2013 Database, Final Report, December 2010.Google Scholar
  13. Grubbs, F. E., (1969) Procedures for detecting outlying observations in samples. Technometrics (11), pp. 1–21.Google Scholar
  14. Harris, P. and Charlton, M., (2010) “Spatial analysis for quality control, phase 1: The identification of logical input errors and statistical outliers”, The ESPON Monitoring Comittee, Tech. Rep., Esch-sur-Alzette, Luxembourg.Google Scholar
  15. International Organization for Standardisation. Technical Committee 211, (2002) Geographic Information - Quality principles - ISO 19113.Google Scholar
  16. International Organization for Standardisation. Technical Committee 211, (2003) Geographic Information - Quality evaluation procedures - ISO 19114.Google Scholar
  17. International Organization for Standardisation. Technical Committee 211, (2003) Geographic Information -- Metadata - ISO 19115.Google Scholar
  18. International Organization for Standardisation. Technical Committee 211, (2006) Geographic Information – Data quality measures - ISO 19138.Google Scholar
  19. International Organization for Standardisation. Technical Committee 211, (2011) Geographic Information -- Data quality - ISO 19157.Google Scholar
  20. Kubik, K., Lyons, K., Merchant, D. (1988) Photogrammetric work without blunders. Photogrammetric Engineering and Remote Sensing 54: 51-4.Google Scholar
  21. Monmonier, M., (1989), Geographic brushing: enhancing exploratory analysis of the scatterplot matrix. Geographical Analysis, 21, pp. 81–84.Google Scholar
  22. Plumejeaud, C., Gensel, J., and Villanova-Oliver, M., (2010) Opérationnalisation d’un profil ISO 19115 pour des métadonnées socio-économiques, INFORSID Marseille, May 25-28.Google Scholar
  23. Plumejeaud C., Mathian H., Gensel J., and Grasland C., (2011), Spatio-temporal analysis of territorial changes from a multi-scale perspective, International Journal of Geographical Information Science, 25(11), pp. 1597-1612.Google Scholar
  24. Rousseeuw, P. and Leroy, A., (1996) Robust Regression and Outlier Detection. John Wiley & Sons, 3rd edition.Google Scholar
  25. Schneiderman, B., (1996), “The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations”, Proceedings of the 1996 IEEE Symposium on Visual Languages, pp. 336-344, Washington, DC, USA.Google Scholar
  26. Servigne, S., Lesage, N. and Libourel, T. (2010) Quality Components, Standards, and Metadata, in Fundamentals of Spatial Data Quality (eds R. Devillers and R. Jeansoulin), 2010, ISTE, London, UK.Google Scholar
  27. Tukey, J., (1977), Exploratory data analysis, Addison Wesley Longman Publishing Co., Inc., 688 p.Google Scholar
  28. UN/ECE. (1995) Guidelines for the Modelling of Statistical Data and Metadata. Technical report, UN/ECE, New York, Geneva.Google Scholar
  29. Wand, Y., and Wang, R.Y. (1996) Anchoring Data Quality Dimensions in Ontological Foundations. In: Communications of the ACM, pp. 86–95.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Christine Plumejeaud
    • 1
  • Marlène Villanova-Oliver
    • 1
  1. 1.Laboratoire d’Informatique de GrenobleSteamer teamSaint Martin d’HèresFrance

Personalised recommendations