Skip to main content

Issues on Clustering and Data Gridding

  • Conference paper
  • First Online:
Classification and Data Mining

Abstract

This contribution addresses clustering issues in presence of densely populated data points with high degree of overlapping. In order to avoid the disturbing effects of high dense areas we suggest a technique that selects a point in each cell of a grid defined along the Principal Component axes of the data. The selected sub-sample removes the high density areas while preserving the general structure of the data. Once the clustering on the gridded data is produced, it is easy to classify the rest of the data with reliable and stable results. The good performance of the approach is shown on a complex dataset coming from international trade data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Atkinson, A. C., & Riani, M. (2007). Exploratory tools for clustering multivariate data. Computational Statistics and Data Analysis,52, 272–285.

    Google Scholar 

  • Atkinson, A. C., Riani, M., & Cerioli, A. (2004). Exploring multivariate data with the forward search. New York: Springer.

    Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.

    Google Scholar 

  • Cochran, W. G. (1977). Robust sampling techniques (3rd ed.). New York: Wiley.

    Google Scholar 

  • Fraley, C. (1998). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing,20, 270–281.

    Google Scholar 

  • Garcia-Escudero, L. A., Gordaliza, A., Matran, C., & Mayo-Iscar, A. (2008). A general trimming approach to robust cluster analysis. Annals of Statistics,36, 1324–1345.

    Google Scholar 

  • Garcia-Escudero, L. A., Gordaliza, A., Matran, C., & Mayo-Iscar, A. (2010). A review of robust clustering methods. Advances in Data Analysis and Classification,4, 89–109.

    Google Scholar 

  • Garcia-Escudero, L. A., Gordaliza, A., Matran, C., & Mayo-Iscar, A. (2011). Exploring the number of groups in robust model based clustering. Statistics and Computing,21(4), 585–599.

    Google Scholar 

  • Perrotta, D., & Torti, F. (2009). Detecting price outliers in European trade data with the forward search. In N. C. Lauro, F. Palumbo, & M. Greenacre (Eds.), Data analysis and classification: From exploration to confirmation (Springer studies in classification, data analysis, and knowledge organization, pp. 415–423). Berlin: Springer.

    Google Scholar 

  • Riani, M., Cerioli, A., Atkinson, A. C., Perrotta, D., & Torti, F. (2008). Fitting robust mixtures of regression lines to European trade data. In: F. Fogelman-Soulie, et al. (Eds.), Mining massive datasets for security applications. Amsterdam: IOS Press.

    Google Scholar 

  • Riani, M., Atkinson, A. C., & Cerioli, A. (2009). Finding an unknown number of multivariate outliers. Journal of the Royal Statistical Society, Series B – Statistical Methodology,71, 447–466.

    Google Scholar 

  • Riani, M., Perrotta, D. and Torti, F. (2012). FSDA: A MATLAB toolbox for robust analysis and interactive data exploration. In: Chemometrics and Intelligent Laboratory Systems, 116, 17–32.

    Google Scholar 

  • Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statistics,14, 1080–1100.

    Google Scholar 

  • Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics,6(2), 461–464.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jukka Heikkonen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Heikkonen, J., Perrotta, D., Riani, M., Torti, F. (2013). Issues on Clustering and Data Gridding. In: Giusti, A., Ritter, G., Vichi, M. (eds) Classification and Data Mining. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28894-4_5

Download citation

Publish with us

Policies and ethics