Issues on Clustering and Data Gridding

  • Jukka Heikkonen
  • Domenico Perrotta
  • Marco Riani
  • Francesca Torti
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


This contribution addresses clustering issues in presence of densely populated data points with high degree of overlapping. In order to avoid the disturbing effects of high dense areas we suggest a technique that selects a point in each cell of a grid defined along the Principal Component axes of the data. The selected sub-sample removes the high density areas while preserving the general structure of the data. Once the clustering on the gridded data is produced, it is easy to classify the rest of the data with reliable and stable results. The good performance of the approach is shown on a complex dataset coming from international trade data.


European Union Gaussian Mixture Model Gridded Data Multivariate Gaussian Distribution Disturbing Effect 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Atkinson, A. C., & Riani, M. (2007). Exploratory tools for clustering multivariate data. Computational Statistics and Data Analysis,52, 272–285.Google Scholar
  2. Atkinson, A. C., Riani, M., & Cerioli, A. (2004). Exploring multivariate data with the forward search. New York: Springer.Google Scholar
  3. Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.Google Scholar
  4. Cochran, W. G. (1977). Robust sampling techniques (3rd ed.). New York: Wiley.Google Scholar
  5. Fraley, C. (1998). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing,20, 270–281.Google Scholar
  6. Garcia-Escudero, L. A., Gordaliza, A., Matran, C., & Mayo-Iscar, A. (2008). A general trimming approach to robust cluster analysis. Annals of Statistics,36, 1324–1345.Google Scholar
  7. Garcia-Escudero, L. A., Gordaliza, A., Matran, C., & Mayo-Iscar, A. (2010). A review of robust clustering methods. Advances in Data Analysis and Classification,4, 89–109.Google Scholar
  8. Garcia-Escudero, L. A., Gordaliza, A., Matran, C., & Mayo-Iscar, A. (2011). Exploring the number of groups in robust model based clustering. Statistics and Computing,21(4), 585–599.Google Scholar
  9. Perrotta, D., & Torti, F. (2009). Detecting price outliers in European trade data with the forward search. In N. C. Lauro, F. Palumbo, & M. Greenacre (Eds.), Data analysis and classification: From exploration to confirmation (Springer studies in classification, data analysis, and knowledge organization, pp. 415–423). Berlin: Springer.Google Scholar
  10. Riani, M., Cerioli, A., Atkinson, A. C., Perrotta, D., & Torti, F. (2008). Fitting robust mixtures of regression lines to European trade data. In: F. Fogelman-Soulie, et al. (Eds.), Mining massive datasets for security applications. Amsterdam: IOS Press.Google Scholar
  11. Riani, M., Atkinson, A. C., & Cerioli, A. (2009). Finding an unknown number of multivariate outliers. Journal of the Royal Statistical Society, Series B – Statistical Methodology,71, 447–466.Google Scholar
  12. Riani, M., Perrotta, D. and Torti, F. (2012). FSDA: A MATLAB toolbox for robust analysis and interactive data exploration. In: Chemometrics and Intelligent Laboratory Systems, 116, 17–32.Google Scholar
  13. Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statistics,14, 1080–1100.Google Scholar
  14. Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics,6(2), 461–464.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jukka Heikkonen
    • 1
  • Domenico Perrotta
    • 2
  • Marco Riani
    • 3
  • Francesca Torti
    • 4
  1. 1.Department of Information TechnologyUniversity of TurkuTurkuFinland
  2. 2.EC Joint Research Centre, Ispra siteIspraItaly
  3. 3.University of ParmaParmaItaly
  4. 4.University of Milano BicoccaMilanItaly

Personalised recommendations