Abstract
This contribution addresses clustering issues in presence of densely populated data points with high degree of overlapping. In order to avoid the disturbing effects of high dense areas we suggest a technique that selects a point in each cell of a grid defined along the Principal Component axes of the data. The selected sub-sample removes the high density areas while preserving the general structure of the data. Once the clustering on the gridded data is produced, it is easy to classify the rest of the data with reliable and stable results. The good performance of the approach is shown on a complex dataset coming from international trade data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Atkinson, A. C., & Riani, M. (2007). Exploratory tools for clustering multivariate data. Computational Statistics and Data Analysis,52, 272–285.
Atkinson, A. C., Riani, M., & Cerioli, A. (2004). Exploring multivariate data with the forward search. New York: Springer.
Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.
Cochran, W. G. (1977). Robust sampling techniques (3rd ed.). New York: Wiley.
Fraley, C. (1998). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing,20, 270–281.
Garcia-Escudero, L. A., Gordaliza, A., Matran, C., & Mayo-Iscar, A. (2008). A general trimming approach to robust cluster analysis. Annals of Statistics,36, 1324–1345.
Garcia-Escudero, L. A., Gordaliza, A., Matran, C., & Mayo-Iscar, A. (2010). A review of robust clustering methods. Advances in Data Analysis and Classification,4, 89–109.
Garcia-Escudero, L. A., Gordaliza, A., Matran, C., & Mayo-Iscar, A. (2011). Exploring the number of groups in robust model based clustering. Statistics and Computing,21(4), 585–599.
Perrotta, D., & Torti, F. (2009). Detecting price outliers in European trade data with the forward search. In N. C. Lauro, F. Palumbo, & M. Greenacre (Eds.), Data analysis and classification: From exploration to confirmation (Springer studies in classification, data analysis, and knowledge organization, pp. 415–423). Berlin: Springer.
Riani, M., Cerioli, A., Atkinson, A. C., Perrotta, D., & Torti, F. (2008). Fitting robust mixtures of regression lines to European trade data. In: F. Fogelman-Soulie, et al. (Eds.), Mining massive datasets for security applications. Amsterdam: IOS Press.
Riani, M., Atkinson, A. C., & Cerioli, A. (2009). Finding an unknown number of multivariate outliers. Journal of the Royal Statistical Society, Series B – Statistical Methodology,71, 447–466.
Riani, M., Perrotta, D. and Torti, F. (2012). FSDA: A MATLAB toolbox for robust analysis and interactive data exploration. In: Chemometrics and Intelligent Laboratory Systems, 116, 17–32.
Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statistics,14, 1080–1100.
Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics,6(2), 461–464.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Heikkonen, J., Perrotta, D., Riani, M., Torti, F. (2013). Issues on Clustering and Data Gridding. In: Giusti, A., Ritter, G., Vichi, M. (eds) Classification and Data Mining. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28894-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-28894-4_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28893-7
Online ISBN: 978-3-642-28894-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)