Issues on Clustering and Data Gridding

Heikkonen, Jukka; Perrotta, Domenico; Riani, Marco; Torti, Francesca

doi:10.1007/978-3-642-28894-4_5

Jukka Heikkonen⁴,
Domenico Perrotta⁵,
Marco Riani⁶ &
…
Francesca Torti⁷

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3424 Accesses
2 Citations

Abstract

This contribution addresses clustering issues in presence of densely populated data points with high degree of overlapping. In order to avoid the disturbing effects of high dense areas we suggest a technique that selects a point in each cell of a grid defined along the Principal Component axes of the data. The selected sub-sample removes the high density areas while preserving the general structure of the data. Once the clustering on the gridded data is produced, it is easy to classify the rest of the data with reliable and stable results. The good performance of the approach is shown on a complex dataset coming from international trade data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Atkinson, A. C., & Riani, M. (2007). Exploratory tools for clustering multivariate data. Computational Statistics and Data Analysis,52, 272–285.
Google Scholar
Atkinson, A. C., Riani, M., & Cerioli, A. (2004). Exploring multivariate data with the forward search. New York: Springer.
Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.
Google Scholar
Cochran, W. G. (1977). Robust sampling techniques (3rd ed.). New York: Wiley.
Google Scholar
Fraley, C. (1998). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing,20, 270–281.
Google Scholar
Garcia-Escudero, L. A., Gordaliza, A., Matran, C., & Mayo-Iscar, A. (2008). A general trimming approach to robust cluster analysis. Annals of Statistics,36, 1324–1345.
Google Scholar
Garcia-Escudero, L. A., Gordaliza, A., Matran, C., & Mayo-Iscar, A. (2010). A review of robust clustering methods. Advances in Data Analysis and Classification,4, 89–109.
Google Scholar
Garcia-Escudero, L. A., Gordaliza, A., Matran, C., & Mayo-Iscar, A. (2011). Exploring the number of groups in robust model based clustering. Statistics and Computing,21(4), 585–599.
Google Scholar
Perrotta, D., & Torti, F. (2009). Detecting price outliers in European trade data with the forward search. In N. C. Lauro, F. Palumbo, & M. Greenacre (Eds.), Data analysis and classification: From exploration to confirmation (Springer studies in classification, data analysis, and knowledge organization, pp. 415–423). Berlin: Springer.
Google Scholar
Riani, M., Cerioli, A., Atkinson, A. C., Perrotta, D., & Torti, F. (2008). Fitting robust mixtures of regression lines to European trade data. In: F. Fogelman-Soulie, et al. (Eds.), Mining massive datasets for security applications. Amsterdam: IOS Press.
Google Scholar
Riani, M., Atkinson, A. C., & Cerioli, A. (2009). Finding an unknown number of multivariate outliers. Journal of the Royal Statistical Society, Series B – Statistical Methodology,71, 447–466.
Google Scholar
Riani, M., Perrotta, D. and Torti, F. (2012). FSDA: A MATLAB toolbox for robust analysis and interactive data exploration. In: Chemometrics and Intelligent Laboratory Systems, 116, 17–32.
Google Scholar
Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statistics,14, 1080–1100.
Google Scholar
Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics,6(2), 461–464.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, University of Turku, Turku, Finland
Jukka Heikkonen
EC Joint Research Centre, Ispra site, Ispra, Italy
Domenico Perrotta
University of Parma, Parma, Italy
Marco Riani
University of Milano Bicocca, Milan, Italy
Francesca Torti

Authors

Jukka Heikkonen
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Perrotta
View author publications
You can also search for this author in PubMed Google Scholar
Marco Riani
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Torti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jukka Heikkonen .

Editor information

Editors and Affiliations

Department of Statistics, Università degli Studi di Firenze, Viale G.B. Morgagni 59, Firenze, 50134, Italy
Antonio Giusti
Fakultät für Informatik, und Mathematik, Universität Passau, Innstr. 33, Passau, 94030, Germany
Gunter Ritter
Sapienza", Department of Statistics, University of Rome "La, Piazzale Aldo Moro 5, Rome, 00185, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heikkonen, J., Perrotta, D., Riani, M., Torti, F. (2013). Issues on Clustering and Data Gridding. In: Giusti, A., Ritter, G., Vichi, M. (eds) Classification and Data Mining. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28894-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-28894-4_5
Published: 06 September 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28893-7
Online ISBN: 978-3-642-28894-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics