Abstract
The paper builds on the representation of units/clusters with a special type of symbolic objects that consist of distributions of variables. Two compatible clustering methods are developed: the leaders method, that reduces a large dataset to a smaller set of symbolic objects (clusters) on which a hierarchical clustering method is applied to reveal its internal structure. The proposed approach is illustrated on USDA Nutrient Database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BATAGELJ, V.: Generalized Ward and related clustering problems. (H.H. Bock, ed.: Classification and related methods of data analysis ), North-Holland, Amsterdam, 1988, 67–74.
BOCK, H.-H. (2000): Symbolic Data. In: H.-H. Bock and E. Diday (Eds.): Analysis of Symbolic Data. Exploratory methods for extracting statistical information from complex data. Springer, Heidelberg.
BOCK, H.-H. and DIDAY, E. (2000): Symbolic Objects. In: H.-H. Bock and E. Diday (Eds.): Analysis of Symbolic Data. Exploratory methods for extracting statistical information from complex data. Springer, Heidelberg.
DIDAY, E. (1979): Optimisation en classification automatique, Tome 1.,2. INRIA, Rocquencourt (in French).
DOUGHERTY, J., KOHAVI, R., and SAHAMI, M. (1995): Supervised and unsupervised discretization of continuous features. Proceedings of the Twelfth International Conference on Machine Learning (pp. 194–202). Tahoe City, CA: Morgan Kaufmann. http://citeseer.nj.nec.com/dougherty95supervised.html
HARTIGAN, J.A. (1975): Clustering Algorithms. Wiley, New York.
KORENJAK-ČERNE, S. and BATAGELJ, V. (1998): Clustering large datasets of mixed units. In: Rizzi, A., Vichi, M., Bock, H.-H. (Eds.): Advances in Data Science and Classification. Springer.
VERDE, R., DE CARVALHO, F.A.T. and LECHEVALLIER, Y. (2000): A Dynamic Clustering Algorithm for Multi-nominal Data. In: Kiers, H.A.L., Ras-son, J.-P., Groenen, P.J.F., Schader, M. (Eds.): Data Analysis, Classification, and Related Methods. Springer.
USDA Nutrient Database for Standard Reference, Release 14. U.S. Department of Agriculture, Agricultural Research Service. 2001: Nutrient Data Laboratory Home Page, http://www.nal.usda.gov/fnic/foodcomp.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Korenjak-Černe, S., Batagelj, V. (2002). Symbolic Data Analysis Approach to Clustering Large Datasets. In: Jajuga, K., Sokołowski, A., Bock, HH. (eds) Classification, Clustering, and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-56181-8_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-56181-8_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43691-1
Online ISBN: 978-3-642-56181-8
eBook Packages: Springer Book Archive