Chapter

Search Computing

Volume 7538 of the series Lecture Notes in Computer Science pp 111-126

Clustering and Labeling of Multi-dimensional Mixed Structured Data

  • Marco BrambillaAffiliated withDipartimento di Elettronica e Informazione, Politecnico di Milano
  • , Massimiliano ZanoniAffiliated withDipartimento di Elettronica e Informazione, Politecnico di Milano

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Cluster Analysis consists of the aggregation of data items of a given set into subsets based on some similarity properties. Clustering techniques have been applied in many fields which typically involve a large amount of complex data. This study focuses on what we call multi-domain clustering and labeling, i.e. a set of techniques for multi-dimensional structured mixed data clustering. The work consists of studying the best mix of clustering techniques that address the problem in the multi-domain setting. Considered data types are numerical, categorical and textual. All of them can appear together within the same clustering scenario. We focus on k-means and agglomerative hierarchical clustering methods based on a new distance function we define for this specific setting. The proposed approach has been validated on some real and realistic data-sets based onto college, automobile and leisure fields. Experimental data allowed to evaluate the effectiveness of the different solutions, both for clustering and labeling.