High Dimensional Clustering Using Parallel Coordinates and the Grand Tour
- 362 Downloads
In this paper, we present some graphical techniques for cluster analysis of high-dimensional data. Parallel coordinate plots and parallel coordinate density plots are graphical techniques which map multivariate data into a two-dimensional display. The method has some elegant duality properties with ordinary Cartesian plots so that higher-dimensional mathematical structures can be analyzed. Our high interaction software allows for rapid editing of data to remove outliers and isolate clusters by brushing. Our brushing techniques allow not only for hue adjustment, but also for saturation adjustment. Saturation adjustment allows for the handling of comparatively massive data sets by using the α-channel of the Silicon Graphics workstation to compensate for heavy overplotting.
The grand tour is a generalized rotation of coordinate axes in a high-dimensional space. Coupled with the full-dimensional plots allowed by the parallel coordinate display, these techniques allow the data analyst to explore data which is both high-dimensional and massive in size. In this paper we give a description of both techniques and illustrate their use to do inverse regression and clustering. We have used these techniques to analyze data on the order of 250,000 observations in 8 dimensions. Because the analysis requires the use of color graphics, in the present paper we illustrate the methods with a more modest data set of 3848 observations. Other illustrations are available on our web page.
KeywordsGeneralize Rotation High Density Region Inverse Regression Explanatory Covariates Brushing Technique
Unable to display preview. Download preview PDF.
- BUJA, A. and ASIMOV, D. (1985): Grand tour methods: an outline, Computer Science and Statistics: Proceedings of the Seventeenth Symposium on the Interface, 63–67, (D. Allen, ed.), New York: North Holland Publishing Company.Google Scholar
- MILLER, J. J. and WEGMAN, E. J. (1991): Construction of line densities for parallel coordinate plots, Computing and Graphics in Statistics, (A. Buja and P. Tukey, eds.), 107–123, Springer-Verlag: New York.Google Scholar
- WEGMAN, E. J. (1991): The grand tour in k-dimensions, Computing Science and Statistics: Proceedings of the 22nd Symposium on the Interface, 127-136.Google Scholar