Structural Representation of Categorical Data and Cluster Analysis Through Filters
Representation of categorical data by nominal measurement leaves the entire information intact, which is not the case with widely used numerical or pseudo-numerical representation such as Likert-type scoring. This aspect is first explained, and then we turn our attention to the analysis of nominally represented data. For the analysis of a large number of variables, one typically resorts to dimension reduction, and its necessity is often greater with categorical data than with continuous data. In spite of this, Nishisato S, Clavel JG (Behaviormetrika 57:15–32, 2010) proposed an approach which is diametrically opposite to the dimension-reduction approach, for they advocate the use of doubled hyper-space to accommodate both row variables and column variables of two-way data in a common space. The rationale of doubled space can be used to vindicate the validity of the Carroll-Green-Schaffer scaling (Carroll JD, Green PE, Schaffer CM (1986) J Mark Res 23(3):271–280). The current paper will then introduce a simple procedure for the analysis of a hyper-dimensional configuration of data, called cluster analysis through filters. A numerical example will be presented to show a clear contrast between the dimension-reduction approach and the total information analysis by cluster analysis. There is no doubt that our approach is preferred to the dimension-reduction approach on two grounds: our results are a factual summary of a multidimensional data configuration, and our procedure is simple and practical.
Thanks are due to José Garcia Clavel for the calculation of between-set distances of Heuer’s data.
- Heuer G (1979) Selbstmord bei Kindern und Jugendlichen: ein Beitrag zur Suizidprophylaxe aus pädagogischer Sicht. Klett-Cotta, StuttgartGoogle Scholar
- Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22(140):44–53Google Scholar
- Nishisato S (1994) Elements of dual scaling: an introduction to practical data analysis. Lawrence Erlbaum Associates, HillsdaleGoogle Scholar
- Nishisato S (1999) Data types and information: beyond the current practice of data analysis. In: Decker R, Gaul W (eds) Classification and information processing at the turn of the Millennium. Springer, Berlin/Heidelberg, pp 40–51Google Scholar
- Nishisato S (2012a) Optimal quantities for analysis through regression of measurement on data. Bull Data Anal Jpn Classif Soc 1:1–10Google Scholar
- Nishisato S (2012b) Reminiscence and a step forward. In: Gaul W, Geyer-Schultz A, Schmidt-Thieme L, Kunze J (eds) Classification, data analysis, and knowledge organization. Springer, Heidelberg, pp 109–119Google Scholar
- Nishisato S, Clavel JG (2010) Total information analysis: comprehensive dual scaling. Behaviormetrika 57:15–32Google Scholar