Abstract
Categories can be counted, rated, or ranked, but they cannot be measured. Likewise, persons or individuals can be counted, rated, or ranked, but they cannot be measured either. Nevertheless, psychology has realized early on that it can take an indirect road to measurement: What can be measured is the strength of association between categories in samples or populations, and what can be quantitatively compared are counts, ratings, or rankings made under different circumstances, or originating from different persons. The strong demand for quantitative analysis of categorical data has thus created a variety of statistical methods, with substantial contributions from psychometrics and sociometrics. What is the common basis of these methods dealing with categories? The basic element they share is that the sample space has a special geometry, in which categories (or persons) are point masses forming a simplex, while distributions of counts or profiles of ratings are centers of gravity, which are also point masses. Rankings form a discrete subset in the interior of the simplex, known as the permutation polytope, and paired comparisons form another subset on the edges of the simplex. Distances between point masses form the basic tool of analysis. The paper gives some history of major concepts, which naturally leads to a new concept: the shadow point. It is then shown how loglinear models, Luce and Rasch models, unfolding models, correspondence analysis and homogeneity analysis, forced classification and classification trees, as well as other models and methods, fit into this particular geometrical framework.
Similar content being viewed by others
References
Agresti, A. (1990).Categorical Data Analysis. New York: Wiley.
Aitchison, J., & Greenacre, M. (2002). Biplots of compositional data.Applied Statistics, 51, 375–392.
Andrich, D. (1988). The application of an unfolding model of the PIRT type to the measurement of attitude.Applied Psychological Measurement, 12, 33–51.
Andrich, D. (1995). Hyperbolic cosine latent trait models for unfolding direct responses and pairwise preferences.Applied Psychological Measurement, 19, 269–290.
Andrich, D. (1996). A hyperbolic cosine latent trait model for unfolding polytomous responses: Reconciling Thurstone and Likert methodologies.British Journal of Mathematical and Statistical Psychology, 49, 347–365.
Anglin, M.D., McGlothlin, W.H., & Speckart, G. (1981). The effect of parole on methadone-patient behavior.American Journal of Drug and Alcohol Abuse, 8, 153–170.
Bakhuis Roozeboom, H.W. (1894). Grafische Darstellung der heterogenen Systeme aus ein bis vier Stoffen, mit Einschluss der Chemischen Umsetzung [Graphical Representation of Heterogeneous Systems in One to Four Substances, Including their Chemical Conversion].Zeitschrift für Physikalische Chemie, 15, 145–158.
Bartholomew, D.J. (1980). Factor analysis for categorical data.Journal of the Royal Statistical Society, Series B,42, 293–321.
Benzécri, J.-P. (with 33 coauthors) (1973).L'analyse des Données, Tome II: L'analyse des Correspondances [Data Analysis, Part II: Correspondence Analysis]. Paris: Dunod.
Benzécri, J.-P. (1992).Correspondence Analysis Handbook. New York: Marcel Dekker.
Blasius, J., & Greenacre, M.J. (1998).Vizualization of Categorical Data. New York: Academic Press.
Böckenholt, U. (1993). Applications of Thurstonian models to ranking data. In M. Fligner & J. Verducci (Eds.),Probability Models and Statistical Analyses for Ranking Data (pp. 157–172). New York: Springer Verlag.
Böckenholt, U. (2002). A Thurstonian analysis of preference change.Journal of Mathematical Psychology, 46, 300–314.
Boring, E.G. (1942).Sensation and Perception in the History of Experimental Psychology. New York: Appleton-Century-Crofts.
Bradley, R.A., & Terry, M.E. (1952). Rank analysis of incomplete block designs: I. The method of paired comparisons.Biometrika, 39, 324–345.
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984).Classification and Regression Trees. Belmont, CA: Wadsworth.
Busing, F.M.T.A., Groenen, P.J.F., & Heiser, W.J. (2005). Avoiding degeneracy in multidimensional unfolding by penalizing on the coefficient of variation.Psychometrika,70, in press.
Carroll, J.D. (1972). Individual differences and multidimensional scaling. In R.N. Shepard, A.K. Romney, & S.B. Nerlove (Eds.),Multidimensional Scaling: Theory and Applications in the Behavioral Sciences, Vol. I. Theory (pp. 105–155). New York: Wiley.
Cliff, N., Collins, L.M., Zatkin, J.L., Gallipeau, D., & McCormick, D.J. (1988). An ordinal scaling method for questionnaire and other ordinal data.Applied Psychological Measurement, 12, 83–97.
Cohen, A., & Mallows, C.L. (1980).Analysis of ranking data. Technical Report, Bell Telephone Laboratories, Murray Hill, New Jersey.
Cohen, A., & Mallows, C.L. (1983). Assessing goodness of fit of ranking models to data.The Statistician, 32, 361–373.
Coombs, C.H. (1950). Psychological scaling without a unit of measurement.Psychological Review, 57, 145–158.
Coombs, C.H. (1964).A Theory of Data. New York: Wiley.
Cox, D.R. (1970).The Analysis of Binary Data. London: Methuen.
Coxeter, H.S.M. (1973).Regular Polytopes. New York: Dover.
Critchlow, D.E. (1985).Metric Methods for Analyzing Partially Ranked Data. New York: Springer Verlag.
Daniels, H.E. (1950). Rank correlation and population models.Journal of the Royal Statistical Society, Series B,12, 171–181.
Delbeke, L. (1968).Construction of preference spaces: An investigation into the applicability of multidimensional scaling models. Leuven: Leuvense Universitaire Uitgaven.
De Rooij, M., & Heiser, W.J. (2005). Graphical representations and odds ratios in a distance-association model for the analysis of cross-classified data.Psychometrika,70, in press.
DeSarbo, W.S., & Cho, J. (1989). A stochastic multidimensional scaling vector threshold model for the spatial representation of “pick any/N” data.Psychometrika, 54, 105–129.
DeSarbo, W.S., & Hoffman, D.L. (1986). Simple and weighted unfolding threshold models for the spatial representation of binary choice data.Applied Psychological Measurement, 10, 247–264.
DeSarbo, W.S., & Rao, V.R. (1984). GENFOLD2: A set of models and algorithms for the GENeral unfolding analysis of preference/dominance data.Journal of Classification, 1, 147–186.
De Soete, G., & Heiser, W.J. (1993). A latent class unfolding model for analyzing single stimulus preference ratings.Psychometrika, 58, 545–565.
Diaconis, P. (1982).Group Theory in Statistics. Harvard University Lecture Notes.
Diaconis, P. (1988).Group Representations in Probability and Statistics. Hayward, CA: Institute of Mathematical Statistics.
Dijksterhuis, E.J. (1987).Archimedes. Princeton, NJ: Princeton University Press.
Embretson, S.E. (1984). A general multicomponent latent trait model for response processes.Psychometrika, 49, 175–186.
Escher, B.G. (1934).De Methodes der Grafische Voorstelling [Methods of Graphical Representation]. Amsterdam: Maatschappij voor Goede en Goedkope Lectuur.
Feigin, P.D., & Cohen, A. (1978). On a model for concordance between judges.Journal of the Royal Statistical Society, Series B,40, 203–213.
Fienberg, S.E. (1968). The geometry of anr ×c contingency table.The Annals of Mathematical Statistics, 39, 1186–1190.
Fienberg, S.E. (1970). An iterative procedure for estimation in contingency tables.The Annals of Mathematical Statistics, 41, 907–917.
Fienberg, S.E., & Gilbert, J.P. (1970). The geometry of a two by two contingency table.Journal of the American Statistical Association, 65, 694–701.
Fienberg, S.E., & Holland, P.W. (1973). Simultaneous estimation of multinomial cell probabilities.Journal of the American Statistical Association, 68, 683–691.
Fienberg, S.E., & Larntz, K. (1976). Loglinear representation for paired and multiple comparison models.Biometrika, 63, 245–254.
Fischer, G.H. (1973). Linear logistic test model as an instrument in educational research.Acta Psychologica, 37, 359–374.
Fligner, M.A., & Verducci, J.S. (1986). Distance based ranking models.Journal of the Royal Statistical Society, Series B,48, 359–369.
Fligner, M.A., & Verducci, J.S. (1988). Multistage ranking models.Journal of the American Statistical Association, 83, 892–901.
Fligner, M.A., & Verducci, J.S. (Eds.) (1993).Probability Models and Statistical Analyses for Ranking Data. New York; Springer Verlag.
Galton, F. (1888). Co-relations and their measurement, chiefly from anthropometric data.Proceedings of the Royal Society of London, 45, 135–145.
Gibbs, J.W. (1877). On the equilibrium of heterogeneous substances.Transactions of the Connecticut Academy, III, 108–248.
Gifi, A. (1990).Nonlinear Multivariate Analysis. New York: Wiley.
Goodman, L.A. (1985). The analysis of cross-classified data having ordered and/or unordered categories: Association models, correlation models, and asymmetry models for contingency tables with or without missing entries.Annals of Statistics, 13, 10–69.
Greenacre, M.J. (1988). Clustering the rows and columns of a contingency table.Journal of Classification, 5, 39–51.
Greenacre, M.J. (1993). Biplots in correspondence analysis.Journal of Applied Statistics, 20, 251–269.
Hastie, T., Tibshirani, R., & Friedman, J.H. (2001).The Elements of Statistical Learning. New York: Springer Verlag.
Heath, T.L. (1925). Introduction. In: Euclid,The Thirteen Books of the Elements. New York: Dover (unabridged republication of the second edition, 1956).
Heiser, W.J. (1981).Unfolding Analysis of Proximity Data. Unpublished Ph.D. Thesis, Leiden University.
Heiser, W.J. (1989). Order invariant unfolding analysis under smoothness restrictions. In G. De Soete, H. Feger, & K.C. Klauer (Eds.),New Developments in Psychological Choice Modeling (pp. 3–31). Amsterdam: North-Holland.
Heiser, W.J. (2001). Correspondence analysis. In N.J. Smelser & P.B. Baltes (Eds.),International Encyclopedia of the Social and Behavioral Sciences (pp. 2820–2824). Oxford: Pergamon.
Heiser, W.J. (2003a). Trust in relations.Measurement: Interdisciplinary Research and Perspectives, 1, 264–269.
Heiser, W.J. (2003b).Interpretation of Between-Set Distances in Correspondence Analysis. Paper presented at the DI-MACS Workshop on Algorithms for Multidimensional Scaling, II. Tallahassee, Florida, USA, June 11–12, 2003.
Heiser, W.J., & Busing, F.M.T.A. (2004). Multidimensional scaling and unfolding of symmetric and asymmetric proximity relations. In D. Kaplan (Ed.),The SAGE Handbook of Quantitative Methodology for the Social Sciences (pp. 25–48). Thousand Oaks, CA: Sage.
Heiser, W.J., & Meulman, J.J. (1983). Analyzing rectangular tables by joint and constrained MDS.Journal of Econometrics, 22, 139–167.
Ihm, P., & Van Groenewoud, H. (1975). A multivariate ordering of vegetation data based on Gaussian type gradient response curves.Journal of Ecology, 63, 767–778.
Israëls, A. (1987).Eigenvalue Techniques for Qualitative Data. Leiden: DSWO Press.
Johnson, M., & Junker, B.W. (2003). Using data augmentation and Markov chain Monte Carlo for the estimation of unfolding response models.Journal of Educational and Behavioral Statistics, 28, 195–230.
Kelderman, H., & Rijkes, C.P.M. (1994). Loglinear multidimensional IRT models for polytomously scored items.Psychometrika, 59, 149–176.
Kendall, M.G. (1948).Rank Correlation Methods (first edition). London: Griffin.
Kim, C., Rangaswamy, A., & DeSarbo, W.S. (1999). A quasi-metric approach to multidimensional unfolding for reducing the occurrence of degenerate solutions.Multivariate Behavioral Research, 34, 134–180.
Kruskal, J.B., & Carroll, J.D. (1969). Geometric models and badness-of-fit functions. In P.R. Krishnaiah (Ed.),Multivariate Analysis, Vol II (pp. 639–671). New York: Academic Press.
Kruskal, J.B., & Shepard, R.N. (1974). A nonmetric variety of linear factor analysis.Psychometrika, 39, 123–157.
Lebart, L. (1998). Correspondence analysis, discrimination, and neural networks. In C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H.-H. Bock, & Y. Baba (Eds.),Data Science, Classification, and Related Methods (pp. 423–430). Tokyo: Springer.
Lee, S.-Y., Poon, W.-Y., & Bentler, P.M. (1992). Structural equation models with continuous and polytomous variables.Psychometrika, 57, 89–105.
Lewis, C. (1986). Test Theory and Psychometrika: The past twenty-five years.Psychometrika, 51, 11–22.
Lovie, A.D. (1995). Who discovered Spearman's rank correlation?British Journal of Mathematical and Statistical Psychology, 48, 255–269.
Luce, R.D. (1959).Individual Choice Behavior: A Theoretical Analysis. New York: Wiley.
Magidson, J., & Vermunt, J.K. (2001). Latent class factor and cluster models, bi-plots, and related graphical displays.Sociological Methodology, 31, 223–274.
Mallows, C.L. (1957). Non-null ranking models: I.Biometrika, 44, 114–130.
Marden J.I. (1995).Analyzing and Modeling Rank Data. London: Chapman & Hall.
Maxwell, J.C. (1857). Experiments on colour, as perceived by the eye, with remarks on colour-blindness.Transactions of the Royal Society of Edinburgh, 21, 275–298.
Maxwell, J.C. (1860). On the theory of compound colours, and the relations of the colours of the spectrum.Philosophical Transactions of the Royal Society of London, 150, 57–84.
McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.),Frontiers in Econometrics, pp. 105–142. New York: Academic Press.
Meulman, J.J., & Heiser, W.J. (1998). Visual display of interaction in multiway contingency tables by use of homogeneity analysis: the 2×2×2×2 case. In J. Blasius & M. Greenacre (Eds.),Visualization of Categorical Data (pp. 277–296). New York: Academic Press.
Meulman, J.J., Van der Kooij, A.J., & Heiser, W.J. (2004). Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. In D. Kaplan (Ed.),The SAGE Handbook of Quantitative Methodology for the Social Sciences (pp. 49–70). Thousand Oaks, CA: Sage.
Michell, J. (1999).Measurement in Psychology: Critical History of a Methodological Concept. Cambridge, UK: Cambridge University Press.
Mirkin, B. (2001). Eleven ways to look at the chi-squared coefficient for contingency tables.The American Statistician, 55, 111–120.
Mirkin, B., Arabie, P., & Hubert, L.J. (1995). Additive two-mode clustering: The error-variance approach revisited.Journal of Classification, 12, 243–263.
Nishisato, S. (1984). Forced classification: A simple application of a quantification technique.Psychometrika, 49, 25–36.
Nishisato, S. (2004). Dual scaling. In D. Kaplan (Ed.),The SAGE Handbook of Quantitative Methodology for the Social Sciences, pp. 3–24. Thousand Oaks, CA: Sage.
Pearson, K. (1896). Mathematical contributions to the theory of evolution—III: Regression, heredity, and panmixia.Philosophical Transactions of the Royal Society of London, Series A, Mathematical and Physical Sciences, 187, 253–318.
Pearson, K. (1900).The Grammar of Science (2nd Ed.). London: Adam and Charles Black.
Plackett, R.L. (1975). The analysis of permutations.Applied Statistics, 24, 193–202.
Post, W.J. (1992).Nonparametric Unfolding Models: A Latent Structure Approach. Leiden: DSWO Press.
Rasch, G. (1966). An item analysis which takes individual differences into account.British Journal of Mathematical and Statistical Psychology, 19, 49–57.
Roberts, J.S., Donoghue, J.R., & Laughlin, J.E. (2000). A general item response theory model for unfolding unidimendional polytomous responses.Applied Psychological Measurement, 24, 3–32.
Roberts, J.S., & Laughlin, J.E. (1996). A unidimensional item response model for unfolding responses from a graded disagree-agree response scale.Applied Psychological Measurement, 20, 231–255.
Roskam, E.E.Ch.I. (1968).Metric Analysis of Ordinal Data in Psychology. Voorschoten: VAM Publ.
Ross, J., & Cliff, N. (1964). A generalization of the interpoint distance model.Psychometrika, 29, 167–176.
Schönemann, P.H. (1970). On metric multidimensional unfolding.Psychometrika, 35, 349–366.
Schoute, P.H. (1911). Analytic treatment of the polytopes regularly derived from the regular polytopes.Verhandelingen der Koninklijke Akademie van Wetenschappen te Amsterdam (eerste sectie), 11(3), 1–82 (incl. separate plates with 12 figures and 3 tables).
Slater, P. (1960). The analysis of personal preferences.British Journal of Statistical Psychology, 13, 119–135.
Spearman, C.E. (1904a). The proof and measurement of association between two things.American Journal of Psychology, 15, 72–101.
Spearman, C.E. (1904b). ‘General intelligence’ objectively determined and measured.American Journal of Psychology, 15, 201–293.
Spearman, C.E. (1906). ‘Footrule’ for measuring correlation.British Journal of Psychology, 2, 89–108.
Stein, S. (1999).Archimedes: What Did He Do Besides Cry Eureka? Washington, DC: The Mathematical Association of America.
Stevens, S.S. (1951). Mathematics, measurement and psychophysics. In S.S. Stevens (Ed.),Handbook of Experimental Psychology (pp. 1–49). New York: Wiley.
Takane, Y. (1987). Analysis of contingency tables by ideal point discriminant analysis.Psychometrika, 52, 493–513.
Takane, Y. (1996). An item response model for multidimensional analysis of multiple-choice questionaire data.Behaviormetrika, 23, 153–167.
Takane, Y. (1998). Visualisation in ideal point discriminant analysis. In J. Blasius & M. Greenacre (Eds.),Visualization of Categorical Data (pp. 441–459). New York: Academic Press.
Takane, Y., Yanai, H., & Mayekawa, S. (1991). Relationships among several methods of linearly constrained correspondence analysis.Psychometrika, 56, 667–684.
Ter Braak, C.J.F. (1986). Canonical correspondence analysis: A new eigenvector technique for multivariate direct gradient analysis.Ecology, 67, 1167–1179.
Thompson, G.L. (1993). Generalized permutation polytopes and exploratory graphical methods for ranked data.Annals of Statistics, 21, 1401–1430.
Tucker, L.R. (1960). Intra-individual and inter-individual multidimensionality. In H. Gulliksen & S. Messick (Eds.),Psychological Scaling: Theory and Applications (pp. 155–167). New York: Wiley.
Van de Geer, J.P. (1993).Multivariate Analysis of Categorical Data: Theory. Newbury Park, CA: Sage.
Van der Ark, L.A., & Van der Heijden, P.G.M (1998). Graphical display of latent budget analysis and latent class analysis, with special reference to correspondence analysis. In J. Blasius & M. Greenacre (Eds.),Visualization of Categorical Data (pp. 489–508). New York: Academic Press.
Van der Ark, L.A., Van der Heijden, P.G.M., & Sikkel, D. (1999). On the identifiability in the latent budget model.Journal of Classification, 16, 117–137.
Van Deun, K., Groenen, P.J.F., Heiser, W.J., Busing, F.M.T.A., & Delbeke, L. (2005). Interpreting degenerate solutions in unfolding by use of the vector model and the compensatory distance model.Psychometrika, 70, in press.
Van Deun, K., Heiser, W.J., & Delbeke, L. (2004). Multidimensional unfolding by nonmetric multidimensional scaling of Spearman distances in the extended permutation polytope (submitted ms.).
Van IJzendoorn, M.H., & Kroonenberg, P.M. (1988). Cross-cultural patterns of attachment: A meta-analysis of the strange situation.Child Development, 59, 147–156.
Wickens, T.D. (1989).Multiway Contingency Tables Analysis for the Social Sciences. Hillsdale, NJ: Lawrence Erlbaum.
Williams, R.H., Zimmerman, D.W., Zumbo, B.D., & Ross, D. (2003). Charles Spearman: British behavioral scientist.Human Nature Review, 3, 114–118.
Wilkinson, L. (1999).The Grammar of Graphics. New York: Springer Verlag.
Young, F.W., Takane, Y., & De Leeuw, J. (1978). The principal components of mixed measurement level multivariate data: An alternating least squares method with optimal scaling features.Psychometrika, 43, 279–281.
Yule, G.U. (1900). On the association of attributes in statistics.Philosophical Transactions of the Royal Society of London (A), 194, 257–319.
Zhang, J. (2004). Binary choice, subset choice, random utility, and ranking: A unified perspective using the permutahedron.Journal of Mathematical Psychology, 48, 107–134.
Ziegler, G.U. (1995).Lectures on Polytopes. New York: Springer Verlag.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is based on my Presidential Address delivered at the 69th Annual Meeting of the Psychometric Society, Pacific Grove, California, June 14–17, 2004. It was completed during a stay as Fellow of the Netherlands Institute for Advanced Study in the Humanities and Social Sciences (NIAS) in Wassenaar, The Netherlands.
I would like to thank Marike Polak, Frank Busing, Elise Dusseldorp, and Angela Jansen for their help in the data analyses and the preparation of the figures, and Laurence Frank for her assistance during the oral presentation. I am also very lucky to have a career-long personal coach, Jacqueline J. Meulman, with whom I share so many interests and perspectives.
Rights and permissions
About this article
Cite this article
Heiser, W.J. Geometric representation of association between categories. Psychometrika 69, 513–545 (2004). https://doi.org/10.1007/BF02289854
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02289854