Skip to main content
Log in

The location model for mixtures of categorical and continuous variables

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Recent research into graphical association models has focussed interest on the conditional Gaussian distribution for analyzing mixtures of categorical and continuous variables. A special case of such models, utilizing the homogeneous conditional Gaussian distribution, has in fact been known since 1961 as the location model, and for the past 30 years has provided a basis for the multivariate analysis of mixed categorical and continuous variables. Extensive development of this model took place throughout the 1970’s and 1980’s in the context of discrimination and classification, and comprehensive methodology is now available for such analysis of mixed variables. This paper surveys these developments and summarizes current capabilities in the area. Topics include distances between groups, discriminant analysis, error rates and their estimation, model and feature selection, and the handling of missing data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • AFIFI, A. A., and ELASHOFF, R. M. (1969), “Multivariate Two-sample Tests with Dichotomous and Continuous Variables. 1. The Location Model,”Annals of Mathematical Statistics, 40, 290–298.

    MathSciNet  MATH  Google Scholar 

  • AKAIKE, H. (1973), “Information Theory and an Extension of the Maximum Likelihood Principle,” inSecond International Symposium on Information Theory, Eds., B. N. Petrov and F. Csaki Budapest: Akademia Kiado, 267–281.

    Google Scholar 

  • ANDERSON, J. A. (1982), “Logistic Discrimination,” InHandbook of Statistics 2, Classification, Pattern Recognition and Reduction of Dimensionality, Eds., P.R. Krishnaiah and L.N. Kanal. Amsterdam: North Holland, 169–191.

    Google Scholar 

  • ANDERSON, T. W. (1973), “An Asymptotic Expansion of the Distribution of the Studentized Classification Statistic W”,Annals of Statistics, 1, 964–972.

    MATH  MathSciNet  Google Scholar 

  • BALAKRISHNAN, N., and TIKU, M. L. (1988), “Robust Classification Procedures Based on Dichotomous and Continuous Variables”,Journal of Classification, 5, 53–80.

    Article  MATH  MathSciNet  Google Scholar 

  • CHANG P. C., and AFIFI, A. A. (1974), “Classification Based on Dichotomous and Continuous Variables,”Journal of the American Statistical Association, 69, 336–339.

    Article  MATH  Google Scholar 

  • COX, D. R. (1972), “The Analysis of Multivariate Binary Data,”Applied Statistics 21, 113–120.

    Article  Google Scholar 

  • CUADRAS, C. M. (1989), “Distance Analysis in Discrimination and Classification Using Both Continuous and Categorical Variables,” inStatistical Data Analysis and Inference, Ed., Y. Dodge, Amsterdam: North Holland, 459–473.

    Google Scholar 

  • CUADRAS, C. M. (1991), “A Distance-based Approach to Discriminant Analysis and Its Properties”, Mathematics preprint series no. 90, Barcelona University.

  • DAUDIN, J. J. (1986), “Selection of Variables in Mixed-variable Discriminant Analysis,”Biometrics, 42, 473–481.

    Article  MathSciNet  Google Scholar 

  • DILLON, W. R., and GOLDSTEIN, M. (1978) “On the Performance of Some Multinomial Classification Rules,”Journal of the American Statistical Association, 73, 305–313.

    Article  Google Scholar 

  • EDWARDS, D. (1990), “Hierarchical Interaction Models,”Journal of the Royal Statistical Society, Series B, 52, 3–20.

    MATH  Google Scholar 

  • GANESHANANDAM, S., and KRZANOWSKI, W. J. (1989), “On Selecting Variables and Assessing Their Performance in Linear Discriminant Analysis,”Australian Journal of Statistics, 31, 433–447.

    MATH  Google Scholar 

  • GOWER, J. C. (1971), “A General Coefficient of Similarity and Some of Its Properties,”Biometrics, 27, 857–871.

    Article  Google Scholar 

  • HAN, C.-P. (1979), “Alternative Methods of Estimating the Likelihood Ratio in Classification of Multivariate Normal Observations,”American Statistician 33, 204–206.

    Article  MATH  MathSciNet  Google Scholar 

  • KNOKE, J. D. (1982), “Discriminant Analysis with Discrete and Continuous Variables”,Biometrics, 38, 191–200.

    Article  Google Scholar 

  • KRUSINSKA, E. (1988a), “Variable Selection in Location Model for Mixed Variable Discrimination: A Procedure Based on Total Probability of Misclassification,”EDV in Medizin und Biologie, 19, 14–18.

    Google Scholar 

  • KRUSINSKA, E. (1988b), “Linear Transformations in Location Model and Their Influence on Classification Results in Mixed Variable Discrimination,”EDV in Medizin und Biologie, 19, 110–114.

    Google Scholar 

  • KRUSINSKA, E. (1989a), “New Procedure for Selection of Variables in Location Model for Mixed Variable Discrimination,”Biometrical Journal, 31, 511–523.

    Article  MathSciNet  Google Scholar 

  • KRUSINSKA, E. (1989b), “Two Step Semi-optimal Branch and Bound Algorithm for Feature Selection in Mixed Variable Discrimination,”Pattern Recognition, 22, 455–459.

    Article  Google Scholar 

  • KRUSINSKA, E. (1990), “Suitable Location Model Selection in the Terminology of Graphical Models,”Biometrical Journal 32, 817–826.

    Article  Google Scholar 

  • KRZANOWSKI, W. J. (1975), “Discrimination and Classification Using Both Binary and Continuous Variables,”Journal of the American Statistical Association 70, 782–790.

    Article  MATH  Google Scholar 

  • KRZANOWSKI, W. J. (1976), “Canonical Representation of the Location Model for Discrimination or Classification,”Journal of the American Statistical Association, 71, 845–848.

    Article  MATH  Google Scholar 

  • KRZANOWSKI, W. J. (1977), “The Performance of Fisher’s Linear Discriminant Function Under Non-optimal Conditions,”Technometrics 19, 191–200.

    Article  MATH  Google Scholar 

  • KRZANOWSKI, W. J. (1979), “Some Linear Transformations for Mixtures of Binary and Continuous Variables, With Particular Reference to Linear Discriminant Analysis,”Biometrika, 66, 33–39.

    Article  MATH  MathSciNet  Google Scholar 

  • KRZANOWSKI, W. J. (1980), “Mixtures of Continuous and Categorical Variables in Discriminant Analysis,”Biometrics, 36, 493–499.

    Article  MATH  Google Scholar 

  • KRZANOWSKI, W. J. (1982), “Mixtures of Continuous and Categorical Variables in Discriminant Analysis: A Hypothesis-testing Approach,”Biometrics, 38, 991–1002.

    Article  MATH  MathSciNet  Google Scholar 

  • KRZANOWSKI, W. J. (1983a), “Distance Between Populations Using Mixed Continuous and Categorical Variables,”Biometrika, 70, 235–243.

    Article  MATH  MathSciNet  Google Scholar 

  • KRZANOWSKI, W. J. (1983b), “Stepwise Location Model Choice in Mixed-variable Discrimination,”Applied Statistics, 32, 260–266.

    Article  Google Scholar 

  • KRZANOWSKI, W. J. (1984) “On the Null Distribution of Distance Between Two Groups, Using Mixed Continuous and Categorical Variables,”Journal of Classification, 1, 243–253.

    Article  MATH  Google Scholar 

  • KRZANOWSKI, W. J. (1986), “Multiple Discriminant Analysis in the Presence of Mixed Continuous and Categorical Data,”Computers and Mathematics with Applications, 12A(2), 179–185.

    Article  Google Scholar 

  • KRZANOWSKI, W. J. (1987), “A Comparison Between Two Distance-based Discriminant Principles,”Journal of Classification, 4, 73–84.

    Article  MATH  Google Scholar 

  • LACHENBRUCH P. A., and MICKEY, M. R. (1968), “Estimation of Error Rates in Discriminant Analysis,”Technometrics 10, 1–11.

    Article  MathSciNet  Google Scholar 

  • LAURITZEN, S. L., and WERMUTH, N. (1989), “Graphical Models for Association Between Variables, Some of Which Are Qualitative and Some Quantitative,”Annals of Statistics, 17, 31–54.

    MATH  MathSciNet  Google Scholar 

  • LERMAN, I. C. (1987), “Construction d’un indice de Similarité entre objets décrits par des variables d’un type quelconque. Application au problème du consensus en classification (1),”Revue de Statistique Appliquée, 35, 39–60

    MATH  MathSciNet  Google Scholar 

  • LEUNG, C. Y. (1989), “The Studentized Location Linear Discriminant Function,”Communications in Statistics, Theory and Methods 18, 3977–3990.

    MATH  MathSciNet  Google Scholar 

  • LITTLE, R. J. A., and SCHLUCHTER, M. D. (1985), “Maximum Likelihood Estimation for Mixed Continuous and Categorical Data with Missing Values,”Biometrika, 72, 497–512.

    Article  MATH  MathSciNet  Google Scholar 

  • MATUSITA, K. (1956), “Decision Rule, Based on the Distance, for the Classification Problem,”Annals of Mathematical Statistics, 8, 67–77.

    MATH  MathSciNet  Google Scholar 

  • OKAMOTO, M. (1963), “An Asymptotic Expansion for the Distribution of the Linear Discriminant Function,”Annals of Mathematical Statistics, 34, 1286–1301 (with correction in39, 1358–1359).

    MathSciNet  MATH  Google Scholar 

  • OLKIN, I., and TATE R. F. (1961), “Multivariate Correlation Models with Mixed Discrete and Continuous Variables,”Annals of Mathematical Statistics, 32, 448–465 (with correction in36 343–344).

    MathSciNet  MATH  Google Scholar 

  • RAO, C. R. (1982), “Diversity and Dissimilarity Coefficients: A Unified Approach,”Theoretical Population Biology 21, 24–43.

    Article  MATH  MathSciNet  Google Scholar 

  • TAKANE, Y., BOZDOGAN, H. and SHIBAYAMA, T. (1987), “Ideal Point Discriminant Analysis,”Psychometrika, 52, 371–392.

    Article  MATH  MathSciNet  Google Scholar 

  • TIKU, M. L., and BALAKRISHNAN, N. (1984), “Robust Multivariate Classification Procedures Based on the MML Estimators,”Communications in Statistics—Theory and Methods, 13, 967–986.

    MATH  MathSciNet  Google Scholar 

  • TU, C. T. and HAN, C. P. (1982), “Discriminant Analysis Based on Binary and Continuous Variables,”Journal of the American Statistical Association, 77, 447–454.

    Article  MATH  MathSciNet  Google Scholar 

  • VLACHONIKOLIS, I. G. (1985), “On the Asymptotic Distribution of the Location Linear Discriminant Function,”Journal of the Royal Statistical Society, Series B, 47, 498–509.

    MATH  MathSciNet  Google Scholar 

  • VLACHONIKOLIS, I. G. (1986), “On the Estimation of the Expected Probability of Misclassification in Discriminant Analysis with Mixed Binary and Continuous Variables,”Computers and Mathematics with Applications, 12A(2), 187–195.

    Article  Google Scholar 

  • VLACHONIKOLIS, I. G. (1990), “Predictive Discrimination and Classification with Mixed Binary and Continuous Variables,”Biometrika, 77, 657–662.

    Article  MathSciNet  Google Scholar 

  • VLACHONIKOLIS, I. G., and MARRIOTT F. H. C. (1982), “Discrimination with Mixed Binary and Continuous Data”,Applied Statistics, 31, 23–31.

    Article  Google Scholar 

  • WERMUTH, N., and LAURITZEN, S. L. (1990), “On Substantive Research Hypotheses, Conditional Independence Graphs and Graphical Chain Models,”Journal of the Royal Statistical Society, Series B, 52, 21–50.

    MathSciNet  Google Scholar 

  • WHITTAKER, J. (1990),Graphical Models in Applied Multivariate Statistics, Chichester: Wiley.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krzanowski, W.J. The location model for mixtures of categorical and continuous variables. Journal of Classification 10, 25–49 (1993). https://doi.org/10.1007/BF02638452

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02638452

Keywords

Navigation