Skip to main content
Log in

Non-parametric smoothing of the location model in mixed variable discrimination

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

The location model is a familiar basis for discriminant analysis of mixtures of categorical and continuous variables. Its usual implementation involves second-order smoothing, using multivariate regression for the continuous variables and log-linear models for the categorical variables. In spite of the smoothing, these procedures still require many parameters to be estimated and this in turn restricts the categorical variables to a small number if implementation is to be feasible. In this paper we propose non-parametric smoothing procedures for both parts of the model. The number of parameters to be estimated is dramatically reduced and the range of applicability thereby greatly increased. The methods are illustrated on several data sets, and the performances are compared with a range of other popular discrimination techniques. The proposed method compares very favourably with all its competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitchison J. and Aitken C.G.G. 1976. Multivariate binary discrimination by the kernel method. Biometrika 63: 413–420.

    Google Scholar 

  • Anderson J.A. 1972. Separate sample logistic discrimination. Biometrika 59: 19–35.

    Google Scholar 

  • Anderson T.W. 1984. An Introduction to Multivariate Statistical Analysis, 2nd ed. Wiley, New York.

    Google Scholar 

  • Asparoukhov O.K. 1985. Microprocessor system for investigation of thromboembolic complications. Unpublished Ph.D. dissertation, Technical University, Sofia (in Bulgarian).

  • Asparoukhov O.K. and andreev Tz. 1995. Comparison of one-stage classifiers for assessment of the ability of children to construct grammatical structures consciously. In: Philosophic to Technical, Panchev I. (Ed.), Multivariate Analysis in the Behavioral Sciences. Academic Publishing House, Sofia, pp. 1–13.

    Google Scholar 

  • Asparoukhov O.K. and Danchev S. 1997. Discrimination and classification in the presence of binary variables. Biocybernetics and Biomedical Engineering, 17(1-2): 25–39.

    Google Scholar 

  • Breiman L., Friedman J.H., Olshen R.A., and Stone C.J. 1984. Classification and Regression Trees. Wadsworth, Belmont, CA.

    Google Scholar 

  • Celeux G. and Mkhadri A. 1992. Discrete regularized discriminant analysis. Statistics and Computing 2: 143–151.

    Google Scholar 

  • Daudin J.J. 1986. Selection of variables in mixed-variable discriminant analysis. Biometrics 42: 473–481.

    Google Scholar 

  • Friedman J.H. 1989. Regularized discriminant analysis. Journal of the American Statistical Association 84: 165–175.

    Google Scholar 

  • Gill P.E., Murray W., and Wright M.H. 1981. Practical Optimization. Academic Press, London.

    Google Scholar 

  • Habbema J.D.F., Hermans J., and van Den Broek K. 1974. A stepwise discriminant analysis program using density estimation. In: Bruckmann G. (Ed.), Compstat. Physica Verlag, Vienna, pp. 101–110.

    Google Scholar 

  • Habbema J.D.F., Hermans J., and Remme J. 1978. Variable kernel density estimation in discriminant analysis. In Corsten L.C.A. and Hermans J. (Eds.), Compstat. Physica Verlag, Vienna, pp. 178–185.

    Google Scholar 

  • Hall P. 1981. Optimal near neighbour estimator for use in discriminant analysis. Biometrika 68(2): 572–575.

    Google Scholar 

  • Hand D.J. 1997. Construction and Assessment of Classification Rules. Chichester, Wiley.

    Google Scholar 

  • Hastie T., Tibshirani R., and Buja A. 1994. Flexible discriminant analysis by optimal scoring. Journal of the American Statistical Association 89: 1255–1270.

    Google Scholar 

  • Hermans J., Habbema J.D., Kasanmoentalib T.K.D., and Raatgever J.W. 1982. Manual for ALLOC80 discriminant analysis program. Dept. of Medical Statistics, University of Leiden, Netherlands.

    Google Scholar 

  • Knoke J.D. 1982. Discriminant analysis with discrete and continuous variables. Biometrics 38: 191–200.

    Google Scholar 

  • Kohonen T. 1990. The self-organizing map. Proc. IEEE 78: 1464–1480.

    Google Scholar 

  • Krzanowski W.J. 1975. Discrimination and classification using both binary and continuous variables. Journal of the American Statistical Association 70: 782–790.

    Google Scholar 

  • Krzanowski W.J. 1993. The location model for mixtures of categorical and continuous variables. Journal of Classification 10: 25–49.

    Google Scholar 

  • Krzanowski W.J. 1994. Quadratic location discriminant function for mixed categorical and continuous data. Statistics & Probability Letters 19: 91–95.

    Google Scholar 

  • Krzanowski W.J. 1995. Selection of variables, and assessment of their performance, in mixed-variable discriminant analysis. Computational Statistics and Data Analysis 19: 419–431.

    Google Scholar 

  • Lachenbruch P.A. and Mickey M.R. 1968. Estimation of error rates in discriminant analysis. Technometrics 10: 1–11.

    Google Scholar 

  • Lim T.-S., LohW.-Y., and Shih Y.-S. 2000. A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning 40: 203–228.

    Google Scholar 

  • Markowski C. and Markowski E. 1987. An experimental comparison of the discriminant problem with both qualitative and quantitative variables. Eur. J. Oper. Res. 28: 74–78.

    Google Scholar 

  • McLachlan G.J. 1992. Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York.

    Google Scholar 

  • Olkin I. and Tate R.F. 1961. Multivariate correlation models with mixed discrette and continuous variables. Annals of Mathematical Statistics 32: 448–465.

    Google Scholar 

  • Ripley B. 1994. Neural networks and related methods for classification. J. R. Statist. Soc. B 56(3): 409–456.

    Google Scholar 

  • Rubin P.A. 1997. Solving mixed integer classification problems by decomposition. Annals of Operations Research 74: 51–64.

    Google Scholar 

  • Schmitz P.I.M., Habbema J.D., and Hermans J. (1983). The performance of logistic discrimination on myocardial infarction data, in comparison with some other discriminant analysis methods. Statist. Med. 2: 199–205.

    Google Scholar 

  • Schmitz P.I.M., Habbema J.D., and Hermans J. 1985. A simulation study of the performance of five discriminant analysis methods for mixtures of continuous and binary variables. J. Statist. Comput. Simul. 23: 69–95.

    Google Scholar 

  • Schmitz P.I.M., Habbema J.D., Hermans J., and Raatgever J.W. 1983. Comparative performance of four discriminant analysis methods for mixtures of continuous and discrete variables. Commun. Statist.-Simula. 12: 727–751.

    Google Scholar 

  • Stam A. and Joachimsthaler E.A. 1990. A comparison of a robust mixed-integer approach to existing methods for establishing classification rules for the discriminant problem. Eur. J. Oper. Res. 46: 113–122.

    Google Scholar 

  • Vlachonikolis I.J. and Marriott F.H.C. 1982. Discrimination with mixed binary and continuous data. Appl. Statist. 31: 23–31.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Asparoukhov, O., Krzanowski, W.J. Non-parametric smoothing of the location model in mixed variable discrimination. Statistics and Computing 10, 289–297 (2000). https://doi.org/10.1023/A:1008973308264

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008973308264

Navigation