Skip to main content
Log in

Variable selection in discriminant analysis based on the location model for mixed variables

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Non-parametric smoothing of the location model is a potential basis for discriminating between groups of objects using mixtures of continuous and categorical variables simultaneously. However, it may lead to unreliable estimates of parameters when too many variables are involved. This paper proposes a method for performing variable selection on the basis of distance between groups as measured by smoothed Kullback–Leibler divergence. Searching strategies using forward, backward and stepwise selections are outlined, and corresponding stopping rules derived from asymptotic distributional results are proposed. Results from a Monte Carlo study demonstrate the feasibility of the method. Examples on real data show that the method is generally competitive with, and sometimes is better than, other existing classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aeberhard S, Vel OYD, Coomans DH (2000) New fast algorithms for error rate-based stepwise variable selection in discriminant analysis. SIAM J Sci Comput 22:1036–1052

    Article  MATH  MathSciNet  Google Scholar 

  • Aitchison J, Aitken CGG (1976) Multivariate binary discrimination by the kernel method. Biometrika 63:413–420

    Article  MATH  MathSciNet  Google Scholar 

  • Asparoukhov O, Krzanowski WJ (2000) Non-parametric smoothing of the location model in mixed variable discrimination. Stat Comput 10:289–297

    Article  Google Scholar 

  • Bar-Hen A, Daudin JJ (1995) Generalization of the Mahalanobis distance in the mixed case. J Multivar Anal 53:332–342

    Article  MATH  MathSciNet  Google Scholar 

  • Bickel PJ, Levina E (2004) Some theory for Fisher’s Linear Discriminant function, “naive Bayes”, and some alternatives when there are many more variables than observations. Bernoulli 10:989–1010

    Article  MATH  MathSciNet  Google Scholar 

  • Chang PC, Afifi AA (1974) Classification based on dichotomous and continuous variables. J Am Stat Assoc 69:336–339

    Article  MATH  Google Scholar 

  • Costanza MC, Afifi AA (1979) Comparison of stopping rules in forward stepwise discriminant analysis. J Am Stat Assoc 74:777–785

    Article  MATH  Google Scholar 

  • Daudin JJ (1986) Selection of variables in mixed-variable discriminant analysis. Biometrics 42:473–481

    Article  MathSciNet  Google Scholar 

  • Daudin JJ, Bar-Hen A (1999) Selection in discriminant analysis with continuous and discrete variables. Comput Stat Data Anal 32:161–175

    Article  Google Scholar 

  • Duin RPW (1996) A note on comparing classifiers. Patt Recognit Lett 17:529–536

    Article  Google Scholar 

  • Everitt BS, Merette C (1990) The clustering of mixed-mode data: A comparison of possible approaches. J Appl Stat 17:283–297

    Article  Google Scholar 

  • Fienberg SE (1972) The analysis of incomplete multiway contingency tables. Biometrics 28:177–202

    Article  Google Scholar 

  • Ganeshanandam S, Krzanowski WJ (1989) On selecting variables and assessing their performance in linear discriminant analysis. Aust J Stat 31:433–447

    Article  MATH  Google Scholar 

  • Habbema JDF, Hermans J (1977) Selection of variables in discriminant analysis by F-statistic and error rate. Technometrics 19:487–493

    Article  MATH  Google Scholar 

  • Hall P (1981) Optimal near neighbour estimator for use in discriminant analysis. Biometrika 68:572–575

    Article  MATH  MathSciNet  Google Scholar 

  • Hand DJ (1997) Construction and assessment of classification rules. Wiley, Chichester

    MATH  Google Scholar 

  • Hoadley B (2001) Comment on “Statistical modelling: The two cultures”, by Breiman, L. Stat Sci 16: 220–224

    Google Scholar 

  • Krusińska E (1987) A valuation of state of object based on weighted Mahalanobis distance. Patt Recognit 20:413–418

    Article  Google Scholar 

  • Krzanowski WJ (1975) Discrimination and classification using both binary and continuous variables. J Am Stat Assoc 70:782–790

    Article  MATH  Google Scholar 

  • Krzanowski WJ (1980) Mixtures of continuous and categorical variables in discriminant analysis. Biometrics 36:493–499

    Article  MATH  Google Scholar 

  • Krzanowski WJ (1983) Stepwise location model choice in mixed-variable discrimination. Appl Stat 32: 260–266

    Article  Google Scholar 

  • Krzanowski WJ (1994) Quadratic location discriminant functions for mixed categorical and continuous data. Stat Prob Lett 19:91–95

    Article  MATH  Google Scholar 

  • Mahat NI (2006) Some investigations in discriminant analysis with mixed variables. Ph. D. thesis, Exeter University, U.K.

  • McKay RJ, Campbell NA (1982) Variable selection techniques in discriminant analysis ii. allocation. British J Math Stat Psychol 35:30–41

    MATH  MathSciNet  Google Scholar 

  • McLachlan GJ (1992) Discriminant analysis and statistical pattern recognition. Wiley, New York

    Google Scholar 

  • Olkin I, Tate RF (1961) Multivariate correlation models with mixed discrete and continuous variables. Ann Math Stat 32:448–465

    MATH  MathSciNet  Google Scholar 

  • Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, New York

    Google Scholar 

  • Raudys SJ, Jain AK (1991) Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Trans Syst Man Cyber 13:252–264

    Google Scholar 

  • Rencher AC (1993) The contribution of individual variables to Hotelling’s T2, Wilk’s λ, and R2. Biometrics 49:479–489

    Article  MATH  MathSciNet  Google Scholar 

  • Snapinn SM, Knoke JD (1989) Estimation of error rates in discriminant analysis with selection of variables. Biometrics 45:289–299

    Article  MATH  MathSciNet  Google Scholar 

  • Venables WN, Ripley BD (1994) Modern applied statistics with S-Plus. Springer, New York

    MATH  Google Scholar 

  • Webb A (2002) Statistical pattern recognition, 2nd edn. Wiley, Chichester

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nor Idayu Mahat.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahat, N.I., Krzanowski, W.J. & Hernandez, A. Variable selection in discriminant analysis based on the location model for mixed variables. ADAC 1, 105–122 (2007). https://doi.org/10.1007/s11634-007-0009-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-007-0009-9

Keywords

Mathematics Subject Classification (2000)

Navigation