Abstract
Normal distribution based discriminant methods have been used for the classification of new entities into different groups based on a discriminant rule constructed from the learning set. In practice if the groups are not homogeneous, then mixture discriminant analysis of Hastie and Tibshirani (J R Stat Soc Ser B 58(1):155–176, 1996) is a useful approach, assuming that the distribution of the feature vectors is a mixture of multivariate normals. In this paper a new logistic regression model for heterogenous group structure of the learning set is proposed based on penalized multinomial mixture logit models. This approach is shown through simulation studies to be more effective. The results were compared with the standard mixture discriminant analysis approach using the probability of misclassification criterion. This comparison showed a slight reduction in the average probability of misclassification using this penalized multinomial mixture logit model as compared to the classical discriminant rules. It also showed better results when applied to practical life data problems producing smaller errors.
Article PDF
Similar content being viewed by others
References
Agresti A (1990) Categorical data analysis. John Wiley & Sons, Inc, New York
Albert A, Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71: 1–10
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39: 1–38
Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using GENE expression data. J Am Stat Assoc 97(457): 77–87
Everitt BS, Hand DJ (1981) Finite mixture distributions. Chapman & Hall, London
Firth D (1993) Bias reduction of maximum likelihood estimates. Biometrika 80: 27–38
Frank IE, Friedman JH (1993) A Statistical view of some chemometric regression tools. Technometrics 35: 109–148
Friedman JH (1989) Regularized discriminant analysis. J Am Stat Assoc 84: 165–175
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439): 531–537
Good IJ, Gaskins RA (1971) Nonparametric roughness penalties for probability densities. Biometrika 58: 255–277
Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc Ser B 58(1): 155–176
Heinze G, Schemper M (2002) A solution to the problem of separation in logistic regression. Stat Med 21: 2409–2419
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12: 55–67
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall, London
McCulloch R, Rossi PE (1994) An exact likelihood analysis of the multinomial probit model. J Econom 64: 207–240
McFadden D (1974) Conditional logit analysis of qualitative choice behavior. In: Zarembka P (eds) Frontiers in econometrics. Academic Press, New York, pp 105–142
Murphy PM, Aha DW (1995) UCI repository of machine learning databases dept of information and computer science, University of California, Irvine, California. http://www.ics.uci.edu/~mlearn/MLRepository.html
Peng F, Jacobs RA, Tanner MA (1996) Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition. J Am Stat Assoc 91(435): 953–960
Ripley BD (1996) Pattern recognition and neural networks. University Press, Cambridge
Schaefer R, Roi L, Wolfe R (1984) A ridge Logistic estimator. Commun Stat Theory Methods 13(1): 99–113
Schmidt PJ, Strauss RP (1975) The prediction of occupation using multiple logit models. Int Econ Rev 16: 471–486
Theil H (1969) A multinomial extension of the linear logit model. Int Econ Rev 10: 251–259
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58: 267–288
Wilhelm MS, Carter EM, Hubert JJ (1998) Multivariate iterative re-weighted least squares, with applications to dose–response data. Environmetrics 9: 303–315
Wolberg WH, Mangasarian OL (1990) Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc Natl Acad Sci USA 87:9193–9196. [ftp://ftp.ics.uci.edu/pub/machine-learning-databases/breast-cancer-wisconsin/]
Acknowledgments
The breast cancer database was obtained from the University of Wisconsin Hospitals, Madison from Wolberg and Mangasarian (1990).
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution,and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Bashir, S., Carter, E.M. Penalized multinomial mixture logit model. Comput Stat 25, 121–141 (2010). https://doi.org/10.1007/s00180-009-0165-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-009-0165-9