Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Aitchison J. and Dunsmore I.R. 1975. Statistical Predication Analysis. Cambridge University Press, Cambridge.
Böhning D. 1999. Computer-Assisted Analysis of Mixtures and Applications: Meta-Analysis, Disease Mapping and Others. Chapman & Hall/CRC, New York.
Campbell N.A. 1984. Mixture models and atypical values. Mathematical Geology 16: 465–477.
Campbell N.A. and Mahon R.J. 1974. A multivariate study of variation in two species of rock crab of genus Leptograpsus. Australian Journal of Zoology 22: 417–425.
Davé R.N. and Krishnapuram R. 1995. Robust clustering methods: A unified view. IEEE Transactions on Fuzzy Systems 5: 270–293.
Dempster A.P., LairdN.M., and RubinD.B. 1977. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society B 39: 1–38.
De Veaux R.D. and Kreiger A.M. 1990. Robust estimation of a normal mixture. Statistics & Probability Letters 10: 1–7.
Everitt B.S. and Hand D.J. 1981. Finite Mixture Distributions. Chapman & Hall, London.
Frigui H. and Krishnapuram R. 1996. A robust algorithm for automatic extraction of an unknown number of clusters from noisy data. Pattern Recognition Letters 17: 1223–1232.
Gnanadesikan R., Harvey J.W., and Kettenring J.R. 1993. Sankhyā A 55: 494–505.
Green P.J. 1984. Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives. Journal of the Royal Statistical Society B 46: 149–192.
Hampel F.R. 1973. Robust estimation: A condensed partial survey. Z. Wahrscheinlickeitstheorie verw. Gebiete 27: 87–104.
Hawkins D.M. and McLachlan G.J. 1997. High-breakdown linear discriminant analysis. Journal of the American Statistical Association 92: 136–143.
Huber P.J. 1964. Robust estimation of a location parameter. Annals of Mmathematical Statistics 35: 73–101.
Jolion J.-M., Meer P., and Bataouche S. 1995. Robust clustering with applications in computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 13: 791–802.
Kent J.T., Tyler D.E., and Vardi Y. 1994. A curious likelihood identity for the multivariate t-distribution. Communications in Statistics — Simulation and Computation 23: 441–453.
Kharin Y. 1996. Robustness in Statistical Pattern Recognition. Kluwer, Dordrecht.
Kosinski A. 1999. A procedure for the detection of multivariate outliers. Computational Statistics and Data Analysis 29: 145–161.
Kowalski J., Tu X.M., Day R.S., and Mendoza-Blanco J.R. 1997. On the rate of convergence of the ECME algorithm for multiple regression models with t-distributed errors. Biometrika 84: 269–281.
Lange K., Little R.J.A., and Taylor J.M.G. 1989. Robust statistical modeling using the t distribution. Journal of the American Statistical Association 84: 881–896.
Lindsay B.G. 1995. Mixture Models: Theory, Geometry and Applications, NSF-CBMS Regional Conference Series in Probability and Statistics, Vol. 5. Institute of Mathematical Statistics and the American Statistical Association, Alexandria, VA.
Liu C. 1997. ML estimation of the multivariate t distribution and the EM algorithm. Journal of Multivariate Analysis 63: 296–312.
Liu C. and Rubin D.B. 1994. The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika 81: 633–648.
Liu C. and Rubin D.B. 1995. ML estimation of the t distribution using EM and its extensions, ECM and ECME. Statistica Sinica 5: 19–39.
Liu C., Rubin D.B., and Wu Y.N. 1998. Parameter expansion to accelerate EM: The PX-EM Algorithm. Biometrika 85: 755–770.
Markatou M. 1998. Mixture models, robustness and the weighted likelihood methodology. Technical Report No. 1998-9. Department of Statistics, Stanford University, Stanford.
Markatou M., Basu A., and Lindsay B.G. 1998. Weighted likelihood equations with bootstrap root search. Journal of the American Statistical Association 93: 740–750.
McLachlan G.J. and Basford K.E. 1988. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York.
McLachlan G.J. and Peel D. 1998. Robust cluster analysis via mixtures of multivariate t-distributions. In: Amin A., Dori D., Pudil P., and Freeman H. (Eds.), Lecture Notes in Computer Science Vol. 1451. Springer-Verlag, Berlin, pp. 658–666.
McLachlan G.J., Peel D., Basford K.E., and Adams P. 1999. Fitting of mixtures of normal and t-components. Journal of Statistical Software 4(2). (http://www.stat.ucla.edu/journals/jss/).
Meng X.L. and van Dyk D. 1995. The EM algorithm — an old folk song sung to a fast new tune (with discussion).
Rocke D.M. and Woodruff D.L. 1997. Robust estimation of multivariate location and shape. Journal of Statistical Planning and Inference 57: 245–255.
Rousseeuw P.J., Kaufman L., and Trauwaert E. 1996. Fuzzy clustering using scatter matrices. Computational Statistics and Data Analysis 23: 135–151.
Rubin D.B. 1983. Iteratively reweighted least squares. In: Kotz S., Johnson N.L., and Read C.B. (Eds.), Encyclopedia of Statistical Sciences Vol. 4. Wiley, New York, pp. 272–275.
Schroeter P., Vesin J.-M., Langenberger T., and Meuli R. 1998. Robust parameter estimation of intensity distributions for brain magnetic resonance images. IEEE Transactions on Medical Imaging 17: 172–186.
Smith D.J., Bailey T.C., and Munford G. 1993. Robust classification of high-dimensional data using artificial neural networks. Statistics and Computing 3: 71–81.
Sutradhar B. and Ali M.M. 1986. Estimation of parameters of a regression model with a multivariate t error variable. Communications in Statistic — Theory and Methods 15: 429–450.
Titterington D.M., Smith A.F.M., and Makov U.E. 1985. Statistical Analysis of Finite Mixture Distributions. Wiley, New York.
Zhuang X., Huang Y., Palaniappan K., and Zhao Y. 1996. Gaussian density mixture modeling, decomposition and applications. IEEE Transactions on Image Processing 5: 1293–1302.
About this article
Cite this article
Peel, D., McLachlan, G.J. Robust mixture modelling using the t distribution. Statistics and Computing 10, 339–348 (2000). https://doi.org/10.1023/A:1008981510081
- finite mixture models
- normal components
- multivariate t components
- maximum likelihood
- EM algorithm
- cluster analysis