Skip to main content
Log in

Robust mixture modelling using the t distribution

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aitchison J. and Dunsmore I.R. 1975. Statistical Predication Analysis. Cambridge University Press, Cambridge.

    Google Scholar 

  • Böhning D. 1999. Computer-Assisted Analysis of Mixtures and Applications: Meta-Analysis, Disease Mapping and Others. Chapman & Hall/CRC, New York.

    Google Scholar 

  • Campbell N.A. 1984. Mixture models and atypical values. Mathematical Geology 16: 465–477.

    Google Scholar 

  • Campbell N.A. and Mahon R.J. 1974. A multivariate study of variation in two species of rock crab of genus Leptograpsus. Australian Journal of Zoology 22: 417–425.

    Google Scholar 

  • Davé R.N. and Krishnapuram R. 1995. Robust clustering methods: A unified view. IEEE Transactions on Fuzzy Systems 5: 270–293.

    Google Scholar 

  • Dempster A.P., LairdN.M., and RubinD.B. 1977. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society B 39: 1–38.

    Google Scholar 

  • De Veaux R.D. and Kreiger A.M. 1990. Robust estimation of a normal mixture. Statistics & Probability Letters 10: 1–7.

    Google Scholar 

  • Everitt B.S. and Hand D.J. 1981. Finite Mixture Distributions. Chapman & Hall, London.

    Google Scholar 

  • Frigui H. and Krishnapuram R. 1996. A robust algorithm for automatic extraction of an unknown number of clusters from noisy data. Pattern Recognition Letters 17: 1223–1232.

    Google Scholar 

  • Gnanadesikan R., Harvey J.W., and Kettenring J.R. 1993. Sankhyā A 55: 494–505.

    Google Scholar 

  • Green P.J. 1984. Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives. Journal of the Royal Statistical Society B 46: 149–192.

    Google Scholar 

  • Hampel F.R. 1973. Robust estimation: A condensed partial survey. Z. Wahrscheinlickeitstheorie verw. Gebiete 27: 87–104.

    Google Scholar 

  • Hawkins D.M. and McLachlan G.J. 1997. High-breakdown linear discriminant analysis. Journal of the American Statistical Association 92: 136–143.

    Google Scholar 

  • Huber P.J. 1964. Robust estimation of a location parameter. Annals of Mmathematical Statistics 35: 73–101.

    Google Scholar 

  • Jolion J.-M., Meer P., and Bataouche S. 1995. Robust clustering with applications in computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 13: 791–802.

    Google Scholar 

  • Kent J.T., Tyler D.E., and Vardi Y. 1994. A curious likelihood identity for the multivariate t-distribution. Communications in Statistics — Simulation and Computation 23: 441–453.

    Google Scholar 

  • Kharin Y. 1996. Robustness in Statistical Pattern Recognition. Kluwer, Dordrecht.

    Google Scholar 

  • Kosinski A. 1999. A procedure for the detection of multivariate outliers. Computational Statistics and Data Analysis 29: 145–161.

    Google Scholar 

  • Kowalski J., Tu X.M., Day R.S., and Mendoza-Blanco J.R. 1997. On the rate of convergence of the ECME algorithm for multiple regression models with t-distributed errors. Biometrika 84: 269–281.

    Google Scholar 

  • Lange K., Little R.J.A., and Taylor J.M.G. 1989. Robust statistical modeling using the t distribution. Journal of the American Statistical Association 84: 881–896.

    Google Scholar 

  • Lindsay B.G. 1995. Mixture Models: Theory, Geometry and Applications, NSF-CBMS Regional Conference Series in Probability and Statistics, Vol. 5. Institute of Mathematical Statistics and the American Statistical Association, Alexandria, VA.

    Google Scholar 

  • Liu C. 1997. ML estimation of the multivariate t distribution and the EM algorithm. Journal of Multivariate Analysis 63: 296–312.

    Google Scholar 

  • Liu C. and Rubin D.B. 1994. The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika 81: 633–648.

    Google Scholar 

  • Liu C. and Rubin D.B. 1995. ML estimation of the t distribution using EM and its extensions, ECM and ECME. Statistica Sinica 5: 19–39.

    Google Scholar 

  • Liu C., Rubin D.B., and Wu Y.N. 1998. Parameter expansion to accelerate EM: The PX-EM Algorithm. Biometrika 85: 755–770.

    Google Scholar 

  • Markatou M. 1998. Mixture models, robustness and the weighted likelihood methodology. Technical Report No. 1998-9. Department of Statistics, Stanford University, Stanford.

    Google Scholar 

  • Markatou M., Basu A., and Lindsay B.G. 1998. Weighted likelihood equations with bootstrap root search. Journal of the American Statistical Association 93: 740–750.

    Google Scholar 

  • McLachlan G.J. and Basford K.E. 1988. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York.

    Google Scholar 

  • McLachlan G.J. and Peel D. 1998. Robust cluster analysis via mixtures of multivariate t-distributions. In: Amin A., Dori D., Pudil P., and Freeman H. (Eds.), Lecture Notes in Computer Science Vol. 1451. Springer-Verlag, Berlin, pp. 658–666.

    Google Scholar 

  • McLachlan G.J., Peel D., Basford K.E., and Adams P. 1999. Fitting of mixtures of normal and t-components. Journal of Statistical Software 4(2). (http://www.stat.ucla.edu/journals/jss/).

  • Meng X.L. and van Dyk D. 1995. The EM algorithm — an old folk song sung to a fast new tune (with discussion).

  • Rocke D.M. and Woodruff D.L. 1997. Robust estimation of multivariate location and shape. Journal of Statistical Planning and Inference 57: 245–255.

    Google Scholar 

  • Rousseeuw P.J., Kaufman L., and Trauwaert E. 1996. Fuzzy clustering using scatter matrices. Computational Statistics and Data Analysis 23: 135–151.

    Google Scholar 

  • Rubin D.B. 1983. Iteratively reweighted least squares. In: Kotz S., Johnson N.L., and Read C.B. (Eds.), Encyclopedia of Statistical Sciences Vol. 4. Wiley, New York, pp. 272–275.

    Google Scholar 

  • Schroeter P., Vesin J.-M., Langenberger T., and Meuli R. 1998. Robust parameter estimation of intensity distributions for brain magnetic resonance images. IEEE Transactions on Medical Imaging 17: 172–186.

    Google Scholar 

  • Smith D.J., Bailey T.C., and Munford G. 1993. Robust classification of high-dimensional data using artificial neural networks. Statistics and Computing 3: 71–81.

    Google Scholar 

  • Sutradhar B. and Ali M.M. 1986. Estimation of parameters of a regression model with a multivariate t error variable. Communications in Statistic — Theory and Methods 15: 429–450.

    Google Scholar 

  • Titterington D.M., Smith A.F.M., and Makov U.E. 1985. Statistical Analysis of Finite Mixture Distributions. Wiley, New York.

    Google Scholar 

  • Zhuang X., Huang Y., Palaniappan K., and Zhao Y. 1996. Gaussian density mixture modeling, decomposition and applications. IEEE Transactions on Image Processing 5: 1293–1302.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peel, D., McLachlan, G.J. Robust mixture modelling using the t distribution. Statistics and Computing 10, 339–348 (2000). https://doi.org/10.1023/A:1008981510081

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008981510081

Navigation