Advertisement

Smooth Tests of Fit for Gaussian Mixtures

Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

Model based clustering and classification are often based on a finite mixture distribution. The most popular choice for the mixture component distribution is the Gaussian distribution (Fraley and Raftery, J Stat Softw 18(6):1–13, 2007). Many tests, for example those based on goodness of fit measures, focus on detecting the order of the mixture. However what is often neglected are diagnostic tests to confirm the distributional assumptions. This may lead to the cluster analysis having invalid conclusions.

Smooth tests (Rayner et al., Smooth tests of goodness of fit: using R, 2nd edn. Wiley, Singapore, 2009) can be used to test the distributional assumptions against the so-called general smooth alternatives in the sense of Neyman (Skandinavisk Aktuarietidskr 20:150–99, 1937). To test for a mixture distribution we present smooth tests that have the additional advantage that they permit the testing of sub-hypotheses using components. These test statistics are asymptotically chi-squared distributed. Results of the simulation study show that bootstrapping needs to be applied for small to medium sample sizes to maintain the P(type I error) at the nominal level and that the proposed tests have high power against various alternatives. Lastly the tests are illustrated on a data set on the average amount of precipitation in inches for each of 70 United States and Puerto Rico cities (Mcneil, Interactive data analysis. Wiley, New York, 1977).

References

  1. Chen, J., & Li, P. (2009). Hypothesis test for normal mixture models: The EM approach. The Annals of Statistics, 37, 2523–2542.CrossRefMATHMathSciNetGoogle Scholar
  2. Fraley, C., & Raftery, A. E. (2007). Model-based methods of classification: Using the mclust software in chemometrics. Journal of Statistical Software, 18(6), 1–13. http://www.jstatsoft.org/.
  3. Li, P., & Chen, J. (2010). Testing the order of a finite mixture. Journal of the American Statistical Association, 105(491), 1084–1092.CrossRefMathSciNetGoogle Scholar
  4. Li, P., Chen, J., & Marriott, P. (2009). Non-finite fisher information and homogeneity: An EM approach. Biometrika, 96(2), 411–426.CrossRefMATHMathSciNetGoogle Scholar
  5. Lo, Y., Mendell, N. R., & Rubin, D. B. (2001). Testing the number of components in a normal mixture. Biometrika, 88(3), 767–778.CrossRefMathSciNetGoogle Scholar
  6. Mcneil, D. R. (1977). Interactive data analysis. New York: Wiley.Google Scholar
  7. Neyman, J. (1937). Smooth test for goodness of fit. Skandinavisk Aktuarietidskr, 20, 150–99.Google Scholar
  8. Rayner, J. C. W., Thas, O., & Best, D. J. (2009). Smooth tests of goodness of fit: Using R (2nd ed.). Singapore: Wiley.CrossRefGoogle Scholar
  9. Thas, O. (2010). Comparing distributions. New York: Springer.MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.National Institute for Applied Statistics Research AustraliaUniversity of WollongongWollongongAustralia
  2. 2.School of Mathematical and Physical SciencesUniversity of NewcastleCallaghanAustralia
  3. 3.Department of Applied Mathematics, Biometrics and Process ControlGhent UniversityGentBelgium

Personalised recommendations