, Volume 69, Issue 3, pp 481–498 | Cite as

Model based clustering of large data sets: Tracing the development of spelling ability

Application Reviews And Case Studies


There are two main theories with respect to the development of spelling ability: the stage model and the model of overlapping waves. In this paper exploratory model based clustering will be used to analyze the responses of more than 3500 pupils to subsets of 245 items. To evaluate the two theories, the resulting clusters will be ordered along a developmental dimension using an external criterion. Solutions for three statistical problems will be given: (1) an algorithm that can handle large data sets and only renders non-degenerate clusters; (2) a goodness of fit test that is not affected by the fact that the number of possible response vectors by far out-weights the number of observed response vectors; and (3) a new technique,data expunction, that can be used to evaluate goodness-of-fit tests if the missing data mechanism is known.

Key words

Bayesian computational statistics data expunction developmental stages latent class analysis model based clustering spelling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agresti, A. (1990).Categorical data analysis. New York: John Wiley.Google Scholar
  2. Akaike, H. (1987). Factor analysis and AIC.Psychometrika, 52, 317–332.CrossRefGoogle Scholar
  3. Berger, J., & Pericchi, L. (2001). Objective Bayesian methods for model selection: Introduction and comparison [with discussion]. In: P. Lahiri (Ed.),Model selection, Lecture Notes Monograph Series Volume 38 (pp. 135–207). Beachwood, OH: Institute of Mathematical Statistics.Google Scholar
  4. Bear, D.R. & Templeton, S. (1998). Explorations in developmental spelling: Foundations for learning and teaching phonics, spelling and vocabulary.The Reading Teacher, 52, 222–242.Google Scholar
  5. Bowman, M., & Treiman, R. (2002). Relating print and speech: The effects of letter names and word position on reading and spelling performance.Journal of Experimental Child Psychology, 82, 305–340.CrossRefPubMedGoogle Scholar
  6. Bozdogan, H. (1987). Model selection and Akaike's information criterion (AIC): The general theory and its analytical extensions.Psychometrika, 52, 345–370.CrossRefGoogle Scholar
  7. Congdon, P. (2001).Bayesian statistical modelling. New York: John Wiley.Google Scholar
  8. Ehri, L. (1986). Sources of difficulty in learning to spell and read. In: M.L. Wolraich and D. Routh (Eds.),Advances in developmental and behavioral pediatrics, Vol. 7 (pp. 121–195). Greenwich, CT: JAI Press.Google Scholar
  9. Everitt, B.S. (1988). A Monte Carlo investigation of the likelihood ratio test for number of classes in latent class analysis.Multivariate Behavioral Research, 23, 531–538.CrossRefGoogle Scholar
  10. Frith, U. (1980). Unexpected spelling problems. In: U. Frith (Ed.),Cognitive processes in spelling (pp. 495–515). London: Academic Press.Google Scholar
  11. Frith, U. (1985). Beneath the surface of developmental dyslexia. In: K.E. Patterson, J.C. Marshall, and M. Coltheart (Eds.),Surface dyslexia (pp. 301–326). London: Routledge and Kegan-Paul.Google Scholar
  12. Geelhoed, J., & Reitsma, P. (1999).PI-dictee. Lisse: Swets and Zeitlinger.Google Scholar
  13. Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (2000).Bayesian data analysis. London: Chapman and Hall.Google Scholar
  14. Gentry, J.R. (1982). An analysis of developmental spelling in GNYS AT WRK.The Reading Teacher, 36, 192–200.Google Scholar
  15. Henderson, E.H., & Templeton, S. (1986). A developmental perspective of formal spelling instruction through alphabet, pattern and meaning.The Elementary School Journal, 86, 305–316.CrossRefGoogle Scholar
  16. Hoijtink, H. (1998). Constrained latent class analysis using the Gibbs sampler and posterior predictivep-values: Applications to educational testing.Statistica Sinica, 8, 691–711.Google Scholar
  17. Hoijtink, H. (2001). Confirmatory latent class analysis: Model selection using Bayes factors and (Pseudo) likelihood ratio statistics.Multivariate Behavioral Research, 36, 563–588.CrossRefGoogle Scholar
  18. Hoskens, M., & de Boeck, P. (1995). Componential IRT models for polytomous items.Journal of Educational Measurement, 32, 364–384.CrossRefGoogle Scholar
  19. Hoskens, M., & de Boeck, P. (1997). A parametric model for local dependence among test items.Psychological Methods, 2, 261–277.CrossRefGoogle Scholar
  20. Jefferys, W., & Berger, J. (1992). Ockham's razor and Bayesian analysis.American Scientist, 80, 64–72.Google Scholar
  21. Kass, R.E., & Raftery, A.E. (1995). Bayes factors.Journal of the American Statistical Association, 90, 773–795.Google Scholar
  22. Lin, T.H., & Dayton, C.M. (1997). Model selection information criteria for non-nested latent class models.Journal of Educational and Behavioral Statistics, 22, 249–264.Google Scholar
  23. Meng, X.L. (1994). Posterior predictivep-values.The Annals of Statistics, 22, 1142–1160.Google Scholar
  24. Morris, D., Nelson, L., & Perney, J. (1986). Exploring the concept of “spelling instructional level” through the analysis of error-types.The Elementary School Journal, 87, 181–200.CrossRefGoogle Scholar
  25. Newton, M.A., & Raftery, A.E. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap.Journal of the Royal Statistical Society, B, 56, 3–48.Google Scholar
  26. Notenboom, A., Hoijtink, H., & Reitsma, P. (2004). Modeling the development of Dutch spelling ability by Latent Class Analysis. Manuscript submitted for publication.Google Scholar
  27. Richardson, S., & Green, P.J. (1997). On Bayesian analysis of mixtures with an unknown number of components.Journal of the Royal Statistical Society, B, 59, 731–792.CrossRefGoogle Scholar
  28. Rittle-Johnson, B., & Siegler, R.S. (1999). Learning to spell: Variability, choice and change in children's strategy use.Child Development, 70, 332–348.CrossRefPubMedGoogle Scholar
  29. Rubin, D.B. (1987).Multiple imputation for nonresponse in surveys. New York: John Wiley.Google Scholar
  30. Schafer, J.L. (1997).Analysis of incomplete multivariate data. London: Chapman and Hall.Google Scholar
  31. Schafer, J.L., & Graham, J.W. (2002). Missing data: Our view of the state of the art.Psychological Methods, 7, 147–177.CrossRefPubMedGoogle Scholar
  32. Siegler, R.S. (1995). How change does occur: A microgenetic study of number conservation.Cognitive Psychology, 28, 225–273.CrossRefPubMedGoogle Scholar
  33. Siegler, R.S. (1996).Emerging minds: The process of change in children's thinking. New York: Oxford University Press.Google Scholar
  34. Siegler, R.S. (2000). The rebirth of children's learning.Child Development, 71, 26–35.CrossRefPubMedGoogle Scholar
  35. Siegler, R.S., & Chen, Z. (1998). Developmental differences in rule learning: A microgenetic analysis.Cognitive Psychology, 36, 273–310.CrossRefPubMedGoogle Scholar
  36. Siegler, R.S., & Stern, E. (1998). Conscious and unconscious strategy discoveries: a microgenetic analysis.Journal of Experimental Psychology: General, 127, 377–397.CrossRefGoogle Scholar
  37. Smith, A.F.M., & Spiegelhalter, D.J. (1980). Bayes factors and choice criteria for linear models.Journal of the Royal Statistical Society, Series B, 42, 213–220.Google Scholar
  38. Steffler, D.J., Varnhagen, C.K., Treiman, R., & Friesen, C.K. (1998). There's more to children's spelling than the errors they make: Strategic and automatic processes for one-syllable words.Journal of Educational Psychology, 90, 492–505.CrossRefGoogle Scholar
  39. Stevens, M. (2000). Dealing with label switching in mixture models.Journal of the Royal Statistical Society, Series B, 62, 795–810.Google Scholar
  40. Treiman, R., & Bourassa, D.C. (2000). The development of spelling skill.Topics in Language Disorders, 20, 1–18.Google Scholar
  41. Varnhagen, C.K., McCallum, M., & Burstow, M. (1997). Is children's spelling naturally stage-like?Reading and Writing: An Interdisciplinary Journal, 9, 451–481.CrossRefGoogle Scholar
  42. Vermunt, J.K., & Magidson J. (2000).Latent Gold. Belmont: Statistical Innovations Inc.Google Scholar
  43. Zeger, S.L., & Karim, M.R. (1991). Generalized linear models with random effects: A Gibbs sampling approach.Journal of the American Statistical Association, 86, 79–86.Google Scholar

Copyright information

© The Psychometric Society 2004

Authors and Affiliations

  1. 1.Department of Methodology and StatisticsUniversity of UtrechtTC UtrechtNetherlands
  2. 2.Free University AmsterdamThe Netherlands

Personalised recommendations