Semiparametric Regression Analysis of Grouped Data

  • Jaroslaw Harezlak
  • David Ruppert
  • Matt P. Wand
Part of the Use R! book series (USE R)


Grouped data arise in several diverse contexts in statistical design and analysis. Examples include medical studies in which patients are followed over time and measurements on them recorded repeatedly, educational studies in which students grouped into classrooms and schools are scored on examinations, and sample surveys in which the respondents to questionnaires are grouped within geographical districts.


  1. Al Kadiri, M., Carroll, R.J. and Wand, M.P. (2010). Marginal longitudinal semiparametric regression via penalized splines. Statistics and Probability Letters, 80, 1242–1252.MathSciNetCrossRefGoogle Scholar
  2. Bachrach, L.K., Hastie, T., Wang, M.-C., Narasimhan, B. and Marcus, R. (1999). Bone mineral acquisition in healthy Asian, Hispanic, Black and Caucasian youth. A longitudinal study. Journal of Clinical Endocrinology and Metabolism, 84, 4702–12.Google Scholar
  3. Baltagi, B.H. (2013). Econometric Analysis of Panel Data, Fifth Edition. Chichester, U.K.: John Wiley & Sons.Google Scholar
  4. Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9–25.zbMATHGoogle Scholar
  5. Brumback, B.A. and Rice, J.A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves (with discussion). Journal of the American Statistical Association, 93, 961–994.MathSciNetCrossRefGoogle Scholar
  6. Coull, B.A., Ruppert, D. and Wand, M.P. (2001). Simple incorporation of interactions into additive models. Biometrics, 57, 539–545.MathSciNetCrossRefGoogle Scholar
  7. Creswell, M. (1991). A multilevel bivariate model. In Data Analysis with ML3, editors Prosser, R., Rasbash, J. and Goldstein, H., London: Institute of Education, London, pp. 76–108.Google Scholar
  8. Diggle, P., Heagerty, P., Liang, K.-L. and Zeger, S. (2002). Analysis of Longitudinal Data, Second Edition. Oxford, U.K.: Oxford University Press.Google Scholar
  9. Donnelly, C.A., Laird, N.M. and Ware, J.H. (1995). Prediction and creation of smooth curves for temporally correlated longitudinal data. Journal of the American Statistical Association, 90, 984–989.CrossRefGoogle Scholar
  10. Durbán, M., Harezlak, J., Wand, M.P. and Carroll, R.J. (2005). Simple fitting of subject-specific curves for longitudinal data. Statistics in Medicine, 24, 1153–1167.MathSciNetCrossRefGoogle Scholar
  11. Fahrmeir, L. and Kneib, T. (2011). Bayesian Smoothing and Regression for Longitudinal, Spatial, and Event History Data. Oxford, U.K.: Oxford University Press.CrossRefGoogle Scholar
  12. Fitzmaurice, G., Davidian, M., Verbeke, G. and Molenberghs, G. (Editors) (2008). Longitudinal Data Analysis. Boca Raton, Florida: Chapman & Hall/CRC.Google Scholar
  13. Fitzmaurice, G.M., Laird, N.M. and Ware, J.H. (2004). Applied Longitudinal Analysis. Hoboken, New Jersey: John Wiley & Sons.zbMATHGoogle Scholar
  14. Frees, E.W. (2004). Longitudinal and Panel Data: Analysis and Applications in the Social Sciences. Cambridge, U.K.: Cambridge University Press.CrossRefGoogle Scholar
  15. Gelman, A. and Hill, J. (2007). Data Analysis using Regression and Multilevel/Hierarchical Models. New York: Cambridge University Press.Google Scholar
  16. Goldstein, H. (2010). Multilevel Statistical Models, Fourth Edition. Chichester, U.K.: John Wiley & Sons.CrossRefGoogle Scholar
  17. Guo, W. (2002). Functional mixed effects models. Biometrics, 58, 121–128.MathSciNetCrossRefGoogle Scholar
  18. Guo, J., Gabry, J. and Goodrich, B. (2017). rstan: R interface to Stan. R package version 2.17.2.
  19. Huang, A. and Wand, M.P. (2013). Simple marginally noninformative prior distributions for covariance matrices. Bayesian Analysis, 8, 439–452.MathSciNetCrossRefGoogle Scholar
  20. Huq, N.M. and Cleland, J. (1990). Bangladesh Fertility Survey 1989 (Main Report). Dhaka, Bangladesh: National Institute of Population Research and Training.Google Scholar
  21. Kipnis, V., Subar, A.F., Midthune, D., Freedman, L.S., Ballard-Barbash, R., Troiano, R., Bingham, S., Schoeller, D.A., Schatzkin, A. and Carroll, R.J. (2003). The structure of dietary measurement error: results of the OPEN biomarker study. American Journal of Epidemiology, 158, 14–21.CrossRefGoogle Scholar
  22. Longford, N.T. (2005). Missing Data and Small-Area Estimation. New York: Springer.zbMATHGoogle Scholar
  23. McCulloch, C.E., Searle, S.R. and Neuhaus, J.M. (2008). Generalized, Linear, and Mixed Models, Second Edition. New York: John Wiley & Sons.zbMATHGoogle Scholar
  24. Pinheiro, J.C. and Bates, D.M. (2000). Mixed-Effects Models in S and S-PLUS. New York: Springer-Verlag.CrossRefGoogle Scholar
  25. Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., EISPACK authors and R Core Team. (2017). nlme: Linear and nonlinear mixed effects models. R package version 3.1.
  26. Pope, C.A., Dockery, D.W., Spengler, J.D. and Raizenne, M.E. (1991). Respiratory health and PM10 pollution: a daily time series analysis. American Review of Respiratory Disease, 144, 668–674.CrossRefGoogle Scholar
  27. Pratt, J.H., Jones, J.J., Miller, J.Z., Wagner, M.A. and Fineberg, N.S. (1989). Racial differences in aldosterone excretion and plasma aldosterone concentrations in children. New England Journal of Medicine, 321, 1152–1157.CrossRefGoogle Scholar
  28. Rao, J.N.K. and Molina, I. (2015). Small Area Estimation, Second Edition. Hoboken, New Jersey: John Wiley & Sons.CrossRefGoogle Scholar
  29. Ripley, B., Venables, B., Bates, D.M., Hornik, K., Gebhardt, A. and Firth, D. (2015). MASS: Support functions and datasets Venables and Ripley’s ‘Modern Applied Statistics with S’. R package version 7.3.
  30. Ruppert, D., Wand, M.P. and Carroll, R.J. (2003). Semiparametric Regression. Cambridge, U.K.: Cambridge University Press.CrossRefGoogle Scholar
  31. Ruppert, D., Wand, M.P. and Carroll, R.J. (2009). Semiparametric regression during 2003–2007. Electronic Journal of Statistics, 3, 1193–1256MathSciNetCrossRefGoogle Scholar
  32. Sarkar, D. (2008). Lattice. New York: Springer.CrossRefGoogle Scholar
  33. Sarkar, D. (2017). lattice: Trellis graphics for R. R package version 0.20.
  34. Sommer, A. (1982). Nutritional Blindness. New York: Oxford University Press.Google Scholar
  35. Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. New York: Springer-Verlag.zbMATHGoogle Scholar
  36. Verbyla, A.P., Cullis, B.R., Kenward, M.G. and Welham, S.J. (1999). The analysis of designed experiments and longitudinal data by using smoothing splines (with discussion). Journal of the Royal Statistical Society, Series C, 48, 269–312.CrossRefGoogle Scholar
  37. Wang, Y. (1998). Mixed effects smoothing spline analysis of variance. Journal of the Royal Statistical Society, Series B, 60, 159–174.MathSciNetCrossRefGoogle Scholar
  38. Wickham, H. and Chang, W. (2016). ggplot2: Create elegant data visualisations using the grammar of graphics. R package version 2.2.1
  39. Wood, S.N. (2006a). Generalized Additive Models. Boca Raton, Florida: Chapman & Hall/CRC.CrossRefGoogle Scholar
  40. Wood, S.N. (2017). mgcv: Mixed GAM computation vehicle with GCV/AIC/REML smoothness estimation. R package version 1.8.
  41. Wu, H. and Zhang, J.-T. (2006). Nonparametric Regression Methods for Longitudinal Data Analysis. Hoboken, New Jersey: John Wiley & Sons.zbMATHGoogle Scholar
  42. Zeger, S. and Diggle, P.J. (1994). Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics, 50, 689–699.CrossRefGoogle Scholar
  43. Zeger, S., Liang, K.–Y. and Albert, P.S. (1988). Models for longitudinal data: a generalized estimating equation approach. Biometrics, 44, 1049–1060.MathSciNetCrossRefGoogle Scholar
  44. Zhang, D., Lin, X., Raz, J. and Sowers, M. (1998). Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association, 93, 710–719.MathSciNetCrossRefGoogle Scholar
  45. Zhao, Y., Staudenmayer, J., Coull, B.A. and Wand, M.P. (2006). General design Bayesian generalized linear mixed models. Statistical Science, 21, 35–51.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Jaroslaw Harezlak
    • 1
  • David Ruppert
    • 2
  • Matt P. Wand
    • 3
  1. 1.School of Public HealthIndiana University BloomingtonBloomingtonUSA
  2. 2.Department of Statistical ScienceCornell UniversityIthacaUSA
  3. 3.School of Mathematical and Physical SciencesUniversity of Technology SydneyUltimoAustralia

Personalised recommendations