Abstract
The regression modeling techniques considered in the previous chapters are based on the assumption that all relevant covariates are known and included in the analysis set. In practice, however, typically only a limited number of potentially influential variables are known, and it often happens that part of the heterogeneity in the population remains unobserved. In survival modeling, this “unobserved heterogeneity,” when ignored, may cause severe artifacts. This chapter presents various approaches to account for unobserved heterogeneity in discrete time-to-event models. We first consider the discrete hazard frailty model, which incorporates random intercept terms to account for subject-specific variations caused by unobserved covariate information (Sects. 9.1 and 9.2). In Sect. 9.3 discrete hazard frailty models are extended to the case where covariate effects are allowed to be smooth and nonlinear. The model class that is considered is the discrete additive hazard frailty model. Because model misspecification is a critical issue in random-effects models, Sect. 9.4 presents data-driven strategies for variable selection in discrete hazard frailty models. Alternative approaches to incorporate unobserved heterogeneity in discrete time-to-event models are presented in Sects. 9.5 and 9.6, which deal with penalized fixed-effects and finite mixture modeling, respectively. Finally, the connection between discrete hazard frailty models and sequential models in item response theory is investigated (Sect. 9.7).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aalen, O. O. (1988). Heterogeneity in survival analysis. Statistics in Medicine, 7, 1121–1137.
Abbring, J. H., & Van den Berg, G. J. (2007). The unobserved heterogeneity distribution in duration analysis. Biometrika, 94, 87–99.
Agresti, A. (2009). Analysis of ordinal categorical data (2nd ed.). New York: Wiley.
Agresti, A., Caffo, B., & Ohman-Strickland, P. (2004). Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Computational Statistics & Data Analysis, 47, 639–653.
Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics, 55, 117–128.
Almansa, J., Vermunt, J. K., Forero, C. G., & Alonso, J. (2014). A factor mixture model for multivariate survival data: An application to the analysis of lifetime mental disorders. Journal of the Royal Statistical Society, Series C, 63, 85–102.
Anderson, D. A., & Aitkin, M. (1985). Variance component models with binary response: Interviewer variability. Journal of the Royal Statistical Society, Series B, 47, 203–210.
Baker, M., & Melino, A. (2000). Duration dependence and nonparametric heterogeneity: A monte carlo study. Journal of Econometrics, 96, 357–393.
Bondell, H. D., & Reich, B. J. (2009). Simultaneous factor selection and collapsing levels in anova. Biometrics, 65, 169–177.
Breslow, N. E., & Clayton, D. G. (1993). Approximate inference in generalized linear mixed model. Journal of the American Statistical Association, 88, 9–25.
Breslow, N. E., & Lin, X. (1995). Bias correction in generalized linear mixed models with a single component of dispersion. Biometrika, 82, 81–91.
Broström, G. (2013). glmmML: Generalized linear models with clustering. R package version 1.0. http://cran.r-project.org/web/packages/glmmML
Culpepper, S. A. (2014). If at first you don’t succeed, try, try again – applications of sequential IRT models to cognitive assessments. Applied Psychological Measurement, 38, 632–644.
De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., et al. (2011). The estimation of item response models with the lmer function from the lme4 package in R. Journal of Statistical Software, 39(12), 1–28.
De Boeck, P., & Wilson, M. (2004). A framework for item response models. New York: Springer.
Diggle, P. J., Heagerty, P., Liang, K.-Y., & Zeger, S. L. (2002). Analysis of longitudinal data (2nd ed.). New York: Oxford University Press.
Elbers, C., & Ridder, G. (1982). True and spurious duration dependence: The identifiability of the proportional hazard model. The Review of Economic Studies, 49, 403–409.
Follmann, D., & Lambert, D. (1989). Generalizing logistic regression by non-parametric mixing. Journal of the American Statistical Association, 84, 295–300.
Frederiksen, A., Honoré, B. E., & Hu, L. (2007). Discrete time duration models with group-level heterogeneity. Journal of Econometrics, 141, 1014–1043.
Frühwirth-Schnatter, S. (2006). Finite mixture and Markov switching models. New York: Springer.
Gertheiss, J., & Tutz, G. (2010). Sparse modeling of categorial explanatory variables. Annals of Applied Statistics, 4, 2150–2180.
Groll, A. (2015). glmmLasso: Variable selection for generalized linear mixed models by L1-penalized estimation. R package version 1.3.6. http://cran.r-project.org/web/packages/glmmLasso
Groll, A., & Tutz, G. (2014). Variable selection for generalized linear mixed models by L 1-penalized estimation. Statistics and Computing, 24, 137–154.
Groll, A., & Tutz, G. (2016). Variable selection in discrete survival models including heterogeneity. Lifetime Data Analysis [published online].
Grün, B., & Leisch, F. (2008). FlexMix version 2: Finite mixtures with concomitant variables and varying and constant parameters. Journal of Statistical Software, 28(4), 1–35.
Ham, J. C., & Rea, S. A., Jr. (1987). Unemployment insurance and male unemployment duration in Canada. Journal of Labor Economics, 5, 325–353.
Hartzel, J., Liu, I., & Agresti, A. (2001). Describing heterogenous effects in stratified ordinal contingency tables, with applications to multi-center clinical trials. Computational Statistics & Data Analysis, 35, 429–449.
Heagerty, P. J., & Kurland, B. F. (2001). Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika, 88, 973–984.
Heckman, J. J., & Singer, B. (1984a). Econometric duration analysis. Journal of Econometrics, 24, 63–132.
Heckman, J. J., & Singer, B. (1984b). A method for minimizing the impact of distributional assumptions in econometric models of duration. Econometrica, 52, 271–320.
Hedeker, D., Siddiqui, O., & Hu, F. B. (2000). Random-effects regression analysis of correlated grouped-time survival data. Statistical Methods in Medical Research, 9, 161–179.
Hinde, J. (1982). Compound Poisson regression models. In R. Gilchrist (Ed.), GLIM 1982 International Conference on Generalized Linear Models (pp. 109–121). New York: Springer.
Hougaard, P. (1984). Life table methods for heterogeneous populations: Distributions describing the heterogeneity. Biometrika, 71, 75–83.
Kim, Y.-J., & Jhun, M. (2008). Cure rate model with interval censored data. Statistics in Medicine, 27, 3–14.
Kuk, A. Y., & Chen, C.-H. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika, 79, 531–541.
Lancaster, T. (1985). Generalised residuals and heterogeneous duration models: With applications to the Weibull model. Journal of Econometrics, 28, 155–169.
Lancaster, T. (1992). The econometric analysis of transition data. Cambridge: Cambridge University Press.
Land, K. C., Nagin, D. S., & McCall, P. L. (2001). Discrete-time hazard regression models with hidden heterogeneity: The semiparametric mixed Poisson regression approach. Sociological Methods & Research, 29, 342–373.
Li, C.-S., Taylor, J. M., & Sy, J. P. (2001). Identifiability of cure models. Statistics & Probability Letters, 54, 389–395.
Lin, X., & Breslow, N. E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. Journal of the American Statistical Association, 91, 1007–1016.
Lin, X., & Zhang, D. (1999). Inference in generalized additive mixed models by using smoothing splines. Journal of the Royal Statistical Society, Series B, 61, 381–400.
Liu, Q., & Pierce, D. A. (1994). A note on Gauss-Hermite quadrature. Biometrika, 81, 624–629.
Maller, R. A., & Zhou, X. (1996). Survival analysis with long-term survivors. New York: Wiley.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statistical Association, 92, 162–170.
McCulloch, C. E., & Neuhaus, J. M. (2011). Misspecifying the shape of a random effects distribution: Why getting it wrong may not matter. Statistical Science, 26, 388–402.
McCulloch, C. E., & Searle, S. (2001). Generalized, linear, and mixed models. New York: Wiley.
McDonald, J. W., & Rosina, A. (2001). Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. Statistical Methods and Applications, 10, 257–272.
McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Muthén, B., & Masyn, K. (2005). Discrete-time survival mixture analysis. Journal of Educational and Behavioral Statistics, 30, 27–58.
Neuhaus, J. M., & McCulloch, C. E. (2006). Separating between- and within-cluster covariate effects by using conditional and partitioning methods. Journal of the Royal Statistical Society, Series B, 68, 859–872.
Nicoletti, C., & Rondinelli, C. (2010). The (mis)specification of discrete duration models with unobserved heterogeneity: A Monte Carlo study. Journal of Econometrics, 159, 1–13.
Pinheiro, J. C., & Bates, D. M. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics, 4, 12–35.
Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In J. Neyman (Ed.), Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press.
Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8, 185–205.
Ruppert, D., Wand, M. P., & Carroll, R. J. (2003). Semiparametric regression. Cambridge: Cambridge University Press.
Schall, R. (1991). Estimation in generalised linear models with random effects. Biometrika, 78, 719–727.
Scheike, T., & Jensen, T. (1997). A discrete survival model with random effects: An application to time to pregnancy. Biometrics, 53, 318–329.
Sy, J. P., & Taylor, J. M. (2000). Estimation in a Cox proportional hazards cure model. Biometrics, 56, 227–236.
Tutz, G. (1990). Sequential item response models with an ordered response. British Journal of Statistical and Mathematical Psychology, 43, 39–55.
Tutz, G. (2012). Regression for categorical data. Cambridge: Cambridge University Press.
Tutz, G. (2015). Sequential models for ordered responses. In W. van der Linden & R. Hambleton (Eds.), Handbook of modern item response theory. New York: Springer.
Tutz, G., & Oelker, M. (2015). Modeling clustered heterogeneity: Fixed effects, random effects and mixtures. International Statistical Review (to appear).
Van den Berg, G. J. (2001). Duration models: Specification, identification and multiple durations. In J. J. Heckman & E. Leamer (Eds.), Handbook of econometrics (Vol. V, pp. 3381–3460). Amsterdam: North Holland.
van der Linden, W., & Hambleton, R. K. (1997). Handbook of modern item response theory. New York: Springer.
Vaupel, J. W., Manton, K. G., & Stallard, E. (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography, 16, 439–454.
Vaupel, J. W., & Yashin, A. I. (1985). Heterogeneity’s ruses: Some surprising effects of selection on population dynamics. The American Statistician, 39, 176–185.
Verhelst, N. D., Glas, C., & De Vries, H. (1997). A steps model to analyze partial credit. In W. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 123–138). New York: Springer.
Vermunt, J. K. (1996). Log-linear event history analysis: A general approach with missing data, latent variables, and unobserved heterogeneity. Tilburg: Tilburg University Press.
Wolfinger, R. W. (1994). Laplace’s approximation for nonlinear mixed models. Biometrika, 80, 791–795.
Wood, S. N. (2006). Generalized additive models: An introduction with R. London: Chapman & Hall/CRC.
Xue, X., & Brookmeyer, R. (1997). Regression analysis of discrete time survival data under heterogeneity. Statistics in Medicine, 16, 1983–1993.
Yu, B., Tiwari, R. C., Cronin, K. A., & Feuer, E. J. (2004). Cure fraction estimation from the mixture cure models for grouped survival data. Statistics in Medicine, 23, 1733–1747.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Tutz, G., Schmid, M. (2016). Frailty Models and Heterogeneity. In: Modeling Discrete Time-to-Event Data. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-28158-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-28158-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28156-8
Online ISBN: 978-3-319-28158-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)