Estimation strategy of multilevel model for ordinal longitudinal data
Abstract
This paper considers the shrinkage estimation of multilevel models that are appropriate for ordinal longitudinal data. These models can accommodate multiple random effects and, additionally, allow for a general form of model covariates that are related to the overall level of the responses and changes to the response over time. The likelihood inference for multilevel models is computationally burdensome due to intractable integrals. A maximum marginal likelihood (MML) method with Fisher’s scoring procedure is therefore followed to estimate the random and fixed effects parameters. In real life data, researchers may have collected many covariates for the response. Some of these covariates may satisfy certain constraints which can be used to produce a restricted estimate from the unrestricted likelihood function. The unrestricted and restricted MMLs can then be combined optimally to form the pretest and shrinkage estimators. Asymptotic properties of these estimators including biases and risks will be discussed. A simulation study is conducted to assess the performance of the estimators with respect to the unrestricted MML estimator. Finally, the relevance of the proposed estimators will be illustrated with a real data set.
Keywords
Longitudinal data Maximum marginal likelihood Multilevel Pretest Ordinal regression Shrinkage1 Introduction
Longitudinal data with ordinal responses occur quite frequently in health and social sciences. To diagnose patients, it is a common practice to assign them into ordered categories corresponding to various degrees of a medical condition or classify them based on the risk of developing a diseaserelated response. Many longitudinal studies of children are interested in maturation from childhood to adulthood. Naturally, these studies involve a monotonically increasing maturation process, and the sexual maturation is often measured with an ordinal outcome (Albert et al. 1997). In social studies. the response could be the level of agreement to a particular question such as, strongly agree, agree, no opinion, disagree, or strongly disagree. It is important to take advantages of using multilevel models that are appropriate for ordinal longitudinal data, rather than simply analyzing ordinal responses as measurements using the available software. Observations in these studies are made in hierarchical structures with repeated observations over time (level1) nested within subjects (level2). Level1 measurements are generally not independent, and should be accounted for in a model even if these dependencies are not of primary interest. One analytic technique that accounts for withinsubject dependence uses marginal modeling where the regression coefficients are interpreted in terms of the population and not the individual subject basis. This approach has been studied more extensively than the alternative approach of conditional models in which the population averaged effects are not directly specified and the effect of a covariate on the response is conditional on the random effects (Lee and Daniels 2008). In this paper, we consider a conditional model approach for ordinal longitudinal response. Lee and Nelder (2004) argued that the conditional model is fundamental and benefits from being able to make marginal predictions, as well as conditional predictions.
Numerous studies have suggested modeling ordinal responses by using conditional models. Hedeker and Gibbons (1994) proposed a random effects model for analyzing ordinal responses in longitudinal studies that use probit and logit link functions, using multilevel terminology. A maximum marginal likelihood solution is described in this paper using Gauss–Hermite multidimensional quadrature to numerically integrate over the distribution of random effects. An important issue is the number of necessary quadrature points to make sure of the accurate estimation of the model parameters. We used the Newton–Raphson method with the Fisher scoring algorithm for an iterative solution to the likelihood equations. Hedeker and Gibbons (2006) provided a comparable discussion from the multilevel regression perspective that accommodates multiple random effects for analyzing longitudinal ordinal responses where they included variables to explain inter and intraindividual variations. Snijders and Bosker (2012) discussed multilevel models in a comprehensive way for data that are organized in a nesting structure. These multilevel models are defined by a set of regression models by which the variation presents itself in different levels, namely withinsubjects and betweensubjects variability is explicitly modeled. Albert et al. (1997) proposed the methodology for analyzing monotonically increasing ordinal data with an application to a sexual maturation data from the National Heart, Lung, and Blood Institute Growth and Health Study (NGHS). In particular, they developed an EM algorithm for maximum likelihood estimation that incorporates covariates and randomly missing data. Excellent introductions to multilevel modeling include Raudenbush and Bryk (2002), Skrondal and RabeHesketh (2004), Agresti (2010), Goldstein (2010), and Hox et al. (2017).
In this paper, we are interested in using the model proposed by Hedeker and Gibbons (1994) and applying the pretest and shrinkage methods to estimate the fixed effects when there is prior knowledge/auxilliary information available in the form of possible linear restrictions on the parameters. In conducting longitudinal surveys, often many covariates have been collected and variable selection is a key issue in modeling this data. We assume that there is some prior/auxilliary information which could possibly be available through a variety of sources such as a similar study, a prior study which needs updated results, or through a search for a sparsity pattern and variable selection methods. Then using this knowledge, we impose linear restrictions on the fixed effects, while treating the random effects as nuisance parameters. To use this information, we test whether some parameters are not significant through a pretesting strategy. We also explore using the shrinkage estimator as an alternative to pretesting in hopes of improving the inference provided by the model. The simulation studies and application to a real data presented in this paper clearly illustrate the importance of the multilevel model for longitudinal ordinal outcome data.
To the best of our knowledge, there is no published research in reviewed literature that deals with the pretest and shrinkage estimators to multilevel regression models in the longitudinal setting with a repeated ordinal response. The originality of this paper is to fill this gap by implementing the unrestricted, restricted, pretest, shrinkage, and positive shrinkage estimators. The literature on shrinkage estimation is enormous, and we only mention a few of the most relevant contributions. Thomson and Hossain (2018) developed the James–Stein shrinkage and the LASSO methods and compared their performance with the maximum likelihood estimate for a generalized linear mixed models when some of the covariates may be subject to a linear restriction. Hossain et al. (2016) developed the pretest and shrinkage estimation methods for the analysis of longitudinal data under a partial linear model when some parameters are subject to certain restrictions. Zeng and Hill (2016) explored the properties of pretest and shrinkage estimators for random parameters logit models. Many articles have been devoted to the study of pretest and shrinkage estimators in parametric and semiparametric linear models for uncorrelated data, including Thomson et al. (2016), Hossain et al. (2015), Lian (2012), and among others.
The remainder of this paper is organized into seven more sections. Section 2 introduces the multilevel mixed effects ordinal regression model. Section 3 outlines the marginal maximum likelihood estimate (MMLE). Section 4 defines the pretest, and shrinkage estimates, for the fixed effect parameters. Section 5 discusses the asymptotic bias and risk under the alternative hypothesis. In Sect. 6 we conduct a simulation study. Section 7 involves applying the shrinkage estimate to a real data set, and Sect. 8 gives concluding remarks.
2 Multilevel mixed effects ordinal regression model
2.1 Observed ordinal response
The unobserved response \(\varvec{y}_i\) for the ith subject in model (2) can be related to the observed ordinal response through the “threshold concept”. We denote the observed ordinal response as \(\varvec{Y}_{i}\) and its value is determined by a series of strictly increasing thresholds \(\gamma _1< \cdots , < \gamma _{K1}\), where K is the number of ordered categories. In the ordinal response, \(Y_{ij}=K\) if \(\gamma _{K1}\le Y_{ij}< \gamma _{K}\) for the latent variable with \(\gamma _0=\infty\) and \(\gamma _K=\infty\). In the dichotomous response setting, it is common to set a threshold to zero to set the location of the latent variable. It is usually done in terms of the first threshold (i.e. \(\gamma _1=0)\).
3 Marginal maximum likelihood estimation
3.1 Numerical solution to the MML estimate
To evaluate the above numerical integration, Gauss–Hermite quadrature is used by summing over Q quadrature nodes (in our case, \(Q^q\)) for each dimension of integration and optimally weighting these points. The optimal weights for the standard normal univariate density are used and are given in Stroud and Sechrest (1966). For more details about this process, see Hedeker and Gibbons (1994). Note that the integral has been approximated by Gauss–Hermite quadrature, but the Laplace approximation can also be used as it is equivalent to using Gauss–Hermite quadrature with one quadrature point (Liu and Pierce 1994). Once the likelihood has been maximized as above, we denote the estimate of \(\hat{\varvec{\theta }}\) of parameter \(\varvec{\theta }\) as \(\hat{\varvec{\theta }}_F = (\hat{\varvec{\beta }}_F^{\mathsf {T}}, \hat{\varvec{\eta }}_F^{\mathsf {T}})^{\mathsf {T}}\), which refers to the unrestricted marginal maximum likelihood estimate (UMML). Although we obtained the estimate \(\hat{\varvec{\eta }}\), our primary focus is on the fixed effects parameter \(\varvec{\beta },\) while the variance–covariance components of the random effects and other parameters are treated as nuisance parameters.
3.2 Information matrix and restricted MML estimate
3.3 Likelihood ratio test
In the next section, we define the shrinkage and pretest estimators for the fixed effects parameter vector \(\varvec{\beta },\) while treating the other parameters in the model as nuisance.
4 Pretest and shrinkage estimators
5 Asymptotic bias and risk under the alternative hypothesis
Theorem 5.1
Theorem 5.2
Remark 5.1
To compare the ABs of the estimators, let \(\varvec{\omega } =\varvec{\zeta }/\varDelta\) to make all AB expressions in terms of scalar factors \(\varDelta\) along with \(\varvec{\omega }\). Thus, the bias of \(\hat{\varvec{\beta }}_R\) increases and is unbounded as \(\varDelta \rightarrow \infty\). On the other hand, the scalar factors in the ABs of \(\hat{\varvec{\beta }}_{P}\), \(\hat{\varvec{\beta }}_{S}\), and \(\hat{\varvec{\beta }}_{S+}\) are bounded in \(\varDelta ,\) as \(\text{ E }(\chi _{p_2+2}^{2}(\varDelta ))\) is a decreasing logconvex function of \(\varDelta\). The AB of \(\hat{\varvec{\beta }}_{S}\) starts from the origin at \(\varDelta =0\), grows monotonically first, reseaches a maximum, and then decreases back towards 0. Similar characteristics can be found for \(\hat{\varvec{\beta }}_{P}\) and \(\hat{\varvec{\beta }}_{S+}\). Further, the bias curve of \(\hat{\varvec{\beta }}_{S+}\) remains below the curve of \(\hat{\varvec{\beta }}_{S}\) up to certain value of \(\varDelta\) and then merges with the curve of the \(\hat{\varvec{\beta }}_{S}\).
Theorem 5.3
Theorem 5.4
Proof
Similar proofs can be found in Thomson and Hossain (2018). \(\square\)
Remark 5.2
Under \(H_0: \varvec{A}\varvec{\beta }=\varvec{h}\), that is, when \(\varvec{\delta }=\varvec{0}\), \(\hat{\varvec{\beta }}_R\) is the best choice and it strongly dominates \(\hat{\varvec{\beta }}_F\). Note that \(\text{ trace } (\varvec{W} \varvec{J}^{*})>0\) as the eigenvalue of \(\varvec{W} \varvec{J}^{*}\) are all positive. Also, \(\varvec{\zeta }^{\mathsf {T}} \varvec{W}\varvec{\zeta }>0\), provided that \(\varvec{\delta }=\varvec{0}\). However, when \(\varvec{\delta }\) moves away from the \(\varvec{0}\), the AR of \(\hat{\varvec{\beta }}_R\) grows and becomes unbounded as \(\varvec{\zeta }^{\mathsf {T}} \varvec{W}\varvec{\zeta }\) increases, whereas the risk of \(\hat{\varvec{\beta }}_F\) remains bounded. This clearly indicates that the performance of \(\hat{\varvec{\beta }}_R\) depends on the validity of \(\varvec{A}\varvec{\beta }=\varvec{h}\). The AR of \(\hat{\varvec{\beta }}_P\) increases monotonically to a maximum, crossing the risk function of \(\hat{\varvec{\beta }}_R\), then monotonically decreases to the value of \(\text{ trace } \, (\varvec{W} \varvec{I}^{1})\) as \(\varvec{\delta }\) gets further away from \(\varvec{0}\). It is difficult to endorse clearcut recommendations in favour of \(\hat{\varvec{\beta }}_{P}\) over \(\hat{\varvec{\beta }}_F\). It can also be shown that \(\text{ AR }\left( \hat{\varvec{\beta }}_{S+}; \varvec{W}\right)\)\(\le\)\(\text{ AR }\left( \hat{\varvec{\beta }}_{S}; \varvec{W}\right)\)\(\le\)\(\text{ AR }\left( \hat{\varvec{\beta }}_{F}; \varvec{W}\right)\). Hence, SE dominates \(\hat{\varvec{\beta }}_F\). The risk of \(\hat{\varvec{\beta }}_{S+}\) is asymptotically superior to that of \(\hat{\varvec{\beta }}_{S}\) in the entire parameter space induced by \(\varDelta\). For details of similar comparisons, see Hossain et al. (2015).
6 Simulation

Restricted Model: We considered restricted model \(H_0: \varvec{A} \varvec{\beta } = \varvec{h}\) and the estimators based on an \(\varvec{A}\) with \(H_0: \varvec{A} \varvec{\beta } = \varvec{0},\) where \(\varvec{A}=\left[ \varvec{I}_{p_2}, \varvec{0}_{p_2\times (p  p_{2})}\right]\), \(\varvec{h}=\varvec{0}_{p_2\times 1},\) where \(\varvec{0}_{a\times b}\) is an \(a\times b\) matrix of zeros and \(\varvec{\beta }_R=(\varvec{\beta }_{1R}^{\mathsf {T}}, \varvec{\beta }_{2R}^{\mathsf {T}})^{\mathsf {T}}\). The dimension of \(\varvec{\beta }_{1R}\) and \(\varvec{\beta }_{2R}\) are \(p_1 \times 1\) and \(p_2 \times 1\), respectively, such that \(p = p_1 + p_2\). We assume the restriction \(\varvec{\beta }_{2R}=\varvec{0}\), where \(\varvec{0}\) is a \(p_2\times 1\) vector of zeros (we considered \(p_2=3, 6\) and 12).

Simulation Model: We consider the simulation model when \(\varDelta = \varvec{\beta }\varvec{\beta }_R^2\), where \(\varvec{\beta } = (\varvec{\beta }_1^{\mathsf {T}}, \varvec{\beta }_2^{\mathsf {T}})^{\mathsf {T}}\), \(\cdot \) is the Euclidean norm. Under local alternative (13), \(\varDelta\) is the difference between the restricted model and the simulation model and they are identical models if \(\varDelta =0\).
We specify \(p_1 = 4\) for this simulation study, as the true coefficients are assumed to be \(\varvec{\beta _1} = (1.30,0.93,1.40, 1.26)^{\mathsf {T}}\) with \(\varvec{\mu } = 0.40\) as the global intercept term. The weight matrix \(\varvec{W}\) in the quadratic loss function from Sect. 5 was set to \(\varvec{I}_{p \times p}\). We consider the two simulation procedures for the probit and logit links of model (2).
Each of the p fixed effect covariates were generated from separate \(n_i\)multivariate normal distributions with mean vector \(\varvec{10}\) and covariance matrices \(\sigma _x^2\varvec{\rho }_x\), where \(\sigma _x^2 = 0.38\), and \(\varvec{\rho }_x\) is in the form of a exchangeable correlation matrix with \(\rho = 0.6\). The error term \(\varvec{\varepsilon }\) was generated with mean vectors \(\varvec{0}\), and covariance matrices \(\varvec{\varLambda } = \sigma _{\varepsilon }^2\varvec{\rho }_{\varepsilon }\), where \(\sigma _{\varepsilon }^2= 2.4\), and \(\varvec{\rho }_{\varepsilon }\) is the correlation structure, with parameter \(\rho _{\varepsilon } = 0.6\). The random effects were generated from a bivariate normal distribution (as there was two random effects) with means equal to 0, variances equal to 0.65 and covariances set to 0. The responses were generated using different \(\varDelta\) values, where \(\varDelta = (0.0, 0.1, 0.2, 0.4, 0.7, 0.9, 1.2, 1.5, 2.0)\).
Specifically, we generated 1000 data sets each consisting of \(N = 100\), 150, and 200 subjects (level1) and observations per subject vary from \(n_i = 2\) to \(n_i=4\) (level2). The number of observations per subject was drawn from uniform distribution ensuring minimum 2 observations and maximum 4 observations. Each subject also had two random effect covariates, one to allow for different baseline response (intercept) and another to allow for differing response profiles (slope). At each visit, a fourcategory ordinal response was generated which took values 1, 2, 3 and 4 with probabilities 0.2, 0.3, 0.3, and 0.2, respectively. For the purpose of better visualization, we summarize the results in the following subsections based on the tables and figures. Some of the tables and figures are provided in the Electronic Supplementary Material.
6.1 Quadratic Bias of UMML, RMML, PT, SE and PSE when \(\varDelta \ge 0\)
6.2 Risk analysis when \(\varDelta = 0\)
Relative MSEs of the RMML, PT, SE, and PSE on the UMML when the restricted parameter space is correct (i.e. \(\varDelta =0\)), \(p_1=4\), and \(n_i=3\)
Estimators  Sample size \(N=100\)  Sample size \(N=200\)  

\(p_2=3\)  \(p_2=6\)  \(p_2=12\)  \(p_2=3\)  \(p_2=6\)  \(p_2=12\)  
Probit model  
RMML  1.64  2.24  4.17  1.43  1.82  3.27 
PT  1.48  1.92  1.35  1.29  1.61  2.78 
SE  1.14  1.55  2.63  1.11  1.47  2.50 
PSE  1.21  1.62  2.76  1.15  1.56  2.74 
Logit model  
RMML  1.28  1.54  2.37  1.23  1.44  1.89 
PT  1.24  1.49  2.07  1.16  1.38  1.79 
SE  1.08  1.22  1.89  1.05  1.26  1.62 
PSE  1.12  1.29  2.03  1.07  1.33  1.74 
Relative MSEs of the RMML, PT, SE, and PSE on the UMML when the restricted parameter space is correct (i.e. \(\varDelta =0\)), \(p_1=4\) and \(n_i\) varies from 2 to 4
Estimators  Sample size \(N=100\)  Sample size \(N=200\)  

\(p_2=3\)  \(p_2=6\)  \(p_2=12\)  \(p_2=3\)  \(p_2=6\)  \(p_2=12\)  
Probit model  
RMML  1.58  2.15  3.13  1.43  1.77  2.14 
PT  1.30  1.73  2,40  1.34  1.64  2.00 
SE  1.15  1.59  2.32  1.13  1.21  1.78 
PSE  1.16  1.64  2.42  1.15  1.56  1.94 
Logit model  
RMML  1.27  2.06  2.84  1.24  1.38  2.42 
PT  1.20  1.52  1.96  1.17  1.37  2.19 
SE  1.08  1.43  2.09  1.07  1.18  1.94 
PSE  1.10  1.49  2.15  1.09  1.27  2.11 
6.3 Risk analysis when \(\varDelta \ge 0\)
7 Real data applications
In this section, we present an application of multilevel models for ordinal responses to longitudinal data. We applied the proposed shrinkage and pretest methods to the longitudinal kneebased data from Osteoarthritis Initiative (OAI) database. The data were obtained from the OAI, an online and publicly available database (http://www.oai.ecsf.edu/). Specifically, we used data sets from Enrollees version 25, and AllClinical datasets versions 0.2.3, 3.2.1, 5.2.1, 8.2.1, and 10.2.2. The OAI is a cohort study of 4796 subjects who were between 45 and 79 years of age at enrollment who, either have symptomatic knee osteoarthritis (OA), do not have the condition (Control), or are at risk for developing the condition. This study was conducted between February 2004 and September 2006, with annual followup done until 2013 to 2015 depending on the date of enrollment. Subjects were followed at 5 time points 0, 24, 36, 72, and 96 months after enrollment. The purpose of the OAI study was to improve public health. We included OAI participants with established symptomatic radiographic knee OA at baseline (\(N = 1668\)), defined as both knee symptoms such as pain, aching or stiffness in and around the knee on most days of the month for at least 1 month in the previous year, and excluded the control cohort from our analysis. Some missing observations in the responses and covariates were ignored in our analysis.
The response was categorized by the selfreported Western Ontario McMaster Osteoarthritis Index (WOMAC, 5point Likert scale; Bellamy et al. 1988) for addressing the severity with both knees in the quality of life associated with OA symptoms. The WOMAC is a diseasespecific instrument for measuring the level of pain, joint stiffness and functional ability and was applied for the evaluation of knee OA (Bellamy et al. 1988). The WOMAC score is calculated based on 5 items that measure pain, 2 items that measure stiffness and 17 items that measure physical function. For severity of difficulty in both knees, we classified WOMAC score into three categories (none, mild, and moderate to extreme) on the basis of a fivepoint scale, with higher scores indicating the severity of difficulties. Covariates included race [\(\beta _1\); white vs nonwhite], age \((\beta _2)\), body mass index [BMI\((\beta _3)\): overweight (\(2530\)) vs. healthy weight (\(<25\))], [BMI\((\beta _4)\): obese (\(>30\)) vs. healthy weight (\(<25\))], depression measured [CESD (\(\beta _5\)): yes vs no], total number of prescription medications [NMED(\(\beta _6)\)], time point for each subject [time (\(\beta _7\))], progression of kee pain/OA status [cohort(\(\beta _8\)): progression or not], sex [\(\beta _{9}\): male vs female], selfreported diabetes [DIAB(\(\beta _{10}\)): yes vs no], Charlson Comorbidity Index [CCI score (\(\beta _{11})\): \(\ge 1\) vs \(<1\)], and education [\(\beta _{12}\): college graduate vs high school and less], education [\(\beta _{13}\): university graduate vs high school and less]. The time points that were used were baseline, 24, 36, 72, and 96 months followup times. The main objective of our study is to find the association of knee OA severity on the abovementioned covariates and demographic variables, and see if our proposed estimators improve the unrestricted marginal maximum likelihood estimator.
Estimates (first row) and standard errors (second row) for \(\text{ race }\)\((\beta _1)\), \(\text{ age }\)\((\beta _2)\), overweight vs. healthy weight \((\beta _3)\), obese vs. healthy weight \((\beta _4)\), obese vs. healthy weight \((\beta _4)\), CESD \((\beta _5)\), NMED \((\beta _6)\), Time \((\beta _7)\), cohort \(\beta _8\), and sex \((\beta _{9})\)
Estimators  \(\beta _1\)  \(\beta _2\)  \(\beta _3\)  \(\beta _4\)  \(\beta _5\)  \(\beta _6\)  \(\beta _7\)  \(\beta _8\)  \(\beta _9\)  RMSE 

UMML  − 0.534  − 0.017  0.109  0.262  0.419  0.055  − 0.001  1.383  − 0.219  1.00 
(0.175)  (0.008)  (0.139)  (0.141)  (0.185)  (0.026)  (0.001)  (0.143)  (0.137)  
RMML  − 0.571  − 0.016  0.117  0.278  0.439  0.060  − 0.001  1.365  − 0.229  1.78 
(0.165)  (0.008)  (0.133)  (0.136)  (0.189)  (0.022)  (0.001)  (0.140)  (0.138)  
PT  − 0.549  − 0.017  0.109  0.262  0.418  0.055  − 0.001  1.383  − 0.219  1.15 
(0.138)  (0.006)  (0.121)  (0.129)  (0.168)  (0.021)  (0.001)  (0.125)  (0.127)  
SE  − 0.549  − 0.017  0.109  0.262  0.419  0.055  − 0.001  1.383  − 0.220  1.05 
(0.174)  (0.008)  (0.138)  (0.137)  (0.183)  (0.025)  (0.001)  (0.143)  (0.136)  
PSE  − 0.549  − 0.017  0.109  0.262  0.419  0.055  − 0.001  1.383  − 0.220  1.05 
(0.174)  (0.008)  (0.138)  (0.137)  (0.183)  (0.025)  (0.001)  (0.143)  (0.136) 
To estimate the standard errors of the estimator, we need to apply a bootstrap technique as we have only one data set. A bootstrap sampling scheme (Wu and Chiang 2000) was then carried out to calculate the estimates, standard errors, and RMSEs of the proposed estimates. We generated bootstrap samples of size \(N=300\) subjects with replacement from the original data set that contains 1668 subjects, and let \(\{\varvec{Y}_i^*, \varvec{X}_i^*, \varvec{Z}_i^*;~~ 1 \le i \le 300, 1 \le j \le 5\}\) be the longitudinal bootstrap samples. Some subjects and their entire data values in the original sample may appear several times in the new sample. We then refit the ordinal regression model using this data based on the same method that was applied in Sect. 3 to obtain bootstrap estimates. The resampling procedure was repeated 1000 times. The point estimates, standard errors, and RMSEs of significant coefficients are reported in the last column of Table 3. The RMSEs of RMML, PT, SE, and PSE with respect to UMML are 1.78, 1.15, 1.05 and 1.05, respectively. The results are consistent with the simulation study and theoretical findings and provide general recommendations about the application of the proposed shrinkage and pretest estimation methods.
8 Conclusion
We have applied the marginal maximum likelihood method to estimate the regression parameters of multilevel models for ordinal longitudinal data and named this estimate as UMML. We also estimate the parameters when some of the parameters are under linear restriction and named this estimate as RMML. The pretest and shrinkage estimators are constructed based on the UMML and RMML. We have presented a closed form of the bias and risk expressions, and used a Monte Carlo simulation study to explore the bias and risk properties of the estimators. Our simulation studies show good performance of the pretest and shrinkage methods under the different number of covariate settings. It is found that the RMML estimators offer numerically superior performance compared to the UMML, pretest, and shrinkage estimators at and near the restriction \(\varvec{A} \varvec{\beta } = \varvec{h}\). However, the superiority diminishes as we move away from this restriction. The risk of the pretest estimator is lower than the UE (or higher relative MSE with respect to the UE) at and near the restriction. As \(\varDelta\) increases, the relative MSE of PT, SE, and PSE converges to one; but near the restriction, the PSE performs better than the SE (although the difference between the two is not easy to spot due to the scale of the plot).
We applied the proposed estimation method to the Osteoarthritis Initiative database. To compare the pretest and shrinkage estimators with respect to UMML, we calculated the RMSE based on the bootstrap resampling method as we cannot calculate RMSE based on one data set. It shows that pretest and shrinkage estimators perform better than the UMML and the RMML performs the best because of its unbiasedness.
This paper has attempted to present the multilevel ordinal regression model for longitudinal data. Certainly, the use of ordinal models is not as popular as the use of normal and binary regression models, despite the fact that ordinal longitudinal outcomes are often obtained. The tools are available in terms of methods and software, so hopefully this situation will change as researchers become more familiar with applications of the multilevel ordinal regression model.
Notes
Acknowledgements
The OAI is a public–private partnership comprising five contracts (N01AR22258; N01AR22259; N01AR22260; N01AR22261; N01AR22262) funded by the National Institutes of Health, a branch of the Department of Health and Human Services, and conducted by the OAI Study Investigators. Private funding partners include Merck Research Laboratories; Novartis Pharmaceuticals Corporation; GlaxoSmithKline; and Pfizer, Inc. Private sector funding for the OAI is managed by the Foundation for the National Institutes of Health. This manuscript was prepared using an OAI public use data set and does not necessarily reflect the opinions or views of the OAI investigators, the NIH, or the private funding partners. We express our sincere thanks to the editor, associate editor, and the referees for their constructive and valuable suggestions, which led to an improvement of our original version of the manuscript. Shakhawat Hossain and Saumen Mandal were supported by Discovery Grants from the Natural Sciences and the Engineering Research Council of Canada.
Supplementary material
References
 Agresti, A. (2010). Analysis of Ordinal Categorical Data. Hoboken, New Jersey: Wiley.CrossRefGoogle Scholar
 Ahmed, S. E., Doksum, K., Hossain, S., & You, J. (2007). Shrinkage, pretest and LASSO estimators in partially linear models. Australian and New Zealand Journal of Statistics, 49(4), 461–471.CrossRefGoogle Scholar
 Albert, P. S., Hunsberger, S. A., & Biro, F. M. (1997). Modeling repeated measures with monotonic ordinal responses and misclassification, with applications to studying maturation. Journal of the American Statistical Association, 92, 1304–1211.CrossRefGoogle Scholar
 Bellamy, N., Buchanan, W. W., Goldsmith, C. H., Campbell, J., & Stitt, L. (1988). Validation study of womac: A health status instrument for measuring clinicallyimportant patientrelevant outcomes following total hip or knee arthroplasty in osteoarthritis. Journal of Orthopedics and Rheumatology, 1, 95–108.Google Scholar
 Goldstein, H. (2010). Multilevel Statistical Models. Chichester: Wiley.CrossRefGoogle Scholar
 Hedeker, D., & Gibbons, R. (1994). A randomeffects ordinal regression model for multilevel analysis. Biometrics, 50(4), 933–944.CrossRefGoogle Scholar
 Hedeker, D., & Gibbons, R. D. (2006). Longitudinal data analysis. New York: Wiley.zbMATHGoogle Scholar
 Hossain, S., Ahmed, S. E., & Doksum, K. A. (2015). Shrinkage, pretest, and penalty estimators in generalized linear models. Statistical Methodology, 24, 52–68.MathSciNetCrossRefGoogle Scholar
 Hossain, S., Ahmed, S. E., Yi, Y., & Chen, B. (2016). Shrinkage and pretest estimators for longitudinal data analysis under partially linear models. Jounal of Nonparametric Statistics, 28(3), 531–549.MathSciNetCrossRefGoogle Scholar
 Hox, J., Moerbeek, M., & van de Schoot, R. (2018). Multilevel Analysis: Techniques and Applications, Third Edition (Quantitative Methodology Series). New York: Routledge.Google Scholar
 Hox, J. J., Moerbeek, M., & Schoot, R. (2017). Multilevel analysis: Techniques and applications. New York: Routledge.CrossRefGoogle Scholar
 Lee, K., & Daniels, M. J. (2008). Marginalized models for longitudinal ordinal data with application to quality of life studies. Statistics in Medicine, 27(21), 4359–4380.MathSciNetCrossRefGoogle Scholar
 Lee, Y., & Nelder, J. A. (2004). Conditional and marginal models: Another view. Statistical Science, 19(2), 219–238.MathSciNetCrossRefGoogle Scholar
 Lian, H. (2012). Shrinkage estimation for identification of linear components in additive models. Statistics and Probability Letters, 82, 225–231.MathSciNetCrossRefGoogle Scholar
 Liu, Q., & Pierce, D. A. (1994). A note on gausshermite quadrature. Biometrika, 81(3), 624–629.MathSciNetzbMATHGoogle Scholar
 Magnus, J. R. (1988). Linear Structures. Oxford, London: Charles Griffin.zbMATHGoogle Scholar
 Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and data analysis methods. Thousand Oaks: Sage Publications Ltd.Google Scholar
 Skrondal, A., & RabeHesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. London: Chapman and Hall/CRC.CrossRefGoogle Scholar
 Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. London: SAGE Publications Ltd.zbMATHGoogle Scholar
 Stroud, A. H., & Sechrest, D. (1966). Gaussian quadrature formulas. Upper Saddle River: PrenticeHall.Google Scholar
 Thomson, T., & Hossain, S. (2018). Efficient shrinkage for generalized linear mixed models under linear restrictions. Sankhya A: The Indian Journal of Statistics, 80, 1–26.MathSciNetCrossRefGoogle Scholar
 Thomson, T., Hossain, S., & Ghahramani, M. (2016). Efficient estimation for time series following generalized linear models. Australian & New Zealand Journal of Statistics, 58, 493–513.MathSciNetCrossRefGoogle Scholar
 van der Vaart, A. W. (1998). Asymptotic Statistics. New York: Cambridge University Press.CrossRefGoogle Scholar
 Wu, C. O., & Chiang, C. T. (2000). Kernel smoothing on varying coefficient models with longitudinal dependent variable. Statistica Sinica, 10, 433–456.MathSciNetzbMATHGoogle Scholar
 Zeng, T., & Hill, R. C. (2016). Shrinkage estimation in the random parameters logit model. Open Journal of Statistics, 6, 667–674.CrossRefGoogle Scholar