Advertisement

Japanese Journal of Statistics and Data Science

, Volume 2, Issue 2, pp 299–322 | Cite as

Estimation strategy of multilevel model for ordinal longitudinal data

  • Shakhawat HossainEmail author
  • Ian Hiebert
  • Saumen Mandal
Original Paper

Abstract

This paper considers the shrinkage estimation of multilevel models that are appropriate for ordinal longitudinal data. These models can accommodate multiple random effects and, additionally, allow for a general form of model covariates that are related to the overall level of the responses and changes to the response over time. The likelihood inference for multilevel models is computationally burdensome due to intractable integrals. A maximum marginal likelihood (MML) method with Fisher’s scoring procedure is therefore followed to estimate the random and fixed effects parameters. In real life data, researchers may have collected many covariates for the response. Some of these covariates may satisfy certain constraints which can be used to produce a restricted estimate from the unrestricted likelihood function. The unrestricted and restricted MMLs can then be combined optimally to form the pretest and shrinkage estimators. Asymptotic properties of these estimators including biases and risks will be discussed. A simulation study is conducted to assess the performance of the estimators with respect to the unrestricted MML estimator. Finally, the relevance of the proposed estimators will be illustrated with a real data set.

Keywords

Longitudinal data Maximum marginal likelihood Multilevel Pretest Ordinal regression Shrinkage 

1 Introduction

Longitudinal data with ordinal responses occur quite frequently in health and social sciences. To diagnose patients, it is a common practice to assign them into ordered categories corresponding to various degrees of a medical condition or classify them based on the risk of developing a disease-related response. Many longitudinal studies of children are interested in maturation from childhood to adulthood. Naturally, these studies involve a monotonically increasing maturation process, and the sexual maturation is often measured with an ordinal outcome (Albert et al. 1997). In social studies. the response could be the level of agreement to a particular question such as, strongly agree, agree, no opinion, disagree, or strongly disagree. It is important to take advantages of using multilevel models that are appropriate for ordinal longitudinal data, rather than simply analyzing ordinal responses as measurements using the available software. Observations in these studies are made in hierarchical structures with repeated observations over time (level-1) nested within subjects (level-2). Level-1 measurements are generally not independent, and should be accounted for in a model even if these dependencies are not of primary interest. One analytic technique that accounts for within-subject dependence uses marginal modeling where the regression coefficients are interpreted in terms of the population and not the individual subject basis. This approach has been studied more extensively than the alternative approach of conditional models in which the population averaged effects are not directly specified and the effect of a covariate on the response is conditional on the random effects (Lee and Daniels 2008). In this paper, we consider a conditional model approach for ordinal longitudinal response. Lee and Nelder (2004) argued that the conditional model is fundamental and benefits from being able to make marginal predictions, as well as conditional predictions.

Numerous studies have suggested modeling ordinal responses by using conditional models. Hedeker and Gibbons (1994) proposed a random effects model for analyzing ordinal responses in longitudinal studies that use probit and logit link functions, using multilevel terminology. A maximum marginal likelihood solution is described in this paper using Gauss–Hermite multidimensional quadrature to numerically integrate over the distribution of random effects. An important issue is the number of necessary quadrature points to make sure of the accurate estimation of the model parameters. We used the Newton–Raphson method with the Fisher scoring algorithm for an iterative solution to the likelihood equations. Hedeker and Gibbons (2006) provided a comparable discussion from the multilevel regression perspective that accommodates multiple random effects for analyzing longitudinal ordinal responses where they included variables to explain inter- and intra-individual variations. Snijders and Bosker (2012) discussed multilevel models in a comprehensive way for data that are organized in a nesting structure. These multilevel models are defined by a set of regression models by which the variation presents itself in different levels, namely within-subjects and between-subjects variability is explicitly modeled. Albert et al. (1997) proposed the methodology for analyzing monotonically increasing ordinal data with an application to a sexual maturation data from the National Heart, Lung, and Blood Institute Growth and Health Study (NGHS). In particular, they developed an EM algorithm for maximum likelihood estimation that incorporates covariates and randomly missing data. Excellent introductions to multilevel modeling include Raudenbush and Bryk (2002), Skrondal and Rabe-Hesketh (2004), Agresti (2010), Goldstein (2010), and Hox et al. (2017).

In this paper, we are interested in using the model proposed by Hedeker and Gibbons (1994) and applying the pretest and shrinkage methods to estimate the fixed effects when there is prior knowledge/auxilliary information available in the form of possible linear restrictions on the parameters. In conducting longitudinal surveys, often many covariates have been collected and variable selection is a key issue in modeling this data. We assume that there is some prior/auxilliary information which could possibly be available through a variety of sources such as a similar study, a prior study which needs updated results, or through a search for a sparsity pattern and variable selection methods. Then using this knowledge, we impose linear restrictions on the fixed effects, while treating the random effects as nuisance parameters. To use this information, we test whether some parameters are not significant through a pretesting strategy. We also explore using the shrinkage estimator as an alternative to pretesting in hopes of improving the inference provided by the model. The simulation studies and application to a real data presented in this paper clearly illustrate the importance of the multilevel model for longitudinal ordinal outcome data.

To the best of our knowledge, there is no published research in reviewed literature that deals with the pretest and shrinkage estimators to multilevel regression models in the longitudinal setting with a repeated ordinal response. The originality of this paper is to fill this gap by implementing the unrestricted, restricted, pretest, shrinkage, and positive shrinkage estimators. The literature on shrinkage estimation is enormous, and we only mention a few of the most relevant contributions. Thomson and Hossain (2018) developed the James–Stein shrinkage and the LASSO methods and compared their performance with the maximum likelihood estimate for a generalized linear mixed models when some of the covariates may be subject to a linear restriction. Hossain et al. (2016) developed the pretest and shrinkage estimation methods for the analysis of longitudinal data under a partial linear model when some parameters are subject to certain restrictions. Zeng and Hill (2016) explored the properties of pretest and shrinkage estimators for random parameters logit models. Many articles have been devoted to the study of pretest and shrinkage estimators in parametric and semi-parametric linear models for uncorrelated data, including Thomson et al. (2016), Hossain et al. (2015), Lian (2012), and among others.

The remainder of this paper is organized into seven more sections. Section 2 introduces the multilevel mixed effects ordinal regression model. Section 3 outlines the marginal maximum likelihood estimate (MMLE). Section 4 defines the pretest, and shrinkage estimates, for the fixed effect parameters. Section 5 discusses the asymptotic bias and risk under the alternative hypothesis. In Sect. 6 we conduct a simulation study. Section 7 involves applying the shrinkage estimate to a real data set, and Sect. 8 gives concluding remarks.

2 Multilevel mixed effects ordinal regression model

Multilevel mixed effects models have become very popular for the analysis of longitudinal data as they are flexible and widely applicable. They assume that measurements from a subject share a set of latent, unobserved, and random effects which are used to generate an association structure between the repeated measurements. To set the notation, let j denote the level-1 units and i denote the level-2 units (nested observations). Assume that there are \(i=1, \ldots , N\) level-2 units and \(j=1, \ldots , n_i\) level-1 units nested within each level-2 unit. The total number of level-1 observations across level-2 units is given by \(n=\sum _{i=1}^{N} n_i\). Let \(y_{ij}\) be the unobserved latent response associated with level-2 unit i nested within level-1 unit j and \(\varvec{y}_i=(y_{i1}, \ldots , y_{in_i})^{\mathsf {T}}\). We also let \(\varvec{X}_i =(\varvec{x}_{i1}, \ldots , \varvec{x}_{in_i})^{\mathsf {T}}\), and \(\varvec{Z}_i =(\varvec{z}_{i1}, \ldots , \varvec{z}_{in_i})^{\mathsf {T}}\) be the \(n_i \times p\) and \(n_i \times q\) covariate matrices for the fixed and random effects, respectively. Let \(\varvec{\beta }=(\beta _1,\ldots ,\beta _p)^{\mathsf {T}}\) be the \(p \times 1\) vector of regression parameters and \(\varvec{u}_i=(u_{i1}, \ldots ,u_{i{n_i}})^{\mathsf {T}}\) be the \(q \times 1\) vector of unknown random effects for the level-2 unit i. Then the random effects regression model for the latent response can be written as:
$$\begin{aligned} \varvec{y}_i = \varvec{X}_i \varvec{\beta } + \varvec{Z}_i \varvec{u}_i +\varvec{\varepsilon }_i. \end{aligned}$$
(1)
The matrices \(\varvec{X}_i\) and \(\varvec{Z}_i\) are for the jth level-1 unit nested within level-2 unit i. Some researchers consider the model given in (1) as a multilevel model. This model can be shown to be multilevel by decomposing it into the following within level-2 unit model (between level-1 unit model),
$$\begin{aligned} \varvec{y}_i = \varvec{X}_{(1)i} \varvec{\beta }_{(1)} + \varvec{Z}_{(1)i} \varvec{u}_i + \varvec{\epsilon }_i, \end{aligned}$$
(2)
and between level-2 unit model,
$$\begin{aligned} \varvec{u}_i = \varvec{\mu } + \varvec{X}_{(2)i}\varvec{\beta }_{(2)} +\varvec{\delta }_i, \end{aligned}$$
(3)
where \(\varvec{X}_{(1)i}\) and \(\varvec{\beta }_{(1)}\) denote the fixed level-1 covariates and their effects, \(\varvec{X}_{(2)i}\) and \(\varvec{\beta }_{(2)}\) are the fixed level-2 covariates and their effects, and \(\varvec{Z}_{(1)i}\) are the level-1 covariates allowed to vary at level-2. Since the level-2 random effects \(\varvec{u}_i\) vary across level-1, it can be modelled in terms of subject-level variables \(\varvec{X}_{(2)i}\) to represent the effect of subject characteristics on the initial level \((\varvec{\mu })\) and change across the time of a subject, in addition to unexplained subject-level random variation \((\varvec{\delta }_i)\). The error vector \(\varvec{\epsilon }_i\) and random effect vector \(\varvec{\delta }_i\) account for the within-subject and between-subject variations and follow multivariate normal distributions with means \(\varvec{0}\) and variance covariance matrices \(\varvec{\varLambda }_i\) and \(\varvec{\Sigma _u}\), respectively. Now, the models given in (2) and (3) can be combined into the following model:
$$\begin{aligned} \varvec{y}_i = \varvec{X}_{(1)i} \varvec{\beta }_{(1)} + \varvec{Z}_{(1)i} (\varvec{\mu } + \varvec{X}_{(2)i}\varvec{\beta }_{(2)} +\varvec{\delta }_i) + \varvec{\epsilon }_i, \end{aligned}$$
(4)
which can be simplified as (1) by setting \(\varvec{u}_i = \varvec{\mu } + \varvec{\delta }_i\), \(\varvec{X}_i = [\varvec{Z}_{(1)i} \otimes \varvec{X}_{(2)i}~\vdots ~ \varvec{X}_{(1)i}]\), and \(\varvec{\beta } = [\varvec{\beta }_{(2)}~\vdots ~\varvec{\beta }_{(1)}]\), where \(\otimes\) is the Kronecker product. Usually, some of the level-2 covariates \(\varvec{X}_{(2)i}\) may not be significantly related to all of the q level-2 effects \(\varvec{u}_i\). In this case, the corresponding elements of the \(\varvec{Z}_{(1)i} \otimes \varvec{X}_{(2)i}\) partition of the covariate matrix \(\varvec{X}_i\) are removed.

2.1 Observed ordinal response

The unobserved response \(\varvec{y}_i\) for the ith subject in model (2) can be related to the observed ordinal response through the “threshold concept”. We denote the observed ordinal response as \(\varvec{Y}_{i}\) and its value is determined by a series of strictly increasing thresholds \(\gamma _1< \cdots , < \gamma _{K-1}\), where K is the number of ordered categories. In the ordinal response, \(Y_{ij}=K\) if \(\gamma _{K-1}\le Y_{ij}< \gamma _{K}\) for the latent variable with \(\gamma _0=-\infty\) and \(\gamma _K=\infty\). In the dichotomous response setting, it is common to set a threshold to zero to set the location of the latent variable. It is usually done in terms of the first threshold (i.e. \(\gamma _1=0)\).

For the model given in (2), the probability of the response falling into category K given that it occurs on individual i and conditional on the parameters \(\varvec{\beta }\) and \(\varvec{u}\) is given by:
$$\begin{aligned} P(Y_{ij} = k|\varvec{\beta },\varvec{u})=F(\gamma _k-D_{ij})-F(\gamma _{k-1}-D_{ij}), \end{aligned}$$
where F is a cumulative distribution function (CDF) and \(D_{ij}\) = \(\varvec{X}_{ij}^{\mathsf {T}} \varvec{\beta } + \varvec{Z}_{ij}^{\mathsf {T}} \varvec{u}_i\), where \(\varvec{X}_{ij}\) is the \(p\times 1\) covariate vector and \(\varvec{Z}_{ij}\) is the design vector for the q random effects. The CDF \(F(\cdot )\) could be the inverse of the probit, logit, cloglog, loglog, or cauchit link function, but in this paper we consider the cases where F is the inverse cumulative standard normal distribution function \(\varvec{\Phi }(\cdot )\) (which is equivalent to linking the response via the probit link function) or inverse cumulative logit function \(\varPsi (\cdot )\). For logistic response, the logit function \(\varPsi (\cdot )\) replaces the normal \(\varvec{\Phi }(\cdot )\), and the standard logistic has standard deviation equal to \(\pi /\sqrt{3}\). In the next section, we will derive the marginal maximum likelihood for probit response function; however, we will mention the necessary changes if the logistic function is used.

3 Marginal maximum likelihood estimation

Let \(\varvec{Y}_i\) be the vector of ordinal responses from the level-2 unit i for the \(n_i\) level-1 units nested within each level-2 unit. The probability of any pattern \(y_i\), given \(\varvec{\beta }\) and \(\varvec{u},\) is equal to the product of the probabilities of the level-1 responses which can be written as
$$\begin{aligned} \ell ( \varvec{Y}_i|\varvec{\beta },\varvec{u})= \prod _{j=1}^{n_i} \prod _{k=1}^K \left( \varvec{\Phi }\left( \frac{\gamma _k-D_{ij}}{\sigma }\right) -\varvec{\Phi }\left( \frac{\gamma _{k-1}-D_{ij}}{\sigma }\right) \right) ^{I_{ijk}}, \end{aligned}$$
where, \(I_{ijk} = 1 \ \text {for} \ Y_{ij}=k\) and 0 otherwise. We also set \(\sigma = 1\), and as noted above \(\gamma _1 = 0\).
Let \(h(\varvec{u})\) denote the prior density of the random effects \(\varvec{u}\). The marginal density of \(\varvec{Y}\) can be expressed by multiplying N times the integral of the conditional likelihood \(\ell (\cdot )\) weighted by the prior density \(h(\cdot )\),
$$\begin{aligned} {\mathcal {L}}(\varvec{Y}|\varvec{\beta },\varvec{u}) = \prod _{i=1}^{N} \int _{\varvec{u} } \ell (\varvec{Y}_i|\varvec{\beta },\varvec{u})h(\varvec{u})\mathrm{{d}}\varvec{u}. \end{aligned}$$
(5)
Hedeker and Gibbons (1994) transformed the integral in the ordinal response setting so that numerical quadrature could be used to solve the marginal likelihood equation, as there is no closed form solution. To improve the stability during estimation, the random effects can be transformed as \(\varvec{u} = \varvec{G\psi } + \varvec{\mu }\), where \(\varvec{GG^{\mathsf {T}}}\) =\(\varvec{\Sigma _u}\) and \(\varvec{G}\) is the Cholesky decomposition of \(\varvec{\Sigma _u}\). After applying the transformation, the transformed model is \(D_{ij}\) = \(\varvec{X}_{ij}^{\mathsf {T}} \varvec{\beta } + \varvec{Z}_{ij}^{\mathsf {T}} (\varvec{G\psi } + \varvec{\mu })\) and the transformed likelihood is
$$\begin{aligned} {\mathcal {L}}(\varvec{Y}|\varvec{\beta },\varvec{u})&= \prod _{i=1}^{N} \int _{\varvec{\psi } } \ell (\varvec{Y}_i|\varvec{\beta },\varvec{\psi })h(\varvec{\psi })d\varvec{\psi }, \end{aligned}$$
where \(h(\varvec{\psi })\) is the multivariate standard normal density. Using the Cholesky factor \(\varvec{G}\) instead of \(\varvec{\Sigma _u}\) during iterations ensures that \(\hat{\varvec{\Sigma }}_u\) is positive definite at all times. It also simplifies the numerical integration (Hedeker and Gibbons 1994).
The marginal log-likelihood from the N level-2 units, \(\log {L}\)= \(\text {log}[{\mathcal {L}}(\varvec{Y}|\varvec{\beta },\varvec{u})]\) = \(\sum _{i=1}^{N} \text {log}\left[ {\mathcal {L}}(\varvec{Y}_i|\varvec{\beta },\varvec{\psi })\right]\) = \(\sum _{i=1}^{N} f(\varvec{Y}_i)\), is then maximized to yield maximum likelihood estimates. For this, denote the conditional likelihood as \(\ell (\varvec{Y}_i|\varvec{\beta },\varvec{u})\) and the marginal density as \(f(\varvec{Y}_i)\) with
$$\begin{aligned} f(\varvec{Y}_i) = \int _{\varvec{u} } \ell (\varvec{Y}_i|\varvec{\beta },\varvec{u})h(\varvec{u})d\varvec{u}= \int _{\varvec{\psi } } \ell (\varvec{Y}_i|\varvec{\beta },\varvec{\psi })h(\varvec{\psi })d\varvec{\psi }, \end{aligned}$$
(6)
and the proof is given in the Appendix. Let \(\varvec{\theta }\) be a \((K-2+p+q+s) \times 1\) parameter vector to be estimated for the \(K-2\) threshold values \(\gamma _k\) (\(k=2,\ldots , K-1\)), the p fixed effects parameter vector \(\varvec{\beta }\), the overall mean \(\varvec{\mu }\), and \(v(\varvec{G})\) that contains the unique elements of the Cholesky factor with q and \(s \le q(q+1)/2\) elements, respectively. That is, we can write the total parameters in the model as \(\varvec{\theta }= (\varvec{\beta }^{\mathsf {T}}, \varvec{\eta }^{\mathsf {T}})^{\mathsf {T}}\), where \(\varvec{\eta }= (\varvec{\gamma }^{\mathsf {T}}, \varvec{\mu }^\mathsf {T}, \varvec{v}^{\mathsf {T}})^{\mathsf {T}}\) with \(\varvec{\gamma }\) is a \((K-2)\times 1\) vector of threshold parameter vector and \(\varvec{v}\) is the variance–covariance parameter vector of random effects \(\varvec{u}\). Differentiating \(\log {L}\) with respect to the parameter \(\varvec{\theta }\), we have
$$\begin{aligned} \frac{\partial \log {L}}{\partial \varvec{\theta }} = \sum _{i=1}^{N}{f(\varvec{Y}_i)}^{-1}\int _{\varvec{\psi }} \sum _{j=1}^{n_i}\sum _{k=1}^{K}{I_{ijk}} \xi (\Phi ,\phi ) \ell ( \varvec{Y}_i|\varvec{\psi },\varvec{\beta })h(\varvec{\psi })\frac{\partial D_{ij}}{\partial \varvec{\theta }}\mathrm{{d}}\varvec{\psi }, \end{aligned}$$
(7)
where
$$\begin{aligned} \xi (\Phi ,\phi ) = \frac{\varvec{\phi }(\gamma _k-D_{ij}){w_{kk^\prime }} -\varvec{\phi }(\gamma _{k-1}-D_{ij}){w_{(k-1)k^\prime }}}{\varvec{\Phi }(\gamma _k-D_{ij})-\varvec{\Phi }(\gamma _{k-1}-D_{ij})}, \end{aligned}$$
and substituting the parameters back in for \(\varvec{\theta }\),
$$\begin{aligned} \frac{\partial D_{ij}}{\partial \varvec{\gamma }_{k^\prime }} = 1, \frac{\partial D_{ij}}{\partial \varvec{\beta }} = \varvec{x}_{ij}, \ \frac{\partial D_{ij}}{\partial \varvec{\mu }} = \varvec{z}_{ij}, \ \frac{\partial D_{ij}}{\partial v(\varvec{G})} = (\varvec{\psi }\otimes \varvec{x}_{ij} )\varvec{J}_q, \end{aligned}$$
where \(w_{kk^\prime }\) and \(w_{(k-1)k^\prime }\) are indicator variables such that \(w_{kk^\prime } = 1 \ \text {for} \ k = k^\prime\) when the derivative is taken with respect to the threshold values, \(-1\) when the derivative is taken with respect to any parameter other than the threshold values, and 0 otherwise. \(\varvec{J}_q\) is the transpose of the transformation matrix of Magnus (1988) which makes the resulting matrix lower triangular. Note that, for the cumulative logit case, the logistic response function \(\varPsi (\cdot )\) replaces the normal response function \(\varvec{\Phi }\), and the product \(\varPsi (\cdot ) \times (1-\varPsi (\cdot ))\) replaces the standard normal density function \(\phi (\cdot )\) in equations (5) and (7).

3.1 Numerical solution to the MML estimate

The Newton–Raphson method with Fisher scoring algorithm can be used to obtain the solution of the log-likelihood equation \(\partial \log {L}/\partial \varvec{\theta }=\varvec{0}\). For this, the working estimate \(\hat{\varvec{\theta }}\) of \(\varvec{\theta }\) is improved on the lth iteration as follows;
$$\begin{aligned} \varvec{\theta }_{l+1} = \varvec{\theta }_l -E\left( \frac{\partial ^2 \log {L}}{\partial \varvec{\theta }_l\partial \varvec{\theta }_l^{\mathsf {T}}} \right) ^{-1}\frac{\partial \log {L}}{\partial \varvec{\theta }}, \end{aligned}$$
where the information matrix, or expectation of the matrix of second derivatives, is given by
$$\begin{aligned} E\left( \frac{\partial ^2 \log {L}}{\partial \varvec{\theta }_l\partial \varvec{\theta }_l^{\mathsf {T}}} \right) = \sum _{i=1}^{N}f^{-2}(\varvec{Y}_i)\frac{\partial f(\varvec{Y}_i) }{\partial \varvec{\theta }} \left( \frac{\partial f(\varvec{Y}_i)}{\partial \varvec{\theta }}\right) ^{{\mathsf {T}}}. \end{aligned}$$
(8)
Lee and Daniels (2008) mentioned that the matrix of second derivatives has a complex form, and there is a consistent estimator of the information matrix given that the model is correctly specified and only involves first derivatives called the sample empirical covariance matrix which is given in (8). Once convergence has been achieved, the large-sample covariance matrix can be obtained by inverting the information matrix.

To evaluate the above numerical integration, Gauss–Hermite quadrature is used by summing over Q quadrature nodes (in our case, \(Q^q\)) for each dimension of integration and optimally weighting these points. The optimal weights for the standard normal univariate density are used and are given in Stroud and Sechrest (1966). For more details about this process, see Hedeker and Gibbons (1994). Note that the integral has been approximated by Gauss–Hermite quadrature, but the Laplace approximation can also be used as it is equivalent to using Gauss–Hermite quadrature with one quadrature point (Liu and Pierce 1994). Once the likelihood has been maximized as above, we denote the estimate of \(\hat{\varvec{\theta }}\) of parameter \(\varvec{\theta }\) as \(\hat{\varvec{\theta }}_F = (\hat{\varvec{\beta }}_F^{\mathsf {T}}, \hat{\varvec{\eta }}_F^{\mathsf {T}})^{\mathsf {T}}\), which refers to the unrestricted marginal maximum likelihood estimate (UMML). Although we obtained the estimate \(\hat{\varvec{\eta }}\), our primary focus is on the fixed effects parameter \(\varvec{\beta },\) while the variance–covariance components of the random effects and other parameters are treated as nuisance parameters.

3.2 Information matrix and restricted MML estimate

For testing the particular hypothesis on \(\varvec{\beta }\) or linear combination of \(\varvec{\beta }\), we need the following partition of the observed information matrix:
$$\begin{aligned} \varvec{I}(\varvec{\theta }) = E\left[ \begin{array}{ll} \frac{\partial ^2 \log {L}}{\partial \varvec{\beta }\partial \varvec{\beta }^{\mathsf {T}}} & \frac{\partial ^2 \log {L}}{\partial \varvec{\beta }\partial \varvec{\eta }}\\ \frac{\partial ^2 \log {L}}{\partial \varvec{\eta }\partial \varvec{\beta }} & \frac{\partial ^2 \log {L}}{\partial \varvec{\eta }\partial \varvec{\eta }^{\mathsf {T}}}\\ \end{array}\right] . \end{aligned}$$
(9)
We work with such a partition as our inference is centred around \(\varvec{\beta }\). Hence, we consider the hypotheses
$$\begin{aligned} H_0: \varvec{A} \varvec{\beta } = \varvec{h} ~~~\text{ vs. }~~~ H_1: \varvec{A} \varvec{\beta } \ne \varvec{h}, \end{aligned}$$
(10)
where \(\varvec{A}\) is a \(g \times p\) matrix of full row rank, \(g \le p\), and \(\varvec{h}\) is a \(g \times 1\) vector of known constants.
In many practical settings, the fixed effects in (2) may be subject to restrictions \(\varvec{A} \varvec{\beta } = \varvec{h},\) and random effects and other parameters are treated as nuisance. These restrictions typically reflect prior information about the value of the parameters. Rather than using such restrictions for testing, these restrictions can be used to construct a shrinkage estimator, which will be defined in the next section, and thereby estimation efficiency. Restricted estimation is, of course, more complicated compared with unrestricted estimation. We incorporate the restrictions by assuming the hypothesis given in (10). In this situation, maximizing \(\log {L}\) = \(\sum _{i=1}^{N} {\mathcal {L}}(\varvec{Y}_i|\varvec{\beta },\varvec{\psi })\) subject to \(\varvec{A}\varvec{\beta }=\varvec{h}\) is equivalent to finding
$$\begin{aligned} \hat{\varvec{\theta }}_R = (\hat{\varvec{\beta }}_R^{\mathsf {T}}, \hat{\varvec{\eta }}_R^{\mathsf {T}})^{\mathsf {T}} = \underset{\varvec{\beta }, \varvec{\psi }, \varvec{\gamma },\varvec{\mu }}{\text{ argmax }}\left\{ \sum _{i=1}^{N} {\mathcal {L}}(\varvec{Y}_i|\varvec{\beta },\varvec{\psi }): \varvec{A}\varvec{\beta }=\varvec{h}\right\} , \end{aligned}$$
(11)
where \(\hat{\varvec{\theta }}_R\) is the restricted marginal maximum likelihood (RMML) estimator. The objective function (11) can be maximized by using the same numerical method discussed in Section 3.1 with an appropriate choice of matrix \(\varvec{A}\) and the constant vector \(\varvec{h}\).

3.3 Likelihood ratio test

Since our proposed model holds the proportional odds assumption, we can use the likelihood ratio test for testing \(H_0: \varvec{A}\varvec{\beta }=\varvec{h}\), where the deviance of the unrestricted model is compared to the deviance of the restricted model. The test statistic is given by:
$$\begin{aligned} {\hat{\Xi }}_L= & {} -2(\log {{\mathcal {L}}(\varvec{Y}|\hat{\varvec{\beta }}_F, \hat{\varvec{\psi }}_F)}~-~\log {{\mathcal {L}}(\varvec{Y} |\hat{\varvec{\beta }}_R,\hat{\varvec{\psi }}_R)}) \nonumber \\= &\, \text {Deviance of restricted model(DEVR)-Deviance of unrestricted model(DEVU)}. \end{aligned}$$
(12)
Under the null hypothesis, the asymptotic distribution of the test statistic in (12) is chi-square with degrees of freedom f=DEVR - DEVU, which is equal to the number of additional parameters in the unrestricted model (Hedeker and Gibbons 1994).

In the next section, we define the shrinkage and pretest estimators for the fixed effects parameter vector \(\varvec{\beta },\) while treating the other parameters in the model as nuisance.

4 Pretest and shrinkage estimators

The pretest estimator (PT) denoted as \(\hat{\varvec{\beta }}_{P}\) for the parameters \(\varvec{{\beta }}\) based on \(\hat{\varvec{\beta }}_F\) and \(\hat{\varvec{\beta }}_R\) is defined as,
$$\begin{aligned} \hat{\varvec{\beta }}_{P}=\hat{\varvec{\beta }}_F-I({\hat{\Xi }}_L \le \chi ^2_{q,\alpha })(\hat{\varvec{\beta }}_F-\hat{\varvec{\beta }}_R), \end{aligned}$$
where \(I(\cdot )\) is the indicator function that chooses whether to use the unrestricted or the restricted model based on whether \(H_0\) is rejected or accepted. In a certain region of the parameter space, the pretest estimator underperforms the UMML (Ahmed et al. 2007). Unfortunately, the PT is a discontinuous function that changes based on the \(\alpha\)-level. To avoid this discontinuity, we thus use the shrinkage estimator (SE) which is a continuous function and expresses the UMML and the RMML in the form of a linear combination given as
$$\begin{aligned} \hat{\varvec{\beta }}_{S}=\hat{\varvec{\beta }}_R + (1-(q-2){\hat{\Xi }}_L^{-1})(\hat{\varvec{\beta }}_F-\hat{\varvec{\beta }}_R). \end{aligned}$$
This reduces down to a function with the form \(\hat{\varvec{\beta }}_{SE}=\lambda \hat{\varvec{\beta }}_F+(1-\lambda ) \hat{\varvec{\beta }}_R\), where \(\lambda \in [0,1]\). When \(\lambda = 1,\) no shrinkage occurs and the estimates are the same as the UMML, and if \(\lambda\) = 0, the RMML is chosen. The drawback is that the shrinkage factor \((1-(q-2){\hat{\Xi }}_L^{-1})\) can be negative. This happens for small values of \({\hat{\Xi }}_L\). This phenomenon is known as over-shrinkage. This can be alleviated by taking its positive part which makes it not only a shrinkage estimator, but also a thresholding estimator. The positive part shrinkage estimator (PSE) is defined as:
$$\begin{aligned} \hat{\varvec{\beta }}_{S+}=\hat{\varvec{\beta }}_R+ \left( 1-\frac{(q-2)}{{\hat{\Xi }}_L}\right) _{+} (\hat{\varvec{\beta }}_F-\hat{\varvec{\beta }}_R), ~q\ge 3, \end{aligned}$$
where \(z_+ = \max (0,z)\) is employed.

5 Asymptotic bias and risk under the alternative hypothesis

In this section, we derive the asymptotic bias (AB) and asymptotic risk (AR) of the proposed estimators using the local asymptotic normality approach of van der Vaart (1998). The test statistic \({\hat{\Xi }}_L\) converges to \(\infty\) as \(n\rightarrow \infty\) for the fixed alternative \(H_a: \varvec{A}\varvec{\beta }\ne \varvec{h} + \varvec{\delta }\), where \(\varvec{\delta }= (\delta _1, \delta _2, \ldots , \delta _{p_2})\)\(\in {\mathbb {R}}^{p_2}\) is a real fixed vector and the estimators \(\hat{\varvec{\beta }}_{P}\), \(\hat{\varvec{\beta }}_{S}\), and \(\hat{\varvec{\beta }}_{S+}\) become equivalent in probability to the UMML \(\hat{\varvec{\beta }}_{F}\). On the other hand, the RMML estimator \(\hat{\varvec{\beta }}_{R}\) will have unbounded risk. Therefore, for the large-sample situation there is not much to investigate. That is, for any proposed estimator \(\hat{\varvec{\beta }}^*\) of \(\varvec{\beta }\) and under a fixed alternative, the distribution of \(\sqrt{n}(\hat{\varvec{\beta }}^*-\varvec{\beta })\) converges to \(\sqrt{n}(\hat{\varvec{\beta }}_F-\varvec{\beta })\). To concentrate on meaningful comparisons, we restrict ourselves to local alternatives of the form
$$\begin{aligned} H_{(n)}: \varvec{A}\varvec{\beta } = \varvec{h} + \frac{\varvec{\delta }}{\sqrt{n}}, ~n>0. \end{aligned}$$
(13)
The following joint asymptotic normality of UMML and RMML under (13) allows the study of the AB and AR of the estimators \(\hat{\varvec{\beta }}_{F}\), \(\hat{\varvec{\beta }}_{R}\), \(\hat{\varvec{\beta }}_{P}\), \(\hat{\varvec{\beta }}_{S}\), and \(\hat{\varvec{\beta }}_{S+}\). Let \(\varvec{\varrho }_{1n} = \sqrt{n}(\hat{\varvec{\beta }}_F - \varvec{\beta })\), \(\varvec{\varrho }_{2n} = \sqrt{n}(\hat{\varvec{\beta }}_R - \varvec{\beta })\), and \(\varvec{\varrho }_{3n}=\sqrt{n}(\hat{\varvec{\beta }}_F - \hat{\varvec{\beta }}_R)\).

Theorem 5.1

Assume\(\varvec{A}=[\varvec{O}, \varvec{{\mathcal {I}}}]\)is a\(p_2\times p_1\)of zeros and\(\varvec{{\mathcal {I}}}\)is a\(p_2\times p_2\)identity matrix of rank\(p_2.\)Under the local alternatives (13) and the assumed regularity conditions, we have
$$\begin{aligned}&\left( \begin{array}{c} \varvec{\varrho }_{1n} \\ \varvec{\varrho }_{2n}\\ \varvec{\varrho }_{3n} \end{array}\right) \xrightarrow [n\rightarrow \infty ]{\mathcal {L}} \left( \begin{array}{c} \varvec{\varrho }_{1} \\ \varvec{\varrho }_{2}\\ \varvec{\varrho }_{3} \end{array} \right) \sim N_{3p} \left( \left( \begin{array}{c} \varvec{0} \\ \varvec{\zeta } \\ -\varvec{\zeta } \end{array}\right) , \left( \begin{array}{ccc} \varvec{I}^{-1} &{} \varvec{J}^* &{} \varvec{I}^{-1} - \varvec{J}^* \\ \varvec{J}^{{*}\mathsf {T}} & \varvec{I}^{-1} - \varvec{J}^* &{} \varvec{0} \\ (\varvec{I}^{-1} - \varvec{J}^*)^{\mathsf {T}} &{} \varvec{0} &{} \varvec{I}^{-1} - \varvec{J}^* \end{array} \right) \right), \end{aligned}$$
where \(\varvec{I}= E\left( \frac{\partial ^2 \log {L}}{\partial \varvec{\beta }\partial \varvec{\beta }^{\mathsf {T}}}\right),\)\(\varvec{J}^* = \varvec{I}^{-1}- \varvec{I}^{-1}\varvec{A}(\varvec{A}^{\mathsf {T}}\varvec{I}^{-1}\varvec{A})^{-1} \varvec{A}\varvec{I}^{-1}\), and\(\varvec{\zeta } = -\varvec{I}^{-1}\varvec{A}(\varvec{A}^{\mathsf {T}} \varvec{I}^{-1}\varvec{A})^{-1}\varvec{\delta }.\)
First, we begin with the asymptotic bias of the proposed estimator. Assume that\(\sqrt{n}(\hat{\varvec{\beta }}^* - \varvec{\beta })\)converges in distribution as\(n\rightarrow \infty\)to some integrable random variable Z with distribution\({\tilde{F}}_z\). Then, the AB of\(\hat{\varvec{\beta }}^*\)is defined by
$$\begin{aligned} \text{ AB }(\hat{\varvec{\beta }}^*) = \int \varvec{z} d{\tilde{F}}_z(\varvec{z}). \end{aligned}$$
Let\(\varPsi _g(x,\varDelta ) = P\left( \chi _{g}^2(\varDelta ) \le z \right)\)be the distribution function of a non-central chi-squared random variable\(\chi _{g}^2(\varDelta )\)with non-centrality parameter\(\varDelta =\varvec{\delta }^{\mathsf {T}}(\varvec{A}^{\mathsf {T}}\varvec{I}^{-1}\varvec{A}) ^{-1}\varvec{\delta }\)and degrees of freedom g. Also, let\(\chi ^2_{g, \alpha }\)be the\(\alpha\)-level critical value of the central\(\chi ^{2}_{g}\)distribution.

Theorem 5.2

If the conditions of Theorem 5.1 hold, then the asymptotic biases of estimators are
$$\begin{aligned} \text{ AB }(\hat{\varvec{\beta }}_R)= \,& \varvec{\zeta }\\ \text{ AB }(\hat{\varvec{\beta }}_{P})= &\, \varvec{\zeta } \varPsi _{p_2 + 2}(\chi ^2_{p_2-2, \alpha }, \varDelta )\\ \text{ AB }(\hat{\varvec{\beta }}_{S})=\, & (p_2 - 2)\varvec{\zeta } \text{ E }(\chi _{p_2+2}^{-2}(\varDelta ))\\ \text{ AB }(\hat{\varvec{\beta }}_{S+}) =\, & \text{ AB } (\hat{\varvec{\beta }}_{S}) - (p_2-2) \varvec{\zeta } \text{ E }(\chi _{p_2+2}^{-2}(\varDelta ) I(\chi _{p_2+2}^{-2}(\varDelta )<p_2-2)) - \text{ AB }(\hat{\varvec{\beta }}_{P}). \end{aligned}$$
Proof Similar proofs can be found in Thomson and Hossain (2018).

Remark 5.1

To compare the ABs of the estimators, let \(\varvec{\omega } =\varvec{\zeta }/\varDelta\) to make all AB expressions in terms of scalar factors \(\varDelta\) along with \(\varvec{\omega }\). Thus, the bias of \(\hat{\varvec{\beta }}_R\) increases and is unbounded as \(\varDelta \rightarrow \infty\). On the other hand, the scalar factors in the ABs of \(\hat{\varvec{\beta }}_{P}\), \(\hat{\varvec{\beta }}_{S}\), and \(\hat{\varvec{\beta }}_{S+}\) are bounded in \(\varDelta ,\) as \(\text{ E }(\chi _{p_2+2}^{-2}(\varDelta ))\) is a decreasing log-convex function of \(\varDelta\). The AB of \(\hat{\varvec{\beta }}_{S}\) starts from the origin at \(\varDelta =0\), grows monotonically first, reseaches a maximum, and then decreases back towards 0. Similar characteristics can be found for \(\hat{\varvec{\beta }}_{P}\) and \(\hat{\varvec{\beta }}_{S+}\). Further, the bias curve of \(\hat{\varvec{\beta }}_{S+}\) remains below the curve of \(\hat{\varvec{\beta }}_{S}\) up to certain value of \(\varDelta\) and then merges with the curve of the \(\hat{\varvec{\beta }}_{S}\).

The bias expressions for all the estimators are not in the scalar form. We therefore take recourse by converting them into the quadratic form in Theorem 5.3 that will be used to calculate the quadratic bias numerically in our simulation study in the following section. The quadratic bias (QB) of any of the proposed estimators \(\hat{\varvec{\beta }}^*\) is given by
$$\begin{aligned} \text{ QB }(\hat{\varvec{\beta }}^*)= \text{ AB }(\hat{\varvec{\beta }}^*)^{\top } \varvec{I} \text{ AB }(\hat{\varvec{\beta }}^*). \end{aligned}$$

Theorem 5.3

Assume the condition of Theorem 5.1 holds, the QB of the proposed estimators are
$$\begin{aligned} \text{ QB }(\hat{\varvec{\beta }}_F)= & {} \varvec{0} ,\\ \text{ QB }(\hat{\varvec{\beta }}_R)= & {} \Upsilon ,~\text{ where }~~\Upsilon =\varvec{\zeta }^{\mathsf {T}}\varvec{I}\varvec{\zeta },\\ \text{ QB }(\hat{\varvec{\beta }}_{P})= & {} \Upsilon (\varPsi _{p_2 + 2} (\chi ^2_{p_2-2, \alpha }, \varDelta ))^2,\\ \text{ QB }(\hat{\varvec{\beta }}_{S})= & {} \Upsilon (p_2 - 2)^2 (\text{ E }(\chi _{p_2+2}^{-2}(\varDelta )))^2,\\ \text{ QB }(\hat{\varvec{\beta }}_{S+})= & {} \Upsilon \left( (p_2 - 2)\text{ E }(\chi _{p_2+2}^{-2}(\varDelta ))-\chi ^2_{p_2-2, \alpha }, \varDelta )\right. \\- & {} \left. (p_2-2)\text{ E }(\chi _{p_2+2}^{-2}(\varDelta ) I(\chi _{p_2+2}^{-2}(\varDelta )<p_2-2)) - \text{ AB }(\hat{\varvec{\beta }}_{P})\right) . \end{aligned}$$
Second, we consider the asymptotic risk (AR) of the estimators. To derive expressions for the AR of the estimators, we define a quadratic loss function
$$\begin{aligned} {\mathcal {L}}(\hat{\varvec{\beta }}^*; \varvec{W}) = n\left( (\hat{\varvec{\beta }}^* - \varvec{\beta })\right) ^{\mathsf {T}} \varvec{W}\left( \hat{\varvec{\beta }}^* - \varvec{\beta }\right) , \end{aligned}$$
where\(\varvec{W}\) is a non-negative definite matrix. One of the choices of this matrix is the identity matrix which will be used in the simulation study. Other choices of this matrix are also available, for example, \(\varvec{W}=\varvec{I}^{-1}\)or a general\(\varvec{W}\)which gives a loss function that weights each\(\varvec{\beta }\)differently. The mean squared error (MSE) matrix for any estimator\(\hat{\varvec{\beta }}^*\)under the quadratic loss function is
$$\begin{aligned} \text {MSE}(\hat{\varvec{\beta }}^*) = \text{ E }\left\{ \lim _{n\rightarrow \infty } \big (\sqrt{n}(\hat{\varvec{\beta }}^* - \varvec{\beta })\big )\big (\sqrt{n}(\hat{\varvec{\beta }}^* - \varvec{\beta })\big )^{\mathsf {T}}\right\} = \varvec{\int } \varvec{z} \varvec{z}^{\mathsf {T}} d{\tilde{F}}_z(\varvec{z}). \end{aligned}$$
(14)
Then, the AR is defined as
$$\begin{aligned} \text{ AR }(\hat{\varvec{\beta }}^*; \varvec{W})= \varvec{\int } \varvec{z}^{\mathsf {T}} \varvec{W} \varvec{z} d{\tilde{F}}_z(\varvec{z})= \text{ tr } \big (\varvec{W}~\text {MSE}(\hat{\varvec{\beta }}\big )\big ). \end{aligned}$$
(15)

Theorem 5.4

Under local alternatives\(H_{(n)}\)in (13) and the regularity conditions,
$$\begin{aligned} \text{ AR }(\hat{\varvec{\beta }}_F; \varvec{W})= & {} \text{ trace } (\varvec{W} \varvec{I}^{-1}), \\ \text{ AR }(\hat{\varvec{\beta }}_R; \varvec{W})= & {} \text{ AR }(\hat{\varvec{\beta }}_F; \varvec{W})-\left( \text{ trace } (\varvec{W} \varvec{J}^{*}) - \varvec{\zeta }^{\mathsf {T}} \varvec{W}\varvec{\zeta }\right) ,\\ \text{ AR }(\hat{\varvec{\beta }}_{P}; \varvec{W})= & {} \text{ AR }(\hat{\varvec{\beta }}_F; \varvec{W}) - \varPsi _{p_2 + 2}(\chi ^2_{p_2, \alpha }, \varDelta ) \text{ trace }(\varvec{W} \varvec{J}^{*})\\&+ \left( 2\varPsi _{p_2+2} \left( \chi ^2_{p_2, \alpha }, \varDelta \right) - \varPsi _{p_2 + 4} \left( \chi ^2_{p_2, \alpha }, \varDelta \right) \right) \text{ trace } \left( \varvec{\delta }^{\top } \varvec{W} \varvec{\delta }\right) , \\ \text{ AR }\left( \hat{\varvec{\beta }}_{S}; \varvec{W}\right)= & {} \text{ AR }\left( \hat{\varvec{\beta }_F}; \varvec{W}\right) + \left( (p_2-2)^2 \text{ E } \left( \varPi _1^{-2}\right) -2 (p_2-2) \text{ E } \left( \varPi _1^{-1}\right) \right) \text{ trace } \left( \varvec{W} \varvec{J}^{*}\right) \\&+ \left( (p_2-2)^2 \text{ E } \left( \varPi _2^{-2}\right) + 2 (p_2-2) \text{ E } \left( \varPi _1^{-1}\right) \right. \\&\quad \left. - 2 (p_2-2) \text{ E } \left( \varPi _2^{-1}\right) \right) \text{ trace } \left( \varvec{\delta }^{\top } \varvec{W} \varvec{\delta }\right) ,\\ \text{ AR }\left( \hat{\varvec{\beta }}_{S+}; \varvec{W}\right)= & {} \text{ AR }\left( \hat{\varvec{\beta }}_{S}; \varvec{W}\right) - \text{ E } \left( (1 - (p_2-2) \varPi _1^{-1})^2 I \left( \varPi _1< (p_2-2)\right) \right) \text{ trace } \left( \varvec{W} \varvec{J}^{*}\right) \\&+ \left( 2 \varPsi _{(p_2+2)}((p_2-2), \varDelta ) - 2 (p_2-2) \text{ E } \left( \varPi _1^{-1} I \left( \varPi _1< (p_2-2)\right) \right) \right. \\&- \left. \text{ E } \left( \left( 1 - (p_2-2) \varPi _2^{-1}\right) ^2 I \left( \varPi _2 < (p_2-2)\right) \right) \right) \text{ trace } \left( \varvec{\delta }^{\top } \varvec{W} \varvec{\delta }\right) , \end{aligned}$$
where\(\varPi _1=\chi _{k_2+2}^2(\varDelta )\)and\(\varPi _2=\chi _{k_2+4}^2(\varDelta )\)are Chi-square random variables.

Proof

Similar proofs can be found in Thomson and Hossain (2018). \(\square\)

Remark 5.2

Under \(H_0: \varvec{A}\varvec{\beta }=\varvec{h}\), that is, when \(\varvec{\delta }=\varvec{0}\), \(\hat{\varvec{\beta }}_R\) is the best choice and it strongly dominates \(\hat{\varvec{\beta }}_F\). Note that \(\text{ trace } (\varvec{W} \varvec{J}^{*})>0\) as the eigenvalue of \(\varvec{W} \varvec{J}^{*}\) are all positive. Also, \(\varvec{\zeta }^{\mathsf {T}} \varvec{W}\varvec{\zeta }>0\), provided that \(\varvec{\delta }=\varvec{0}\). However, when \(\varvec{\delta }\) moves away from the \(\varvec{0}\), the AR of \(\hat{\varvec{\beta }}_R\) grows and becomes unbounded as \(\varvec{\zeta }^{\mathsf {T}} \varvec{W}\varvec{\zeta }\) increases, whereas the risk of \(\hat{\varvec{\beta }}_F\) remains bounded. This clearly indicates that the performance of \(\hat{\varvec{\beta }}_R\) depends on the validity of \(\varvec{A}\varvec{\beta }=\varvec{h}\). The AR of \(\hat{\varvec{\beta }}_P\) increases monotonically to a maximum, crossing the risk function of \(\hat{\varvec{\beta }}_R\), then monotonically decreases to the value of \(\text{ trace } \, (\varvec{W} \varvec{I}^{-1})\) as \(\varvec{\delta }\) gets further away from \(\varvec{0}\). It is difficult to endorse clear-cut recommendations in favour of \(\hat{\varvec{\beta }}_{P}\) over \(\hat{\varvec{\beta }}_F\). It can also be shown that \(\text{ AR }\left( \hat{\varvec{\beta }}_{S+}; \varvec{W}\right)\)\(\le\)\(\text{ AR }\left( \hat{\varvec{\beta }}_{S}; \varvec{W}\right)\)\(\le\)\(\text{ AR }\left( \hat{\varvec{\beta }}_{F}; \varvec{W}\right)\). Hence, SE dominates \(\hat{\varvec{\beta }}_F\). The risk of \(\hat{\varvec{\beta }}_{S+}\) is asymptotically superior to that of \(\hat{\varvec{\beta }}_{S}\) in the entire parameter space induced by \(\varDelta\). For details of similar comparisons, see Hossain et al. (2015).

6 Simulation

A Monte Carlo simulation study was conducted to evaluate the performance of the proposed estimators with respect to the unrestricted marginal maximum likelihood estimator (UMML) using relative mean square error (RMSE). In the simulations, we used the multilevel ordinal regression model given in (2) for the probit link and logit link. For these simulation studies, we define a restricted model and a simulation model as follows:
  • Restricted Model: We considered restricted model \(H_0: \varvec{A} \varvec{\beta } = \varvec{h}\) and the estimators based on an \(\varvec{A}\) with \(H_0: \varvec{A} \varvec{\beta } = \varvec{0},\) where \(\varvec{A}=\left[ \varvec{I}_{p_2}, \varvec{0}_{p_2\times (p - p_{2})}\right]\), \(\varvec{h}=\varvec{0}_{p_2\times 1},\) where \(\varvec{0}_{a\times b}\) is an \(a\times b\) matrix of zeros and \(\varvec{\beta }_R=(\varvec{\beta }_{1R}^{\mathsf {T}}, \varvec{\beta }_{2R}^{\mathsf {T}})^{\mathsf {T}}\). The dimension of \(\varvec{\beta }_{1R}\) and \(\varvec{\beta }_{2R}\) are \(p_1 \times 1\) and \(p_2 \times 1\), respectively, such that \(p = p_1 + p_2\). We assume the restriction \(\varvec{\beta }_{2R}=\varvec{0}\), where \(\varvec{0}\) is a \(p_2\times 1\) vector of zeros (we considered \(p_2=3, 6\) and 12).

  • Simulation Model: We consider the simulation model when \(\varDelta = ||\varvec{\beta }-\varvec{\beta }_R||^2\), where \(\varvec{\beta } = (\varvec{\beta }_1^{\mathsf {T}}, \varvec{\beta }_2^{\mathsf {T}})^{\mathsf {T}}\), \(||\cdot ||\) is the Euclidean norm. Under local alternative (13), \(\varDelta\) is the difference between the restricted model and the simulation model and they are identical models if \(\varDelta =0\).

In the simulation study, \(\varvec{\beta }_1=\varvec{\beta }_{1R}\) and \(\varvec{\beta }_2=(\sqrt{\varDelta }, \varvec{0}_{1 \times (p_{2} - 1)})^{\mathsf {T}}\). We investigate the behaviour of the proposed estimators for \(H_0: \varDelta =0\) and \(H_a: \varDelta =\varDelta\), where \(0\le \varDelta \le 2\).

We specify \(p_1 = 4\) for this simulation study, as the true coefficients are assumed to be \(\varvec{\beta _1} = (1.30,0.93,-1.40, 1.26)^{\mathsf {T}}\) with \(\varvec{\mu } = 0.40\) as the global intercept term. The weight matrix \(\varvec{W}\) in the quadratic loss function from Sect.  5 was set to \(\varvec{I}_{p \times p}\). We consider the two simulation procedures for the probit and logit links of model (2).

Each of the p fixed effect covariates were generated from separate \(n_i\)-multivariate normal distributions with mean vector \(\varvec{10}\) and covariance matrices \(\sigma _x^2\varvec{\rho }_x\), where \(\sigma _x^2 = 0.38\), and \(\varvec{\rho }_x\) is in the form of a exchangeable correlation matrix with \(\rho = 0.6\). The error term \(\varvec{\varepsilon }\) was generated with mean vectors \(\varvec{0}\), and covariance matrices \(\varvec{\varLambda } = \sigma _{\varepsilon }^2\varvec{\rho }_{\varepsilon }\), where \(\sigma _{\varepsilon }^2= 2.4\), and \(\varvec{\rho }_{\varepsilon }\) is the correlation structure, with parameter \(\rho _{\varepsilon } = 0.6\). The random effects were generated from a bivariate normal distribution (as there was two random effects) with means equal to 0, variances equal to 0.65 and covariances set to 0. The responses were generated using different \(\varDelta\) values, where \(\varDelta = (0.0, 0.1, 0.2, 0.4, 0.7, 0.9, 1.2, 1.5, 2.0)\).

Specifically, we generated 1000 data sets each consisting of \(N = 100\), 150, and 200 subjects (level-1) and observations per subject vary from \(n_i = 2\) to \(n_i=4\) (level-2). The number of observations per subject was drawn from uniform distribution ensuring minimum 2 observations and maximum 4 observations. Each subject also had two random effect covariates, one to allow for different baseline response (intercept) and another to allow for differing response profiles (slope). At each visit, a four-category ordinal response was generated which took values 1, 2, 3 and 4 with probabilities 0.2, 0.3, 0.3, and 0.2, respectively. For the purpose of better visualization, we summarize the results in the following subsections based on the tables and figures. Some of the tables and figures are provided in the Electronic Supplementary Material.

6.1 Quadratic Bias of UMML, RMML, PT, SE and PSE when \(\varDelta \ge 0\)

The quadratic bias (QB) is a scalar quantity and it can be compared across the estimators for probit and logit models when the observation varies per subject from \(n_i=2\) to 4. Since \(\varDelta\) is common in the QB expressions in Theorem 3 for each estimator except the UMML, we compare them by looking at the QB for different values of \(\varDelta\). The plots of QBs of the UMML, RMML, PT, SE and PSE with \(\alpha =0.05\) are provided in Figs. 1, 2 for different values of the non-centrality parameter \(\varDelta\). As expected, the QB of the UMML is 0 (but due to sampling variability, its simulated QB is not exactly zero) for all values of \(\varDelta ,\) and the QB of RE increases without bound for both models as \(\varDelta\) increases, and tends to \(\infty\) as \(\varDelta \rightarrow \infty\). Theoretically, the QB of the PT is a function of the level of significance \(\alpha ,\) but we only report results for \(\alpha =0.05\). As shown in the graphs in Figs. 1, 2, the QB of the PT starts off with low QB and then increases and crosses the QB of the RE, then moves toward the QB of the UE as \(\varDelta\) increases. The SE and PSE are biased but are bounded in \(\varDelta\) and the PSE has lower or equal QB than the SE for both models. As \(\varDelta\) increases, more sampling fluctuations occur in the QB of all estimators for \(N=100\) as compared to the QB for \(N=200\). We did not report the results of QB in the figures for logit model when \(\varDelta \ge 0,\) although we found that QB of any estimator for probit model is lower than the QB of any estimator for logit model.
Fig. 1

Simulated quadratic biases of the UMML, RMML, PT, SE, and PSE for probit model when \(\varDelta \ge 0\). Here \(p_1=4\), \(p_2=3, 6, 12\), \(N=100\), 200, and \(n_i=2~\text {to}~4\)

Fig. 2

Simulated quadratic biases of the UMML, RMML, PT, SE, and PSE for logit model when \(\varDelta \ge 0\). Here \(p_1=4\), \(p_2=3, 6, 12\), \(N=100\), 200 and \(n_i=2~\text {to}~4\)

6.2 Risk analysis when \(\varDelta = 0\)

Let \(\text {MSE}_U\) denote the mean squared error of the UMML estimator \(\hat{\varvec{\beta }}\), and let \(\text {MSE}_M\), for \(M = 1, 2, 3, 4\) denote the MSE of the RMML, PT, SE and PSE, respectively. The relative MSE of the Mth estimator with respect to UMML in the model is defined by
$$\begin{aligned} R_M= \frac{\text{ MSE }_U(\hat{\varvec{\beta }})}{\text{ MSE }_M(\hat{\varvec{\beta }}_M)}. \end{aligned}$$
(16)
Observe that \(R > 1\) indicates the estimator RMML, PT, SE or PSE is better than then the UMML or has lower risk than the UMML.
We provided the simulated relative mean squared error (RMSE) for the probit and logit models in Tables 1 and 2 for \(n_i=3\) and \(n_i\) varies from 2 to 4 per subject, respectively. These tables show that the proposed estimators improve, in terms of the relative MSE on the UMML. The improvement is dramatic (more than 200%) for the number of covariates 7, 10, 16 in the model. For \(\varDelta = 0\), the RE performs the best because of its unbiasedness. For smaller numbers of inactive covariates, the PT is superior to the PSE, and for larger numbers of inactive covariates (\(p_2 = 12\)) the opposite is true subject to the random fluctuation. If the number of subjects N increases, the RMSE of the proposed estimator decreases. The SE and PSE are outperformed by the UMML estimator in terms of risk, but the PSE performs better than the SE. The performance of the estimators for the logit model is better than the performance of estimators for the probit model, although the biases of the estimators for the logit model were higher.
Table 1

Relative MSEs of the RMML, PT, SE, and PSE on the UMML when the restricted parameter space is correct (i.e. \(\varDelta =0\)), \(p_1=4\), and \(n_i=3\)

Estimators

Sample size \(N=100\)

Sample size \(N=200\)

\(p_2=3\)

\(p_2=6\)

\(p_2=12\)

\(p_2=3\)

\(p_2=6\)

\(p_2=12\)

 

Probit model

RMML

1.64

2.24

4.17

1.43

1.82

3.27

PT

1.48

1.92

1.35

1.29

1.61

2.78

SE

1.14

1.55

2.63

1.11

1.47

2.50

PSE

1.21

1.62

2.76

1.15

1.56

2.74

 

Logit model

RMML

1.28

1.54

2.37

1.23

1.44

1.89

PT

1.24

1.49

2.07

1.16

1.38

1.79

SE

1.08

1.22

1.89

1.05

1.26

1.62

PSE

1.12

1.29

2.03

1.07

1.33

1.74

Table 2

Relative MSEs of the RMML, PT, SE, and PSE on the UMML when the restricted parameter space is correct (i.e. \(\varDelta =0\)), \(p_1=4\) and \(n_i\) varies from 2 to 4

Estimators

Sample size \(N=100\)

Sample size \(N=200\)

\(p_2=3\)

\(p_2=6\)

\(p_2=12\)

\(p_2=3\)

\(p_2=6\)

\(p_2=12\)

 

Probit model

RMML

1.58

2.15

3.13

1.43

1.77

2.14

PT

1.30

1.73

2,40

1.34

1.64

2.00

SE

1.15

1.59

2.32

1.13

1.21

1.78

PSE

1.16

1.64

2.42

1.15

1.56

1.94

 

Logit model

RMML

1.27

2.06

2.84

1.24

1.38

2.42

PT

1.20

1.52

1.96

1.17

1.37

2.19

SE

1.08

1.43

2.09

1.07

1.18

1.94

PSE

1.10

1.49

2.15

1.09

1.27

2.11

6.3 Risk analysis when \(\varDelta \ge 0\)

The purpose of this section is to evaluate the proposed estimators for \(\varDelta >0\), that is, when \(H_0: \varvec{\beta }_2 \ne \varvec{0}\) when the observations per subject vary from 2 to 4. For \(0 \le \varDelta < 0.4\) and \(N = 100\), Figs. 3, 4 shows that the RMML outperforms the other estimators; however, as the sample size increases to \(N = 200\), the RE outperforms the other estimators up until \(\varDelta < 0.9\) for probit and logit models. As \(\varDelta\) increases, the relative MSE of RMML becomes closer to zero, which is an undesirable property as we are looking for larger values of RMSE. This implies that severe departures from the restrictions imposed on the model lead to inefficient estimates of the fixed effects for the restricted model. However, this is not the case for the PT, SE, and PSE. The relative MSE for the PT, SE, and PSE stays equal to or above one throughout \(0 \le \varDelta \le 2\), and approaches the relative MSE of the UMML as \(\varDelta\) increases except some sampling fluctuations. This is in agreement with the asymptotic results in Sect. 5. Because of similar results, we did not report the relative MSE of the estimators for \(n_i=3\) per subject and \(\varDelta >0\). These results are available upon request.
Fig. 3

Simulated relative MSE of RMML, PT, SE, and PSE for probit model when \(\varDelta \ge 0\). Here \(p_1=4\), \(p_2=3, 6, 12\), \(N=100\), 200, and \(n_i\) varies from 2 to 4

Fig. 4

Simulated relative MSE of RMML, PT, SE, and PSE for logit model when \(\varDelta \ge 0\). Here \(p_1=4\), \(p_2=3, 6, 12\), \(N=100\), 200, and \(n_i\) varies from 2 to 4

7 Real data applications

In this section, we present an application of multilevel models for ordinal responses to longitudinal data. We applied the proposed shrinkage and pretest methods to the longitudinal knee-based data from Osteoarthritis Initiative (OAI) database. The data were obtained from the OAI, an online and publicly available database (http://www.oai.ecsf.edu/). Specifically, we used data sets from Enrollees version 25, and AllClinical datasets versions 0.2.3, 3.2.1, 5.2.1, 8.2.1, and 10.2.2. The OAI is a cohort study of 4796 subjects who were between 45 and 79 years of age at enrollment who, either have symptomatic knee osteoarthritis (OA), do not have the condition (Control), or are at risk for developing the condition. This study was conducted between February 2004 and September 2006, with annual follow-up done until 2013 to 2015 depending on the date of enrollment. Subjects were followed at 5 time points 0, 24, 36, 72, and 96 months after enrollment. The purpose of the OAI study was to improve public health. We included OAI participants with established symptomatic radiographic knee OA at baseline (\(N = 1668\)), defined as both knee symptoms such as pain, aching or stiffness in and around the knee on most days of the month for at least 1 month in the previous year, and excluded the control cohort from our analysis. Some missing observations in the responses and covariates were ignored in our analysis.

The response was categorized by the self-reported Western Ontario McMaster Osteoarthritis Index (WOMAC, 5-point Likert scale; Bellamy et al. 1988) for addressing the severity with both knees in the quality of life associated with OA symptoms. The WOMAC is a disease-specific instrument for measuring the level of pain, joint stiffness and functional ability and was applied for the evaluation of knee OA (Bellamy et al. 1988). The WOMAC score is calculated based on 5 items that measure pain, 2 items that measure stiffness and 17 items that measure physical function. For severity of difficulty in both knees, we classified WOMAC score into three categories (none, mild, and moderate to extreme) on the basis of a five-point scale, with higher scores indicating the severity of difficulties. Covariates included race [\(\beta _1\); white vs non-white], age \((\beta _2)\), body mass index [BMI\((\beta _3)\): overweight (\(25-30\)) vs. healthy weight (\(<25\))], [BMI\((\beta _4)\): obese (\(>30\)) vs. healthy weight (\(<25\))], depression measured [CESD (\(\beta _5\)): yes vs no], total number of prescription medications [NMED(\(\beta _6)\)], time point for each subject [time (\(\beta _7\))], progression of kee pain/OA status [cohort(\(\beta _8\)): progression or not], sex [\(\beta _{9}\): male vs female], self-reported diabetes [DIAB(\(\beta _{10}\)): yes vs no], Charlson Comorbidity Index [CCI score (\(\beta _{11})\): \(\ge 1\) vs \(<1\)], and education [\(\beta _{12}\): college graduate vs high school and less], education [\(\beta _{13}\): university graduate vs high school and less]. The time points that were used were baseline, 24, 36, 72, and 96 months follow-up times. The main objective of our study is to find the association of knee OA severity on the above-mentioned covariates and demographic variables, and see if our proposed estimators improve the unrestricted marginal maximum likelihood estimator.

A sequence of models for longitudinal ordinal response were fitted using the covatiates where the effect of covariate CESD was considered as random and the time variable was forcibly included in the model. The AIC and BIC criteria are typically used to compare a range of competing models and a model was selected using the lowest AIC or BIC value which gave the optimal set of covariates with nonzero coefficients (see Hox et al. 2018, chap. 3). It shows that the model with covariates race, age, BMI, CESD, NMED, time, cohort, and sex has the lowest AIC value to explain the risk factors for the knee OA severity. Recall from the simulation study that we obtain the inactive set of parameters \(H_0: \varvec{\beta }_2 = \varvec{0}\), where \(\varvec{\beta }_2 = (\beta _{10}, \beta _{11}, \beta _{12}, \beta _{13})^{\top }\) is a \(p_2 \times 1\) vector, and \(p_2=4\).
Table 3

Estimates (first row) and standard errors (second row) for \(\text{ race }\)\((\beta _1)\), \(\text{ age }\)\((\beta _2)\), overweight vs. healthy weight \((\beta _3)\), obese vs. healthy weight \((\beta _4)\), obese vs. healthy weight \((\beta _4)\), CESD \((\beta _5)\), NMED \((\beta _6)\), Time \((\beta _7)\), cohort \(\beta _8\), and sex \((\beta _{9})\)

Estimators

\(\beta _1\)

\(\beta _2\)

\(\beta _3\)

\(\beta _4\)

\(\beta _5\)

\(\beta _6\)

\(\beta _7\)

\(\beta _8\)

\(\beta _9\)

RMSE

UMML

− 0.534

− 0.017

0.109

0.262

0.419

0.055

− 0.001

1.383

− 0.219

1.00

 

(0.175)

(0.008)

(0.139)

(0.141)

(0.185)

(0.026)

(0.001)

(0.143)

(0.137)

 

RMML

− 0.571

− 0.016

0.117

0.278

0.439

0.060

− 0.001

1.365

− 0.229

1.78

 

(0.165)

(0.008)

(0.133)

(0.136)

(0.189)

(0.022)

(0.001)

(0.140)

(0.138)

 

PT

− 0.549

− 0.017

0.109

0.262

0.418

0.055

− 0.001

1.383

− 0.219

1.15

 

(0.138)

(0.006)

(0.121)

(0.129)

(0.168)

(0.021)

(0.001)

(0.125)

(0.127)

 

SE

− 0.549

− 0.017

0.109

0.262

0.419

0.055

− 0.001

1.383

− 0.220

1.05

 

(0.174)

(0.008)

(0.138)

(0.137)

(0.183)

(0.025)

(0.001)

(0.143)

(0.136)

 

PSE

− 0.549

− 0.017

0.109

0.262

0.419

0.055

− 0.001

1.383

− 0.220

1.05

 

(0.174)

(0.008)

(0.138)

(0.137)

(0.183)

(0.025)

(0.001)

(0.143)

(0.136)

 

To estimate the standard errors of the estimator, we need to apply a bootstrap technique as we have only one data set. A bootstrap sampling scheme (Wu and Chiang 2000) was then carried out to calculate the estimates, standard errors, and RMSEs of the proposed estimates. We generated bootstrap samples of size \(N=300\) subjects with replacement from the original data set that contains 1668 subjects, and let \(\{\varvec{Y}_i^*, \varvec{X}_i^*, \varvec{Z}_i^*;~~ 1 \le i \le 300, 1 \le j \le 5\}\) be the longitudinal bootstrap samples. Some subjects and their entire data values in the original sample may appear several times in the new sample. We then refit the ordinal regression model using this data based on the same method that was applied in Sect. 3 to obtain bootstrap estimates. The resampling procedure was repeated 1000 times. The point estimates, standard errors, and RMSEs of significant coefficients are reported in the last column of Table 3. The RMSEs of RMML, PT, SE, and PSE with respect to UMML are 1.78, 1.15, 1.05 and 1.05, respectively. The results are consistent with the simulation study and theoretical findings and provide general recommendations about the application of the proposed shrinkage and pretest estimation methods.

8 Conclusion

We have applied the marginal maximum likelihood method to estimate the regression parameters of multilevel models for ordinal longitudinal data and named this estimate as UMML. We also estimate the parameters when some of the parameters are under linear restriction and named this estimate as RMML. The pretest and shrinkage estimators are constructed based on the UMML and RMML. We have presented a closed form of the bias and risk expressions, and used a Monte Carlo simulation study to explore the bias and risk properties of the estimators. Our simulation studies show good performance of the pretest and shrinkage methods under the different number of covariate settings. It is found that the RMML estimators offer numerically superior performance compared to the UMML, pretest, and shrinkage estimators at and near the restriction \(\varvec{A} \varvec{\beta } = \varvec{h}\). However, the superiority diminishes as we move away from this restriction. The risk of the pretest estimator is lower than the UE (or higher relative MSE with respect to the UE) at and near the restriction. As \(\varDelta\) increases, the relative MSE of PT, SE, and PSE converges to one; but near the restriction, the PSE performs better than the SE (although the difference between the two is not easy to spot due to the scale of the plot).

We applied the proposed estimation method to the Osteoarthritis Initiative database. To compare the pretest and shrinkage estimators with respect to UMML, we calculated the RMSE based on the bootstrap resampling method as we cannot calculate RMSE based on one data set. It shows that pretest and shrinkage estimators perform better than the UMML and the RMML performs the best because of its unbiasedness.

This paper has attempted to present the multilevel ordinal regression model for longitudinal data. Certainly, the use of ordinal models is not as popular as the use of normal and binary regression models, despite the fact that ordinal longitudinal outcomes are often obtained. The tools are available in terms of methods and software, so hopefully this situation will change as researchers become more familiar with applications of the multilevel ordinal regression model.

Notes

Acknowledgements

The OAI is a public–private partnership comprising five contracts (N01-AR-2-2258; N01-AR-2-2259; N01-AR-2-2260; N01-AR-2-2261; N01-AR-2-2262) funded by the National Institutes of Health, a branch of the Department of Health and Human Services, and conducted by the OAI Study Investigators. Private funding partners include Merck Research Laboratories; Novartis Pharmaceuticals Corporation; GlaxoSmithKline; and Pfizer, Inc. Private sector funding for the OAI is managed by the Foundation for the National Institutes of Health. This manuscript was prepared using an OAI public use data set and does not necessarily reflect the opinions or views of the OAI investigators, the NIH, or the private funding partners. We express our sincere thanks to the editor, associate editor, and the referees for their constructive and valuable suggestions, which led to an improvement of our original version of the manuscript. Shakhawat Hossain and Saumen Mandal were supported by Discovery Grants from the Natural Sciences and the Engineering Research Council of Canada.

Supplementary material

42081_2019_35_MOESM1_ESM.pdf (127 kb)
Supplementary material 1 (PDF 127 kb)

References

  1. Agresti, A. (2010). Analysis of Ordinal Categorical Data. Hoboken, New Jersey: Wiley.CrossRefGoogle Scholar
  2. Ahmed, S. E., Doksum, K., Hossain, S., & You, J. (2007). Shrinkage, pretest and LASSO estimators in partially linear models. Australian and New Zealand Journal of Statistics, 49(4), 461–471.CrossRefGoogle Scholar
  3. Albert, P. S., Hunsberger, S. A., & Biro, F. M. (1997). Modeling repeated measures with monotonic ordinal responses and misclassification, with applications to studying maturation. Journal of the American Statistical Association, 92, 1304–1211.CrossRefGoogle Scholar
  4. Bellamy, N., Buchanan, W. W., Goldsmith, C. H., Campbell, J., & Stitt, L. (1988). Validation study of womac: A health status instrument for measuring clinically-important patient-relevant outcomes following total hip or knee arthroplasty in osteoarthritis. Journal of Orthopedics and Rheumatology, 1, 95–108.Google Scholar
  5. Goldstein, H. (2010). Multilevel Statistical Models. Chichester: Wiley.CrossRefGoogle Scholar
  6. Hedeker, D., & Gibbons, R. (1994). A random-effects ordinal regression model for multilevel analysis. Biometrics, 50(4), 933–944.CrossRefGoogle Scholar
  7. Hedeker, D., & Gibbons, R. D. (2006). Longitudinal data analysis. New York: Wiley.zbMATHGoogle Scholar
  8. Hossain, S., Ahmed, S. E., & Doksum, K. A. (2015). Shrinkage, pretest, and penalty estimators in generalized linear models. Statistical Methodology, 24, 52–68.MathSciNetCrossRefGoogle Scholar
  9. Hossain, S., Ahmed, S. E., Yi, Y., & Chen, B. (2016). Shrinkage and pretest estimators for longitudinal data analysis under partially linear models. Jounal of Nonparametric Statistics, 28(3), 531–549.MathSciNetCrossRefGoogle Scholar
  10. Hox, J., Moerbeek, M., & van de Schoot, R. (2018). Multilevel Analysis: Techniques and Applications, Third Edition (Quantitative Methodology Series). New York: Routledge.Google Scholar
  11. Hox, J. J., Moerbeek, M., & Schoot, R. (2017). Multilevel analysis: Techniques and applications. New York: Routledge.CrossRefGoogle Scholar
  12. Lee, K., & Daniels, M. J. (2008). Marginalized models for longitudinal ordinal data with application to quality of life studies. Statistics in Medicine, 27(21), 4359–4380.MathSciNetCrossRefGoogle Scholar
  13. Lee, Y., & Nelder, J. A. (2004). Conditional and marginal models: Another view. Statistical Science, 19(2), 219–238.MathSciNetCrossRefGoogle Scholar
  14. Lian, H. (2012). Shrinkage estimation for identification of linear components in additive models. Statistics and Probability Letters, 82, 225–231.MathSciNetCrossRefGoogle Scholar
  15. Liu, Q., & Pierce, D. A. (1994). A note on gauss-hermite quadrature. Biometrika, 81(3), 624–629.MathSciNetzbMATHGoogle Scholar
  16. Magnus, J. R. (1988). Linear Structures. Oxford, London: Charles Griffin.zbMATHGoogle Scholar
  17. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and data analysis methods. Thousand Oaks: Sage Publications Ltd.Google Scholar
  18. Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. London: Chapman and Hall/CRC.CrossRefGoogle Scholar
  19. Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. London: SAGE Publications Ltd.zbMATHGoogle Scholar
  20. Stroud, A. H., & Sechrest, D. (1966). Gaussian quadrature formulas. Upper Saddle River: Prentice-Hall.Google Scholar
  21. Thomson, T., & Hossain, S. (2018). Efficient shrinkage for generalized linear mixed models under linear restrictions. Sankhya A: The Indian Journal of Statistics, 80, 1–26.MathSciNetCrossRefGoogle Scholar
  22. Thomson, T., Hossain, S., & Ghahramani, M. (2016). Efficient estimation for time series following generalized linear models. Australian & New Zealand Journal of Statistics, 58, 493–513.MathSciNetCrossRefGoogle Scholar
  23. van der Vaart, A. W. (1998). Asymptotic Statistics. New York: Cambridge University Press.CrossRefGoogle Scholar
  24. Wu, C. O., & Chiang, C. T. (2000). Kernel smoothing on varying coefficient models with longitudinal dependent variable. Statistica Sinica, 10, 433–456.MathSciNetzbMATHGoogle Scholar
  25. Zeng, T., & Hill, R. C. (2016). Shrinkage estimation in the random parameters logit model. Open Journal of Statistics, 6, 667–674.CrossRefGoogle Scholar

Copyright information

© Japanese Federation of Statistical Science Associations 2019

Authors and Affiliations

  1. 1.Department of Mathematics and StatisticsUniversity of WinnipegWinnipegCanada
  2. 2.Department of StatisticsUniversity of ManitobaWinnipegCanada

Personalised recommendations