Single-Index Mixed-Effects Model for Asymmetric Bivariate Clustered Data

Zhao, Weihua; Bandyopadhyay, Dipankar; Lian, Heng

doi:10.1007/s41096-024-00181-0

Single-Index Mixed-Effects Model for Asymmetric Bivariate Clustered Data

Invited Article
Open access
Published: 16 March 2024

Volume 25, pages 17–45, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of the Indian Society for Probability and Statistics Aims and scope Submit manuscript

Single-Index Mixed-Effects Model for Asymmetric Bivariate Clustered Data

Download PDF

326 Accesses
1 Citation
Explore all metrics

A Correction to this article was published on 15 May 2024

This article has been updated

Abstract

Studies/trials assessing status and progression of periodontal disease (PD) usually focus on quantifying the relationship between the clustered (tooth within subjects) bivariate endpoints, such as probed pocket depth (PPD), and clinical attachment level (CAL) with the covariates. Although assumptions of multivariate normality can be invoked for the random terms (random effects and errors) under a linear mixed model (LMM) framework, violations of those assumptions may lead to imprecise inference. Furthermore, the response-covariate relationship may not be linear, as assumed under a LMM fit, and the regression estimates obtained therein do not provide an overall summary of the risk of PD, as obtained from the covariates. Motivated by a PD study on Gullah-speaking African-American Type-2 diabetics, we cast the asymmetric clustered bivariate (PPD and CAL) responses into a non-linear mixed model framework, where both random terms follow the multivariate asymmetric Laplace distribution (ALD). In order to provide a one-number risk summary, the possible non-linearity in the relationship is modeled via a single-index model, powered by polynomial spline approximations for index functions, and the normal mixture expression for ALD. To proceed with a maximum-likelihood inferential setup, we devise an elegant EM-type algorithm. Moreover, the large sample theoretical properties are established under some mild conditions. Simulation studies using synthetic data generated under a variety of scenarios were used to study the finite-sample properties of our estimators, and demonstrate that our proposed model and estimation algorithm can efficiently handle asymmetric, heavy-tailed data, with outliers. Finally, we illustrate our proposed methodology via application to the motivating PD study.

A Shared Spatial Model for Multivariate Extreme-Valued Binary Data with Non-Random Missingness

Article 16 July 2019

Risk factors of chronic periodontitis on healing response: a multilevel modelling analysis

Article Open access 15 September 2017

Multiple imputation methods for missing multilevel ordinal outcomes

Article Open access 09 May 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Epidemiological studies in a clustered, or longitudinal data setting often generate multivariate (repeated) outcomes that are analyzed under the ubiquitous multivariate normal (MVN) assumptions of the random terms (random effects, and within-subject random errors) via standard software, such as SAS, or R. However, violations of those assumptions can lead to imprecise parameter estimates (Bandyopadhyay et al. 2010). These non-Gaussian features are usually manifested through skewness of the response vector, and/or thick-tails. Although achieving close-to-normality via suitable data transformations of the responses (such as log, or Box-Cox) for standard linear mixed model (LMM) analysis are possible, they maybe avoided due to their non-universality, and difficulty in covariate interpretation on the original scale (Jara et al. 2008). To address this, various flexible (parametric) alternatives to the MVN density exists, such as the multivariate skew-normal density (Azzalini and Capitanio 1999; Gupta et al. 2004; Azzalini 2010), the heavy-tailed multivariate skew t-density (Azzalini and Capitanio 2003), and others, that can accommodate departures from normality without resorting to ad-hoc data transformations.

In practice, this setup can be further complicated in presence of multiple outcomes recorded at each cluster units/components. The motivating data example in this paper comes from a clinical study of periodontal disease (PD) conducted on Gullah-speaking African-American Type-2 diabetics (henceforth, GAAD). Here, the multiple outcomes of interest are the tooth-level (mean) probed pocket depth (PPD) and clinical attachment level (CAL), which are recorded (in mm, via a periodontal probe) simultaneously for each tooth nested/clustered within a subject. While PPD quantifies the current PD status, CAL measures the (past) disease history and progression (Page and Eke 2007). An oral clinician may be interested in studying the joint evolution of these outcomes over some features of covariates, and the complexity is induced from two different sources of correlation—(a) Between repeated observations of any given outcome (PPD, or CAL) measured at a cluster unit (tooth), and (b) Between multiple outcomes (PPD and CAL) measured at the same tooth. The existing literature (both classical and Bayesian) in this context of multiple repeated outcomes modeling is also very rich (Luo and Wang 2014; Verbeke et al. 2014; Lin and Wang 2013; Michaelis et al. 2018; Bandyopadhyay et al. 2010). However, a vast majority of these models are developed under the restrictive assumption of linearity of the covariate effects over the multivariate responses.

To motivate further, consider Fig. 1, which presents plots of the empirical Bayes’ estimates of random effects (panels a and b), corresponding Q-Q plots (panels c and d), and observed versus estimated (non-linear) curve (panels e and f), obtained from fitting a LMM separately to the PPD and CAL responses in the GAAD data, using the lme function in R. The plots clearly reveal evidence of asymmetryq (departures from the Gaussian assumptions), which cannot be explained by a standard LMM fit. In addition, the predictor space restricted to be linear combinations of covariates may not provide an elegant picture of their cross-sectional association with the (bivariate) response. Formulating an index for PD (that handles possible non-linearity, confounding, and interaction effects between the PD outcomes and the covariates) via a single-index model, or SIM (Hardle et al. 1993) can be a clinically elegant alternative. SIMs are a popular class of semiparametric regression models that relaxes the assumption of linearity, and bypass the ‘curse of dimensionality’ by reducing the multi-dimensional predictor space ${\textbf {X}}$ into an univariate (scalar) index $U = {\textbf {X}}^{T}\varvec{\beta }$. A link function g(.) now connects the covariate space to the response Y, offering a pragmatic compromise between a fully nonparametric (and often non-interpretable) multiple regression, and a restrictive (parametric) linear regression. Here, the magnitude of the index coefficient $\beta _j$ determine the relative importance of the j-th predictor on the index, and g(U) denotes the location of interest in the response curve at the index U. In biomedical research, the recent work by Wu and Tu (2016) develops an adiposity index via a (multivariate) SIM to efficiently predict multiple longitudinal outcomes (systolic and diastolic blood pressure) in children. However, their proposal considers the usual MVN assumptions for the random terms (errors and effects), and may not well accommodate heavy tailed and other non-Gaussian features. Furthermore, they did not provide rigorous theoretical justification.

Considering Wu and Tu (2016) as our starting point, we seek to develop an index that can efficiently predict the clustered bivariate (PPD and CAL) PD outcomes. Such a clinical index that links both outcomes is vastly absent in the oral health literature. Our bivariate single-index mixed (BV-SIM) model tackles non-Gaussian features in the responses via the multivariate asymmetric Laplace density (ALD; Kotz et al. 2001) assumptions in the random terms. The multivariate ALD can accommodate asymmetric, peaked, and heavy-tailed data using fewer number of parameters than the popular multivariate skew-t density (Gupta 2003). The multivariate symmetric Laplace density (Naik and Plungpongpun 2006), a special case of the ALD, has been applied in other fields, such as speech clustering, classification problems, and image/signal analysis. Under this framework, we consider a polynomial spline approximation to the nonparametric index function, and propose an efficient EM-type algorithm for estimation and inference. The spline approximation, and the mixture normal representation of the multivariate ALD presents a computationally efficient, and intuitively appealing estimation setup, quantifying correlations from both sources.

The rest of the paper is organized as follows. In Sect. 2, we propose the BV-SIM model under the assumptions of a multivariate asymmetric Laplace density. Using the polynomial splines approximation for the nonparametric (index) functions, we derive the maximum likelihood (ML) estimate, and establish the large sample properties of the proposed estimators in Sect. 3, with the detailed technical proofs relegated to the Appendix, where we use the projection method to prove the asymptotic normality of parametric part. In Sect. 4, we develop an efficient MLE procedure based on the EM-algorithm. Simulation studies comparing finite sample performance of our approach to other alternatives appear in Sect. 5, while Sect. 6 illustrates the method via application to the PD dataset. Finally, some concluding remarks are presented in Sect. 7.

2 Statistical Model

We begin with a sketch of the multivariate shifted Laplace density (Kotz et al. 2001), and then develop our SIM mixed effects framework for bivariate clustered data. The multivariate ALD has the density

$$\begin{aligned} p({\textbf {y}};\varvec{\Sigma },\varvec{\gamma })=\frac{2\exp \{{\textbf {y}}{}^{\textrm{T}}\varvec{\Sigma }^{-1}\varvec{\gamma }\}}{(2\pi )^{d/2}|\varvec{\Sigma }|^{1/2}}\times \left( \frac{{\textbf {y}}{}^{\textrm{T}}\varvec{\Sigma }^{-1}{\textbf {y}}}{2+\varvec{\gamma }{}^{\textrm{T}}\varvec{\Sigma }^{-1}\varvec{\gamma }}\right) ^{\nu /2}K_{\nu }(u), \end{aligned}$$

(2.1)

where $K_\nu$ is the modified Bessel function of the third kind with index $\nu$, $\nu =(2-d)/2$, $u=\sqrt{(2+\varvec{\gamma }{}^{\textrm{T}}\varvec{\Sigma }^{-1}\varvec{\gamma })({\textbf {y}}{}^{\textrm{T}}\varvec{\Sigma }^{-1}{\textbf {y}})}$, $\varvec{\gamma }\in \mathbb {R}^d$ is a skewness parameter and $\varvec{\Sigma }$ is a positive definite (p.d.) scatter matrix with dimension $d\times d$. We denote (2.1) as $\textrm{ALD}_d(\varvec{\Sigma },\varvec{\gamma })$. Note, the ALD forces each component density to be joined at the same origin. An extension, the multivariate shifted asymmetric Laplace distribution (SALD; Kotz et al. 2001), has the form

$$\begin{aligned} p({\textbf {y}};\varvec{ \mu }, \varvec{\Sigma },\varvec{\gamma })=\frac{2\exp \{({\textbf {y}}-\varvec{ \mu }){}^{\textrm{T}}\varvec{\Sigma }^{-1}\varvec{\gamma }\}}{(2\pi )^{d/2}|\varvec{\Sigma }|^{1/2}}\times \left( \frac{\delta ({\textbf {y}},\varvec{ \mu },\varvec{\Sigma })}{2+\varvec{\gamma }{}^{\textrm{T}}\varvec{\Sigma }^{-1}\varvec{\gamma }}\right) ^{\nu /2}K_{\nu }(u), \end{aligned}$$

(2.2)

where $u=\sqrt{(2+\varvec{\gamma }{}^{\textrm{T}}\varvec{\Sigma }^{-1}\varvec{\gamma })\delta ({\textbf {y}},\varvec{ \mu },\varvec{\Sigma })}$, $\delta ({\textbf {y}},\varvec{ \mu },\varvec{\Sigma })=({\textbf {y}}-\varvec{ \mu }){}^{\textrm{T}}\varvec{\Sigma }^{-1}({\textbf {y}}-\varvec{ \mu })$, and $\nu ,\varvec{\gamma },\varvec{\Sigma }$ are defined in (2.1). Here, we use the notation ${\textbf {Y}}\sim \textrm{SAL}_d(\varvec{ \mu },\varvec{\Sigma },\varvec{\gamma })$ to denote the random variable ${\textbf {y}}$ following a d-dimensional SALD. After some calculations, the mean and variance of SALD are given by

$$\begin{aligned}\textrm{E}({\textbf {Y}})=\varvec{ \mu }+\varvec{\gamma }\ \ \textrm{and} \ \ \textrm{Var}({\textbf {Y}})=\varvec{\Sigma }+\varvec{\gamma }\varvec{\gamma }{}^{\textrm{T}}.\end{aligned}$$

It is clear that the mean depends on the shifted location parameter $\varvec{ \mu }$ and skewness parameter $\varvec{\gamma }$, while its variance depends on scatter matrix $\varvec{\Sigma }$ and skewness parameter $\varvec{\gamma }$. Also, $\varvec{\Sigma }+\varvec{\gamma }\varvec{\gamma }{}^{\textrm{T}}$ must be p.d. if $\varvec{\Sigma }$ is p.d. The parameter $\varvec{\gamma }$ plays an important role in multivariate asymmetric data analysis, besides the location $\varvec{ \mu }$ and scatter matrix $\varvec{\Sigma }$. Note, the multivariate density in (2.2) reduces to (2.1) when $\varvec{ \mu }=\varvec{0}$, and it further reduces to the multivariate symmetry Laplace distribution (Eltoft et al. 2006) when $\varvec{\gamma }=\varvec{0}$. Moreover, (2.2) reduces to the univariate ALD when dimension $d=1$, $\gamma =(1-2\tau )/\tau (1-\tau )$ and $\varvec{\Sigma }_{1\times 1}=2/\tau (1-\tau )$, and is popularly used in the likelihood framework for quantile regression with density $p(y)=\tau (1-\tau )\exp \{-\rho _\tau (y-\mu )\}$, where $\rho _\tau (u)=u(\tau -I(u<0))$. The SALD in (2.2) has the following stochastic representation

$$\begin{aligned} {\textbf {Y}}=\varvec{ \mu }+V \varvec{\gamma }+\sqrt{V}{\textbf {Z}}, \end{aligned}$$

(2.3)

where V is a random variable from an exponential distribution with mean 1 and ${\textbf {Z}}\sim \textrm{N}_d(0,\varvec{\Sigma })$ is generated independent of V. Using Bayes’s theorem, the density of V given ${\textbf {Y}}={\textbf {y}}$ is generalized inverse Gaussian, with the density

$$\begin{aligned} p_V(v|{\textbf {Y}}={\textbf {y}})=\frac{v^{\nu -1}}{2K_\nu (u)}\left( \frac{\delta ({\textbf {y}}, \varvec{ \mu },\varvec{\Sigma })}{2+\varvec{\gamma }{}^{\textrm{T}}\varvec{\Sigma }^{-1}\varvec{\gamma }}\right) ^{-\nu /2} \exp \left\{ -\frac{1}{2v}\delta ({\textbf {y}}, \varvec{ \mu },\varvec{\Sigma })-\frac{v}{2}(2+\varvec{\gamma }{}^{\textrm{T}}\varvec{\Sigma }^{-1}\varvec{\gamma })\right\} , \end{aligned}$$

(2.4)

where $\nu , \varvec{\gamma }, \varvec{ \mu }, \varvec{\Sigma }, \delta ({\textbf {y}}, \varvec{ \mu },\varvec{\Sigma })$ and u are as defined in (2.2). The SALD allows for peakedness, heavy tails, and skewness, and hence provides more flexibility in modeling multivariate data with non-Gaussian features. More properties, extensions and applications of SALD appear in Kozubowski and Podgórski (2001); Franczak et al. (2014); Bouveyron and Brunet-Saumard (2014).

2.1 Single-Index Mixed-Effects Model

Let ${\textbf {y}}_{ij}=(y_{ij}^{(1)}, y_{ij}^{(2)}){}^{\textrm{T}}$ be the observed values of two response variables (here, mean PPD and CAL) for the ith subject at the jth location (here, tooth), where $i=1,\ldots ,n$ and $j=1,\ldots ,m_i$. We assume

$$\begin{aligned} \left\{ \begin{array}{l} {\textbf {y}}_{ij}=\widetilde{\varvec{ \mu }}_{ij}+\varvec{\epsilon }_{ij}, \ \ \widetilde{\varvec{ \mu }}_{ij}=({\widetilde{\mu }}_{ij}^{(1)},{\widetilde{\mu }}_{ij}^{(2)}){}^{\textrm{T}},\\ \\ {\widetilde{\mu }}_{ij}^{(1)}=g_1(({\textbf {x}}_{ij}^{(1)}){}^{\textrm{T}}\varvec{\beta }_1)+({\textbf {z}}_{ij}^{(1)}){}^{\textrm{T}}{\textbf {b}}_{i1}, \ \ {\widetilde{\mu }}_{ij}^{(2)}=g_2(({\textbf {x}}_{ij}^{(2)}){}^{\textrm{T}}\varvec{\beta }_2)+({\textbf {z}}_{ij}^{(2)}){}^{\textrm{T}}{\textbf {b}}_{i2},\\ \\ \varvec{\epsilon }_{ij} \sim {\textrm{SAL}}_2 (\varvec{0},\varvec{\Sigma },\varvec{\gamma }), i.i.d. \ \ \forall \ \ i, j, \end{array} \right. \end{aligned}$$

(2.5)

where $g_1$ and $g_2$ are two unknown nonparametric functions, ${\textbf {x}}_{ij}^{(1)}=(x_{ij1}^{(1)},\ldots ,x_{ijp_1}^{(1)}){}^{\textrm{T}}$, ${\textbf {x}}_{ij}^{(2)}=(x_{ij1}^{(2)},\ldots ,x_{ijp_2}^{(2)}){}^{\textrm{T}}$, and ${\textbf {z}}_{ij}^{(1)}=(1,z_{ij1}^{(1)},\ldots ,z_{ijq_1}^{(1)}){}^{\textrm{T}}$, ${\textbf {z}}_{ij}^{(2)}=(1,z_{ij1}^{(2)},\ldots ,z_{ijq_2}^{(2)}){}^{\textrm{T}}$, $\varvec{\beta }_k \in \mathbb {R}^{p_k}$ and ${\textbf {b}}_{ik}\in \mathbb {R}^{q_k+1}$ are the (fixed) index coefficients and random effect for the k-th response (k=1 or 2), $\varvec{\gamma }$ is a $2\times 1$ vector of skewness parameters, and $\varvec{\Sigma }$ is the scatter matrix with dimension $2\times 2$ for the random error $\varvec{\epsilon }$. To accommodate a robust specification, we also assume the random effects ${\textbf {b}}_i=({\textbf {b}}_{i1}{}^{\textrm{T}},{\textbf {b}}_{i2}{}^{\textrm{T}}){}^{\textrm{T}}\sim \textrm{SAL}_{(q_1+q_2+2)}({\varvec{0}}, \varvec{\Omega }, \varvec{0})$, where $\varvec{\Omega }$ is an unstructured covariance matrix with dimension $(q_1+q_2+2)\times (q_1+q_2+2)$. Note, $\varvec{\Omega }$ carries information pertaining to both the clustering correlation within a response found on the two blocks of diagonal sub-matrices, with dimensions $(q_1+1)\times (q_1+1)$ and $(q_2+1)\times (q_2+1)$, and the cross-correlations between responses, found on the off-diagonal sub-matrices. In addition, we further assume the joint density of $(\varvec{\epsilon }_{ij}{}^{\textrm{T}}, {\textbf {b}}_i{}^{\textrm{T}}){}^{\textrm{T}}$ is $\textrm{SAL}_{(q_1+q_2+4)}(\varvec{0}_{(q_1+q_2+4)}, \textrm{blockdiag}(\varvec{\Sigma }, \varvec{\Omega }), (\varvec{\gamma }{}^{\textrm{T}},\varvec{0}{}^{\textrm{T}}_{q_1+q_2+4}){}^{\textrm{T}})$. We call model (2.5) as the single-index mixed-effects (SIME) model for bivariate clustered data.

For identifiability, we assume both $\Vert \varvec{\beta }_1\Vert =1$ and $\Vert \varvec{\beta }_2\Vert =1$, and their first components are positive, respectively. In this paper, the popular “delete one component” method is used to avoid the equality constraints (Yu and Ruppert 2002; Cui et al. 2011). Specifically, we write $\varvec{\beta }_1=((1-\Vert \varvec{\beta }_1^{(-1)}\Vert ^2)^{1/2},\beta _{12},\ldots ,\beta _{1p_1}){}^{\textrm{T}}$where, $\varvec{\beta }_1^{(-1)}=(\beta _{12},\ldots ,\beta _{1p_1}){}^{\textrm{T}}$. Under this parametrization, $\varvec{\beta }_1$ is a smooth deterministic function of $\varvec{\beta }_1^{(-1)}$, with its Jacobian matrix given by

$$\begin{aligned} {\textbf {J}}_1=\frac{\partial \varvec{\beta }_1}{\partial \varvec{\beta }_1^{(-1)}}=\left( \begin{array}{c} -\frac{\varvec{\beta }_1^{(-1)}}{(1-\Vert \varvec{\beta }_1^{(-1)}\Vert ^2)^{1/2}}\\ {\textbf {I}}_{p_1-1} \end{array}, \right) \end{aligned}$$

where ${\textbf {I}}_{p_1-1}$ is the identity matrix with $p_1-1$ rows/columns. The true parameter $\varvec{\beta }_1^{(-1)}$ satisfies the constraint $\varvec{\beta }_1^{(-1)} < 1$, which implies that it is a interior point in a unit ball in $\mathbb {R}^{p_1-1}$. Therefore, $\varvec{\beta }_1$ is infinitely differentiable in a neighborhood of $\varvec{\beta }_1^{(-1)}$. Similarly, we define $\varvec{\beta }_2^{(-1)}$ and ${\textbf {J}}_2$, and let $\varvec{\beta }^{(-1)}=((\varvec{\beta }_1^{(-1)}){}^{\textrm{T}},(\varvec{\beta }_2^{(-1)}){}^{\textrm{T}}){}^{\textrm{T}}$, ${\textbf {J}}=\textrm{blockdiag}({\textbf {J}}_1,{\textbf {J}}_2)$. Applying the stochastic representation in (2.3), model (2.5) admits the following hierarchical structure:

$$\begin{aligned} \left\{ \begin{array}{l} {\textbf {y}}_i|{\textbf {b}}_i,V_i \ \sim \ \textrm{N}_{2 m_i}({\widetilde{\varvec{ \mu }}}_i+V_i(\varvec{1}_{m_i}\otimes \varvec{\gamma }), V_i \varvec{\Lambda }_i),\\ \\ {\textbf {b}}_i|V_i \sim \textrm{N}_{2(q+1)}({\varvec{0}}, V_i\varvec{\Omega }), \ \ V_i \sim \textrm{E} (1), \end{array}\right. \end{aligned}$$

(2.6)

where ${\textbf {y}}_i=({\textbf {y}}_{i1}{}^{\textrm{T}},\ldots ,{\textbf {y}}_{im_i}{}^{\textrm{T}}){}^{\textrm{T}}$, ${\widetilde{\varvec{ \mu }}}_i=({\widetilde{\varvec{ \mu }}}_{i1}{}^{\textrm{T}},\ldots ,{\widetilde{\varvec{ \mu }}}_{im_i}{}^{\textrm{T}}){}^{\textrm{T}}$, $\textrm{E}$ denotes the exponential distribution, $\varvec{\Lambda }_i={\textbf {I}}_{m_i}\otimes \varvec{\Sigma }$, where $\otimes$ denotes the kronecker product, and ${\varvec{1}}_{m_i}$ is a $m_i$ column vector with element 1. From (2.5) and (2.6), it is clear that conditional on $V_i$, $\varvec{\epsilon }_{ij}$ and ${\textbf {b}}_i$ are independent. Integrating out ${\textbf {b}}_i$ in (2.6), we have the following hierarchical model

$$\begin{aligned} {\textbf {y}}_i|V_i \sim \textrm{N}_{2 m_i} ({\varvec{ \mu }}_i+V_i(\varvec{1}_{m_i}\otimes \varvec{\gamma }), V_i{\textbf {G}}_i), \ \ V_i\sim \textrm{E}(1), \end{aligned}$$

(2.7)

where $\varvec{ \mu }_i=((\varvec{ \mu }_{i1}){}^{\textrm{T}},\ldots ,(\varvec{ \mu }_{im_i}){}^{\textrm{T}}){}^{\textrm{T}}$ with $\varvec{ \mu }_{ij}=(g_1(({\textbf {x}}_{ij}^{(1)}){}^{\textrm{T}}\varvec{\beta }_1), g_2(({\textbf {x}}_{ij}^{(2)}){}^{\textrm{T}}\varvec{\beta }_2)){}^{\textrm{T}}$, ${\textbf {Z}}_i=({\textbf {Z}}_{i1},\ldots ,{\textbf {Z}}_{im_i})$, ${\textbf {Z}}_{ij}=\textrm{blockdiag}({\textbf {z}}_{ij}^{(1)},{\textbf {z}}_{ij}^{(2)})$, ${\textbf {G}}_i={\textbf {Z}}_i{}^{\textrm{T}}\varvec{\Omega }{\textbf {Z}}_i+{\varvec{\Lambda }}_i$. Moreover, it follows from (2.7) that the ${\textbf {y}}_i$ are independent and marginally distributed as

$$\begin{aligned} {\textbf {y}}_i \sim \textrm{SALD}_{2 m_i} ({\varvec{ \mu }}_i, {\textbf {G}}_i, \varvec{\gamma }_i^*), \ \ i=1,\ldots ,n, \end{aligned}$$

(2.8)

where $\varvec{\gamma }_i^*=\varvec{1}_{m_i}\otimes \varvec{\gamma }$. From (2.7) and by the properties of the generalized inverse Gaussian distribution in (2.4), we have

$$\begin{aligned} \mathbb {E}(V_i|{\textbf {y}}_i)=\sqrt{\frac{b_i}{a_i}}R_\nu (\sqrt{a_ib_i}) \ \ \textrm{and} \ \ \mathbb {E}(V_i^{-1}|{\textbf {y}}_i)=\sqrt{\frac{a_i}{b_i}}R_\nu (\sqrt{a_ib_i})-\frac{2\nu }{b_i}, \end{aligned}$$

(2.9)

where $a_i=2+(\varvec{\gamma }_i^*){}^{\textrm{T}}{\textbf {G}}_i^{-1}\varvec{\gamma }_i^*$, $b_i=({\textbf {y}}_i-\varvec{ \mu }_i){}^{\textrm{T}}{\textbf {G}}_i^{-1}({\textbf {y}}_i-\varvec{ \mu }_i)$, $R_\nu (u)=K_{\nu +1}(u)/K_\nu (u)$ and $\nu =1-m_i$.

2.2 Modeling the Index Functions

Since the two functions $g_1$ and $g_2$ in (2.5) are unknown, we use polynomial splines to approximate them in the subsequent ML estimation. Polynomial splines are simple, yet practical tools with computational tractability and statistical efficiency, and has been proven to be an extremely powerful method for smoothing.

For simplicity, we assume that the covariates ${\textbf {x}}_{ij}^{(1)}$ and ${\textbf {x}}_{ij}^{(2)}$ are bounded and the supports of $({\textbf {x}}^{(1)}){}^{\textrm{T}}\varvec{\beta }_{10}$ and $({\textbf {x}}^{(2)}){}^{\textrm{T}}\varvec{\beta }_{20}$ are contained in the finite interval [a, b]. Such a compactness assumption is almost always used in nonparametric regression with spline approximation. We use polynomial splines to approximate the nonparametric functions $g_1$ and $g_2$. Let $t_0=a<t_1<\cdots<t_{K'}<b=t_{K'+1}$ be the partitions of [a, b] into subintervals $[t_k,t_{k+1}),k=0,\ldots ,K'$ with $K'$ internal knots. A polynomial spline of order d is a function whose restriction to each subinterval is a polynomial of degree $d-1$ and globally $d-2$ times continuously differentiable on [a, b]. The collection of splines with a fixed sequence of knots has a B-spline basis $\{B_{1}(x),\ldots ,B_{K}(x)\}$, with ${K}=K'+d$. We assume the B-spline basis is normalized to have $\sum _{k=1}^KB_k(x)=\sqrt{K}$, although, any scaling can be used without changing the theoretical results.

Let ${\textbf {B}}_1(\cdot )=(B_1(\cdot ),\ldots ,B_{K_1}(\cdot )){}^{\textrm{T}}$ and ${\textbf {B}}_2(\cdot )=(B_1(\cdot ),\ldots ,B_{K_2}(\cdot )){}^{\textrm{T}}$, where $K_1=K_1'+d$ and $K_2=K_2'+d$ with number of knots $K'_1$ and $K'_2$ for $g_1$ and $g_2$. Then, we have $g_k(\cdot )\approx {\textbf {B}}_k{}^{\textrm{T}}(\cdot )\varvec{\theta }_k, k=1,2$ where $\varvec{\theta }_k=(\theta _{k1},\ldots , \theta _{kK_k}){}^{\textrm{T}}, k=1,2$. As a result, we can write

$$\begin{aligned} \mu _{ij}^{(1)}\approx {\textbf {B}}_1{}^{\textrm{T}}(({\textbf {x}}_{ij}^{(1)}){}^{\textrm{T}}\varvec{\beta }_1)\varvec{\theta }_1 \ \ \textrm{and} \ \ \mu _{ij}^{(2)}\approx {\textbf {B}}_2{}^{\textrm{T}}(({\textbf {x}}_{ij}^{(2)}){}^{\textrm{T}}\varvec{\beta }_2)\varvec{\theta }_2 \end{aligned}$$

(2.10)

for $i=1,\ldots ,n, j=1,\ldots ,m_i$. By letting the number of knots increase with the sample size at an appropriate rate, the spline estimate of the unknown function can achieve the optimal nonparametric convergence rate.

3 Theoretical Properties

In this section, we will investigate the theoretical properties for the index parameters and the index functions. In the following we establish the large sample properties based on the marginal distribution (2.8) of the proposed BV-SIM model in (2.5). For simplicity, we assume $m_i\equiv m$, with the response viewed as i.i.d. data, ${\textbf {y}}_i \sim \textrm{SALD}_{2\,m} ({\varvec{ \mu }}_i, {\textbf {G}}_i, \varvec{\gamma }^*), \ \ i=1,\ldots ,n$. In (2.8), $\varvec{\gamma }^*=\varvec{1}_m\otimes \varvec{\gamma }$ and ${\textbf {G}}_i={\textbf {Z}}_i{}^{\textrm{T}}\varvec{\Omega }{\textbf {Z}}_i+{\varvec{\Lambda }}$, with $\varvec{\Lambda }={\textbf {I}}_{m}\otimes \varvec{\Sigma }$. We first introduce some notations.

Let $\varvec{\beta }_{01}$ and $\varvec{\beta }_{02}$ be the true index parameters, and $g_{01}$ and $g_{02}$ the corresponding true index functions. Let $\varvec{\beta }_0=(\varvec{\beta }_{01}{}^{\textrm{T}},\varvec{\beta }_{02}{}^{\textrm{T}}){}^{\textrm{T}}$, $\varvec{\beta }_0^{(-1)}=((\varvec{\beta }_{01}^{(-1)}){}^{\textrm{T}},(\varvec{\beta }_{02}^{(-1)}){}^{\textrm{T}}){}^{\textrm{T}}$, $\varvec{ \mu }_i^0=((\varvec{ \mu }_{i1}^0){}^{\textrm{T}},\ldots ,(\varvec{ \mu }_{im_i}^0){}^{\textrm{T}}){}^{\textrm{T}}$ with $\varvec{ \mu }_{ij}^0=(g_{01}(({\textbf {x}}_{ij}^{(1)}){}^{\textrm{T}}\varvec{\beta }_{01}), g_{02}(({\textbf {x}}_{ij}^{(2)}){}^{\textrm{T}}\varvec{\beta }_{02})){}^{\textrm{T}}$. Denote the support of $\{{\textbf {X}}_{i}{}^{\textrm{T}}\varvec{\beta }_0\}$ as [a, b], where $a=\min _{i} \{ {\textbf {X}}_{i}{}^{\textrm{T}}\varvec{\beta }_0\}$ and $b=\max _{i} \{ {\textbf {X}}_{i}{}^{\textrm{T}}\varvec{\beta }_0\}$, ${\textbf {X}}_i=({\textbf {X}}_{i1},\ldots ,{\textbf {X}}_{im_i})$ with ${\textbf {X}}_{ij}=\textrm{blockdiag}({\textbf {x}}_{ij}^{(1)},{\textbf {x}}_{ij}^{(2)})$. Let $\mathcal {H}_s$ be the collection of all functions on the support [a, b] whose l-th order derivative satisfies the Hölder condition of the order r with $s=l+r$. Then, for each $g \in \mathcal {H}_s$, there exists a positive constant $C_0$ such that $|g^{(l)}(u)-g^{(l)}(v)|\le C_0|u-v|^r, \ \ \forall u, v \in [a,b]$. From De Boor (2001), there exists a constant C (see page 149) such that

$$\begin{aligned} \sup _{u\in [a,b]}|g_k(u)-{\textbf {B}}_k^T(u)\varvec{\theta }_{0k}|\le C K_k^{-s}, \end{aligned}$$

(3.1)

if $g_k \in \mathcal {H}_s$, where $\varvec{\theta }_{0k}=(\theta _{0k1},\ldots ,\theta _{0kK_k})^T, k=1,2$ are the true value of spline coefficients, which can be viewed as the best approximation coefficient vectors for $g_k$.

Denote $\varvec{\delta }=(\varvec{\gamma }{}^{\textrm{T}},\textrm{vech}(\varvec{\Omega }){}^{\textrm{T}},\textrm{vech}(\varvec{\Sigma }){}^{\textrm{T}}){}^{\textrm{T}}$ and $\varvec{\Theta }$ as the parameter space of $\varvec{\zeta }=(\varvec{\beta }{}^{\textrm{T}},\varvec{\theta }{}^{\textrm{T}},\varvec{\delta }{}^{\textrm{T}}){}^{\textrm{T}}$. Given the covariates ${\textbf {X}}_i$ and ${\textbf {Z}}_i$, let $\ell _m(\varvec{ \mu }_i,\varvec{\delta }, {\textbf {y}}_i)$ be the log-likelihood of the marginal distribution for response ${\textbf {y}}_i$ in (2.8) and $\ell _m(\varvec{\zeta }, {\textbf {y}}_i)\triangleq \ell _m({\textbf {W}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })\varvec{\theta },\varvec{\delta }, {\textbf {y}}_i)$ be the corresponding spline-approximated log-likelihood. Let $\varvec{\delta }_0$ be the true value of $\varvec{\delta }$ and $\varvec{\theta }_0=(\varvec{\theta }_{01}{}^{\textrm{T}},\varvec{\theta }_{02}{}^{\textrm{T}}){}^{\textrm{T}}$. Define ${\widehat{\varvec{\zeta }}}=({\widehat{\varvec{\beta }}}{}^{\textrm{T}},{\widehat{\varvec{\theta }}}{}^{\textrm{T}},{\widehat{\varvec{\delta }}}{}^{\textrm{T}}){}^{\textrm{T}}$ as the MLE, given by

$$\begin{aligned} {\widehat{\varvec{\zeta }}}=\textrm{argmax}_{\varvec{\zeta }}\sum _{i=1}^n \ell _m({\textbf {W}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })\varvec{\theta },\varvec{\delta }, {\textbf {y}}_i), \end{aligned}$$

(3.2)

where ${\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })=({\textbf {W}}_{i1},\ldots ,{\textbf {W}}_{im_i})$, ${\textbf {W}}_{ij}=\textrm{blockdiag}({\textbf {B}}_{ij}^{(1)}, {\textbf {B}}_{ij}^{(2)})$ with ${\textbf {B}}_{ij}^{(k)}={\textbf {B}}_k(({\textbf {x}}_{ij}^{(k)}){}^{\textrm{T}}\varvec{\beta }_k), k=1,2$. Define the space of square integrable single-index functions $\mathcal {G}=\{\textbf{g}: \mathbb {E}\Vert \textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\Vert ^2<\infty \}$, where $\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })=(\textbf{g}{}^{\textrm{T}}({\textbf {X}}_{i1}{}^{\textrm{T}}\varvec{\beta }),\ldots ,\textbf{g}{}^{\textrm{T}}({\textbf {X}}_{im_i}{}^{\textrm{T}}\varvec{\beta })){}^{\textrm{T}}$ with $\textbf{g}({\textbf {X}}_{ij}{}^{\textrm{T}}\varvec{\beta })=(g_1(({\textbf {x}}_{ij}^{(1)}){}^{\textrm{T}}\varvec{\beta }_{1}), g_2(({\textbf {x}}_{ij}^{(2)}){}^{\textrm{T}}\varvec{\beta }_{2})){}^{\textrm{T}}$. Denote ${\textbf {C}}_i(\varvec{ \mu }_i,\varvec{\delta })=-\partial ^2 \ell _m(\varvec{ \mu }_i,\varvec{\delta },{\textbf {y}}_i)/\partial \varvec{ \mu }_i\partial \varvec{ \mu }_i{}^{\textrm{T}}$ and ${\textbf {C}}_i^0={\textbf {C}}_i(\varvec{ \mu }_{i}^0,\varvec{\delta }_0)$. Then, the projection of a 2m-dimensional random vector $\varvec{\Gamma }$ onto $\mathcal {G}$ (defined as $\mathbb {E}[\varvec{\Gamma }] = \textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0))$ is the minimizer of

$$\begin{aligned}\min _{\textbf{g}\in \mathcal {G}}\mathbb {E}\left[ (\varvec{\Gamma }-\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)){}^{\textrm{T}}{\textbf {C}}_i^0 (\varvec{\Gamma }-\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0))\right] .\end{aligned}$$

Note, the definition of projection involves the distributions of both ${\textbf {X}}_i, {\textbf {Z}}_i$ and $\varvec{\Gamma }$ since we take the expectation over these random variables. This definition can be extended to any $2m\times L$ matrix by column-wise projection. In the following, we list the regularity conditions (Wang et al. 2014; Lian and Liang 2013; Zhao et al. 2017) that are necessary to study the asymptotic behavior of the MLEs.

(A1)
Both $g_1(\cdot ) \in \mathcal {H}_s$ and $g_2(\cdot ) \in \mathcal {H}_s$ for some $s\ge 2$.
(A2)
Both ${\textbf {x}}_{ij}^{(1)}$ and ${\textbf {x}}_{ij}^{(2)}$, $i=1,\ldots ,n, j=1,\ldots , m_i$, are bounded, with density supported on a convex set.
(A3)
The true parameter point $\varvec{\zeta }_0$ is an interior point of the parameter space $\varvec{\Theta }$.
(A4)
The log-likelihood $\ell _m(\varvec{\zeta },{\textbf {y}}_i)$ is at least thrice differentiable on parameters $\varvec{\zeta }$. Furthermore, the second derivatives of the likelihood function satisfy the equations
$$\begin{aligned}\mathbb {E}\left\{ \left( \frac{\partial \ell _m(\varvec{\zeta },{\textbf {y}}_i)}{\partial \varvec{\zeta }}\right) \left( \frac{\partial \ell _m(\varvec{\zeta },{\textbf {y}}_i)}{\partial \varvec{\zeta }}\right) {}^{\textrm{T}}\right\} = -\mathbb {E}\left\{ \frac{\partial ^2 \ell _m(\varvec{\zeta },{\textbf {y}}_i)}{\partial \varvec{\zeta }\partial \varvec{\zeta }{}^{\textrm{T}}}\right\} .\end{aligned}$$
Also, there exists functions $M_{jkl}({\textbf {y}}_i)$, such that
$$\begin{aligned}\left| \frac{\partial ^3 \ell _m(\varvec{\zeta },{\textbf {y}}_i)}{\partial \varvec{\zeta }_j\partial \varvec{\zeta }_k\partial \varvec{\zeta }_l}\right| \le M_{jkl}({\textbf {y}}_i)\end{aligned}$$
for $\varvec{\zeta }\in \varvec{\Theta }$, and $\mathbb {E}[M_{jkl}({\textbf {y}}_i)]<C_3<+\infty$. Here $\varvec{\zeta }_j$ denotes the j-th component of $\varvec{\zeta }$.
(A5)
The Fisher information matrix $\mathcal {I}(\varvec{\zeta }_0)=-\mathbb {E}\left. \left\{ \frac{\partial ^2 \ell _m(\varvec{\zeta },{\textbf {y}}_i)}{\partial \varvec{\zeta }\partial \varvec{\zeta }{}^{\textrm{T}}}\right\} \right| _{\varvec{\zeta }_0}$ satisfies the conditions
$$\begin{aligned}0<C_1<\lambda _{\min }\{\mathcal {I}(\varvec{\zeta }_0)\}\le \lambda _{\max }\{\mathcal {I}(\varvec{\zeta }_0)\}<C_2<+\infty ,\end{aligned}$$
where $\lambda _{\min }$ and $\lambda _{\max }$ denote the smallest and largest eigenvalues of a matrix.
(A6)
Suppose $\mathbb {E}_{\mathcal {G}}[{\textbf {X}}_{ij}\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}]=(h_1({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0),\ldots , h_{p_1+p_2}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)){}^{\textrm{T}}$. Assume all $h_j \in \mathcal {H}_{s'}$ with $s'>1$. We also assume that
$$\begin{aligned}\mathbb {E}\left[ ({\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i \textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}- \mathbb {E}_{\mathcal {G}}[{\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_{i}\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}])^{\otimes 2}\right] \end{aligned}$$
is positive definite, where ${\textbf {J}}$ is evaluated at $\varvec{\beta }_0$.

Remark 1

The smoothness condition in (A1) is a requirement to attain the best convergence rate for single-index functions approximated in the spline space. Condition (A2) is widely used in the single-index modeling literature, ensuring that the index functions are defined in a compact set and thus facilitates the technical derivations. Conditions (A3) and (A4) are two common assumptions in the literature of maximum likelihood estimation with spline approximations (Wang et al. 2011, 2014), implying that the information matrix of the likelihood function is positive definite. Condition (A5) is slightly stronger than that used in the usual asymptotic likelihood theory, however, widely used in high-dimensional likelihood estimation literature Fan and Peng (2004). Finally, Condition (A6) is related to the ‘projection’, or the ‘orthogonalization’ technique common in a semiparametric setup, which includes partially linear model (Li 2000), partially linear additive model (Lian and Liang 2013), and single-index models (Cui et al. 2011; Zhao et al. 2017).

Denote $K=\max \{K_1,K_2\}$, and let $r_n=\sqrt{K/n}+K^{-s}$. Then, we have the following result.

Theorem 1

Under the Conditions (A1)–(A5), suppose that $K^4/n\rightarrow 0$, $\sqrt{n}K^{-2s+1}\rightarrow 0$, then we have

$$\begin{aligned}\Vert {\widehat{\varvec{\beta }}}-\varvec{\beta }_0\Vert +\Vert {\widehat{\varvec{\theta }}}-\varvec{\theta }_0\Vert =O_p(r_n).\end{aligned}$$

As an immediate implication of Theorem 1, we have $\Vert {\widehat{g}}_1-g_1\Vert =O_p(r_n)$ and $\Vert {\widehat{g}}_2-g_2\Vert =O_p(r_n).$

Remark 2

Note that the rate of convergence for nonparametric functions is $O_p(n^{-s/(2s+1)})$ if the optimal $K\sim n^{1/(2s+1)}$, which is the same as that found in the nonparametric and semiparametric literature.

Theorem 2

Under Conditions (A1)–(A6), suppose that $K^4/n\rightarrow 0$, $\sqrt{n}K^{-2\,s+1}\rightarrow 0$ and $\sqrt{n}K^{-s-s'}\rightarrow 0$. Then, we have

$$\begin{aligned}\sqrt{n}({\widehat{\varvec{\beta }}}^{(-1)}-{\varvec{\beta }}^{(-1)}_0) {\mathop {\longrightarrow }\limits ^{\textrm{d}}} N(\varvec{0},{\varvec{{\Psi }}}^{-1}),\end{aligned}$$

where

$$\begin{aligned}\begin{array}{lll} {\varvec{{\Psi }}}&{}=&{}\mathbb {E}\left[ ({\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}-{\textbf {J}}{}^{\textrm{T}}\mathbb {E}_{\mathcal {G}}[{\textbf {X}}_i\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}])\cdot {\textbf {C}}_i^0\cdot \right. \\ &{}&{}\left. \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ({\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}-{\textbf {J}}{}^{\textrm{T}}\mathbb {E}_{\mathcal {G}}[{\textbf {X}}_i\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}]){}^{\textrm{T}}\right] \end{array}\end{aligned}$$

and ${\textbf {J}}$ is evaluated at the true $\varvec{\beta }_0$.

Following Theorem 2 and invoking the Delta method, we have

$$\begin{aligned}\sqrt{n}({\widehat{\varvec{\beta }}}-{\varvec{\beta }}_0) {\mathop {\longrightarrow }\limits ^{\textrm{d}}} N(\varvec{0},{\textbf {J}}{\varvec{{\Psi }}}^{-1}{\textbf {J}}{}^{\textrm{T}}).\end{aligned}$$

4 Maximum Likelihood Estimation

In this section, we develop the ML estimation for our BV-SIM model. We utilize EM-type algorithms for obtaining the MLE, based on two types of missing data structures in (2.6). The EM algorithm is a popular iterative algorithm for MLE in models with incomplete data (Dempster et al. 1977), where each iteration of the EM algorithm consists of two steps, the expectation (E) step and the maximization (M) step. Despite desirable features, the M-step in the EM algorithm is often difficult to implement for complicated models, and is replaced with a sequence of computationally simple conditional maximization (CM) steps, i.e. maximizing over one parameter with the other parameters held fixed. This leads to a simple extension of the EM algorithm, called the ECM algorithm (Meng and Rubin 1993).

Consider the hierarchical multivariate Laplace model in (2.6), where both $V_i$ and ${\textbf {b}}_i$ are missing data. Let ${\textbf {y}}=({\textbf {y}}_1{}^{\textrm{T}},\ldots ,{\textbf {y}}_n{}^{\textrm{T}}){}^{\textrm{T}}$, ${\textbf {b}}=({\textbf {b}}_1{}^{\textrm{T}},\ldots ,{\textbf {b}}_n{}^{\textrm{T}}){}^{\textrm{T}}$, ${\textbf {V}}=(V_1,\ldots ,V_n){}^{\textrm{T}}$ and $\varvec{\theta }=(\varvec{\theta }_1{}^{\textrm{T}},\varvec{\theta }_2{}^{\textrm{T}}){}^{\textrm{T}}$. The log-likelihood for the complete data in the multivariate Laplace single-index mixed-effects model up to an additive constant can be written as

$$\begin{aligned} \ell (\varvec{\beta },\varvec{\theta },\varvec{\gamma },\varvec{\Sigma },\varvec{\Omega }|{\textbf {y}},{\textbf {b}},{\textbf {V}})=\ell _1(\varvec{\beta },\varvec{\theta },\varvec{\gamma },\varvec{\Sigma }|{\textbf {y}},{\textbf {b}},{\textbf {V}})+ \ell _2(\varvec{\Omega }|{\textbf {b}},{\textbf {V}}), \end{aligned}$$

(4.1)

where

$$\begin{aligned} \ell _1(\varvec{\beta },\varvec{\theta },\varvec{\gamma },\varvec{\Sigma }|{\textbf {y}},{\textbf {b}},{\textbf {V}})=-\frac{N}{2}\log |\varvec{\Sigma }|-\frac{1}{2}\sum _{i=1}^n\sum _{j=1}^{m_i} V_{i}^{-1}({\textbf {y}}_{ij}-{\widetilde{\varvec{ \mu }}}_{ij}-V_{i}\varvec{\gamma }){}^{\textrm{T}}\varvec{\Sigma }^{-1}({\textbf {y}}_{ij}-{\widetilde{\varvec{ \mu }}}_{ij}-V_{i}\varvec{\gamma }) \end{aligned}$$

and

$$\begin{aligned}\ell _2(\varvec{\Omega }|{\textbf {b}},{\textbf {V}})=-\frac{n}{2} \log |\varvec{\Omega }|-\frac{1}{2}\textrm{trace}\left( \varvec{\Omega }^{-1} \sum _{i=1}^n V_i^{-1}{\textbf {b}}_i{\textbf {b}}_i{}^{\textrm{T}}\right) ,\end{aligned}$$

where ${\widetilde{\varvec{ \mu }}}_{ij}$ is defined in (2.5) and $N=\sum _{i=1}^n m_i$. Note that $\ell _1$ can be further written as

$$\begin{aligned} \ell _1&=-\frac{N}{2}\log |\varvec{\Sigma }|-\frac{1}{2}\sum _{i=1}^n V_{i}^{-1} ({\textbf {y}}_{i}-{\textbf {W}}_{i}{}^{\textrm{T}}\varvec{\theta }){}^{\textrm{T}}\varvec{\Lambda }_i^{-1}({\textbf {y}}_{i}-{\textbf {W}}_{i}{}^{\textrm{T}}\varvec{\theta }) -\frac{1}{2}\sum _{i=1}^n V_{i}^{-1}{\textbf {b}}_i{}^{\textrm{T}}{\textbf {Z}}_{i}\varvec{\Lambda }_i^{-1}{\textbf {Z}}_{i}{}^{\textrm{T}}{\textbf {b}}_i \\&+\sum _{i=1}^nV_i^{-1}({\textbf {y}}_{i}-{\textbf {W}}_{i}{}^{\textrm{T}}\varvec{\theta }){}^{\textrm{T}}\varvec{\Lambda }_i^{-1}{\textbf {Z}}_{i}{}^{\textrm{T}}{\textbf {b}}_i -\sum _{i=1}^n(\varvec{\gamma }^*_i){}^{\textrm{T}}\varvec{\Lambda }_i^{-1}{\textbf {Z}}_{i}{}^{\textrm{T}}{\textbf {b}}_i +\sum _{i=1}^n({\textbf {y}}_{i}-{\textbf {W}}_{i}{}^{\textrm{T}}\varvec{\theta }){}^{\textrm{T}}\varvec{\Lambda }_i^{-1}\varvec{\gamma }_i^* \\&-\frac{1}{2}\sum _{i=1}^n V_i(\varvec{\gamma }_i^*){}^{\textrm{T}}\varvec{\Lambda }_i^{-1}\varvec{\gamma }_i^*. \end{aligned}$$

Denote $\varvec{ \eta }$ as the full parameter vector to be estimated. We firstly compute the conditional posterior mean and variance of ${\textbf {b}}_i$ at the current estimate ${\widehat{\varvec{ \eta }}}$, leading to

$$\begin{aligned} \begin{array}{ll} \textrm{Cov}({\textbf {b}}_i|\varvec{ \eta }={\widehat{\varvec{ \eta }}}, {\textbf {y}}, {\textbf {V}})=V_i\left( {{\widehat{\varvec{\Omega }}}}^{-1}+{\textbf {Z}}_{i}{\widehat{\varvec{\Lambda }}}_i^{-1}{\textbf {Z}}_{i}{}^{\textrm{T}}\right) ^{-1}\triangleq V_i \cdot {\widehat{\varvec{\Delta }}}_i, \\ \\ \mathbb {E}({\textbf {b}}_i|\varvec{ \eta }={\widehat{\varvec{ \eta }}}, {\textbf {y}}, {\textbf {V}})={\widehat{\varvec{\Delta }}}_i {\textbf {Z}}_i {\widehat{\varvec{\Lambda }}}_i^{-1}({\textbf {y}}_{i}-{\textbf {W}}_{i}{}^{\textrm{T}}{\widehat{\varvec{\theta }}}-V_i{\widehat{\varvec{\gamma }}}^*_i) \triangleq {\widehat{{\textbf {R}}}}_{i1}-V_i{\widehat{{\textbf {R}}}}_{i2}, \end{array} \end{aligned}$$

for $i=1,\ldots ,n$, where

$$\begin{aligned} {\widehat{\varvec{\Delta }}}_i=\left( {\widehat{\varvec{\Omega }}}^{-1}+{\textbf {Z}}_{i}{\widehat{\varvec{\Lambda }}}_i^{-1}{\textbf {Z}}_{i}{}^{\textrm{T}}\right) ^{-1}, \ \ {\widehat{{\textbf {R}}}}_1={\widehat{\varvec{\Delta }}}_i {\textbf {Z}}_i {\widehat{\varvec{\Lambda }}}_i^{-1}({\textbf {y}}_{i}-{\textbf {W}}_{i}{}^{\textrm{T}}{\widehat{\varvec{\theta }}}) \ \ \textrm{and} \ \ {\widehat{{\textbf {R}}}}_2={\widehat{\varvec{\Delta }}}_i {\textbf {Z}}_i {\widehat{\varvec{\Lambda }}}_i^{-1}\widehat{\varvec{\gamma }}^*_i. \end{aligned}$$

(4.2)

After obtaining the estimates of the conditional mean and conditional covariance of the random effect ${\textbf {b}}_i$, we proceed to calculate the expectation of $\mathbb {E}(\ell (\cdot ))=\mathbb {E}_{\textbf {V}}\{\mathbb {E}_{\textbf {b}}[\ell (\cdot )|{\textbf {V}}]\}$. Define the quantities

$$\begin{aligned} \widehat{c}_i=\mathbb {E}(V_i|\varvec{ \eta }={\widehat{\varvec{ \eta }}}, {\textbf {y}}) \ \ \textrm{and } \ \ \widehat{d}_i=\mathbb {E}(V_i^{-1}|\varvec{ \eta }={\widehat{\varvec{ \eta }}}, {\textbf {y}}), \end{aligned}$$

(4.3)

which can be computed from (2.9), using the current estimate ${\widehat{\varvec{ \eta }}}$. After some simple calculations, we have

$$\begin{aligned} \begin{array}{lll} Q_1&{}\triangleq &{} \mathbb {E}\left[ \ell _1(\cdot |{\textbf {y}},{\textbf {b}},{\textbf {V}})|{\textbf {y}}, \varvec{ \eta }={\widehat{\varvec{ \eta }}}\right] \\ \\ &{} = &{} -\frac{N}{2}\log |\varvec{\Sigma }|-\frac{1}{2}\sum _{i=1}^n \widehat{d}_i ({\textbf {y}}_{i}-{\textbf {W}}_{i}{}^{\textrm{T}}\varvec{\theta }){}^{\textrm{T}}\varvec{\Lambda }_i^{-1}({\textbf {y}}_{i}-{\textbf {W}}_{i}{}^{\textrm{T}}\varvec{\theta }) -\frac{1}{2}\sum _{i=1}^n \widehat{c}_i(\varvec{\gamma }_i^*){}^{\textrm{T}}\varvec{\Lambda }_i^{-1}\varvec{\gamma }_i^* \\ \\ &{}&{} -\frac{1}{2}\sum _{i=1}^n \textrm{trace}\left\{ {\textbf {Z}}_{i}\varvec{\Lambda }_i^{-1}{\textbf {Z}}_{i}{}^{\textrm{T}}\left[ \widehat{d}_i{\widehat{{\textbf {R}}}}_{i1}{\widehat{{\textbf {R}}}}_{i1}{}^{\textrm{T}}-{\widehat{{\textbf {R}}}}_{i1}{\widehat{{\textbf {R}}}}_{i2}{}^{\textrm{T}}-{\widehat{{\textbf {R}}}}_{i2}{\widehat{{\textbf {R}}}}_{i1}{}^{\textrm{T}}+\widehat{c}_i{\widehat{{\textbf {R}}}}_{i2}{\widehat{{\textbf {R}}}}_{i2}{}^{\textrm{T}}+{\widehat{\varvec{\Delta }}}_i\right] \right\} \\ \\ &{}&{} +\sum _{i=1}^n \widehat{d}_i ({\textbf {y}}_{i}-{\textbf {W}}_{i}{}^{\textrm{T}}\varvec{\theta }){}^{\textrm{T}}\varvec{\Lambda }_i^{-1}{\textbf {Z}}_{i}{}^{\textrm{T}}{\widehat{{\textbf {R}}}}_{i1} -\sum _{i=1}^n({\textbf {y}}_{i}-{\textbf {W}}_{i}{}^{\textrm{T}}\varvec{\theta }){}^{\textrm{T}}\varvec{\Lambda }_i^{-1}[{\textbf {Z}}_{i}{}^{\textrm{T}}{\widehat{{\textbf {R}}}}_{i2}-\varvec{\gamma }_i^*]\\ \\ &{}&{} -\sum _{i=1}^n(\varvec{\gamma }^*_i){}^{\textrm{T}}\varvec{\Lambda }_i^{-1}{\textbf {Z}}_{i}{}^{\textrm{T}}{\widehat{{\textbf {R}}}}_{i1} +\sum _{i=1}^n \widehat{c}_i(\varvec{\gamma }^*_i){}^{\textrm{T}}\varvec{\Lambda }_i^{-1}{\textbf {Z}}_{i}{}^{\textrm{T}}{\widehat{{\textbf {R}}}}_{i2}, \end{array} \end{aligned}$$

(4.4)

and

$$\begin{aligned} \begin{array}{lll} Q_2&{}\triangleq &{} \mathbb {E}\left[ \ell _2(\cdot |{\textbf {y}},{\textbf {b}},{\textbf {V}})|{\textbf {y}}, \varvec{ \eta }={\widehat{\varvec{ \eta }}}\right] \\ \\ &{} = &{} -\frac{n}{2} \log |\varvec{\Omega }|-\frac{1}{2}\sum _{i=1}^n \textrm{trace}\left\{ \varvec{\Omega }^{-1}\left[ \widehat{d}_i{\widehat{{\textbf {R}}}}_{i1}{\widehat{{\textbf {R}}}}_{i1}{}^{\textrm{T}}-{\widehat{{\textbf {R}}}}_{i1}{\widehat{{\textbf {R}}}}_{i2}{}^{\textrm{T}}-{\widehat{{\textbf {R}}}}_{i2}{\widehat{{\textbf {R}}}}_{i1}{}^{\textrm{T}}+\widehat{c}_i{\widehat{{\textbf {R}}}}_{i2}{\widehat{{\textbf {R}}}}_{i2}{}^{\textrm{T}}+{\widehat{\varvec{\Delta }}}_i\right] \right\} +C, \end{array} \end{aligned}$$

(4.5)

Next, maximizing $Q_1$ over parameters $\varvec{\theta }$, $\varvec{\gamma }$, $\varvec{\beta }$ and $\varvec{\Sigma }$, and maximizing $Q_2$ over $\varvec{\Omega }$, we can obtain their estimates, which constitutes the CM-steps 1-5 in the following ECM algorithm:

E-step
Given current parameter estimates, for $i=1,\ldots ,n$, update $c_i$ and $d_i$ using (4.3), and update $\widehat{\varvec{\Delta }}_i$, ${\widehat{{\textbf {R}}}}_{i1}$ and ${\widehat{{\textbf {R}}}}_{i2}$ by (4.2).
CM-step 1
Fix ${\widehat{\varvec{\beta }}}, {\widehat{\varvec{\gamma }}}$ and ${\widehat{\varvec{\Sigma }}}$, and update ${\widehat{\varvec{\theta }}}$ by maximizing (4.4) over $\varvec{\theta }$, which gives
$$\begin{aligned}{\widehat{\varvec{\theta }}}=\left( \sum _{i=1}^n \sum _{j=1}^{m_i} \widehat{d}_i{\textbf {W}}_{ij}{\widehat{\varvec{\Sigma }}}^{-1}{\textbf {W}}_{ij}{}^{\textrm{T}}\right) ^{-1}\sum _{i=1}^n \sum _{j=1}^{m_i} {\textbf {W}}_{ij}{\widehat{\varvec{\Sigma }}}^{-1}\left[ \widehat{d}_i({\textbf {y}}_{ij}-{\textbf {W}}_{ij}{}^{\textrm{T}}{\widehat{\varvec{\theta }}}-{\textbf {Z}}_{ij}{}^{\textrm{T}}{\widehat{{\textbf {R}}}}_{i1})+{\textbf {Z}}_{ij}{}^{\textrm{T}}{\widehat{{\textbf {R}}}}_{i2}-\widehat{\varvec{\gamma }}\right] .\end{aligned}$$
CM-step 2
Fix ${\widehat{\varvec{\beta }}}, {\widehat{\varvec{\theta }}}$ and ${\widehat{\varvec{\Sigma }}}$, update ${\widehat{\varvec{\gamma }}}$ by maximizing (4.4) over $\varvec{\gamma }$, i.e.,
$$\begin{aligned}{\widehat{\varvec{\gamma }}}=\frac{\sum _{i=1}^n\sum _{j=1}^{m_i} ({\textbf {y}}_{ij}-{\textbf {W}}_{ij}{}^{\textrm{T}}{\widehat{\varvec{\theta }}}-{\textbf {Z}}_{ij}{}^{\textrm{T}}{\widehat{{\textbf {R}}}}_{i1}+\widehat{c}_i{\textbf {Z}}_{ij}{}^{\textrm{T}}{\widehat{{\textbf {R}}}}_{i2})}{\sum _{i=1}^n m_i \widehat{c}_i}.\end{aligned}$$
CM-step 3
Fix ${\widehat{\varvec{\theta }}}$, ${\widehat{\varvec{\gamma }}}$ and ${\widehat{\varvec{\Sigma }}}$, and update ${\widehat{\varvec{\beta }}}$ by maximizing (4.4) over $\varvec{\beta }$. Since there is no explicit expression for the estimate of the index parameter $\varvec{\beta }$, we use the Newton–Raphson method to obtain ${\widehat{\varvec{\beta }}}$, leading to the following iterative formula
$$\begin{aligned}\begin{array}{lll} \left( {\widehat{\varvec{\beta }}}^{(-1)}\right) ^\textrm{new}&{}=&{}\left( {\widehat{\varvec{\beta }}}^{(-1)}\right) ^\textrm{old}+\left( \sum _{i=1}^n\sum _{j=1}^{m_i}\widehat{d}_i{\textbf {H}}_{ij}{\widehat{\varvec{\Sigma }}}^{-1}{\textbf {H}}_{ij}{}^{\textrm{T}}\right) ^{-1}\times \\ \\ &{}&{} \times \sum _{i=1}^n\sum _{j=1}^{m_i} {\textbf {H}}_{ij}{\widehat{\varvec{\Sigma }}}^{-1}\left[ \widehat{d}_i({\textbf {y}}_{ij}-{\textbf {W}}_{ij}{}^{\textrm{T}}{\widehat{\varvec{\theta }}}-{\textbf {Z}}_{ij}{}^{\textrm{T}}{\widehat{{\textbf {R}}}}_{i1})+{\textbf {Z}}_{ij}{}^{\textrm{T}}{\widehat{{\textbf {R}}}}_{i2}-{\widehat{\varvec{\gamma }}}\right] \end{array} \end{aligned}$$
where ${\textbf {H}}_{ij}=\left[ \begin{array}{cc} {\textbf {J}}_1{}^{\textrm{T}}{\textbf {x}}_{ij}^{(1)}\{\dot{{\textbf {B}}}_1{}^{\textrm{T}}(({\textbf {x}}_{ij}^{(1)}){}^{\textrm{T}}{\widehat{\varvec{\beta }}}_1^\textrm{old}){\widehat{\varvec{\theta }}}_1\} &{} {\varvec{0}}_{(p_1-1)\times 1}\\ {\varvec{0}}_{(p_2-1)\times 1} &{} {\textbf {J}}_2{}^{\textrm{T}}{\textbf {x}}_{ij}^{(2)}\{\dot{{\textbf {B}}}_2{}^{\textrm{T}}(({\textbf {x}}_{ij}^{(2)}){}^{\textrm{T}}{\widehat{\varvec{\beta }}}_2^\textrm{old}){\widehat{\varvec{\theta }}}_2\} \end{array} \right]$, and $\dot{{\textbf {B}}}(\cdot )$ denotes the first derivative of the spline basis ${\textbf {B}}(\cdot )$.
CM-step 4
Fix ${\widehat{\varvec{\beta }}}$, ${\widehat{\varvec{\theta }}}$ and ${\widehat{\varvec{\gamma }}}$, and update ${\widehat{\varvec{\Sigma }}}$ by maximizing (4.4) over $\varvec{\Sigma }$. Denote
$$\begin{aligned}{\widehat{{\textbf {D}}}}=\sum _{i=1}^n \sum _{j=1}^{m_i} \left\{ \left[ \widehat{d}_i ({\textbf {y}}_{ij}-{\textbf {W}}_{ij}{}^{\textrm{T}}{\widehat{\varvec{\theta }}}-2{\textbf {Z}}_{ij}{}^{\textrm{T}}\widehat{{\textbf {R}}}_{i1})+2({\textbf {Z}}_{ij}{}^{\textrm{T}}\widehat{{\textbf {R}}}_{i2} -\widehat{\varvec{\gamma }})\right] ({\textbf {y}}_{ij}-{\textbf {W}}_{ij}{}^{\textrm{T}}{\widehat{\varvec{\theta }}}){}^{\textrm{T}}+\widehat{c}_i{\widehat{\varvec{\gamma }}}{\widehat{\varvec{\gamma }}}{}^{\textrm{T}}\right\} +\end{aligned}$$
$$\begin{aligned}\sum _{i=1}^n \sum _{j=1}^{m_i}{\textbf {Z}}_{ij}{}^{\textrm{T}}\left[ \widehat{d}_i{\widehat{{\textbf {R}}}}_{i1}{\widehat{{\textbf {R}}}}_{i1}{}^{\textrm{T}}-{\widehat{{\textbf {R}}}}_{i1}{\widehat{{\textbf {R}}}}_{i2}{}^{\textrm{T}}-{\widehat{{\textbf {R}}}}_{i2}{\widehat{{\textbf {R}}}}_{i1}{}^{\textrm{T}}+\widehat{c}_i{\widehat{{\textbf {R}}}}_{i2}{\widehat{{\textbf {R}}}}_{i2}{}^{\textrm{T}}+{\widehat{\varvec{\Delta }}}_i\right] {\textbf {Z}}_{ij} +\sum _{i=1}^n \sum _{j=1}^{m_i} ({\textbf {Z}}_{ij}{}^{\textrm{T}}{\widehat{{\textbf {R}}}}_{i1}- \widehat{c}_i{\textbf {Z}}_{ij}{}^{\textrm{T}}{\widehat{{\textbf {R}}}}_{i2}){\widehat{\varvec{\gamma }}}{}^{\textrm{T}}.\end{aligned}$$
Applying the result in Lemma 1, we obtain ${\widehat{\varvec{\Sigma }}}=\frac{1}{N}{\widehat{{\textbf {D}}}}$.
CM-step 5
Update ${\widehat{\varvec{\Omega }}}$ by maximizing (4.5) over $\varvec{\Omega }$, which gives
$$\begin{aligned}{\widehat{\varvec{\Omega }}}=\frac{1}{n}\sum _{i=1}^n \left[ \widehat{d}_i{\widehat{{\textbf {R}}}}_{i1}{\widehat{{\textbf {R}}}}_{i1}{}^{\textrm{T}}-{\widehat{{\textbf {R}}}}_{i1}{\widehat{{\textbf {R}}}}_{i2}{}^{\textrm{T}}-{\widehat{{\textbf {R}}}}_{i2}{\widehat{{\textbf {R}}}}_{i1}{}^{\textrm{T}}+\widehat{c}_i{\widehat{{\textbf {R}}}}_{i2}{\widehat{{\textbf {R}}}}_{i2}{}^{\textrm{T}}+{\widehat{\varvec{\Delta }}}_i\right] .\end{aligned}$$

Repeat the above E-step and CM-steps, until all parameters achieve the desired convergence criterion. Since our estimation procedure requires initial values, we set ${\widehat{\varvec{\gamma }}}^{(0)}=(0,0){}^{\textrm{T}}$, ${\widehat{\varvec{\Sigma }}}^{(0)}={\textbf {I}}_2$, and the estimates of ${\widehat{\varvec{\beta }}}_1^{(0)}$, ${\widehat{\varvec{\beta }}}_2^{(0)}$ and ${\widehat{\varvec{\Omega }}}^{(0)}$ are obtained from fitting a linear mixed model via the R package lmer, where ${\textbf {X}}_{ij}=\textrm{blockdiag}({\textbf {x}}_{ij}^{(1)},{\textbf {x}}_{ij}^{(2)})$ and ${\textbf {Z}}_{ij}$ are the design matrices corresponding to the fixed effects and random effects, respectively. Simulation studies (in Sect. 5) show that the above strategy works well.

5 Simulation Studies

In this section, we conduct extensive simulation studies using synthetic data to study the finite-sample performance of the model parameters in our proposed method (Simulation 1), and the robustness of our method when compared to existing alternatives, under data generated under various settings (Simulation 2).

5.1 Knots Selection

It is well-known that the performance of any spline estimation depends on the knots selection. Here, we employed Schwartz information criteria (SIC) for adaptive know selection (Ma and Song 2015; Lu 2017; Zhao et al. 2017). In view of the order $n^{1/(2s+1)}$ (of knots) to attain optimal convergence rate of nonparametric functions in 1, a sequence of knots are selected in a neighborhood of $n^{1/(2s+1)}$, such as $\left[ 0.5N_s, \min (5N_s, n^{1/2})\right]$, where $N_s=\lfloor n^{1/(2\,s+1)}\rfloor$, and s is the smoothing parameter. We choose $s=2$ in both simulation studies and real data application. For simplicity, we use cubic polynomial splines and the number of interior knots $K_1=K_2\equiv K$ are the same for the two nonparametric link functions. The number $K_\textrm{opt}$ corresponding to the minimum SIC value is defined as the optimal number of knots $\textrm{SIC}(K)=-\sum _{i=1}^n \textrm{log} \hat{L}_i^K+\textrm{log}n\times 2K$, where $\textrm{log}\hat{L}_i^K$ denotes the estimated value of the log-likelihood function obtained from(2.8), with the given K knots.

5.2 Simulation 1: Assessing Finite-Sample Properties

Here, data is generated from the model (2.5), where the two nonparametric functions are $g_1(u)=2\sin (\pi u)$ and $g_2(u)=8u(1-u)$, with the true index parameters $\varvec{\beta }_1=(1/\sqrt{3},-1/\sqrt{3},1/\sqrt{3}){}^{\textrm{T}}$ and $\varvec{\beta }_2=(2/\sqrt{6},1/\sqrt{6},1/\sqrt{6}){}^{\textrm{T}}$, respectively. Both covariates ${\textbf {x}}_{ij}^{(1)}$ and ${\textbf {x}}_{ij}^{(2)}$ are generated independently from the trivariate uniform distribution $U^3(0,1)$. The random effects ${\textbf {b}}_i=({\textbf {b}}_{i1}{}^{\textrm{T}},{\textbf {b}}_{i2}{}^{\textrm{T}}){}^{\textrm{T}}$ are generated from $\textrm{SAL}_{4}({\varvec{0}}, \varvec{\Omega }, \varvec{0})$, with covariance matrix

$$\begin{aligned}\varvec{\Omega }=\left( \begin{array}{cccc} 9 &{} 4.8 &{} 3.6 &{} 0.6\\ 4.8 &{} 4 &{} 2 &{} 1.2\\ 3.6 &{} 2 &{} 4 &{} 1 \\ 0.6 &{} 1.2 &{} 1 &{} 1 \end{array}\right) ,\end{aligned}$$

and the corresponding covariates ${\textbf {z}}_{ij}^{(1)}=(1,z_{ij1}^{(1)}){}^{\textrm{T}}$ and ${\textbf {z}}_{ij}^{(2)}=(1,z_{ij1}^{(2)}){}^{\textrm{T}}$, where $z_{ij1}^{(1)}$ and $z_{ij1}^{(2)}$ are generated from the standard normal distribution. The random error $\varvec{\epsilon }_{ij}$ is generated from $\textrm{SAL}_2 (\varvec{0},\varvec{\Sigma },\varvec{\gamma })$ with $\varvec{\Sigma }=\left( \begin{array}{cc}1 &{} 0.6\\ 0.6 &{} 1 \end{array}\right)$ and $\varvec{\gamma }=(2,1.5){}^{\textrm{T}}$. The sample size n is set to be 50, 100 and 200, and the number of cluster members $m_i$ in each subject is generated from the discrete uniform distribution on $5,6,\ldots ,10$. Table 1 presents the averages of bias, absolute bias, and the empirical standard error estimates for the index parameters and the skewness parameter, over 400 replications.

Table 1 Table entries are the average bias (BIAS), average absolute bias (ABIAS), and empirical standard error (ESE) estimates for $n = 50, 100, 200$, calculated over 400 replications, corresponding to Simulation 1

Full size table

From Table 1, all biases are close to zero for all sample sizes, implying our proposed estimators are consistent. Moreover, the absolute biases and the standard errors are smaller with increasing sample sizes, with the estimation performance of index parameters significantly better than the skewness parameters. To further assess the estimation results, we calculate the integrated mean squared error (IMSE), defined as

$$\begin{aligned}\textrm{IMSE}(g_l)=\frac{1}{400 }\sum _{s=1}^{400}\sqrt{\frac{1}{N}\sum _{i=1}^n\sum _{j=1}^{m_i} \{\widehat{g}^{(s)}_l(({\textbf {x}}_{ij}^{(1)}){}^{\textrm{T}}{\widehat{\varvec{\beta }}}_l)-g_l(({\textbf {x}}_{ij}^{(1)}){}^{\textrm{T}}\varvec{\beta }_l)\}^2},\ \ l=1,2,\end{aligned}$$

where $\widehat{g}_l^{(s)}(\cdot )$ is the spline approximation to $g_l(\cdot )$ in the sth simulation run. We report the average of the IMSE as $\textrm{AIMSE}=\frac{1}{2}\sum _{l=1}^2 \textrm{IMSE}(g_l)$ in Table 2. For evaluating the estimation performances of the scatter matrix $\varvec{\Sigma }$ (corresponding to the bivariate responses) and the covariance matrix $\varvec{\Omega }$ (for the random effects), we use the Frobenius-norm of the matrix of differences between the estimated and true values, i.e. $\Vert {\textbf {A}}\Vert _F=\sqrt{\textrm{trace}({\textbf {A}}{}^{\textrm{T}}{\textbf {A}})}$, where ${\textbf {A}}$ is either $\widehat{\varvec{\Sigma }}-\varvec{\Sigma }$ or ${\widehat{\varvec{\Omega }}}-\varvec{\Omega }$. Simulation results, together with the root of mean square error (RMSE) for $\varvec{\beta }_1$, $\varvec{\beta }_2$ and $\varvec{\gamma }$ are listed in Table 2, where the RMSE for an arbitrary parameter $\varvec{\delta }$ is defined as $\textrm{RMSE}_{\varvec{\delta }}=\sqrt{({\widehat{\varvec{\delta }}}-\varvec{\delta }){}^{\textrm{T}}({\widehat{\varvec{\delta }}}-\varvec{\delta })}$. It is clear from Table 2 that the finite-sample performances of our proposed estimation procedures are satisfactory, with increasing sample sizes. In sum, the simulation results show that both index parameters, the nonparametric functions, and other parameters associated with the mixed effect models are reliably estimated, thereby confirming that our proposed algorithm works well in synthetic data settings.

Table 2 Table entries are the averages of the IMSE (AIMSE), the Frobenius-norms for $\varvec{\Sigma }$ and $\varvec{\Omega }$, and the root of mean squared errors (RMSE) of the model parameters, under various sample sizes $(n = 50, 100, 200)$, calculated over 400 replications, corresponding to Simulation 1

Full size table

5.3 Simulation 2: Assessing Robustness, in Light of Competing Methods

Here, the data is generated similar to Simulation 1 (from a BV-SIM), except that the random effects and errors are independently generated under the following four distributional assumptions:

Case 1:
${\textbf {b}}_i \sim N(\varvec{0}, \varvec{\Omega }), \ \ \varvec{\epsilon }_{ij} \sim N(\varvec{0}, \varvec{\Sigma })$;
Case 2:
${\textbf {b}}_i \sim t(\varvec{0},\varvec{\Omega }, v), \ \ \varvec{\epsilon }_{ij} \sim t(\varvec{0},\varvec{\Sigma },v)$;
Case 3:
${\textbf {b}}_i \sim \textrm{SAL}_{4}({\varvec{0}}, \varvec{\Omega }, \varvec{0}), \ \ \varvec{\epsilon }_{ij} \sim \textrm{SAL}_{2}({\varvec{0}}, \varvec{\Sigma }, \varvec{0})$;
Case 4:
${\textbf {b}}_i \sim 0.8 N(\varvec{0}, \varvec{\Omega })+0.2N(\varvec{0}, 10 \varvec{\Omega }), \ \ \varvec{\epsilon }_{ij} \sim 0.8 N(\varvec{0}, \varvec{\Sigma })+0.2 N(\varvec{0}, 10\varvec{\Sigma })$,

for $i=1,\cdots ,n, \ \ j=1,\cdots , m_i$,

Here, Case 1 corresponds to random effects and errors independently generated from the multivariate normal distribution. For Case 2, both are generated from the multivariate t-distribution with degree of freedom v (setting $v=5$). For Case 3, the random effects and errors are generated from the multivariate symmetric Laplace distribution with covariance matrix $\varvec{\Omega }$ and $\varvec{\Sigma }$, respectively. Finally, Case 4 corresponds to generating both the random terms (effects and errors) from multivariate normal mixtures. Note, for the above four cases, the bivariate clustered response is symmetric, since both the random effects and errors are generated from symmetric distributions. This is to make our approach comparable to the following two existing alternatives, (a) The bivariate normal mixed effect single-index model of Wu and Tu (2016), and (b) The bivariate mixed effect single-index model using the multivariate t-distribution, which extends the univariate linear mixed model proposal of (Pinheiro et al. 2001). In (a), penalized splines were used to approximate the nonparametric index function, whereas we use polynomial splines. At each replication, we use the same dataset to obtain the estimates from these three competing methods. We focus on the estimation of the index parameters and the index functions for the fixed effect part, with the same interpretation for all cases.

Table 3 Table entries are the root of mean squared errors (RMSE) of $\varvec{\beta }_1$ and $\varvec{\beta }_2$, and the Average Integrated Mean Squared Error (AIMSE) from our model and the 2 competing models (Wu and Pinheiro), for $n = 50, 100, 200$, with data generated from the 4 cases described in Sect. 5.2

Full size table

The results are summarized in Table 3. For all cases, RMSEs and AIMSEs decrease quickly as the sample size increases for all three methods. That said, our proposed method performs well for all four cases, and is significantly better than both the alternatives for Cases 3 and 4. The advantages of our method appears more prominent if we further reduce the mixing proportion of the mixture distribution in Case 4 from 0.8 to 0.7, 0.6 or 0.5 (results not reported here). In Cases 1 and 2, the performances of our method is comparable to the two others. In particular, our method performs almost similar to Pinheiro’s t-distribution method in Case 2 when $n=200$, while they are both better than the normal mixed-effects method of Wu and Tu (2016). To summarize, the performance of our proposed method appears to be satisfactory in all cases, and is robust to misspecified (non-Gaussian) random effects and errors, under a bivariate mixed model framework.

6 Application: GAAD Dataset

In this section, we illustrate our method via application to the GAAD dataset. Here, the tooth-level mean PPD and CAL measures are non-Gaussian bivariate responses representing PD status, and our objective is to evaluate the distribution of PD status for this population, and quantify the effects of various subject-level covariates such as Age (in years), body mass index (BMI), Gender ($1 = \textrm{Female}, 0 = \textrm{Male}$), Smoking status ($1 = \textrm{Smoker}, 0 = \mathrm{Never \ Smoker}$) and glycemic level or HbA1c ($1 = \mathrm{High/Uncontrolled}$), $0 = \textrm{Controlled}$) on the PD status. For our analysis, we have $n = 288$ subjects with complete covariate information. About 30% of the subjects are smokers. The mean age of the subjects is about 54 years with a range from 26–87 years. There is a predominance of female subjects (around 76%) in the data. Around 60% of subjects are obese (BMI $\ge 30$), and 59% are with uncontrolled HbA1c. Each subject has varying number of teeth, ranging from 3 to 28, with a total of 5461 observations. A full dentition will constitute 28 teeth, however, missing tooth is very common in any oral health studies, with the actual cause of missingness mostly unknown. Hence, in order to avoid unverifiable missing data assumptions, we did not resort to missing data analysis, and present only complete case analysis.

As part of explanatory analysis, we present the bivariate kernel density estimate of the PPD and CAL responses in Fig. 2 (left panel). The plot reveals significant (right) skewness for both responses. Also, the right panel in Fig. 2 indicates presence of possible outliers. Recent research (Zhao et al. 2018) confirmed possible non-linear relationship between oral health responses, and continuous covariates, like Age. Motivated by this, we set forward to estimate a clinically meaningful single-index structure determining PD for the subjects in this database.

Table 4 Estimates of the index parameters, the skewness parameter and their 95% confidence intervals, corresponding to the PPD and CAL responses from the GAAD study

Full size table

We consider fitting the following model to the GAAD data

$$\begin{aligned}\left\{ \begin{array}{l} \textrm{PPD}_{ij}=g_1({\textbf {x}}_{ij}{}^{\textrm{T}}\varvec{\beta }_1)+{\textbf {z}}_{ij}{}^{\textrm{T}}{\textbf {b}}_{i1}+\epsilon _{ij1},\\ \\ \textrm{CAL}_{ij}=g_2({\textbf {x}}_{ij}{}^{\textrm{T}}\varvec{\beta }_2)+{\textbf {z}}_{ij}{}^{\textrm{T}}{\textbf {b}}_{i2}+\epsilon _{ij2}, \end{array} \right. i=1,\ldots , 288, j=1,\ldots ,m_i, \end{aligned}$$

where ${\textbf {x}}_{ij}=(x_{ij1},\ldots ,x_{ij5}){}^{\textrm{T}}$ with $x_{ij1}=$ Age, $x_{ij2}=$ BMI, $x_{ij3}=$ Gender, $x_{ij4}=$ Smoker, $x_{ij5}=$ HbA1c and ${\textbf {z}}_{ij}=(1,z_{ij1},z_{ij2},z_{ij3}){}^{\textrm{T}}$ with $z_{ij1}=$ Gender, $z_{ij2}=$ Smoker, $z_{ij3}=$ HbA1c. We further assume ${\textbf {b}}_i=({\textbf {b}}_{i1}{}^{\textrm{T}},{\textbf {b}}_{i2}{}^{\textrm{T}}){}^{\textrm{T}}\sim \textrm{SAL}_8(\varvec{0},\varvec{\Omega },\varvec{0})$ and $\varvec{\epsilon }_{ij}=(\epsilon _{ij1},\epsilon _{ij2}){}^{\textrm{T}}\sim \textrm{SAL}_2(\varvec{0},\varvec{\Sigma },\varvec{\gamma })$. The estimates for index parameters, skewness parameter and their 95% confidence intervals are presented in Table 4, where the 95% confidence intervals are obtained by bootstrap resampling with 200 replications. We observe that all parameters (except $\beta _{13}$ corresponding to Gender for the PPD regression) were positive and significant. Interestingly, the estimate of Gender ($\beta _{13}$) is negative yet significant for PPD, while, the corresponding estimate ($\beta _{23}$) for CAL is positive and significant, implying that Gender is contributing to the index development for the two responses in opposite directions. Figure 3 presents the estimated curves corresponding to the two index functions, along with their 95% confidence bands using bootstrap method. Compared to the CAL, the 95% band is tighter for the PPD.

It is immediate that the correlation between PPD and CAL are significant, implying the need to account for the crosswise correlation between the two responses, and the cluster-wise correlation of the responses within the same subject, while modeling the bivariate clustered responses. Furthermore, Fig. 4 presents the bivariate kernel density surface of the estimated residuals (left panel), and the same from random draws of $n=5461$ observations from the bivariate ALD density $\textrm{ALD}({\widehat{\varvec{\Sigma }}},{\widehat{\varvec{\gamma }}})$, where ${\widehat{\varvec{\Sigma }}}$ and ${\widehat{\varvec{\gamma }}}$ are plugged-in estimates derived from our fit. We observe that the estimated surfaces are very similar, confirming the adequacy of model fit to the GAAD dataset.

Correlation matrices $\varvec{\Sigma }$ and $\varvec{\Omega }$ are estimates as:

$$\begin{aligned}\widehat{\varvec{\Sigma }}=\left( \begin{array}{ll} 1.2429 &{}0.7937\\ 0.7937 &{}0.9024 \end{array}\right) \end{aligned}$$

and

$$\begin{aligned}{\widehat{\varvec{\Omega }}}=\left( \begin{array}{cccccccc} 1.6589 &{}-0.0089&{} -0.0461&{} -0.2792&{} 1.5780&{} -0.1832&{} -0.1815&{} -0.5760 \\ -0.0089 &{} 0.8797&{} -0.4081&{} 0.1553&{} -0.1685&{} 0.5379&{} -0.0466&{} 0.4289 \\ -0.0461 &{}-0.4081&{} 0.8423&{} 0.3273&{} -0.0808&{} 0.1296&{} 0.1931&{} 0.1264 \\ -0.2792 &{} 0.1553&{} 0.3273&{} 0.7782&{} -0.4164&{} 0.3802&{} 0.1290&{} 0.6585 \\ 1.5780 &{}-0.1685&{} -0.0808&{} -0.4164&{} 2.1987&{} -0.8975&{} -0.4840&{} -0.8462 \\ -0.1832 &{} 0.5379&{} 0.1296&{} 0.3802&{} -0.8975&{} 1.0517&{} 0.3364&{} 0.6420 \\ -0.1815 &{}-0.0466&{} 0.1931&{} 0.1290&{} -0.4840&{} 0.3364&{} 0.2016&{} 0.1681 \\ -0.5760 &{} 0.4289&{} 0.1264&{} 0.6585&{} -0.8462&{} 0.6420&{} 0.1681&{} 0.8158 \end{array}\right) . \end{aligned}$$

To further evaluate the usefulness of our proposed new model, we consider the fitted and prediction errors in light of two alternatives, denoted as “AM1” (bivariate normal, mixed effects SIM) and “AM2” (bivariate, asymmetric Laplace SIM, without random effects). We randomly partition the data into training and testing sets, where the training data is used to fit the 3 models, and the test data to evaluate the prediction errors. Using varying sizes of training and testing data, the average absolute fitted errors (AAFE), and the average absolute prediction errors (AAPE) for the two responses, based on 200 random partitions, are reported in Table 5, where

$$\begin{aligned}\textrm{AAFE}_k=\frac{1}{\sum _{i=1}^{nb}m_i}\sum _{i=1}^{nb}\sum _{j=1}^{m_i}|y_{ijk}-\widehat{y}_{ijk}|\end{aligned}$$

and

$$\begin{aligned}\textrm{AAPE}_k=\frac{1}{\sum _{i=1}^{n-nb}m_i}\sum _{i=1}^{n-nb}\sum _{j=1}^{m_i}|y_{ijk}-\widetilde{y}_{ijk}|,\end{aligned}$$

for $k=1$ and 2, with $\widehat{y}_{ijk}$, the fitted value based on training data, and $\widetilde{y}_{ijk}$, the predicted value based on the test data, and nb denote the number of subjects in the training data.

Table 5 Average absolute fitted and prediction errors for our model and 2 competing models (AM1 and AM2), for the PPD and CAL responses in the GAAD data, based on 200 random partitions

Full size table

From Table 5, we observe that our model performs the best in terms of AAFE and AAPE, for various sizes of the training and testing set. More specifically, our proposed mixed-effects SIM model is superior to the bivariate asymmetric Laplace SIM (excluding random effects), implying the necessity to account for the within-subject correlation. Furthermore, our proposed model is also better than the SIM with the usual multivariate normal specification for the random effects, thereby providing evidence of the gain in accounting for data asymmetry during modeling.

7 Conclusions

Derivation of useful medical indices that correlate with multiple health outcomes is an issue of significant practical importance. In this paper, we propose a single-index mixed-effects regression model for bivariate responses, where both the error term and random effect are assumed to follow multivariate asymmetric Laplace distribution. By the polynomial spline smoothing for index functions, we proposed a scalable ML estimation method based on EM-type algorithm, and study the asymptotic properties of the ML estimates under some mild conditions. Simulations and real data analysis reveal the potential of the proposed model under data asymmetry, compared to existing alternatives.

There exists a number of future directions to pursue. To further improve model fit and prediction, we can consider the joint modeling of the location, skewness, and scatter matrix, within a multivariate ALD setup. When the number of covaiates is large in both fixed effects and random effects, it is of interest to select important variables in both parts to obtain a concise model. Some existing variable selection work of linear mixed effects model are available for univariate response case; see, for example, Kinney and Dunson (2010); Bondell et al. (2010); Fan and Li (2012); Schelldorfer and Geer (2011); Pan and Huang (2014), and others. However, for the case of single-index mixed effects models for multivariate responses, there is limited work, and pursuing the variable selection is a non-trivial journey. Another extension is to consider mixed effects quantile regression (Waldmann and Kneib 2015) for bivariate responses. These will be pursued elsewhere.

Change history

15 May 2024
A Correction to this paper has been published: https://doi.org/10.1007/s41096-024-00191-y

References

Anderson TW (1984) An introduction to multivariate statistical analysis. John Wiley & Sons, USA
Google Scholar
Azzalini A (2010) The skew-normal distribution and related multivariate families. Scand J Stat 32:159–188
MathSciNet Google Scholar
Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew normal distribution. J Roy Stat Soc 61:579–602
MathSciNet Google Scholar
Azzalini A, Capitanio A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J R Stat Soc Series B (Stat Methodol) 65:367–389
MathSciNet Google Scholar
Bandyopadhyay D, Lachos VH, Abanto-Valle CA, Ghosh P (2010) Linear mixed models for skew-normal/independent bivariate responses with an application to periodontal disease. Stat Med 29:2643–2655
MathSciNet Google Scholar
Bondell HD, Krishna A, Ghosh SK (2010) Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics 66:1069–1077
MathSciNet Google Scholar
Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78
MathSciNet Google Scholar
Cui X, Haerdle WK, Zhu L (2011) The EFM approach for single-index models. Ann Stat 39:1658–1688
MathSciNet Google Scholar
De Boor C (2001) A practical guide to splines, 4th edn. Applied Mathematical Sciences. Springer-Verlag, Berlin
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion). J R Stat Soc Series B (Stat Methodol) 39:1–38
Google Scholar
Eltoft T, Kim T, Lee TW (2006) On the multivariate Laplace distribution. IEEE Signal Process Lett 13:300–303
Google Scholar
Fan JQ, Peng H (2004) Nonconcave penalized likelihood with a diverging number of parameters. Ann Stat 32:928–961
MathSciNet Google Scholar
Fan Y, Li R (2012) Variable selection in linear mixed effects models. Ann Stat 40:2043–2068
MathSciNet Google Scholar
Franczak BC, Browne RP, Mcnicholas PD (2014) Mixtures of shifted asymmetric Laplace distributions. IEEE Trans Pattern Anal Mach Intell 36:1149–1157
Google Scholar
Gupta A (2003) Multivariate skew t-distribution. Stat A J Theor Appl Stat 37:359–363
MathSciNet Google Scholar
Gupta AK, González-FaríAs G, DomáNguez-Molina JA (2004) A multivariate skew normal distribution. J Multivar Anal 89:181–190
MathSciNet Google Scholar
Hardle W, Hall P, Ichimura H (1993) Optimal smoothing in single-index models. Ann Stat 21:157–178
MathSciNet Google Scholar
Jara A, Quintana F, San Martín E (2008) Linear mixed models with skew-elliptical distributions: a Bayesian approach. Comput Stat Data Anal 52:5033–5045
MathSciNet Google Scholar
Kinney SK, Dunson DB (2010) Fixed and random effects selection in linear and logistic models. Biometrics 63:690–698
MathSciNet Google Scholar
Kotz S, Kozubowski TJ, Podgórski K (2001) The Laplace distribution and generalizations. Birkhauser, Switzerland
Google Scholar
Kozubowski TJ, Podgórski K (2001) Asymmetric Laplace laws and modeling financial data. Math Comput Modell 34:1003–1021
MathSciNet Google Scholar
Li Q (2000) Efficient estimation of additive partially linear models. Int Econ Rev 41:1073–1092
MathSciNet Google Scholar
Lian H, Liang H (2013) Generalized additive partial linear models with high-dimensional covariates. Economet Theor 29:1136–1161
MathSciNet Google Scholar
Lin TI, Wang WL (2013) Multivariate skew-normal at linear mixed models for multi-outcome longitudinal data. Stat Model 13:199–221
MathSciNet Google Scholar
Lu M (2017) Efficient estimation of quasi-likelihood models using b-splines. Ann Inst Stat Math 69:1099–1127
MathSciNet Google Scholar
Luo S, Wang J (2014) Bayesian hierarchical model for multiple repeated measures and survival data: an application to Parkinson’s disease. Stat Med 33:4279–4291
MathSciNet Google Scholar
Ma S, Song PX-K (2015) Varying index coefficient models. J Am Stat Assoc 110:341–356
MathSciNet Google Scholar
Meng X, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278
MathSciNet Google Scholar
Michaelis P, Klein N, Kneib T (2018) Bayesian multivariate distributional regression with skewed responses and skewed random effects. J Comput Graph Stat 27:602–611
MathSciNet Google Scholar
Naik DN, Plungpongpun K (2006) A Kotz-type distribution for multivariate statistical inference. Birkhäuser Boston, Boston
Google Scholar
Page RC, Eke PI (2007) Case definitions for use in population-based surveillance of periodontitis. J Periodontol 78:1387–1399
Google Scholar
Pan J, Huang C (2014) Random effects selection in generalized linear mixed models via shrinkage penalty function. Stat Comput 24:725–738
MathSciNet Google Scholar
Pinheiro JC, Liu C, Wu YN (2001) Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate $t$ distribution. J Comput Graph Stat 10:249–276
MathSciNet Google Scholar
Schelldorfer J, Geer SVD (2011) Estimation for high-dimensional linear mixed-effects models using $\ell _1$-penalization. Scand J Stat 38:197–214
MathSciNet Google Scholar
Verbeke G, Fieuws S, Molenberghs G, Davidian M (2014) The analysis of multivariate longitudinal data: a review. Stat Methods Med Res 23:42–59
MathSciNet Google Scholar
Waldmann E, Kneib T (2015) Bayesian bivariate quantile regression. Stat Model 15:326–344
MathSciNet Google Scholar
Wang L, Liu X, Liang H, Carroll RJ (2011) Estimation and variable selection for generalized additive partial linear models. Ann Stat 39:1827–1851
MathSciNet Google Scholar
Wang L, Xue L, Qu A, Liang H et al (2014) Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates. Ann Stat 42:592–624
MathSciNet Google Scholar
Wu J, Tu W (2016) A multivariate single-index model for longitudinal data. Stat Model 16:392–408
MathSciNet Google Scholar
Yu Y, Ruppert D (2002) Penalized spline estimation for partially linear single-index models. J Am Stat Assoc 97:1042–1054
MathSciNet Google Scholar
Zhao W, Lian H, Bandyopadhyay D (2018) A partially linear additive model for clustered proportion data. Stat Med 37:1009–1030
MathSciNet Google Scholar
Zhao W, Lian H, Liang H (2017) GEE analysis for longitudinal single-index quantile regression. J Stat Plann Inference 187:78–102
MathSciNet Google Scholar

Download references

Acknowledgements

The authors thank the Center for Oral Health Research at the Medical University of South Carolina for providing the motivation, and the context of this work.

Funding

The work is partially funded by grants grants R21DE031879 and R01DE031134 awarded by the United States National Institutes of Health.

Author information

Authors and Affiliations

School of Sciences, Nantong University, Nantong, China
Weihua Zhao
Department of Mathematics, City University of Hong Kong, Kowloon Tong, Hong Kong, China
Heng Lian
Department of Biostatistics, School of Population Health, Virginia Commonwealth University, Richmond, VA, USA
Dipankar Bandyopadhyay

Authors

Weihua Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Dipankar Bandyopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Heng Lian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dipankar Bandyopadhyay.

Ethics declarations

Conflict of interest

The authors report no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Appendix 1: Lemmas

Lemma 1

Assume that ${\textbf {A}}$ is a $d\times d$ positive definite matrix, then for any positive definite matrix $\varvec{\Sigma }$ with dimension $d\times d$, we have

$$\begin{aligned}f(\varvec{\Sigma })=|\varvec{\Sigma }|^{-n/2}\exp \left\{ -\frac{1}{2}\textrm{trace}(\varvec{\Sigma }^{-1}{\textbf {A}})\right\} \le \left| \frac{1}{n} {\textbf {A}}\right| ^{-n/2} \exp \left\{ -\frac{n d}{2}\right\} , \end{aligned}$$

if and only if $\varvec{\Sigma }=\frac{1}{n}{\textbf {A}}$.

Proof of Lemma 1

See proofs in Anderson (1984). $\Box$

According to the MLE defined in (12), the likelihood estimating equations for $\varvec{\beta }$ and $\varvec{\theta }$ can be written as

$$\begin{aligned}\sum _{i=1}^n \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}{\widehat{\varvec{\beta }}}){\widehat{\varvec{\theta }}}\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}{\widehat{\varvec{\beta }}}) \end{array}\right) \left. \frac{\partial {\ell }_m(\varvec{ \mu }_i,{\widehat{\varvec{\delta }}},{\textbf {y}}_i)}{\partial \varvec{ \mu }_i}\right| _{\varvec{ \mu }_i={\textbf {W}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}{\widehat{\varvec{\beta }}}){\widehat{\varvec{\theta }}}}=\varvec{0}.\end{aligned}$$

Denote $\dot{\ell }_m(\varvec{\beta },\varvec{\theta }, \varvec{\delta },{\textbf {y}}_i)\triangleq \left. \frac{\partial {\ell }_m(\varvec{ \mu }_i,\varvec{\delta },{\textbf {y}}_i)}{\partial \varvec{ \mu }_i}\right| _{\varvec{ \mu }_i={\textbf {W}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })\varvec{\theta }}$ and $\dot{\ell }_m(\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }),\varvec{\delta },{\textbf {y}}_i)\triangleq \left. \frac{{\ell }_m(\varvec{ \mu }_i,\varvec{\delta },{\textbf {y}}_i)}{\partial \varvec{ \mu }_i}\right| _{\varvec{ \mu }_i=\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })}$. Then we have the following Lemma 2.

Lemma 2

Assuming Conditions (A1)–(A6) hold, we have

$$\begin{aligned} \begin{aligned} \left\| \sum _{i=1}^n \left[ \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })\varvec{\theta }\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }) \end{array}\right) \right. \right.&-\left. \left. \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) \right] \right\| \dot{\ell }_m(\varvec{\beta },\varvec{\theta },\varvec{\delta },{\textbf {y}}_i) \\&= o_p(\sqrt{n}) \end{aligned} \end{aligned}$$

(18)

and

$$\begin{aligned} \begin{aligned} \left\| \sum _{i=1}^n \left[ \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })\varvec{\theta }\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }) \end{array}\right) \right. \right.&- \left. \left. \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\end{array}\right) \right] \right\| \dot{\ell }_m(\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0),\varvec{\delta }_0,{\textbf {y}}_i)\\&= o_p(\sqrt{n}) \end{aligned} \end{aligned}$$

(19)

uniformly over $\Vert \varvec{\beta }^{(-1)}-\varvec{\beta }^{(-1)}_0\Vert +\Vert \varvec{\theta }-\varvec{\theta }_0\Vert +\Vert \varvec{\delta }-\varvec{\delta }_0\Vert \le C r_n$.

Proof of Lemma 2

We firstly prove (19). To obtain the bound, we only need to calculate the conditional variance of the left term in (19) since the conditional expection $\mathbb {E}\left( \left. \dot{\ell }_m(\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0),\varvec{\delta }_0,{\textbf {y}}_i)\right| {\textbf {X}}_i,{\textbf {Z}}_i\right) =\varvec{0}.$ By the Condition (A4), the eigenvalues of the conditional variance for $\mathbb {V}\textrm{ar}\left( \left. \dot{\ell }_m(\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0),\varvec{\delta }_0,{\textbf {y}}_i)\right| {\textbf {X}}_i,{\textbf {Z}}_i\right)$ are bounded, hence we only need to obtain the bound of $\left\| \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })\varvec{\theta }-\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })-{\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right\|$.

By the properties of spline basis, we have

$$\begin{aligned}\begin{array}{ll} |{\textbf {W}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })\varvec{\theta }-{\textbf {W}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0|&{}\le |\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }^*)\varvec{\theta }{\textbf {X}}_i{}^{\textrm{T}}(\varvec{\beta }-\varvec{\beta }_0)|+|{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })(\varvec{\theta }-\varvec{\theta }_0)|\\ &{}\le CK^{1/2}(\Vert \varvec{\beta }-\varvec{\beta }_0\Vert +\Vert \varvec{\theta }-\varvec{\theta }_0\Vert ) \end{array} \end{aligned}$$

and

$$\begin{aligned}\begin{array}{ll} |\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })\varvec{\theta }-\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0|&{}\le |\ddot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }^{**})\varvec{\theta }{\textbf {X}}_i{}^{\textrm{T}}(\varvec{\beta }-\varvec{\beta }_0)|+|\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })(\varvec{\theta }-\varvec{\theta }_0)|\\ &{}\le CK^{3/2}(\Vert \varvec{\beta }-\varvec{\beta }_0\Vert +\Vert \varvec{\theta }-\varvec{\theta }_0\Vert ), \end{array} \end{aligned}$$

where both $\varvec{\beta }^*$ and $\varvec{\beta }^{**}$ lies on the line segment connecting $\varvec{\beta }$ and $\varvec{\beta }_0$. As a result,

$$\begin{aligned} \left\| \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })\varvec{\theta }-\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })-{\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right\| \le CK^{3/2}r_n. \end{aligned}$$

(20)

Then the order of (19) is $O_p(\sqrt{n}K^{3/2}r_n)=o_p(\sqrt{n})$ since $d>2$.

We next prove (18). By the Taylor’s expansion and regularity conditions, it is clear that

$$\begin{aligned}\left\| \dot{\ell }_m(\varvec{\beta }, \varvec{\theta },\varvec{\delta },{\textbf {y}}_i)-\dot{\ell }_m(\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0),\varvec{\delta }_0,{\textbf {y}}_i)\right\| =O_p(r_n).\end{aligned}$$

Applying the results of (19) and (20), the order of (18) is $o_p(\sqrt{n})+O_p(n K^{3/2}r_n^2)=o_p(\sqrt{n})$. $\Box$

Lemma 3

Assume that Condition (A1)–(A6) hold, the singular values of the matrix

$$\begin{aligned}\frac{1}{n}\sum _{i=1}^n \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) {\textbf {C}}_i^0\left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) {}^{\textrm{T}}\end{aligned}$$

are bounded and bounded away from zero with probability approaching one.

Proof of Lemma 3

Note that we can replace $\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0$ with $\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)$ in above expression with only a difference of $o_p(1)$ since $\Vert \dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0-\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\Vert \le CK^{-s+1}$. Therefore we next only need to show that the eigenvalues of

$$\begin{aligned}\frac{1}{n}\sum _{i=1}^n \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) {\textbf {C}}_i^0\left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) {}^{\textrm{T}}\end{aligned}$$

are bounded and bounded away from zero.

By the Condition (A6), there exists a $(p_1+p_2)\times (K_1+K_2)$ matrix $\varvec{\Pi }_0$ such that

$$\begin{aligned}\Vert \mathbb {E}_{\mathcal {G}}[{\textbf {X}}_i\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}]-\varvec{\Pi }_0\dot{{\textbf {W}}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\Vert \le CK^{-s'}.\end{aligned}$$

It is obvious that the singular values of $\left( \begin{array}{cc} {\textbf {I}}&{} -\varvec{\Pi }_0\\ \varvec{0} &{} {\textbf {I}}\end{array}\right)$ are bounded and bounded away from zero. Thus, by pre-/post-multiplying this matrix, we only need to prove that the singular values of

$$\begin{aligned} \begin{aligned} \frac{1}{n}\sum _{i=1}^n\left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}-{\textbf {J}}{}^{\textrm{T}}\varvec{\Pi }_0\dot{{\textbf {W}}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) {\textbf {C}}_i^0&\\ \times \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}-{\textbf {J}}{}^{\textrm{T}}\varvec{\Pi }_0\dot{{\textbf {W}}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) {}^{\textrm{T}}\end{aligned} \end{aligned}$$

are bounded and bounded away from zero. Apply the approximation of splines again, we only need to show that the singular values of

$$\begin{aligned} \begin{aligned} \frac{1}{n}\sum _{i=1}^n \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}-{\textbf {J}}{}^{\textrm{T}}\mathbb {E}_{\mathcal {G}}[{\textbf {X}}_i\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}]\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) {\textbf {C}}_i^0&\\ \times \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}-{\textbf {J}}{}^{\textrm{T}}\mathbb {E}_{\mathcal {G}}[{\textbf {X}}_i\textrm{diag}\{\dot{\textbf{g}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\}]\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) {}^{\textrm{T}}\end{aligned} \end{aligned}$$

are bounded, and bounded away from zero. By the law of large numbers, we only need to show its expectation has eigenvalues bounded and bounded away from zero. This is true by checking Conditions (A5) and (A6). $\Box$

Proof of Theorem 1

By Lemma 2 and Taylor’s expansion, it is easy to show

$$\begin{aligned} \begin{array}{ll} &{}\sum _{i=1}^n \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) \dot{\ell }_m(\varvec{\beta },\varvec{\theta },\varvec{\delta },{\textbf {y}}_i)\\ =&{}\sum _{i=1}^n \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) \dot{\ell }_m(\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0),\varvec{\delta }_0,{\textbf {y}}_i)\\ &{}-\sum _{i=1}^n \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) {\textbf {C}}_i^0\left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) {}^{\textrm{T}}\left( \begin{array}{c} \varvec{\beta }^{(-1)}-\varvec{\beta }^{(-1)}\\ \varvec{\theta }-\varvec{\theta }_0\end{array}\right) \\ &{}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +o_p(\sqrt{n})+O_p(nr_n) \end{array} \end{aligned}$$

(21)

if $\Vert \varvec{\beta }^{(-1)}-\varvec{\beta }^{(-1)}_0\Vert +\Vert \varvec{\theta }-\varvec{\theta }_0\Vert +\Vert \varvec{\delta }-\varvec{\delta }_0\Vert =O_p(r_n)$.

By direct variance calculation

$$\begin{aligned} \sum _{i=1}^n \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) \dot{\ell }_m(\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0),\varvec{\delta }_0,{\textbf {y}}_i)=O_p(\sqrt{nK}). \end{aligned}$$

(22)

Moreover, by Lemma 3, the singular values of the matrix

$$\begin{aligned}\frac{1}{n}\sum _{i=1}^n \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) {\textbf {C}}_i^0\left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) {}^{\textrm{T}}\end{aligned}$$

are bounded and bounded away from zero with probability approaching one.

Combining the above results of (21) and (22) together with Lemma 2, if choosing L sufficiently large enough for $\Vert \varvec{\beta }^{(-1)}-\varvec{\beta }^{(-1)}_0\Vert +\Vert \varvec{\theta }-\varvec{\theta }_0\Vert +\Vert \varvec{\delta }-\varvec{\delta }_0\Vert =Lr_n$, we have

$$\begin{aligned}\begin{array}{l} P\left( \left\| \sum _{i=1}^n \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })\varvec{\theta }\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }) \end{array}\right) \dot{\ell }_m(\varvec{\beta },\varvec{\theta },\varvec{\delta },{\textbf {y}}_i)\right\| \right. \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left. >\left\| \sum _{i=1}^n \left( \begin{array}{c} {\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}\\ {\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0) \end{array}\right) \dot{\ell }_m(\varvec{\beta }_0,\varvec{\theta }_0,\varvec{\delta }_0,{\textbf {y}}_i)\right\| \right) \rightarrow 1. \end{array}\end{aligned}$$

Thus we can conclude that $\Vert {\widehat{\varvec{\beta }}}-\varvec{\beta }_0\Vert +\Vert {\widehat{\varvec{\theta }}}-\varvec{\theta }_0\Vert =O_p(r_n)$. $\Box$

Proof of Theorem 2

Denote ${\textbf {T}}=({\textbf {T}}_1,\ldots ,{\textbf {T}}_n){}^{\textrm{T}}$, ${\textbf {D}}=\textrm{diag}({\textbf {C}}_1^0,\ldots ,{\textbf {C}}_n^0)$ and define the “projection matrix" ${\textbf {P}}={\textbf {T}}({\textbf {T}}{}^{\textrm{T}}{\textbf {D}}{\textbf {T}})^{-1}{\textbf {T}}{}^{\textrm{T}}{\textbf {D}}$, where ${\textbf {T}}_i={\textbf {W}}_i({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)$. Let ${\textbf {X}}_i^*={\textbf {J}}{}^{\textrm{T}}{\textbf {X}}_i\textrm{diag}\{\dot{{\textbf {W}}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)\varvec{\theta }_0\}$, ${\textbf {X}}^*=({\textbf {X}}_1^*,\ldots ,{\textbf {X}}_n^*){}^{\textrm{T}}$ and $\widetilde{{\textbf {X}}}^*=({\textbf {I}}-{\textbf {P}}){\textbf {X}}^*$. Then we can write ${\widetilde{{\textbf {X}}}}_i^*={\textbf {X}}_i^*-{\textbf {A}}{\textbf {T}}_i$ where ${\textbf {A}}={\textbf {X}}^*{}{}^{\textrm{T}}{\textbf {D}}{\textbf {T}}({\textbf {T}}{}^{\textrm{T}}{\textbf {D}}{\textbf {T}})^{-1}$. Let $\widetilde{{\textbf {A}}}=\left( \begin{array}{ll} {\textbf {I}}&{} {\textbf {A}}\\ \varvec{0} &{} {\textbf {I}}\end{array} \right)$. By the Lemma 2 and the proof shown in Theorem 1, we have

$$\begin{aligned}&\sup _{\Vert \varvec{\beta }^{(-1)}-\varvec{\beta }^{(-1)}_0\Vert +\Vert \varvec{\theta }-\varvec{\theta }_0\Vert +\Vert \varvec{\delta }-\varvec{\delta }_0\Vert \le C r_n} \left\| \widetilde{{\textbf {A}}}\sum _{i=1}^n \left( \begin{array}{l}\widetilde{{\textbf {X}}}_i^*\\ {\textbf {T}}_i\end{array}\right) \dot{\ell }_m(\varvec{\beta },\varvec{\theta },\varvec{\delta },{\textbf {y}}_i) \right. \\&\left. -\widetilde{{\textbf {A}}}\sum _{i=1}^n \left( \begin{array}{l}\widetilde{{\textbf {X}}}_i^*\\ {\textbf {T}}_i\end{array}\right) \dot{\ell }_m(\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0),\varvec{\delta }_0,{\textbf {y}}_i)\right. \\&+\left. \widetilde{{\textbf {A}}}\sum _{i=1}^n \left( \begin{array}{l} \widetilde{{\textbf {X}}}_i^*\\ {\textbf {T}}_i\end{array} \right) {\textbf {C}}_i^0\left[ (\widetilde{{\textbf {X}}}_i^*{}{}^{\textrm{T}},{\textbf {T}}_i{}^{\textrm{T}})\left( \begin{array}{c} \varvec{\beta }^{(-1)}-\varvec{\beta }^{(-1)}_0\\ \varvec{\theta }-\varvec{\theta }_0+{\textbf {A}}{}^{\textrm{T}}(\varvec{\beta }^{(-1)}-\varvec{\beta }^{(-1)}_0) \end{array}\right) +{\textbf {R}}_i\right] \right\| \\ {}&=o_p(\sqrt{n}), \end{aligned}$$

where ${\textbf {R}}_i={\textbf {W}}_i{}^{\textrm{T}}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta })\varvec{\theta }-\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0)$.

By parameter transformation, we can write $\varvec{\theta }-\varvec{\theta }_0+{\textbf {A}}{}^{\textrm{T}}(\varvec{\beta }^{(-1)}-\varvec{\beta }^{(-1)}_0)$ as $\varvec{\vartheta }-\varvec{\vartheta }_0$. Further denote

$$\begin{aligned} {\textbf {U}}(\varvec{\beta },\varvec{\vartheta })&\triangleq \widetilde{{\textbf {A}}}\sum _{i=1}^n \left( \begin{array}{l}\widetilde{{\textbf {X}}}_i^*\\ {\textbf {T}}_i\end{array}\right) \dot{\ell }_m(\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0),\varvec{\delta }_0,{\textbf {y}}_i) \\ {}&-\sum _{i=1}^n \left( \begin{array}{l} \widetilde{{\textbf {X}}}_i^*\\ {\textbf {T}}_i\end{array} \right) {\textbf {C}}_i^0\left[ (\widetilde{{\textbf {X}}}_i^*{}{}^{\textrm{T}},{\textbf {T}}_i{}^{\textrm{T}})\left( \begin{array}{c} \varvec{\beta }^{(-1)}-\varvec{\beta }^{(-1)}_0\\ \varvec{\vartheta }-\varvec{\vartheta }_0 \end{array}\right) +{\textbf {R}}_i\right] , \end{aligned}$$

and the first $p_1+p_2-2$ and the last $K_1+K_2$ equations of ${\textbf {U}}(\varvec{\beta },\varvec{\vartheta })$ as ${\textbf {U}}_1(\varvec{\beta },\varvec{\vartheta })$ and ${\textbf {U}}_2(\varvec{\beta },\varvec{\vartheta })$, respectively. Let

$$\begin{aligned}\widetilde{\varvec{\beta }}^{(-1)}={\varvec{\beta }}^{(-1)}_0+\left( \sum _{i=1}^n\widetilde{{\textbf {X}}}_i^*{\textbf {C}}_i^0\widetilde{{\textbf {X}}}_i^*{}{}^{\textrm{T}}\right) ^{-1} \sum _{i=1}^n\widetilde{{\textbf {X}}}_i^*\dot{\ell }_m(\textbf{g}({\textbf {X}}_i{}^{\textrm{T}}\varvec{\beta }_0),\varvec{\delta }_0,{\textbf {y}}_i).\end{aligned}$$

It is easy to see that

$$\begin{aligned}\left\| \frac{1}{n}\sum _{i=1}^n\widetilde{{\textbf {X}}}_i^*{\textbf {C}}_i^0\widetilde{{\textbf {X}}}_i^*{}{}^{\textrm{T}}- {\varvec{{\Psi }}}\right\| =o_p(1),\end{aligned}$$

and by the central limit theorem, we have

$$\begin{aligned}\sqrt{n}(\widetilde{\varvec{\beta }}^{(-1)}-{\varvec{\beta }}^{(-1)}_0) {\mathop {\longrightarrow }\limits ^{\textrm{d}}}N(\varvec{0},{\varvec{{\Psi }}}_1^{-1}).\end{aligned}$$

In the following, we only need to show $\Vert \widehat{\varvec{\beta }}^{(-1)}-\widetilde{\varvec{\beta }}^{(-1)}_0\Vert =o_p(1/\sqrt{n})$.

For any $\varvec{\beta }$ satisfying $\Vert {\varvec{\beta }}^{(-1)}-{\varvec{\beta }}^{(-1)}_0\Vert =\varepsilon /\sqrt{n}, \forall \varepsilon >0$, similar to the proof of Lemma A.6 in Zhao et al. (2017), we can show that

$$\begin{aligned}\Vert \sum _{i=1}^n \widetilde{{\textbf {X}}}_i^*{\textbf {C}}_i^0{\textbf {R}}_i\Vert =o_p(\sqrt{n}) \ \ \textrm{and} \ \ \Vert \sum _{i=1}^n {\textbf {T}}_i{\textbf {C}}_i^0{\textbf {R}}_i\Vert =o_p(n),\end{aligned}$$

which lead to

$$\begin{aligned}\Vert {\textbf {U}}_2(\varvec{\beta },\widehat{\varvec{\vartheta }})-{\textbf {U}}_2({\widetilde{\varvec{\beta }}},\widehat{\varvec{\vartheta }})\Vert =o_p(\sqrt{n}).\end{aligned}$$

Furthermore, note that

$$\begin{aligned}\sum _{i=1}^n \widetilde{{\textbf {X}}}_i^*{\textbf {C}}_i^0{\textbf {T}}_i{}^{\textrm{T}}=\sum _{i=1}^n ({\textbf {X}}_i^*-{\textbf {X}}^*{}{}^{\textrm{T}}{\textbf {D}}{\textbf {T}}({\textbf {T}}{}^{\textrm{T}}{\textbf {D}}{\textbf {T}})^{-1}{\textbf {T}}_i){\textbf {D}}_i{\textbf {T}}_i{}^{\textrm{T}}=\varvec{0},\end{aligned}$$

and ${\textbf {U}}_1(\varvec{\beta },{\varvec{\vartheta }})$ is a linear function of $\varvec{\beta }$ up to a $o_p(\sqrt{n})$ term. Consequently,

$$\begin{aligned}\Vert {\textbf {U}}_1(\varvec{\beta },\widehat{\varvec{\vartheta }})\Vert \ge Cn\Vert \varvec{\beta }-{\widetilde{\varvec{\beta }}}\Vert +o_p(\sqrt{n}).\end{aligned}$$

As a result, we have

$$\begin{aligned}\Vert \widetilde{{\textbf {A}}}{\textbf {U}}(\varvec{\beta },\widehat{\varvec{\vartheta }})\Vert \ge C\varepsilon \sqrt{n} \ \ \textrm{while} \ \ \Vert \widetilde{{\textbf {A}}}{\textbf {U}}({\widetilde{\varvec{\beta }}},\widehat{\varvec{\vartheta }})\Vert = o_p(\sqrt{n})\end{aligned}$$

since the eigenvalues of ${\widetilde{{\textbf {A}}}}{\widetilde{{\textbf {A}}}}{}^{\textrm{T}}$ are bounded and bounded away from zero with probability approaching 1. Then we can conclude that $\Vert \widehat{\varvec{\beta }}^{(-1)}-\widetilde{\varvec{\beta }}^{(-1)}_0\Vert =o_p(1/\sqrt{n})$ holds. $\Box$

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhao, W., Bandyopadhyay, D. & Lian, H. Single-Index Mixed-Effects Model for Asymmetric Bivariate Clustered Data. J Indian Soc Probab Stat 25, 17–45 (2024). https://doi.org/10.1007/s41096-024-00181-0

Download citation

Accepted: 08 February 2024
Published: 16 March 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s41096-024-00181-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Single-Index Mixed-Effects Model for Asymmetric Bivariate Clustered Data

Abstract

Similar content being viewed by others

A Shared Spatial Model for Multivariate Extreme-Valued Binary Data with Non-Random Missingness

Risk factors of chronic periodontitis on healing response: a multilevel modelling analysis

Multiple imputation methods for missing multilevel ordinal outcomes

1 Introduction

2 Statistical Model

2.1 Single-Index Mixed-Effects Model

2.2 Modeling the Index Functions

3 Theoretical Properties

Remark 1

Theorem 1

Remark 2

Theorem 2

4 Maximum Likelihood Estimation

5 Simulation Studies

5.1 Knots Selection

5.2 Simulation 1: Assessing Finite-Sample Properties

5.3 Simulation 2: Assessing Robustness, in Light of Competing Methods

6 Application: GAAD Dataset

7 Conclusions

Change history

15 May 2024

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 Appendix 1: Lemmas

Lemma 1

Proof of Lemma 1

Lemma 2

Proof of Lemma 2

Lemma 3

Proof of Lemma 3

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation