Single-Index Mixed-Effects Model for Asymmetric Bivariate Clustered Data

Studies/trials assessing status and progression of periodontal disease (PD) usually focus on quantifying the relationship between the clustered (tooth within subjects) bivariate endpoints, such as probed pocket depth (PPD), and clinical attachment level (CAL) with the covariates. Although assumptions of multivariate normality can be invoked for the random terms (random effects and errors) under a linear mixed model (LMM) framework, violations of those assumptions may lead to imprecise inference. Furthermore, the response-covariate relationship may not be linear, as assumed under a LMM fit, and the regression estimates obtained therein do not provide an overall summary of the risk of PD, as obtained from the covariates. Motivated by a PD study on Gullah-speaking African-American Type-2 diabetics, we cast the asymmetric clustered bivariate (PPD and CAL) responses into a non-linear mixed model framework, where both random terms follow the multivariate asymmetric Laplace distribution (ALD). In order to provide a one-number risk summary, the possible non-linearity in the relationship is modeled via a single-index model, powered by polynomial spline approximations for index functions, and the normal mixture expression for ALD. To proceed with a maximum-likelihood inferential setup, we devise an elegant EM-type algorithm. Moreover, the large sample theoretical properties are established under some mild conditions. Simulation studies using synthetic data generated under a variety of scenarios were used to study the finite-sample properties of our estimators, and demonstrate that our proposed model and estimation algorithm can efficiently handle asymmetric, heavy-tailed data, with outliers. Finally, we illustrate our proposed methodology via application to the motivating PD study.


Introduction
Epidemiological studies in a clustered, or longitudinal data setting often generate multivariate (repeated) outcomes that are analyzed under the ubiquitous multivariate normal (MVN) assumptions of the random terms (random effects, and within-subject random errors) via standard software, such as SAS, or R.However, violations of those assumptions can lead to imprecise parameter estimates (Bandyopadhyay et al. 2010).These non-Gaussian features are usually manifested through skewness of the response vector, and/or thick-tails.Although achieving close-to-normality via suitable data transformations of the responses (such as log, or Box-Cox) for standard linear mixed model (LMM) analysis are possible, they maybe avoided due to their non-universality, and difficulty in covariate interpretation on the original scale (Jara et al. 2008).To address this, various flexible (parametric) alternatives to the MVN density exists, such as the multivariate skewnormal density (Azzalini and Capitanio 1999;Gupta et al. 2004;Azzalini 2010), the heavy-tailed multivariate skew t-density (Azzalini and Capitanio 2003), and others, that can accommodate departures from normality without resorting to adhoc data transformations.
In practice, this setup can be further complicated in presence of multiple outcomes recorded at each cluster units/components.The motivating data example in this paper comes from a clinical study of periodontal disease (PD) conducted on Gullah-speaking African-American Type-2 diabetics (henceforth, GAAD).Here, the multiple outcomes of interest are the tooth-level (mean) probed pocket depth (PPD) and clinical attachment level (CAL), which are recorded (in mm, via a periodontal probe) simultaneously for each tooth nested/clustered within a subject.While PPD quantifies the current PD status, CAL measures the (past) disease history and progression (Page and Eke 2007).An oral clinician may be interested in studying the joint evolution of these outcomes over some features of covariates, and the complexity is induced from two different sources of correlation-(a) Between repeated observations of any given outcome (PPD, or CAL) measured at a cluster unit (tooth), and (b) Between multiple outcomes (PPD and CAL) measured at the same tooth.The existing literature (both classical and Bayesian) in this context of multiple repeated outcomes modeling is also very rich (Luo and Wang 2014;Verbeke et al. 2014;Lin and Wang 2013;Michaelis et al. 2018;Bandyopadhyay et al. 2010).However, a vast majority of these models are developed under the restrictive assumption of linearity of the covariate effects over the multivariate responses.
To motivate further, consider Fig. 1, which presents plots of the empirical Bayes' estimates of random effects (panels a and b), corresponding Q-Q plots (panels c and d), and observed versus estimated (non-linear) curve (panels e and f), obtained from fitting a LMM separately to the PPD and CAL responses in the GAAD data, using the lme function in R. The plots clearly reveal evidence of asymmetryq (departures from the Gaussian assumptions), which cannot be explained by a standard LMM fit.In addition, the predictor space restricted to be linear combinations of covariates may not provide an elegant picture of their cross-sectional association with the (bivariate) response.Formulating an index for PD (that handles possible nonlinearity, confounding, and interaction effects between the PD outcomes and the covariates) via a single-index model, or SIM (Hardle et al. 1993) can be a clinically elegant alternative.SIMs are a popular class of semiparametric regression models that relaxes the assumption of linearity, and bypass the 'curse of dimensionality' by reducing the multi-dimensional predictor space X into an univariate (scalar) index U = X T β.A link function g(.) now connects the covariate space to the response Y , offering a pragmatic compromise between a fully nonparametric (and often non-interpretable) multiple regression, and a restrictive (parametric) linear regression.Here, the magnitude of the index coefficient β j determine the relative importance of the j-th predictor on the index, and g(U) denotes the location of interest in the response curve at the index U.In biomedical research, the recent work by Wu and Tu (2016) develops an adiposity index via a (multivariate) SIM to efficiently predict multiple longitudinal outcomes (systolic and diastolic blood pressure) in children.However, their proposal considers the usual MVN assumptions for the random terms (errors and effects), and may not well accommodate heavy tailed and other non-Gaussian features.Furthermore, they did not provide rigorous theoretical justification.
Considering Wu and Tu (2016) as our starting point, we seek to develop an index that can efficiently predict the clustered bivariate (PPD and CAL) PD outcomes.Such a clinical index that links both outcomes is vastly absent in the oral health literature.Our bivariate single-index mixed (BV-SIM) model tackles non-Gaussian features in the responses via the multivariate asymmetric Laplace density (ALD; Kotz et al. 2001) assumptions in the random terms.The multivariate ALD can accommodate asymmetric, peaked, and heavytailed data using fewer number of parameters than the popular multivariate skew-t density (Gupta 2003).The multivariate symmetric Laplace density (Naik and Plungpongpun 2006), a special case of the ALD, has been applied in other fields, such as speech clustering, classification problems, and image/signal analysis.Under this framework, we consider a polynomial spline approximation to the nonparametric index function, and propose an efficient EM-type algorithm for estimation and inference.The spline approximation, and the mixture normal representation of the multivariate ALD presents a computationally efficient, and intuitively appealing estimation setup, quantifying correlations from both sources.
The rest of the paper is organized as follows.In Sect.2, we propose the BV-SIM model under the assumptions of a multivariate asymmetric Laplace density.Using the polynomial splines approximation for the nonparametric (index) functions, we derive the maximum likelihood (ML) estimate, and establish the large sample properties of the proposed estimators in Sect.3, with the detailed technical proofs relegated to the Appendix, where we use the projection method to prove the asymptotic normality of parametric part.In Sect.4, we develop an efficient MLE procedure based on the EM-algorithm.Simulation studies comparing finite sample performance of our approach to other alternatives appear in Sect.5, while Sect.6 illustrates the method via application to the PD dataset.Finally, some concluding remarks are presented in Sect.7.
We begin with a sketch of the multivariate shifted Laplace density (Kotz et al. 2001), and then develop our SIM mixed effects framework for bivariate clustered data.The multivariate ALD has the density where K ν is the modified Bessel function of the third kind with index ν, ν = (2 − d)/2, u = 2 + γ T Σ −1 γ y T Σ −1 y , γ ∈ ℝ d is a skewness parameter and Σ is a positive definite (p.d.) scatter matrix with dimension d × d.We denote (2.1) as ALD d (Σ, γ).Note, the ALD forces each component density to be joined at the same origin.An extension, the multivariate shifted asymmetric Laplace distribution (SALD; Kotz et al. 2001), has the form where u = 2 + γ T Σ −1 γ δ(y, μ, Σ), δ(y, μ, Σ) = (y − μ) T Σ −1 (y − μ), and ν, γ, Σ are defined in (2.1).Here, we use the notation Y ∼ SAL d (μ, Σ, γ) to denote the random variable y following a d-dimensional SALD.After some calculations, the mean and variance of SALD are given by It is clear that the mean depends on the shifted location parameter μ and skewness parameter γ, while its variance depends on scatter matrix Σ and skewness parameter γ.Also, Σ + γγ T must be p.d. if Σ is p.d.The parameter γ plays an important role in multivariate asymmetric data analysis, besides the location μ and scatter matrix Σ. Note, the multivariate density in (2.2) reduces to (2.1) when μ = 0, and it further reduces to the multivariate symmetry Laplace distribution (Eltoft et al. 2006) when γ = 0.Moreover, (2.2) reduces to the univariate ALD when and is popularly used in the likelihood framework for quantile regression with density p(y) = τ(1 − τ)exp −ρ τ (y − μ) , where ρ τ (u) = u(τ − I(u < 0)).The SALD in (2.2) has the following stochastic representation where V is a random variable from an exponential distribution with mean 1 and Z ∼ N d (0, Σ) is generated independent of V .Using Bayes's theorem, the density of V given Y = y is generalized inverse Gaussian, with the density where ν, γ, μ, Σ, δ(y, μ, Σ) and u are as defined in (2.2).The SALD allows for peakedness, heavy tails, and skewness, and hence provides more flexibility in modeling multivariate data with non-Gaussian features.More properties, extensions and applications of SALD appear in Kozubowski and Podgórski (2001); Franczak et al. (2014); Bouveyron and Brunet-Saumard (2014).
For identifiability, we assume both ∥ β 1 ∥ = 1 and ∥ β 2 ∥ = 1, and their first components are positive, respectively.In this paper, the popular "delete one component" method is used to avoid the equality constraints (Yu and Ruppert 2002;Cui et al. 2011).Specifically, we write , with its Jacobian matrix given by where I p 1 − 1 is the identity matrix with p 1 − 1 rows/columns.The true parameter β 1 ( − 1) satisfies the constraint β 1 ( − 1) < 1, which implies that it is a interior point in a unit ball in ℝ p 1 − 1 . Therefore, . Similarly, we define ( − 1) and J 2 , and let β ( − 1) = β 1 ( − 1) T , β 2 ( − 1) T T , J = blockdiag J 1 , J 2 .Applying the stochastic representation in (2.3), model (2.5) admits the following hierarchical structure: where , E denotes the exponential distribution, where ⊗ denotes the kronecker product, and 1 m i is a m i column vector with element 1 .From (2.5) and (2.6), it is clear that conditional on V i , ϵ ij and b i are independent.
Integrating out b i in (2.6), we have the following hierarchical model where , z ij (2) , G i = Z i T ΩZ i + Λ i .Moreover, it follows from (2.7) that the y i are independent and marginally distributed as where γ i * = 1 m i ⊗ γ.From (2.7) and by the properties of the generalized inverse Gaussian distribution in (2.4), we have (2.9)

Modeling the Index Functions
Since the two functions g 1 and g 2 in (2.5) are unknown, we use polynomial splines to approximate them in the subsequent ML estimation.Polynomial splines are simple, yet practical tools with computational tractability and statistical efficiency, and has been proven to be an extremely powerful method for smoothing.
For simplicity, we assume that the covariates x ij (1) and x ij (2) are bounded and the supports of x (1) T β 10 and x (2) T β 20 are contained in the finite interval [a, b].Such a compactness assumption is almost always used in nonparametric regression with spline approximation.We use polynomial splines to approximate the nonparametric functions g 1 and g 2 .Let with K′ internal knots.A polynomial spline of order d is a function whose restriction to each subinterval is a polynomial of degree d − 1 and globally d − 2 times continuously differentiable on [a, b].The collection of splines with a fixed sequence of knots has a Bspline basis B 1 (x), …, B K (x) , with K = K′ + d.We assume the B-spline basis is normalized to have , although, any scaling can be used without changing the theoretical results.
for g 1 and g 2 .Then, we have As a result, we can write (2.10) By letting the number of knots increase with the sample size at an appropriate rate, the spline estimate of the unknown function can achieve the optimal nonparametric convergence rate.

Theoretical Properties
In this section, we will investigate the theoretical properties for the index parameters and the index functions.In the following we establish the large sample properties based on the marginal distribution (2.8) of the proposed BV-SIM model in (2.5).For simplicity, we assume m i ≡ m, with the response viewed as i.i.d.data, We first introduce some notations.
Let β 01 and β 02 be the true index parameters, and g 01 and g 02 the corresponding true index functions.Let . Denote the support of X i T β 0 as [a, b], where , x ij (2) .Let ℋ s be the collection of all functions on the support [a, b] whose l-th order derivative satisfies the Hölder condition of the order r with s = l + r.Then, for each g ∈ ℋ s , there exists a positive ∀u, v ∈ [a, b].From De Boor (2001), there exists a constant C (see page 149) such that are the true value of spline coefficients, which can be viewed as the best approximation coefficient vectors for g k .
Denote δ = γ T , vech(Ω) T , vech(Σ) T T and Θ as the parameter space of ζ = β T , θ T , δ T T .Given the covariates X i and Z i , let ℓ m μ i , δ, y i be the log-likelihood of the marginal distribution for response y i in (2.8) and ℓ m ζ, y i ≜ ℓ m W i T X i T β θ, δ, y i be the corresponding splineapproximated log-likelihood.Let δ 0 be the true value of δ and θ 0 = θ 01 T , θ 02 T T .Define as the MLE, given by where Define the space of square integrable single-index functions G = g : E∥ g X i Note, the definition of projection involves the distributions of both X i , Z i and Γ since we take the expectation over these random variables.This definition can be extended to any 2m × L matrix by column-wise projection.In the following, we list the regularity conditions (Wang et al. 2014;Lian and Liang 2013;Zhao et al. 2017) that are necessary to study the asymptotic behavior of the MLEs.

(A3)
The true parameter point ζ 0 is an interior point of the parameter space Θ.
(A4) The log-likelihood ℓ m ζ, y i is at least thrice differentiable on parameters ζ.
Furthermore, the second derivatives of the likelihood function satisfy the equations Also, there exists functions M jkl y i , such that Here ζ j denotes the j-th component of ζ.

(A5)
The Fisher information matrix satisfies the conditions where λ min and λ max denote the smallest and largest eigenvalues of a matrix.
. Assume all ℎ j ∈ ℋ s′ with s′ > 1.We also assume that is positive definite, where J is evaluated at β 0 .
Remark 1 The smoothness condition in (A1) is a requirement to attain the best convergence rate for single-index functions approximated in the spline space.Condition (A2) is widely used in the single-index modeling literature, ensuring that the index functions are defined in a compact set and thus facilitates the technical derivations.Conditions (A3) and (A4) are two common assumptions in the literature of maximum likelihood estimation with spline approximations (Wang et al. 2011(Wang et al. , 2014)), implying that the information matrix of the likelihood function is positive definite.Condition (A5) is slightly stronger than that used in the usual asymptotic likelihood theory, however, widely used in high-dimensional likelihood estimation literature Fan and Peng (2004).Finally, Condition (A6) is related to the 'projection', or the 'orthogonalization' technique common in a semiparametric setup, which includes partially linear model (Li 2000), partially linear additive model (Lian and Liang 2013), and single-index models (Cui et al. 2011;Zhao et al. 2017).
Denote K = max K 1 , K 2 , and let r n = K /n + K −s .Then, we have the following result.
Remark 2 Note that the rate of convergence for nonparametric functions is O p n −s/(2s + 1) if the optimal K ∼ n 1/(2s + 1) , which is the same as that found in the nonparametric and semiparametric literature.
Following Theorem 2 and invoking the Delta method, we have

Maximum Likelihood Estimation
In this section, we develop the ML estimation for our BV-SIM model.We utilize EM-type algorithms for obtaining the MLE, based on two types of missing data structures in (2.6).
The EM algorithm is a popular iterative algorithm for MLE in models with incomplete data (Dempster et al. 1977), where each iteration of the EM algorithm consists of two steps, the expectation (E) step and the maximization (M) step.Despite desirable features, the M-step in the EM algorithm is often difficult to implement for complicated models, and is replaced with a sequence of computationally simple conditional maximization (CM) steps, i.e. maximizing over one parameter with the other parameters held fixed.This leads to a simple extension of the EM algorithm, called the ECM algorithm (Meng and Rubin 1993).
Consider the hierarchical multivariate Laplace model in (2.6), where both V i and b i are . The log-likelihood for the complete data in the multivariate Laplace single-index mixed-effects model up to an additive constant can be written as where and where μ ij is defined in (2.5) and N = ∑ i = 1 n m i .Note that ℓ 1 can be further written as Denote η as the full parameter vector to be estimated.We firstly compute the conditional posterior mean and variance of b i at the current estimate η ^, leading to After obtaining the estimates of the conditional mean and conditional covariance of the random effect b i , we proceed to calculate the expectation of which can be computed from (2.9), using the current estimate η ^.After some simple calculations, we have and Next, maximizing Q 1 over parameters θ, γ, β and Σ, and maximizing Q 2 over Ω, we can obtain their estimates, which constitutes the CM-steps 1-5 in the following ECM algorithm: E-step Given current parameter estimates, for i = 1, …, n, update c i and d i using (4.3), and update Δ i , R i1 and R i2 by (4.2).
CM-step 1 Fix β, γ ^ and Σ ^, and update θ ^ by maximizing (4.4) over θ, which gives CM-step 2 Fix β ^, θ ^ and Σ ^, update γ ^ by maximizing (4.4) over γ, i.e., ^ and Σ ^, and update β ^ by maximizing (4.4) over β.Since there is no explicit expression for the estimate of the index parameter β, we use the Newton-Raphson method to obtain β, leading to the following iterative formula where ^2 , and B ˙( ⋅ ) denotes the first derivative of the spline basis B( ⋅ ).
CM-step 5 Update Ω by maximizing (4.5) over Ω, which gives Repeat the above E-step and CM-steps, until all parameters achieve the desired convergence criterion.Since our estimation procedure requires initial values, we set γ ^(0) = (0,0) T , Σ (0) = I 2 , and the estimates of β 1 (0) , β 2 (0) and Ω (0) are obtained from fitting a linear mixed model via the R package lmer, where , x ij (2) and Z ij are the design matrices corresponding to the fixed effects and random effects, respectively.Simulation studies (in Sect.5) show that the above strategy works well.

Simulation Studies
In this section, we conduct extensive simulation studies using synthetic data to study the finite-sample performance of the model parameters in our proposed method (Simulation 1), and the robustness of our method when compared to existing alternatives, under data generated under various settings (Simulation 2).

Knots Selection
It is well-known that the performance of any spline estimation depends on the knots selection.Here, we employed Schwartz information criteria (SIC) for adaptive know selection (Ma and Song 2015;Lu 2017;Zhao et al. 2017).In view of the order n 1/(2s + 1) (of knots) to attain optimal convergence rate of nonparametric functions in 1, a sequence of knots are selected in a neighborhood of n 1/(2s + 1) , such as 0.5N s , min 5N s , n 1/2 , where N s = n 1/(2s + 1) , and s is the smoothing parameter.We choose s = 2 in both simulation studies and real data application.For simplicity, we use cubic polynomial splines and the number of interior knots K 1 = K 2 ≡ K are the same for the two nonparametric link functions.
The number K opt corresponding to the minimum SIC value is defined as the optimal number of knots SIC(K) = − ∑ i = 1 n logL ^i K + logn × 2K, where logL ^i K denotes the estimated value of the log-likelihood function obtained from(2.8),with the given K knots.
From Table 1, all biases are close to zero for all sample sizes, implying our proposed estimators are consistent.Moreover, the absolute biases and the standard errors are smaller with increasing sample sizes, with the estimation performance of index parameters significantly better than the skewness parameters.To further assess the estimation results, we calculate the integrated mean squared error (IMSE), defined as where g ^l (s) ( ⋅ ) is the spline approximation to g l ( ⋅ ) in the sth simulation run.We report the average of the IMSE as AIMSE = 1 2 ∑ l = 1 2 IMSE g l in Table 2.For evaluating the estimation performances of the scatter matrix Σ (corresponding to the bivariate responses) and the covariance matrix Ω (for the random effects), we use the Frobenius-norm of the matrix of differences between the estimated and true values, i.e. ∥ A ∥ F = trace A T A , where A is either Σ − Σ or Ω − Ω. Simulation results, together with the root of mean square error (RMSE) for β 1 , β 2 and γ are listed in Table 2, where the RMSE for an arbitrary parameter δ is defined as It is clear from Table 2 that the finite-sample performances of our proposed estimation procedures are satisfactory, with increasing sample sizes.In sum, the simulation results show that both index parameters, the non-parametric functions, and other parameters associated with the mixed effect models are reliably estimated, thereby confirming that our proposed algorithm works well in synthetic data settings.

Simulation 2: Assessing Robustness, in Light of Competing Methods
Here, the data is generated similar to Simulation 1 (from a BV-SIM), except that the random effects and errors are independently generated under the following four distributional assumptions: Case 4: b i ∼ 0.8N(0, Ω) + 0.2N(0, 10Ω), ϵ ij ∼ 0.8N(0, Σ) + 0.2N(0, 10Σ), Here, Case 1 corresponds to random effects and errors independently generated from the multivariate normal distribution.For Case 2, both are generated from the multivariate t-distribution with degree of freedom v (setting v = 5).For Case 3, the random effects and errors are generated from the multivariate symmetric Laplace distribution with covariance matrix Ω and Σ, respectively.Finally, Case 4 corresponds to generating both the random terms (effects and errors) from multivariate normal mixtures.Note, for the above four cases, the bivariate clustered response is symmetric, since both the random effects and errors are generated from symmetric distributions.This is to make our approach comparable to the following two existing alternatives, (a) The bivariate normal mixed effect single-index model of Wu and Tu (2016), and (b) The bivariate mixed effect single-index model using the multivariate t-distribution, which extends the univariate linear mixed model proposal of (Pinheiro et al. 2001).In (a), penalized splines were used to approximate the nonparametric index function, whereas we use polynomial splines.At each replication, we use the same dataset to obtain the estimates from these three competing methods.We focus on the estimation of the index parameters and the index functions for the fixed effect part, with the same interpretation for all cases.
The results are summarized in Table 3.For all cases, RMSEs and AIMSEs decrease quickly as the sample size increases for all three methods.That said, our proposed method performs well for all four cases, and is significantly better than both the alternatives for Cases 3 and 4. The advantages of our method appears more prominent if we further reduce the mixing proportion of the mixture distribution in Case 4 from 0.8 to 0.7, 0.6 or 0.5 (results not reported here).In Cases 1 and 2, the performances of our method is comparable to the two others.In particular, our method performs almost similar to Pinheiro's t-distribution method in Case 2 when n = 200, while they are both better than the normal mixed-effects method of Wu and Tu (2016).To summarize, the performance of our proposed method appears to be satisfactory in all cases, and is robust to misspecified (non-Gaussian) random effects and errors, under a bivariate mixed model framework.

Application: GAAD Dataset
In this section, we illustrate our method via application to the GAAD dataset.Here, the tooth-level mean PPD and CAL measures are non-Gaussian bivariate responses representing PD status, and our objective is to evaluate the distribution of PD status for this population, and quantify the effects of various subject-level covariates such as Age (in years), body mass index (BMI), Gender (1 = Female, 0 = Male), Smoking status (1 = Smoker, 0 = Never Smoker) and glycemic level or HbA1c (1 = High/Uncontrolled), 0 = Controlled) on the PD status.For our analysis, we have n = 288 subjects with complete covariate information.About 30% of the subjects are smokers.The mean age of the subjects is about 54 years with a range from 26-87 years.There is a predominance of female subjects (around 76%) in the data.Around 60% of subjects are obese (BMI ≥ 30), and 59% are with uncontrolled HbA1c.Each subject has varying number of teeth, ranging from 3 to 28, with a total of 5461 observations.A full dentition will constitute 28 teeth, however, missing tooth is very common in any oral health studies, with the actual cause of missingness mostly unknown.Hence, in order to avoid unverifiable missing data assumptions, we did not resort to missing data analysis, and present only complete case analysis.
As part of explanatory analysis, we present the bivariate kernel density estimate of the PPD and CAL responses in Fig. 2 (left panel).The plot reveals significant (right) skewness for both responses.Also, the right panel in Fig. 2 indicates presence of possible outliers.
Recent research (Zhao et al. 2018) confirmed possible non-linear relationship between oral health responses, and continuous covariates, like Age.Motivated by this, we set forward to estimate a clinically meaningful single-index structure determining PD for the subjects in this database.
We consider fitting the following model to the GAAD data where x ij = x ij1 , …, x ij5 T with x ij1 = Age, x ij2 = BMI, x ij3 = Gender, x ij4 = Smoker, x ij5 = HbA1c and z ij = 1, z ij1 , z ij2 , z ij3 T with z ij1 = Gender, z ij2 = Smoker, z ij3 = HbA1c.We further assume T T ∼ SAL 8 (0, Ω, 0) and ϵ ij = ϵ ij1 , ϵ ij2 T ∼ SAL 2 (0, Σ, γ).The estimates for index parameters, skewness parameter and their 95% confidence intervals are presented in Table 4, where the 95% confidence intervals are obtained by bootstrap resampling with 200 replications.We observe that all parameters (except β 13 corresponding to Gender for the PPD regression) were positive and significant.Interestingly, the estimate of Gender β 13 is negative yet significant for PPD, while, the corresponding estimate β 23 for CAL is positive and significant, implying that Gender is contributing to the index development for the two responses in opposite directions.Figure 3 presents the estimated curves corresponding to the two index functions, along with their 95% confidence bands using bootstrap method.
Compared to the CAL, the 95% band is tighter for the PPD.
It is immediate that the correlation between PPD and CAL are significant, implying the need to account for the crosswise correlation between the two responses, and the cluster-wise correlation of the responses within the same subject, while modeling the bivariate clustered responses.Furthermore, Fig. 4 presents the bivariate kernel density surface of the estimated residuals (left panel), and the same from random draws of n = 5461 observations from the bivariate ALD density ALD(Σ, γ), where Σ and γ ^ are plugged-in estimates derived from our fit.We observe that the estimated surfaces are very similar, confirming the adequacy of model fit to the GAAD dataset.
Correlation matrices Σ and Ω are estimates as: To further evaluate the usefulness of our proposed new model, we consider the fitted and prediction errors in light of two alternatives, denoted as "AM1" (bivariate normal, mixed effects SIM) and "AM2" (bivariate, asymmetric Laplace SIM, without random effects).We randomly partition the data into training and testing sets, where the training data is used to fit the 3 models, and the test data to evaluate the prediction errors.Using varying sizes of training and testing data, the average absolute fitted errors (AAFE), and the average absolute prediction errors (AAPE) for the two responses, based on 200 random partitions, are reported in Table 5, where and for k = 1 and 2, with y ^ijk , the fitted value based on training data, and y ijk , the predicted value based on the test data, and nb denote the number of subjects in the training data.
From Table 5, we observe that our model performs the best in terms of AAFE and AAPE, for various sizes of the training and testing set.More specifically, our proposed mixed-effects SIM model is superior to the bivariate asymmetric Laplace SIM (excluding random effects), implying the necessity to account for the within-subject correlation.Furthermore, our proposed model is also better than the SIM with the usual multivariate normal specification for the random effects, thereby providing evidence of the gain in accounting for data asymmetry during modeling.

Conclusions
Derivation of useful medical indices that correlate with multiple health outcomes is an issue of significant practical importance.In this paper, we propose a single-index mixed-effects regression model for bivariate responses, where both the error term and random effect are assumed to follow multivariate asymmetric Laplace distribution.By the polynomial spline smoothing for index functions, we proposed a scalable ML estimation method based on EM-type algorithm, and study the asymptotic properties of the ML estimates under some mild conditions.Simulations and real data analysis reveal the potential of the proposed model under data asymmetry, compared to existing alternatives.
There exists a number of future directions to pursue.To further improve model fit and prediction, we can consider the joint modeling of the location, skewness, and scatter matrix, within a multivariate ALD setup.When the number of covaiates is large in both fixed effects and random effects, it is of interest to select important variables in both parts to obtain a concise model.Some existing variable selection work of linear mixed effects model are available for univariate response case; see, for example, Kinney and Dunson (2010); Bondell et al. (2010); Fan and Li (2012); Schelldorfer and Geer (2011); Pan and Huang (2014), and others.However, for the case of single-index mixed effects models for multivariate responses, there is limited work, and pursuing the variable selection is a non-trivial journey.
Another extension is to consider mixed effects quantile regression (Waldmann and Kneib 2015) for bivariate responses.These will be pursued elsewhere.
Proof of Lemma 1 See proofs in Anderson (1984).□ According to the MLE defined in ( 12), the likelihood estimating equations for β and θ can be written as Denote ℓ ˙m β, θ, δ, y i ≜ ∂ ℓ m μ i , δ, y i ∂μ i μ i = W i T X i T β θ and ℓ ˙m g X i T β , δ, y i ≜ ℓ m μ i , δ, y i ∂μ i μ i = g X i T β .
Then we have the following Lemma 2.

Proof of Lemma 2
We firstly prove (19).To obtain the bound, we only need to calculate the conditional variance of the left term in ( 19) since the conditional expection E ℓ ˙m g X i T β 0 , δ 0 , y i X i , Z i = 0.By the Condition (A4), the eigenvalues of the conditional variance for Var ℓ ˙m g X i T β 0 , δ 0 , y i X i , Z i are bounded, hence we only need to obtain the bound of By the properties of spline basis, we have where both β* and β** lies on the line segment connecting β and β 0 .As a result, Then the order of ( 19) is O p nK 3/2 r n = o p ( n) since d > 2.
We next prove (18).By the Taylor's expansion and regularity conditions, it is clear that Applying the results of ( 19) and ( 20), the order of ( 18) Lemma 3 Assume that Condition (A1)-(A6) hold, the singular values of the matrix are bounded and bounded away from zero with probability approaching one.
Proof of Lemma 3 Note that we can replace W ˙iT X i T β 0 θ 0 with g ˙Xi T β 0 in above expression with only a difference of o p (1) since ∥ W ˙iT X i T β 0 θ 0 − g ˙Xi T β 0 ∥ ≤ CK −s + 1 .Therefore we next only need to show that the eigenvalues of are bounded and bounded away from zero.
By the Condition (A6), there exists a It is obvious that the singular values of are bounded and bounded away from zero.
Thus, by pre-/post-multiplying this matrix, we only need to prove that the singular values of are bounded and bounded away from zero.Apply the approximation of splines again, we only need to show that the singular values of are bounded, and bounded away from zero.By the law of large numbers, we only need to show its expectation has eigenvalues bounded and bounded away from zero.This is true by checking Conditions (A5) and (A6).□ Proof of Theorem 1 By Lemma 2 and Taylor's expansion, it is easy to show By direct variance calculation Moreover, by Lemma 3, the singular values of the matrix are bounded and bounded away from zero with probability approaching one.

Fig. 1 .
Fig. 1.GAAD Data: Plots of the empirical Bayes' estimates of random effects (panels a and b), corresponding Q-Q plots (panels c and d), and observed versus estimated (non-linear) curve (panels e and f), obtained from fitting a linear mixed model separately to the PPD and CAL responses

Fig. 2 .
Fig. 2. Bivariate kernel density estimate (left panel) and boxplots (right panel) for PPD and CAL responses, from the GAAD data

Table 2
Table entries are the averages of the IMSE (AIMSE), the Frobenius-norms for Σ and Ω, and the root of mean squared errors (RMSE) of the model parameters, under various sample sizes (n = 50, 100, 200), calculated over 400 replications, corresponding to Simulation 1

Table 3
Table entries are the root of mean squared errors (RMSE) of β 1 and β 2 , and the Average Integrated Mean Squared Error (AIMSE) from our model and the 2 competing models(Wu and Pinheiro), for n = 50, 100, 200, with data generated from the 4 cases described in Sect.5.2