On composite likelihood in bivariate meta-analysis of diagnostic test accuracy studies

The composite likelihood (CL) is amongst the computational methods used for estimation of the generalized linear mixed model (GLMM) in the context of bivariate meta-analysis of diagnostic test accuracy studies. Its advantage is that the likelihood can be derived conveniently under the assumption of independence between the random effects, but there has not been a clear analysis of the merit or necessity of this method. For synthesis of diagnostic test accuracy studies, a copula mixed model has been proposed in the biostatistics literature. This general model includes the GLMM as a special case and can also allow for flexible dependence modelling, different from assuming simple linear correlation structures, normality and tail independence in the joint tails. A maximum likelihood (ML) method, which is based on evaluating the bi-dimensional integrals of the likelihood with quadrature methods has been proposed, and in fact it eases any computational difficulty that might be caused by the double integral in the likelihood function. Both methods are thoroughly examined with extensive simulations and illustrated with data of a published meta-analysis. It is shown that the ML method has no non-convergence issues or computational difficulties and at the same time allows estimation of the dependence between study-specific sensitivity and specificity and thus prediction via summary receiver operating curves.


Introduction
Synthesis of diagnostic test accuracy studies is the most common medical application of multivariate metaanalysis; we refer the interested reader to the surveys by Jackson et al. (2011); Mavridis and Salanti (2013); Ma et al. (2016).These data have two important properties.The first is that the estimated sensitivities and specificities are typically negatively associated across studies, because studies that adopt less stringent criterion for declaring a test positive invoke higher sensitivities and lower specificities (Jackson et al., 2011).The second important property of the data is the substantial between-study heterogeneity in sensitivities and specificities (Chu et al., 2012).Nikoloulopoulos (2015a), to deal with the aforementioned properties, proposed a copula mixed model for bivariate meta-analysis of diagnostic test accuracy studies and made the argument for moving to copula random effects models.This general model includes the generalized linear mixed model (Chu and Cole, 2006;Arends et al., 2008) as a special case and can also operate on the original scale of sensitivity and specificity.Chen et al. (2014Chen et al. ( , 2016b) ) proposed a composite likelihood (CL) method for estimation of the the generalized linear mixed model (hereafter GLMM) and the Sarmanov beta-binomial model (Chu et al., 2012).Note in passing that both models are special cases of a copula mixed model (Nikoloulopoulos, 2015a).The composite likelihood can be derived conveniently under the assumption of independence between the random effects.The CL method has been recommended by Chen et al. (2014Chen et al. ( , 2016b) ) to overcome practical 'issues' in the joint likelihood inference such as computational difficulty caused by a double integral in the joint likelihood function, and restriction to bivariate normality.
However, (a) GLMM can only be unstable if there are too many parameters in the covariance matrix of the random effects or too many random effects for a small sample, which is not the case in this application domain.Furthermore, Chen et al. (2014Chen et al. ( , 2016b) ) restrict themselves to SAS PROC NLMIXED which is a general routine for random effect models and thus gives limited capacity.The CL method is well established as a surrogate alternative of maximum likelihood when the joint likelihood is too difficult to compute (Varin et al., 2011), which is apparently not the case in the synthesis of diagnostic test accuracy studies.The general model in Nikoloulopoulos (2015a) includes the GLMM as a special case and its numerical evaluation has been implemented in the package CopulaREMADA (Nikoloulopoulos, 2016) within the open source statistical environment R (R Core Team, 2015).
(b) The random effects distribution of a copula mixed model can be expressed via other copulas (other than the bivariate normal) that allow for flexible dependence modelling, different from assuming simple linear correlation structures, normality and tail independence.
The contribution of this paper is to examine the merit of the CL method in the context of diagnostic test accuracy studies and compare it to the ML method in Nikoloulopoulos (2015a).The remainder of the paper proceeds as follows.Section 2 summarizes the copula mixed model for diagnostic test accuracy studies.Section 3 discusses both maximum and composite likelihood for estimation of the model parameters.Section 4 contains small-sample efficiency calculations to compare the two methods.Section 5 presents applications of the likelihood estimation methods to several data frames with diagnostic studies.We conclude with some discussion in Section 6.

The copula mixed model
We first introduce the notation used in this paper.The focus is on two-level (within-study and between-studies) cluster data.The data are are (y ij , n ij ), i = 1, ..., N, j = 1, 2, where j is an index for the within study measurements and i is an index for the individual studies.The data, for study i, can be summarized in a 2 × 2 table with the number of true positives (y i1 ), true negatives (y i2 ), false negatives (n i1 − y i1 ), and false positives (n i2 − y i2 ).
The within-study model assumes that the number of true positives Y i1 and true negatives Y i2 are conditionally independent and binomially distributed given X = x, where X = (X 1 , X 2 ) denotes the bivariate latent pair of (transformed) sensitivity and specificity.That is where l(•) is a link function.
The stochastic representation of the between studies model takes the form where C(•; θ) is a parametric family of copulas with dependence parameter θ and F (•; l(π), δ) is the cdf of the univariate distribution of the random effect.The copula parameter θ is a parameter of the random effects model and it is separated from the univariate parameters, the univariate parameters π 1 and π 2 are the meta-analytic parameters for the sensitivity and specificity, and δ 1 and δ 2 express the between-study variabilities.The models in (1) and ( 2) together specify a copula mixed model (Nikoloulopoulos, 2015a) with joint likelihood where c(u 1 , u 2 ; θ) = ∂ 2 C(u 1 , u 2 ; θ)/∂u 1 ∂u 2 is the copula density and g y; n, π = n y π y (1 − π) n−y , y = 0, 1, . . ., n, 0 < π < 1, is the binomial probability mass function (pmf).The choices of the F •; l(π), δ and l are given in Table 1.
Table 1: The choices of the F •; l(π), δ and l in the copula mixed model.
3 Estimation methods

Maximum likelihood method
Estimation of the model parameters (π 1 , π 2 , δ 1 , δ 2 , θ) can be approached by the standard maximum likelihood (ML) method, by maximizing the logarithm of the joint likelihood in (3).For mixed models of the form with joint likelihood as in (3), numerical evaluation of the joint pmf is easily done with the following steps (Nikoloulopoulos, 2015a): 1. Calculate Gauss-Legendre quadrature points {u q : q = 1, . . ., n q } and weights {w q : q = 1, . . ., n q } in terms of standard uniform distribution (Stroud and Secrest, 1966).Our comparisons with more quadrature points show that n q = 15 is adequate with good precision to at least at four decimal places (Nikoloulopoulos, 2015a, Appendix).
The inverse conditional copula cdfs C −1 (v|u; θ) are given in Table 2 for the sufficient list of parametric families of copulas for meta-analysis of diagnostic test accuracy studies in Nikoloulopoulos (2015a,b).Since the copula parameter θ of each family has different range, in the sequel we re-parametrize them via their Kendalls τ ; that is comparable across families.
Table 2: Parametric families of bivariate copulas and their Kendall's τ as a strictly increasing function of the copula parameter θ.

Composite likelihood method
The composite likelihood method assumes independence between the random effects.Hence, it is identical for any copula mixed model, since all the parametric families of copulas in Table 2 contain the independence copula as a special case.This subsection summarizes the composite likelihood estimating equations and the asymptotic covariance matrix for the estimator that solves them in the context of diagnostic test accuracy studies.

Composite likelihood estimator
Chen et al. ( 2014) and Chen et al. (2016b) proposed the composite likelihood method for estimation of the copula mixed model with normal and beta margins, respectively.Composite likelihood is a surrogate likelihood which leads asymptotically to unbiased estimating equations obtained by the derivatives of the composite log-likelihoods.Estimation of the model parameters can be approached by solving the marginal estimating equations or equivalently by maximizing the sum of composite (univariate) likelihoods.
By using composite likelihood the authors are assuming between-study independence in sensitivities and specificities and thus the joint likelihood in (3) reduces to: since under the independence assumption the copula density c(•) is equal to 1.Note that the joint likelihood reduces to the product of two univariate likelihoods and the evaluation of univariate integrals, thus the computational effort (if any) is subsided.Essentially, for beta margins the univariate likelihoods L j , j = 1, 2 result in a closed form since is the pmf of a Beta-Binomial(n, π, γ) distribution.
Composite likelihood estimates can be obtained by maximizing the logarithm of the joint likelihood in (4) over the univariate parameters.The efficiency of the composite likelihood estimates has been studied and shown in a series of a papers (Varin, 2008;Varin et al., 2011).However, CL ignores the dependence at the estimation of the univariate marginal parameters, thus it is expected to be worse as the dependence increases.
4 Small-and moderate-sample efficiency-misspecification of the univariate distribution of the random effect In this section an extensive simulation study with two different scenarios is conducted (a) to assess the performance of the CL and ML methods, and (b) to investigate in detail the effect of the misspecification of the parametric margin of the random effects distribution.The CL method assumes the independence copula and its focus is on marginal parameters and apparently not the choice of the copula.Hence in the simulations we only investigate the effect of the misspecification of the parametric margin of the random effects distribution.
We refer the interested reader to Nikoloulopoulos (2015a) for a study on the misspecification of the parametric family of copulas of the random effects distribution.We use the simulation process in Nikoloulopoulos (2015a) and set the univariate parameters and disease prevalence to mimic the telomerase data in Section 5.The details are given below: 1. Simulate the study size n from a shifted gamma distribution, i.e., n ∼ sGamma(α = 1.2, β = 0.01, lag = 30) and round off to the nearest integer.
2. Simulate (u 1 , u 2 ) from a parametric family of copulas C(; τ ); τ is converted to the dependence parameter θ via the relations in Table 2.
4. Draw the number of diseased n 1 from a B(n, 0.534) distribution.
5. Set n 2 = n − n 1 and generate y j from a B(n j , x j ) for j = 1, 2.
In the first scenario the simulated data are generated from the BVN copula mixed model with normal margins, logit link (the resulting model is the same with the GLMM) and true marginal parameters (π 1 , π 2 , σ 1 , σ 2 ) = (0.79, 0.91, 0.43, 1.83), while in the second scenario the simulated data are generated from the BVN copula mixed model with beta margins and true marginal parameters (π 1 , π 2 , γ 1 , γ 2 ) = (0.76, 0.81, 0.03, 0.28).The number of studies is set to N = 10 and N = 20 to represent a relatively small and moderate meta-analysis, and the Kendall's τ association between study-specific sensitivity and specificity is set to τ = −0.5 and τ = −0.8 to represent moderate and strong negative dependence.As stated in Chen et al. (2014Chen et al. ( , 2016b) ) one advantage of the CL method is that the problem of nonconvergence is avoided, so we also report on the non-convergence of different methods in Table 3.To summarize the simulated data, we report the resultant biases, root mean square errors (RMSE), and standard deviations (SD), along with average theoretical variances for the ML and CL estimates of the univariate parameters under different marginal choices based on iterations in which all four competing approaches converged in Table 4 and Table 5.Following Chen et al. (2016b) we also summarize the diagnostic odds ratio, that is dOR= . Clearly, this is a function of the univariate parameters; its value ranges from zero to infinity, with a higher value indicating better discriminatory power.
Conclusions from the values in the table are the following: • The CL method is nearly as efficient as the 'gold standard' ML method.
• The meta-analytic ML and CL estimates and SDs are not robust to the margin misspecification.
• The ML method has negligible non-convergence issues.
• The CL method in Chen et al. ( 2014) has a non-convergence rate between 10% to 16%.
• The CL method in Chen et al. (2016b) has no non-convergence issues at all as expected since the log L has a closed form.The simulation results indicate that for both methods the effect of misspecifying the marginal choice can be seen as substantial for both the univariate parameters and the parameters that are functions of them, such as the dOR.This is in line with Nikoloulopoulos (2015a,b) for the ML method.Here we also show that the CL method is not robust to the misspecification of the margin.This agrees with the conclusions of Xu and Reid (2011) and Ogden (2016) who argue that if the marginal distribution of the random effects is misspecified then the CL estimator no longer retains robustness.This has not been revealed before, since Chen et al. (2014) (Chen et al., 2016b) focused solely on a beta (normal) margin and didn't study the effect of misspecification of the marginal random effect distribution.The focus in the CL method is on marginal parameters and their functions (e.g., dOR).Since these are univariate inferences, all that matters, as regard as to the bias, is the univariate model.

Tumor markers for bladder cancer
In this section we illustrate the methods with data of the published meta-analyses in Glas et al. (2003b); also analysed in Chen et al. (2016b).This meta-analyses deal with the most common urological cancer, that is bladder cancer.Several diagnostic markers are assessed including the cytology (N = 26) which is the classical marker for detecting bladder cancer since 1945 and is not expensive compared with the reference standard (that is cystoscopy procedure), but lacks the diagnostic sensitivity.The other markers under investigation to give a better sensitivity are NMP22 (N = 14), BTA (N = 6), BTASTAT(N = 8), telomerase (N = 10), and BTATRAK (N = 5).
For all the meta-analyses, we fit the copula mixed model for all different choices of parametric families of copulas and margins.Sufficient choices of copulas are BVN, Frank, Clayton, and the rotated versions of the latter (Table 2).These families have different strengths of tail behaviour; for more details see Nikoloulopoulos (2015a,b).We use the log-likelihood at estimates as a rough diagnostic measure for goodness of fit between the models and summarize the choice of the copula and margin with the largest log-likelihood, along with the GLMM (BVN copula mixed model with normal margins) as a benchmark.We also estimate the model parameters with the CL method under the assumption of both normal (CL-norm) and beta (CL-beta) margins.In Table 6 we report the resulting maximized ML and CL log-likelihoods, estimates, and standard errors.

NMP22
The log-likelihoods show that a copula mixed model with rotated by 270 degrees Clayton copula and beta margins provides the best fit and the estimates of sensitivity π 1 and specificity π 2 are smaller under this assumption.The CL method performs well since the estimated τ is weak and not significantly different from zero.

BTA
The log-likelihoods show that a copula mixed model with rotated by 180 degrees Clayton copula and normal margins provides the best fit.Chen et al. (2016b) previously restricted to beta margins thus the sensitivity π 1 and dOR were overestimated (CL-beta).

BTASTAT
The log-likelihoods show that a BVN copula mixed model with beta margins provides the best fit and the estimates of specificity π 2 and dOR are smaller and larger, respectively, under this assumption.

Telomerase
Nikoloulopoulos (2015a) has previously analysed these data to illustrate the copula mixed model when there exists negative perfect dependence, and thus there is only one copula: the countermonotonic copula.This is a limiting case for all the parametric families of copulas, when the dependence parameter is fixed to the left boundary of its parameter space.Both models agree on the estimated sensitivity π1 but the estimate of specificity π2 is larger under the standard GLMM.The log-likelihood is −50.37 for normal margins and −51.14 for beta margins, and thus a normal margin seems to be a better fit for the data.In this example the CL method overestimates the dOR, since it ignores the perfect negative dependence at the estimation of the parameters.

BTATRAK
The log-likelihoods show that a copula mixed model with rotated by 270 degrees Clayton copula and beta margins provides the best fit.Note that the CL-norm estimate of the between study variance σ 2 1 was approximately zero, thus for this case the standard errors are unreliable as the between-study variance parameter estimate is on the boundary of the parameter space.-152.08 -152.18Clnω-norm and Clnω-beta denotes a Clayton rotated by ω degrees copula mixed with normal and beta margins, respectively.a The CL-norm estimate of the between study variance σ 2 1 was approximately zero, thus for this case the standard errors are unreliable as the between-study variance parameter estimate is on the boundary of the parameter space.

Cytology
The log-likelihoods show that a copula mixed model with rotated by 90 degrees Clayton copula and beta margins provides the best fit.All models agree on the estimated sensitivity π1 , but the estimated specificity π 2 and dOR are smaller when beta margins are assumed.The CL method performs well on the estimation of the univariate parameters and their functions since the estimated τ is weak and not significantly different from zero.

Discussion
In this paper we have challenged claims made in Chen et al. (2014Chen et al. ( , 2016b) ) about the advantages of using a composite likelihood in meta-analysis of diagnostic test accuracy studies, in terms of convergence and robustness to model misspecification.The usual reason for using a composite likelihood does not apply here, because the full likelihood is straightforward to compute.We have demonstrated that the copula mixed model in Nikoloulopoulos (2015a) does not suffer for computational problems or convergence issues.Nikoloulopoulos (2015a) proposed a numerically stable ML estimation technique based on Gauss-Legendre quadrature; the crucial step is to convert from independent to dependent quadrature points.Furthermore, it has been shown the secondary motivation of robustness of the CL method is not retained in this context if the marginal distributions are misspecified.Hence it is a digression to use the CL methods in Chen et al. (2014Chen et al. ( , 2016b) ) for estimation in meta-analysis of diagnostic test accuracy studies as apparently there is neither computationally difficulty in the calculation of the bivariate log-likelihood nor robustness in the misspecification of the marginal distribution of the random effects.These conclusions hold to any context where clinical trials or observational studies report more than a single binary outcome.
Furtermore, in Chen et al. (2014Chen et al. ( , 2016b) ) the main inference is univariate such as the overall sensitivity or specificity or their functions as a single measure of diagnostic accuracy, e.g., the diagnostic odds ratio (dOR).The dOR for many cases is not useful since cannot distinguish the ability to detect individuals with disease from the ability to identify healthy individuals (Chen et al., 2014).Whenever the balance between false negative and false positive rates is of immediate importance, both the prevalence and the conditional error rates of the test have to be taken into consideration to make a balanced decision; hence the dOR is less useful, as it does not distinguish between the two types of diagnostic mistake (Glas et al., 2003a).
In fact, if the interest is only to overall sensitivity, and specificity, then the overall test accuracy across studies will not be clearly defined.Different studies use different thresholds for a positive test result, thus the overall sensitivity and specificity do not make sense.Instead, some form of the summary receiver operating characteristic (SROC) curve makes much more sense and will help decision makers to assess the actual diagnostic accuracy of a diagnostic test.In an era of evidence-based medicine, decision makers need high-quality procedures such as the SROC curves to support decisions about whether or not to use a diagnostic test in a specific clinical situation and, if so, which test.
An SROC curve is deduced for the copula mixed model in Nikoloulopoulos (2015a) through a median regression curve of X 1 on X 2 .For the copula mixed model, the model parameters (including dependence parameters), the choice of the copula, and the choice of the margin affect the shape of the SROC curve (Nikoloulopoulos, 2015a).However, there is no priori reason to regress X 1 on X 2 instead of the other way around, so Nikoloulopoulos (2015a) also provides a median regression curve of X 2 on X 1 .Apparently, while there is a unique definition of the ROC curve within a study with fixed accuracy, there is no unique definition of SROC curve across multiple studies with different accuracies (Rücker and Schumacher, 2010).As Arends et al. (2008) have pointed out, none of the SROC curves proposed in the literature can be interpreted as an average ROC.Rücker and Schumacher (2009) stated that instead of summarizing data using an SROC, it might be preferable to give confidence regions.Hence, in addition to using just median regression curves, Nikoloulopoulos (2015a) proposed quantile regression curves with a focus on high (q = 0.99) and low quantiles (q = 0.01), which are strongly associated with the upper and lower tail dependence imposed from each parametric family of copulas.These can been seen as confidence regions of the median regression SROC curve.Among the parametric families of copulas in Table 2 the tail dependence varies, and is a property to consider when choosing amongst different families of copulas as affects the shape of SROC curves (Nikoloulopoulos, 2015a).Finally, Nikoloulopoulos (2015a) to reserve the nature of a bivariate response instead of a univariate response along with a covariate, proposed to plot the estimated contour of the random effects distribution.The contour plot can be seen as the predictive region of the estimated pair of sensitivity and specificity.The prediction region of the copula mixed model does not depend on the assumption of bivariate normality of the random effects and has non-elliptical shape.) and x2 := x2(x1, q), respectively; for q = 0.5 solid lines and for q ∈ {0.01, 0.99} dotted lines (confidence region).In case of BTATRAK and telomerase the predictive and confidence region are meaningless since the Kendall's τ association is close to −1.In this case all the quantile regression curves almost coincide, and hence, we depict only the median regression curve for each model.In case of BTA the axes are in logit scale since we also plot the estimated contour plot of the random effects distribution as predictive region; this has been estimated for the logit pair of (Sensitivity, Specificity).
Figure 1 demonstrates these curves and summary operating points (a pair of average sensitivity and speci-ficity) with a confidence and a predictive region from the best fitted copula mixed model for all the metaanalyses in Section 5.Both CL methods in Chen et al. (2014Chen et al. ( , 2016b) cannot be used to produce the SROC curves, since the dependence parameters affect the shape of the SROC curve and these are set to independence by definition.Note in passing that the CL method in Chen et al. (2014) can provide a confidence region but this is restricted to the elliptical shape.Nevertheless, the additional feature of having to estimate the association among the random effects in ML estimation has been found to require larger sample sizes than in CL estimation where this parameter is set to independence.The application example includes cases with an adequate number of individual studies.For meta-analyses with fewer studies the CL methods in Chen et al. (2014Chen et al. ( , 2016b) ) can be recommended if a bivariate copula mixed model is near non-identifiable (or has a flat log-likelihood) and the estimation of an average operating point (summary sensitivity and specificity) is of interest instead of a SROC curve.

Software
The R package CopulaREMADA (Nikoloulopoulos, 2016) has been used to produce the ML estimates (along with their SE) of the parameters from the copula mixed models and plot the SROC curves and summary operating points (a pair of average sensitivity and specificity) with a confidence and a predictive region.The R package xmeta (Chen et al., 2016a) has been used to produce the CL estimates (along with their SE) of the parameters from both methods in Chen et al. (2014Chen et al. ( , 2016b)).

Figure 1 :
Figure1: Contour plots (predictive region) and quantile regression curves from the best fitted copula mixed model for the bladder cancer data.Red and green lines represent the quantile regression curves x1 := x1(x2, q) and x2 := x2(x1, q), respectively; for q = 0.5 solid lines and for q ∈ {0.01, 0.99} dotted lines (confidence region).In case of BTATRAK and telomerase the predictive and confidence region are meaningless since the Kendall's τ association is close to −1.In this case all the quantile regression curves almost coincide, and hence, we depict only the median regression curve for each model.In case of BTA the axes are in logit scale since we also plot the estimated contour plot of the random effects distribution as predictive region; this has been estimated for the logit pair of (Sensitivity, Specificity).

Table 3 :
Times of non-convergence out of 10 4 simulations for the CL and ML methods under different marginal choices in both simulated scenarios.

Table 4 :
Biases, root mean square errors (RMSE) and standard deviations (SD), along with the square root of the average theoretical variances (

Table 5 :
Biases, root mean square errors (RMSE) and standard deviations (SD), along with the square root of the average theoretical variances (

Table 6 :
Estimated parameters, standard errors (SE) and log-likelihood values using the ML and CL methods for bladder cancer data.