INTRODUCTION

Nonlinear mixed effects (NLME) models are widely used to describe clinical data in drug development for learning about the underlying physiological system, confirming drug effects, simulating different scenarios for dose selection, and decision making (1). These models incorporate mathematical description of structural components (fixed effects) and stochastic components (random effects). Random effects account for multiple sources of variability: inter-individual variability (IIV), inter-occasion variability (IOV), and residual unexplained variability (RUV). RUV incorporates intra-individual variability, inaccuracies in dosing and sampling history, measurement errors, and model misspecification. It is assumed that the RUV model allows the transformation of a normal, independent, and identically distributed random variable to capture the full complexity of any heteroscedasticity, non-normality, and dependence in the RUV between the model and data. It has been shown that maximum likelihood estimation of model parameters with misspecified RUV model results in biased parameter estimates (2), and conclusions regarding covariate inclusion or parameter uncertainty based on such fit may not be correct (3,4). Thus, selection of an appropriate RUV model is important for maximum likelihood estimation, real-life-like simulations, better utilization of data, and subsequent model-informed decisions.

Extended RUV models have been developed to various RUV characteristics. An autoregressive (AR1) error model has been used to describe serially correlated residuals which may occur with rich data in particular (2,5). One example of this was a case study of glucose tolerance test, where AR1 errors provided a better description of the data (6). A dynamic transform both sides (dTBS) approach account for skewness and scedasticity of the residuals by estimating a shape parameter (λ) and a power term (ζ) using a Box-Cox transform to both data and model predictions (7,8). Inter-individual variability IIV in the magnitude of RUV relaxes the assumption that all subjects display a common residual error variance (9). A power error model accounts for residual error scedasticity dependent on model predictions and as such offers an alternative to the commonly used additive plus proportional RUV model. A Student’s t-distributed error model allows for symmetric heavy tails in residual distribution and thus introduces outlier robustness in the model (9). Time-varying error magnitude has been observed for example for oral pharmacokinetic profiles with higher error magnitude during the absorption phase (2). A simple implementation of such a RUV extension is through a step function where the error magnitude changes at a certain time (after dose).

Despite the availability of these extended RUV models, they are seldom tested. A factor limiting these extended RUV models from being routinely checked is the absence of a fast, easy-to-use and accurate diagnostic tool, as implementing these extended RUV models one by one can be time-consuming and computationally intensive. This gets even more complicated in simultaneous modeling of multi-dependent variables, where exploring these extended RUV models for each of the dependent variables will result in large number of combinations to be tested. The recommended diagnostic tools for assessing RUV models are limited to plots of residuals versus time/model prediction (10,11). Such graphical diagnostic provide guidance only for selection among the standard RUV models: additive, proportional, or combined error model. Extended RUV models are mostly implemented based on subjective decision (e.g., log transformation of both sides), implementation facility (e.g., numerical instability of parameter estimation), or expected features (e.g., autocorrelation with rich frequent sampling schedule). To avoid this limited and case-dependent RUV modeling, here, we investigate if post-processing of common model-based diagnostics can provide additional advantages. We propose a new diagnostic tool, based on standard output such as conditional weighted residuals (CWRES), which scan these extended RUV models for their ability to improve the description of data, without re-estimating the original NLME model.

METHODS

We choose to illustrate the diagnostic procedure using CWRES which are theoretically expected to be normally distributed with mean 0 and variance 1 for a correct NLME model (12). CWRES are computed based on individual’s empirical Bayes estimates regardless what estimation method is used (Eq. 15):

$$ {\overset{\rightharpoonup }{y}}_i=f\left(\overset{\rightharpoonup }{\theta },{\overset{\rightharpoonup }{\eta}}_i\right)+h\left(f\left(\overset{\rightharpoonup }{\theta },{\overset{\rightharpoonup }{\eta}}_i\right),{\overset{\rightharpoonup }{\varepsilon}}_i\right) $$
(1)
$$ {\overset{\rightharpoonup }{y}}_i\approx f\left(\overset{\rightharpoonup }{\theta },{\overset{\rightharpoonup }{\eta}}_i\right)+{\left.\frac{df}{d{\overset{\rightharpoonup }{\eta}}_i}\right|}_{{\overset{\rightharpoonup }{\eta}}_i={\widehat{\eta}}_i}\ast \left({\eta}_i-{\widehat{\eta}}_i\right)+h\left(f\left(\overset{\rightharpoonup }{\theta },{\overset{\rightharpoonup }{\eta}}_i\right),{\overset{\rightharpoonup }{\varepsilon}}_i\right) $$
(2)
$$ E\left({\overset{\rightharpoonup }{y}}_i\right)=f\left(\overset{\rightharpoonup }{\theta },{\hat{\eta}}_i\right)-{\left.\frac{df}{d{\overset{\rightharpoonup }{\eta}}_i}\right|}_{{\overset{\rightharpoonup }{\eta}}_i={\hat{\eta}}_i}\ast {\widehat{\eta}}_i $$
(3)
(4)
$$ \mathrm{CWRES}=\mathrm{COV}{\left({\overset{\rightharpoonup }{y}}_i\right)}^{-1/2}\ast \left({\overset{\rightharpoonup }{y}}_i-E\left({\overset{\rightharpoonup }{y}}_i\right)\right) $$
(5)

where \( {\overset{\rightharpoonup }{y}}_i \) is the vector of observations for the ith individual, f denotes individual model predictions,\( \overset{\rightharpoonup }{\theta } \) is the vector of population fixed effects, \( {\overset{\rightharpoonup }{\eta}}_i \) is vector of random unexplained individual deviation from the population fixed effects, h is RUV model, \( {\overset{\rightharpoonup }{\varepsilon}}_i \) is the vector of residual errors, \( E\left({\overset{\rightharpoonup }{y}}_i\right) \) is the expectation of the marginal density of the data given the model, 𝜂\( {\widehat{\eta}}_i \) is vector of empirical Bayes estimates, and \( \mathrm{COV}\left({\overset{\rightharpoonup }{y}}_i\right) \) is the covariance of the marginal density of the data given the model. Both random effects, \( {\overset{\rightharpoonup }{\eta}}_i \) and\( {\overset{\rightharpoonup }{\varepsilon}}_i \) are assumed to follow normal distribution with mean 0 and covariance matrix Ω and Σ, respectively. Hence, CWRES are directly linked to the objective function used in FOCE estimation method (12).

CWRES Base Model

As a substitute for assessing extended RUV models on the original data, the extended models were applied to a CWRES distribution. Thus, CWRES calculated from the original NLME model execution were treated as the dependent variable (DV) and modeled first by a linear base model to estimate CWRES distribution’s mean and variance as follow:

$$ {\overset{\rightharpoonup }{y}}_i={\Theta}_1+{\eta}_{1i}+{\overset{\rightharpoonup }{\upvarepsilon}}_{1i} $$
(6)

where \( {\overset{\rightharpoonup }{y}}_i \) is a vector of CWRES data from individual i, Θ1 is the population mean of CWRES, η1i is a univariate random variable describing the unexplained deviation of individual i from Θ1, and η1i is assumed to follow a normal distribution of a mean 0 with variance ω2. \( {\overset{\rightharpoonup }{\varepsilon}}_{1i} \) is a univariate random variable describing residual unexplained variability of individual i; it is assumed to be independent identically normally distributed with a mean 0 and variance σ2. When the original NLME model is describing the data adequately, the expected mean of CWRES Θ1 is 0, the expected value of ω2 is 0, and the expected value of σ2 is 1. This base model (Eq. 6) was then extended with the different RUV models (Supplementary material) as follow:

Autocorrelated Errors

The CWRES calculated from the original NLME model were modeled using an autoregressive error model AR1 as

$$ Corr\ \left({\upvarepsilon}_{1 ij},{\upvarepsilon}_{1 ik}\right)=\exp \left(\left(-\ln (2)/{\mathrm{t}}_{1/2}\right)\ast \left({Time}_j-{Time}_k\right)\right) $$
(7)

where ɛij is the residual error for individual i at time jɛik is the residual error for individual i at time point k, and t1/2 is half-life governing the duration of the correlation. ΔOFVCWRES_AR1 was calculated as the difference between base model objective function value OFVCWRES_Base and AR1 error model objective function value OFVCWRES_AR1.

Dynamic Transform both Sides

The individual predictions (IPREDs) calculated from the original model were used for the dTBS implementation. The dTBS model uses a Box-Cox transformation and as such requires the underlying variable to be positive; this is an issue as CWRES data includes negative observations. Thus, two models were needed to apply dTBS on CWRES data: (1) new base model instead of (Eq. 6) called CWRES dTBS base model and (2) CWRES dTBS model.

In CWRES dTBS base model, CWRES data was first exponentiated (Eq. 8) then a dTBS model (Eq. 9) with both shape parameter λ and power term ζ fixed to zero (log-normal transformation with homoscedastic variance) was fitted to the exponentiated CWRES data to calculate OFVCWRES_dTBS_base.

In CWRES dTBS model, a dTBS model (Eq. 10) was fitted to the exponentiated CWRES data with estimating both λ and ζ, to calculate OFVCWRES_dTBS:

$$ {\overset{\rightharpoonup }{y}}_i=\exp \left({\overset{\rightharpoonup }{y}}_i\right) $$
(8)
$$ \mathit{\ln}\left({\overset{\rightharpoonup }{y}}_i\right)=\ln \left(\overset{\rightharpoonup }{IPRED}\right)+{\overset{\rightharpoonup }{\upvarepsilon}}_{1i}\ast {\left(\overset{\rightharpoonup }{IPRED}\right)}^{\upzeta}\kern1.25em \mathrm{if}\ \lambda =0 $$
(9)
$$ \frac{{\overset{\rightharpoonup }{y}}_i^{\lambda }-1}{\lambda }=\frac{{\overset{\rightharpoonup }{IPRED}}^{\lambda }-1}{\lambda }+{\upvarepsilon}_{1i}\ast {\left(\overset{\rightharpoonup }{IPRED}\right)}^{\upzeta}\kern1.5em \mathrm{Otherwise} $$
(10)

where an estimate of λ = 0 means CWRES data was normally distributed before being exponentiated, λ > 0 means CWRES data was left skewed before being exponentiated, and λ < 0 means CWRES data was right skewed before being exponentiated.

ΔOFVCWRES_dTBS was calculated as the difference between dTBS base model objective function value OFVCWRES_dTBS_base and dTBS model objective function value OFVCWRES_dTBS.

Inter-Individual Variability on RUV

The CWRES data outputted from the original model execution was modeled as

$$ {\overset{\rightharpoonup }{y}}_i={\Theta}_1+{\eta}_{1i}+{\overset{\rightharpoonup }{\upvarepsilon}}_{1i}\ast \exp \left({\eta}_{2i}\right) $$
(11)

where η2i is random deviation of individual i from \( {\overset{\rightharpoonup }{\varepsilon}}_{1i} \), thus allowing different individuals to have different RUV magnitude. ΔOFVCWRES_IIV was calculated as the difference between base model objective function value OFVCWRES_Base and IIV on RUV model objective function value OFVCWRES_IIV.

Power Model

The individual predictions IPRED calculated on normal scale from the original model execution were used to scale individual residual error \( {\overset{\rightharpoonup }{\varepsilon}}_i \) where CWRES was modeled as

$$ {\overset{\rightharpoonup }{y}}_i={\Theta}_1+{\eta}_{1i}+{\overset{\rightharpoonup }{\upvarepsilon}}_{1i}\ast {\left(\overset{\rightharpoonup }{IPRED}\right)}^{\upzeta} $$
(12)

where ζ is the power exponent determining the dependence of \( {\overset{\rightharpoonup }{\varepsilon}}_i \) on model prediction. IPRED should include only positive values for successful implementation. If for instance IPRED included negative predictions because of log transformation, they were exponentiated back to normal scale. ΔOFVCWRES_power was calculated as the difference between OFVCWRES_Base and power error model objective function value OFVCWRES_power.

t-Distribution Error Model

The Laplacian method with user-defined likelihood had to be used to apply a t-distributed residual error. The conditional likelihood L of CWRES data was defined in the control file for a Laplace base model (Eq. 13) and a t-distributed error model (Eq. 14):

$$ L=\left(1/\sqrt{2{\pi \sigma}^2}\right)\exp \left(-\frac{{\mathrm{IWRES}}^2}{2}\right) $$
(13)
$$ L=\frac{\varGamma \left(\frac{\upsilon +1}{2}\right)}{\ \varGamma \left(\frac{\upsilon }{2}\sqrt{\upsilon \pi {\sigma}^2}\right)}{\left(1+\frac{{\mathrm{IWRES}}^2}{\upsilon}\right)}^{-\left(\frac{\upsilon +1}{2}\right)} $$
(14)

where σ2 is the variance of the data, IWRES is the individual weighted residuals, Γ is the gamma function, and υ is the degrees of freedom, which is the additionally estimated parameter when extending the RUV model to a t-distributed error. ΔOFVCWRES_t − dist was calculated as the difference between Laplace base model objective function value OFVCWRES_Laplace_base and t-distribution error model objective function value OFVCWRES_t − dist.

Time-Varying Error

Time, or time after dose, could be used as predictor of residual error magnitude (2), where cutoff time points for the change in variance of ɛi are selected based on data density to cut the data into N equal-size groups. CWRES were modeled as

$$ {\overset{\rightharpoonup }{y}}_i={\Theta}_1+{\eta}_{1i}+{\overset{\rightharpoonup }{\upvarepsilon}}_{1i} $$
(15)
$$ \mathrm{If}\ \left(\mathrm{Time}>X\right)\kern2em {\overset{\rightharpoonup }{y}}_i={\Theta}_1+{\eta}_{1i}+{\overset{\rightharpoonup }{\upvarepsilon}}_{2i} $$
(16)

where ɛ2i is the residual error after the cutoff time point X. The number of cutoff time points is subjective; here, we used three cutoff time points to cut the data into four equal-size groups, each with a separate error magnitude. ΔOFVCWRES_time was calculated as the difference between OFVCWRES_Base and the time-varying error model objective function value OFVCWRES_time.

Evaluations

The agreement in improvement of fit (ΔOFV) between implementing these extended RUV models on the original NLME model (ΔOFVOriginal; conventional analysis) and implementing them on CWRES data (ΔOFVCWRES) was evaluated in both simulated (n = 7) and real (n = 16) data examples as described below. For each data set and RUV model considered, there were thus two models evaluated for the original data (the original model and one with RUV model extension) and two models for the CWRES obtained from the original NLME model (the base CWRES model and the extended RUV model). In addition to CWRES, real data exercise was applied to three other diagnostics: conditional weighted residuals with interaction (CWRESI), individual weighted residuals (IWRES) and normalized prediction distribution errors (NPDE). CWRESI considers the interaction between intra- and inter-individual variability as in proportional error model. IWRES are the differences between the observations and IPRED weighted by the expected standard deviation of the residual variability (σ) only.

$$ \mathrm{IWRES}=\frac{{\overset{\rightharpoonup }{y}}_i-f\left(\overset{\rightharpoonup }{\theta },{\widehat{\eta}}_i\right)}{\sigma } $$
(17)

NPDEs are calculated by applying the inverse function of the normal cumulative density function to the decorrelated prediction discrepancies, and so it is normally distributed by construction without any approximations (13).

Simulations

Stochastic simulations and estimation (SSEs) were performed to investigate the ability of CWRES modeling to identify correctly the true extended RUV model in different scenarios, as well as investigating type I error rates of the likelihood ratio test employed, with null hypothesis being the absence of RUV misspecification The base simulation model was a one-compartment disposition model with first-order absorption, linear elimination, and proportional RUV model. Parameters assigned with inter-individual variability were absorption rate constant, clearance, and volume of distribution, with a correlation between the last two parameters. This base model was extended with the investigated RUV models to produce six additional models. The base model was used to simulate 200 datasets, while each of the extended RUV models was used to simulate 100 datasets; each dataset was estimated with the seven models to calculate mean ΔOFVOriginal for each extended RUV model. CWRES calculated from the estimated base models and modeled as described previously to calculate mean ΔOFVCWRES for each extended RUV model (Fig. 1). The values of parameters governing each of the extended RUV models when simulating were chosen to produce a ΔOFVOriginal of up to 100 between the base model and the true extended RUV model.

Fig. 1
figure 1

Schematic presentation of simulation setup. The base model was used to simulate 200 data sets, while each of the extended RUV models was used to simulate 100 datasets, and each dataset was estimated with the seven models to calculate mean ΔOFVOriginal for each extended RUV model. CWRES outputted from the base model fitting to the simulated data in each of the different scenarios were treated as dependent variables and modeled with same RUV extensions to calculate mean ΔOFVCWRES for each extended RUV model. Note that different base models were needed for dTBS and t-distribution error models

Real Data Examples

Models varied in complexity, amount of available data, and residual error components (Table I). Four examples modeled more than one dependent variable simultaneously. Data was log-transformed in nine examples. Two examples already included IIV on RUV model and this was then considered the base RUV model. All examples were developed using the FOCE method, and the interaction option was added in relevant RUV extended models.

Table I Summary of Real Data Examples Used for Investigation

Software

NONMEM version 7.3 (ICON Development Solutions, Hanover, MD, USA) (29) was used for the analysis with help of PsN (30), and graphs were generated in R (31).

RESULTS

Simulated Data Examples

CWRES modeling identified the same (correct) RUV model as the conventional analysis as shown in Fig. 2. The highest ΔOFVCWRES signaled the correct RUV extension model used in the simulation for AR1, dTBS, IIV, t-distribution, and time-varying RUV error models, e.g., ΔOFVCWRES_AR1was the highest drop between all CWRES models (red bars) when simulating with AR1 error model. Note that the Power and dTBS models are nested and the results reflected this. Since when simulating with power error model only correction for residual scedasticity was needed, both ΔOFVCWRES_power and ΔOFVCWRES_dTBS were of equal magnitude, − 20 and − 22, respectively, so it is clear that dTBS model is not providing any additional advantages over power error model which should be selected as the correct RUV model extension. Type I error rates with CWRES were 1.5, 1, 2, 5.5, 3, and 8.5% for AR1, dTBS, IIV, power, t-distribution, and time-varying RUV models, respectively, taking into account the number of estimated parameters. When simulating with the base model, none of the investigated extension showed an improvement either by CWRES modeling or conventional analysis.

Fig. 2
figure 2

Plot of mean ΔOFV vs extended RUV models show the agreement between mean ΔOFVOriginal (gray) and mean ΔOFVCWRES (red) among all seven scenarios of simulations. Abbreviations: AR1 autocorrelated errors, dTBS dynamic transform both sides, IIV inter-individual variability on RUV, Power power error model, t-dist t-distribution error model, TV time-varying error model

The improvement in goodness-of-fit ΔOFV when applying an extension was in general of similar magnitude whether based on ΔOFVCWRES or ΔOFVOriginal. However, if the method worked perfectly, ΔOFVCWRES would equal exactly ΔOFVOriginal which is not the case especially for AR1 simulation scenario, where ΔOFVCWRES were − 34, while ΔOFVOriginalwas − 100. The parameters governing these extended RUV models showed a good concordance between their estimates from CWRES and conventional analysis, with a correlation coefficient of 0.93 across all RUV extensions except for dTBS as they are on different scale.

Real Data Examples

At least one of the investigated RUV extensions resulted into a significant ΔOFVOriginal in all examples except for Daunorubicin and Digoxin PD models. When significant, the improvement was substantial with ΔOFVOriginal ranging from − 2019 for Gastric emptying with autocorrelated error to − 4 for r-hFSH PK with IIV on RUV, with a median ΔOFVOriginal of − 71 across models with significant improvement. Similar to the conventional analysis, a significant ΔOFVCWRES were found in all examples except for Daunorubicin and Digoxin PD models. The real data examples further supported the good agreement between the true misspecification in error model and what was identified by modeling of CWRES as CWRES modeling identified the most important RUV extension similar to the conventional analysis (Fig. 3). Exceptions were the two models Ethambutol PK and Disufenton sodium, where the order of the 1st (t-distribution) and 2nd most important extensions (IIV on RUV) were reversed. CWRES modeling identified the same RUV extensions to be significant improvements as conventional analysis except for t-distributed error model with Asenapine PD. The ΔOFVOriginal and ΔOFVCWRES displayed a correlation coefficient of 0.88 across all models and a median ratio \( \frac{{\Delta \mathrm{OFV}}_{CWRES}}{{\Delta \mathrm{OFV}}_{Original}} \) of 0.77 among models with significant improvement. It is not surprising that this ratio is below one as when extending the original model, there is always the potential for larger improvement in fit than CWRES as many parameters are re-estimated under the new RUV model.

Fig. 3
figure 3

Plot of absolute ΔOFVOriginalvs absolute ΔOFVCWRES among all real data examples for the six extended RUV models; all points around the line of unity showing the agreement between conventional analysis and CWRES modeling

The evaluation of the other diagnostics CWRESI, IWRES, and NPDE with real data examples is shown in Fig. 4, where CWRES outperformed other diagnostics in predicting ΔOFVOriginal. The root mean-squared errors (RMSE) of ΔOFVCWRES were lower than RMSE of other ΔOFVdiagnostics and so they are reported as relative to RMSE of ΔOFVCWRES as shown in Table II, where RMSE measures the differences between ΔOFVdiagnostics and the actually observed ΔOFVOriginal.

Fig. 4
figure 4

Plot of absolute ΔOFVOriginalvs absolute ΔOFVDiagnostic for CWRES, CWRESI, IWRES, and NPDE among all real data examples for the six extended RUV models

Table II The RMSE of ΔOFVCWRES for Different ΔOFVOriginal Ranges and the Relative RMSE of Other ΔOFVdiagnostics to RMSE of ΔOFVCWRES
$$ \mathrm{RMSE}=\sqrt{E\left({\left({\Delta \mathrm{OFV}}_{diagnostics}-{\Delta \mathrm{OFV}}_{Original}\right)}^2\right)} $$
(18)

DISCUSSION

A new diagnostic tool for post-processing of CWRES was successfully developed and evaluated. CWRES modeling evaluate extended RUV models and assess in a robust and extremely fast way whether extensions are needed to implement. The method accurately identified the correct type of RUV model misspecifications as well as the expected improvement in fit in terms of ΔOFV and the expected parameters of the extended RUV model, similar to conventional analysis. This method typically does not suffer from local minima problems or other estimation-related issues because CWRES models are simple and quick to run with known expected distribution as shown by the agreement between ΔOFVOriginal and ΔOFVCWRES. In no case was modifications of initial estimates needed for the CWRES models applied here.

Simulation results elucidated the ability of different extensions to describe each other and to produce similar improvement in fit. The t-distributed error model allows the incorporation of outlier robustness into the model while IIV on RUV allows individuals to have different RUV magnitude; thus, both extensions can describe outliers and produce similar model fits. This behavior was shown when simulating with IIV on RUV; t-distributed error model was the 2nd most important RUV extension. This behavior also might explain the inflated ΔOFVCWRES_IIV when t-distributed error model was the most important improvement in both simulated and real data examples, and the reversed order of most important RUV extensions for Ethambutol PK and Disfenton sodium.

Both dTBS and Power error model can account for scedasticity; however, only dTBS can correct for both scedasticity and skewness; thus, on simulating from power error model, both extensions had similar ΔOFVCWRES but not when simulating from dTBS; a profound real data example of this was Pefloxacin where ΔOFVOriginal_power(ΔOFVCWRES_power) was − 65 (− 50) and ΔOFVOriginal_dTBS (ΔOFVCWRES_dTBS) was − 70 (− 50) as correction for scedasticity was only in need. On the other hand, a correction for both scedasticity and skewness was needed for Moxonodine PK where ΔOFVOriginal_power(ΔOFVCWRES_power) was − 60 (− 48) and ΔOFVOriginal_dTBS (ΔOFVCWRES_dTBS) was − 243 (− 87).

As mentioned in our results, ΔOFVOriginal is expected to be larger than ΔOFVCWRES, as it is more flexible with more parameters to estimate; this difference between the ΔOFVs was most pronounced with the dTBS approach as shown by most results being in the top left triangle in Fig. 3 when correction for scedasticity and skewness resulted in different non-residual error parameter estimates, for example in Disufenton sodium PK:ΔOFVOriginal_dTBS (ΔOFVCWRES_dTBS) was − 116 (− 80) as implementing dTBS led to a relative change of 126, 12, 30, 10, 6, 16, and 15% in estimates of THETAs’ from 1 to 7 respectively.

One interesting aspect is in case of multiple dependent variables, using CWRES modeling identified accurately which dependent variable needed further improvement which decrease the risk of model over parameterization. For example, the Gastric emptying PK model had four dependent variables; by conventional analysis, only the total ΔOFVOriginal_dTBS (− 433) is obtained but this does not provide information about for which dependent variable an extended RUV model is dominating the improvement in fit. On the other hand, by CWRES modeling, four separate ΔOFVCWRES_dTBS, one for each dependent variable, are obtained and only the 2nd (ΔOFVCWRES_dTBS = − 207) and 4th (ΔOFVCWRES_dTBS = − 139) dependent variables could be identified as important to transform in the original model.

Other common diagnostics CWRESI, IWRES, and NPDE are available for evaluation of model goodness of fit. Similar to CWRES, these diagnostics should be normally distributed when the model adequately describe the data, except for CWRESI because of the interaction between \( {\overset{\rightharpoonup }{\eta}}_i \) and\( {\overset{\rightharpoonup }{\varepsilon}}_i \). It had been concluded previously that IWRES perform poorly with increasing model non-linearity, leading to biased parameter estimates and misguided model development (12), and that CWRES and NPDE give the best diagnostics in different situations even when there was interaction in the model (32); our results support these conclusions with a further favor of CWRES over NPDE as shown in Fig. 4. Noting that NPDE are sensitive to the number of samples and results shown here were with setting ESAMPLE option to 10000, as lower samples resulted in poor predictions of ΔOFVOriginal.

In conclusion, the principle of model-based diagnostics post-processing for automated model building had been demonstrated and successfully applied with CWRES modeling, which is a valuable diagnostic tool for RUV model identification during model development/evaluation process, as it provides guidance for the nature and magnitude of potential RUV model misspecification/improvements.

How to Proceed in Practice

CWRES modeling can be easily implemented in analysis software; it is already implemented in PsN (available as “resmod” from version 4.7.0). The procedure requires the original NLME model file and a table with ID, TIME, CWRES, and IPRED data items from the original NLME model execution. CWRES data item can be replaced with other residuals, and DVID data item is needed in case of multiple dependent variables. TIME data item can be replaced by time after dose TAD if available. IPRED data item should be positive on normal scale for successful testing of dTBS and power error models. Options are available to account for multiple occasions and to manually set the number of time cut off points. The extended RUV models are automatically created and run; the output contains ΔOFVCWRES for each extended RUV model together with the parameters of interest for each RUV extension. Since resmod is a fast and easy to perform procedure with no limitations for continuous data, it is a good idea to always consider applying resmod when selecting between-error models for model development. Also, all parameters of the original NLME model should be re-estimated after implementing any of the extended RUV models. As we showed, these extended RUV models address different problems of the residual variability. In our preliminary results, a combination of these extended RUV models sometimes showed a better description of data; however, that was not further investigated. A previously discussed example of this was a combination of dTBS approach with t-distributed error model to address skewed residuals with outliers (7).