Background

In addition to direct medical care costs, indirect cost (i.e., work productivity loss due to health problems) is an important component when estimating burden of illness [1,2,3,4] or conducting economic evaluations of health care interventions from a societal perspective [5, 6]. Correspondingly, in recent years, an increasing number of randomized controlled trials (RCTs) have measured the impact of health care interventions on work productivity loss either considering it as one patient-centered outcome [7,8,9,10,11,12,13,14] or as one component for economic evaluations [15,16,17,18].

Complete components of work productivity loss include 1) absenteeism; 2) presenteeism; 3) employment status changes including reduced routine work hours and work stoppage due to illness [19]. Before transformed into monetary amount, work productivity loss due to health is usually first expressed as work time loss, i.e., counting the days missed from work (absenteeism), the hours lost due to reduced productivity while working (presenteeism), or stopped workdays. Therefore, productivity loss data could be non-negative count data. When presenteeism is first measured using percentage of loss (e.g., from the Work Productivity and Activity Impairment Questionnaire (WPAI) [20, 21] and the Valuation of Lost Productivity questionnaire [22, 23]) and then transformed into work time loss by multiplying the percentage of loss by the actual working time, the estimate may also be non-negative continuous data.

When estimating work time loss among a study population, people can be divided into three groups: I) those with no time loss, II) those with some time loss, and III) those who have lost all work time. Studies have shown that the proportion of Group I is very high, i.e., zero loss [24,25,26,27]. Such a distribution applies to the sum of time loss from all the three subcomponents and the time loss from absenteeism and presenteeism, respectively. For example, among patients with arthritis, a study found that the frequency of ‘0’ value varied by the measurement questionnaires: 61% if presenteeism was measured using the Health Labour Questionnaire, 5% using Work Limitations Questionnaire, 16% using the World Health Organization Health and Work Performance Questionnaire, and 27% using WPAI [25]. A clinical trial among employed patients with early rheumatoid arthritis showed that about 50–70% did not have any paid work productivity loss (sum of all three subcomponents) depending on the follow-up time point, and that about 5–10% stopped working due to health problems at Week 13 or cumulatively at Week 52 [27], which contributed to a high proportion of Group I and Group III, i.e., inflated zero value and maximum value.

Various statistical models have been used for analyzing productivity loss. Ordinary Least Squares (OLS) regression was commonly used in previous studies [7,8,9, 28, 29]. Poisson and Negative Binomial (NB) models for count data were also used [11, 12]. Some studies avoided estimating the mean time loss by using logistic models where productivity loss was treated as binary or categorical variables [10, 30, 31]. These methods may be problematic. For example, the logistic models did not take full use of the continuous data information. Various models have been suggested to deal with zero-inflated and bound-inflated data, e.g., two-part models, zero-inflated models, and other mixture models [32,33,34,35,36]. Kleinman et al. utilized a two-part model to estimate annual lost days due to absenteeism and presenteeism [37]. Also, zero-inflated models were used to analyze productivity loss outcomes [13, 27]. Each method has its own assumptions and its estimation could be biased if these assumptions are not satisfied. Thus, it is important to compare which analytic method works best for analyzing work productivity loss. This study’s objective was to compare the performance of these commonly used methods for analyzing work time loss data in RCTs using simulations.

Methods

Our simulation methods followed the published guidance by Morris et al. [38] on using simulation studies to evaluate statistical methods.

Data-generating mechanisms

Distributions of productivity loss outcome Y

We assumed that the productivity (time) loss outcome Y in an RCT depends on the treatment arm (arm = 0, 1) and a covariate x, denoted by Y(arm, x). For given arm and a value of x, the probabilities of Y(arm, x) being zero (Group I above), all loss Max (Group III), and in (0, Max) (Group II) are denoted by P1(arm, x), P3(arm, x), and P2(arm, x), respectively, where P1(arm, x) + P2(arm, x) + P3(arm, x) = 1. The relationships among treatment arm, covariate x and the probabilities P1(arm, x), P2(arm, x), and P3(arm, x) are given by the follow equations:

$$ {P}_1\left( arm,x\right)=\frac{\exp \left({\alpha}_1+{\beta}_1 arm+{\gamma}_1x\right)}{\exp \left({\alpha}_1+{\beta}_1 arm+{\gamma}_1x\right)+\exp \left({\alpha}_2+{\beta}_2 arm+{\gamma}_2x\right)+1} $$
$$ {P}_2\left( arm,x\right)=\frac{\exp \left({\alpha}_2+{\beta}_2 arm+{\gamma}_2x\right)}{\exp \left({\alpha}_1+{\beta}_1 arm+{\gamma}_1x\right)+\exp \left({\alpha}_2+{\beta}_2 arm+{\gamma}_2x\right)+1} $$
$$ {P}_3\left( arm,x\right)=\frac{1}{\exp \left({\alpha}_1+{\beta}_1 arm+{\gamma}_1x\right)+\exp \left({\alpha}_2+{\beta}_2 arm+{\gamma}_2x\right)+1} $$

where α1, β1, γ1, α2, β2, and γ2 are given parameters at each simulation.

We denoted Y(arm, x) truncated at 0 and Max by \( {\left.Y\left( arm,x\right)\right|}_0^{Max} \). We assumed that \( {\left.Y\left( arm,x\right)\right|}_0^{max} \) follows a truncated NB distribution, denoted by \( {\left. NB\left(r\left( arm,x\right),p\Big( arm,x\right)\Big)\right|}_0^{Max} \), with mean Er(arm, x), p(arm, x){Y(arm, x)| 0 < Y(arm, x) < Max} and standard deviation SDr(arm, x), p(arm, x){Y(arm, x)| 0 < Y(arm, x) < Max}. We further assumed that

$$ {E}_{r\left( arm,x\right),p\left( arm,x\right)}\left\{Y\left( arm,x\right)|0<Y\left( arm,x\right)<\mathit{\operatorname{Max}}\right\}={E}_{r\left( arm,0\right),p\left( arm,0\right)}\left\{Y\left( arm,0\right)|0<Y\left( arm,0\right)<\mathit{\operatorname{Max}}\right\}+a\cdotp x $$

and p(arm, x) = p(arm, 0) for all x, where a is a parameter assumed. Thus, for given Er(arm, 0), p(arm, 0){Y(arm, 0)| 0 < Y(arm, 0) < Max}, SDr(arm, 0), p(arm, 0){Y(arm, 0)| 0 < Y(arm, 0) < Max}, and parameter a at each simulation, the NB parameters r(arm, x) and p(arm, x) can be derived for given arm and x (see the detailed derivation in Additional file 1).

For given arm and x, the mean of the productivity loss outcome Y(arm, x) can be calculated by

$$ E\left\{Y\left( arm,x\right)\right\}={P}_1\left( arm,x\right)\cdotp 0+{P}_2\left( arm,x\right)\cdotp {E}_{r\left( arm,x\right),p\left( arm,x\right)}\left\{Y\left( arm,x\right)|0<Y\left( arm,x\right)<\mathit{\operatorname{Max}}\right\}+{P}_3\left( arm,x\right)\cdotp \mathit{\operatorname{Max}} $$

Distribution for covariate x

Since most RCTs are not randomized by patients’ productivity loss at baseline, which is highly correlated with productivity loss outcome, it is common to use regression models to adjust for baseline productivity loss. Therefore, in this study, we assumed x to be the productivity loss at baseline which follows a distribution with P, the probability of being zeros (P > 0), and 1 – P, the probability being non-zero values from a NB distribution truncated at 0 and Max, \( {\left. NB\left({r}_x,{p}_x\right)\right|}_0^{Max} \). That is, x is independent of treatments and all RCT participants should be working at baseline and thus their productivity loss does not equal to Max. The truncated NB distribution has mean = μx and variance = sdx. At each simulation, μx and sdx are given parameters and rx and px are derived from μx and sdx.

Simulation parameters

We assumed the time period for estimating work productivity loss is 12 weeks and the maximum loss time is 60 days (Max = 60). We considered three sets of parameters for the multinomial distributions for Y(arm, x), three sets of parameters for \( {\left. NB\left(r\left( arm,x\right),p\Big( arm,x\Big)\right)\right|}_0^{60} \) and one set of parameters for baseline productivity loss x (see all parameters in Table 1). The parameters were chosen based on our review of recently published articles which measured absenteeism and presenteeism in an RCT [7,8,9,10,11,12,13,14, 27, 39].

Table 1 Parameters in the simulation study

Number of observations each arm Nobs

Similarly, based on the common sample size in the previous RCT studies, we chose Nobs = 50, 100, and 200 participants who are working at baseline for each arm at each simulation. A sample size of 1000 and 2000 were also used to check whether bias (see definition below) varied by sample size.

Simulation algorithm

At each simulation, we generated Nobs samples for each arm in the following steps.

For arm = 0, 1 and i = 1, 2, 3, ……Nobs,

  1. 1.

    Randomly generate bi from Bernoulli distribution Bernoulli (P). If bi = 1, then let xi = 0. If bi = 0, then randomly generate xi from \( {\left. NB\left({r}_x,{p}_x\right)\right|}_0^{60} \).

  2. 2.

    Randomly generate vector (K1(arm, xi), K2(arm, xi), K3(arm, xi)) from the multinomial distribution (P1(arm, xi), P2(arm, xi), P3(arm, xi),1), where K1(arm, xi), K2(arm, xi), and K3(arm, xi) are either 0 or 1 and \( {\sum}_{j=1}^3{K}_j\left( arm,{x}_i\right)=1 \).

  3. 3.

    Randomly generate Z(arm, xi) from \( {\left. NB\left(r\left( arm,{x}_i\right),p\Big( arm,{x}_i\right)\Big)\right|}_0^{Max}. \)

  4. 4.

    The productivity loss outcome Yi(arm) is defined by

    $$ {Y}_i(arm)=\left\{\begin{array}{c}0\ if\ {K}_1\left( arm,{x}_i\right)=1\\ {}Z\left( arm,{x}_i\right)\ if\ {K}_2\left( arm,{x}_i\right)=1\ \\ {}60\ if\ {K}_3\left( arm,{x}_i\right)=1\ \end{array}\right. $$

Estimation methods

Each simulated dataset was analyzed using the following five regression models:

  1. 1.

    OLS;

  2. 2.

    NB: generalized linear model for NB distribution;

  3. 3.

    ZTNB: two-part model – logistic regression for the probability of being zero, and generalized linear regression with zero-truncated NB distribution (Hurdle) for the non-zeros.

  4. 4.

    ZG: two-part model – logistic regression for the probability of being zero, and generalized linear regression with Gamma distribution for the non-zeros;

  5. 5.

    Three-part model: multinomial logistic regression for the probabilities of being zero and 60 and generalized linear regression with Beta distribution for the those with values in (0, 60) (transformed to (0, 1)).

Estimand

Our estimands in the simulation study were θ(x) = E{Y(1, x)} − E{Y(0, x)} at the values of x of interest. The estimates of estimand θ(x) in each regression, denoted by \( \hat{\theta}(x), \) were derived from the given x and the estimated regression parameters. We used bootstrapping method, 1000 replications, to estimate the standard error (SE) of \( \hat{\theta}(x) \) in all regressions except OLS.

Performance measures

Our key performance measure of interest was bias \( \hat{\theta}(x)-\theta (x) \). We assumed \( SD\left(\hat{\theta}(x)\right)\le 4 \) (the standard deviation of the estimators \( \hat{\theta}(x) \)) and considered 0.08 as an acceptable Monte Carlo SE of bias. Accordingly, we needed at least 2500 repetitions based on the published simulation study guidance [38]. We chose our final number of repetitions, nsim = 5000 and thus Monte Carlo SE of coverage would be 0.3 and 0.7 for coverage of 95 and 50%, respectively, which are acceptable.

Our performance measures included bias, coverage, power, empirical and model-based SE, and the mean squared error (MSE) for \( \hat{\theta}(x) \) at x = μx, 0 and 30. Their definitions can be found in Morris et al. [38]. All analyses were performed using SAS 9.4. SAS codes used for our simulation study are available in Additional file 1.

Results

Computational issues

We encountered convergence issues mainly in the scenarios with smaller number of observations and higher proportion of Group I and lower proportion of Group III. For example, when Nobs =50, the simulated databases based on 80% zero loss could not generate enough samples for Group II and Group III and thus we did not compare the five models in this scenario. Similarly, the simulated databases based on 5% max loss could have no samples for Group III and thus the three-part model was considered for such scenarios. For some of the remaining scenarios, the issue of quasi-complete separation (the maximum likelihood estimate may not exist) was detected while running multinomial logistic regression for the three-part model. Table 2 presents the number of databases with quasi-complete separation detected among the 5000 simulated databases by three different sets of distributions of productivity loss outcomes in two arms (i.e., the proportions of zero loss, some loss and max loss for productivity loss outcomes in the two arms); Nobs = 50, 100 and 200; and whether the two arms have equal scale parameters for truncated NB distributions of their productivity loss outcomes. The number of simulations with the convergence issue was very small.

Table 2 Number of simulated databases with quasi-complete separation

Table 3 presents the proportions of the total 10,000 simulations (5000 for equal scale and 5000 for unequal scale) that have the number of quasi-complete separation detected among their 1000 bootstraps equal to 0, 1–10, 11–100, 101–400, 401–700, 701–950, and 950–1000 by the distributions of productivity loss outcomes and Nobs. The issue was the most problematic in the scenario with the distribution of productivity loss outcome, 80%/15%/5% vs. 60%/30%/10%, and 100 observations. Only 38.03% of the 10,000 simulations did not have any bootstraps with quasi-complete separation, 41.97% had 1–10 bootstraps with quasi-complete separation and 0.09% had > 950 bootstraps with quasi-complete separation.

Table 3 The proportions of simulations that having quasi-complete separation issues among their 1000 bootstraps

Bias

Figure 1 (for Nobs = 100) and Supplementary Figure S1 (Nobs = 50), S2 (Nobs = 200), S3 (Nobs = 1000) and S4 (Nobs = 2000) (see Additional file 2) present the bias for the five regression models by the value of baseline productivity loss, x; three different sets of distributions of productivity loss outcomes in two arms; and whether the two arms have equal scale parameters for truncated NB distributions of their productivity loss outcomes.

Fig. 1
figure 1

Mean bias for the number of observations in each arm = 100. Legends: OLS: ordinary least squares; NB: negative binomial; ZTNB: two-part model – logistic regression for the probability of being zero, and generalized linear regression with zero-truncated NB distribution for the non-zeros; ZG: two-part model – logistic regression for the probability of being zero, and generalized linear regression with Gamma distribution for the non-zeros; Three-part: multinomial logistic regression for the probabilities of being zero and 60 and generalized linear regression with Beta distribution for the those with values in (0, 60) (transformed to (0, 1))

Covariate x

All models performed similarly well at the mean of baseline productivity loss, x = 14. At other assessed x values, NB performed worst for all scenarios and three-part model performed best. Two-part models performed almost exactly the same for any x values regardless whether the second part was assumed to be a truncated NB or gamma distribution.

Proportions of zero loss and max loss

When productivity loss outcome had a high proportion of zero loss (> 50% in at least one arm) and a non-trivial proportion of max loss (> 5%), two-part models and three-part model performed better than OLS and NB models. The performance of OLS was getting better when the proportion of zero loss was lower (≤50% in both arms) and could be better than two-part models when x values were higher.

Different scale parameters between the two arms

When the two arms had unequal scale parameters in the assumed truncated NB distributions for their productivity loss outcomes, three-part model provided bias estimators for all values of x. Two-part models could produce lower bias than three-part model at lower values of x. OLS performed best or as well as two-part models or three-part model in the databases with the proportions of productivity loss outcomes at 50%/40%/10% vs. 30%/55%/15%.

Sample size

For Nobs = 50, we only applied the three-part model for databases with the proportions of productivity loss outcomes at 50%/40%/10% vs. 30%/55%/15%. The performance results based on bias did not change when Nobs increased from 50 to 100, 200, 1000 and 2000.

Other performance measures

Similar to bias, other performance measures, including coverage, power, empirical SE, model SE and MSE, were comparable among the five different models when x = 14 (Tables 4, 5 and 6 and Supplementary Tables S1-S3 in Additional file 3). However, when x = 0 or 30, the coverages of the two-part models and three-part model were similar; the coverage of the three-part model was slightly larger than that of OLS and NB except when the proportions of productivity loss outcomes were at 50%/40%/10% vs. 30%/55%/15% (Table 4 and Supplementary Tables S1-S3 in Additional file 3). When x = 0, the empirical SE and MSE of the NB, two-part models and three-part model were similar and they were all lower than those of OLS in all scenarios (Tables 5 and 6 and Supplementary Tables S1-S3 in Additional file 3). On the other hand, when x = 30, the empirical SE and MSE of OLS were the lowest, followed by the three-part model. Those of NB were the highest. The two-part models (ZTNB and ZG) performed the same using these performance measures.

Table 4 Coverage for the number of observations each arm = 100
Table 5 Empirical standard error for the number of observations each arm = 100
Table 6 Mean squared error for the number of observations each arm = 100

Discussion

In this paper, we compared different statistical models for analyzing productivity loss outcomes in RCTs by considering their data distribution characteristics with inflated zero and max values. We focused on five commonly used models, OLS, NB, ZTNB, ZG, and three-part model, adjusting for one covariate. From our simulation results, we found that NB performed worst overall. Two-part models assuming the second part following zero-truncated NB (ZTNB) or Gamma distribution (ZG) performed the same in all scenarios. The performance of OLS, two-part models and three-part model varied in different scenarios. Based on our results, we provided the following practical recommendations if the treatment effect at any given values of one single covariate is of interest:

  1. 1.

    Check the sample size and the proportions of zero loss and max loss in each arm; If the sample size of each arm (i.e., the number of participants who are working at baseline) is ≤50 and there are ≤5 subjects with max loss, three-part model should not be considered. Two-part models (either ZTNB or ZG) should be used if the proportion of zero loss > 50% in at least one arm and OLS should be used if the proportion of zero loss ≤50% in both arms.

  2. 2.

    If the sample size of each arm is > 50 and the proportion of max loss > 5% with more than 5 subjects, then check the scale parameter of the productivity loss outcome distribution between zero and max loss (Group II) for each arm, which was an influencing factor on model performance:

    • Our three-part model assumed a Beta distribution for Group II and thus the scale = α + β, where α and β are Beta distribution parameters derived from the mean and variance of Group II. If the two arms have similar scales, three-part model should be used.

    • Otherwise, the two-part models should be used if the proportion of zero loss > 50% in at least one arm and OLS should be used if the proportion of zero loss ≤50% in both arms.

We chose the baseline productivity loss as a single covariate in our simulations, but the above practical recommendations would apply to any single covariate models for analyzing productivity loss outcomes in an RCT, in which the covariate is well balanced between treatment arms and associated with the productivity loss outcomes.

Our study followed the published guidance by Morris et al. [38] for design, execution, analysis, reporting, and presentation of simulation studies. To the best of our knowledge, there have not been simulation studies considering data like productivity loss outcome with excess zero and max values. However, some previous simulation studies have compared different statistical models for data with excess zeros and found that zero inflated models or two-part models performed better than Poisson model, NB model or OLS [34, 35]. Our study showed consistent results that two-part models performed better than OLS if data had > 50% zeros in at least one arm of an RCT.

Our study has limitations. First, as mentioned above, we had convergence issues because of quasi-complete separation in simulated databases or their bootstraps. Thus, we did not apply the three-part model for Nobs =50 in scenarios with 5% max loss. However, for Nobs =100 and 200, the number of simulated databases with quasi-complete separation was very small (Table 2), which should have minimal impact on our mean bias estimates. The quasi-complete separation issues detected in bootstraps were also within an acceptable range (Table 3).

Second, we compared five commonly used statistically methods to make more informative and practical recommendations to a broad clinical audience. However, there are other potential methods that could be used, for example, Poisson model, zero-inflated models and other mixture models. We chose NB model, two-part models and three-part model by assuming they would perform similarly to Poisson model, zero-inflated models and other mixture models, respectively.

Third, our simulation parameters were determined based on published RCTs, which were selected after a rapid literature review of RCT studies that measured and reported work productivity loss (absenteeism or presenteeism or both or all three subcomponents). However, the scenarios considered in our simulation study might not cover all possible scenarios of RCTs. Our simulation method can be used to compare statistical models in other scenarios we did not consider in the future.

Conclusions

In summary, we conducted a simulation study to compare five statistical models for analyzing productivity loss outcomes in RCTs. Our findings suggest that NB model performs worst. If treatment effect at any given values of a single covariate is of interest, the model selection among OLS, two-part models and three-part model depends on the sample size, the proportions of zero loss and max loss, and the scale of the productivity loss outcome distribution between zero and max loss in each arm of RCTs.