Introduction

Paid productivity is conceptualised by two distinct but related concepts: absenteeism and presenteeism. Absenteeism refers to the loss of productivity caused by being absent from work because of poor health [1]. Presenteeism describes the impact on productivity whilst at work because of health problems [2]. Presenteeism is broadly interpreted as ‘health-related productivity lost whilst at paid work’ [3]. This interpretation suggests a strong conceptual link between health status and presenteeism. Social pressures and other behavioural factors drive people to come into work even when they are not in ‘full health’ [4]. Presenteeism affects not only the individual in poor health but may also have consequences for co-workers where they must pick up the additional workload.

The debate concerning the inclusion of the impact of productivity in economic evaluations continues. Some jurisdictions, such as the National Health Care Institute (Zorginstituut Nederland, ZIN) in the Netherlands, encourage the inclusion of productivity in economic evaluations [5] but others, including, the National institute for health and care excellence (NICE) in England, explicitly exclude it [6]. Normative arguments largely centre on the distribution of consequences as a result of including or excluding productivity in economic evaluations. It is argued that the inclusion of productivity in economic evaluations may influence funding decisions towards healthcare interventions aimed at particular patient subgroups across the population [7, 8].

Driven by methods guidelines, the most commonly used method of economic evaluation has become Cost-effectiveness analysis (CEA) [6, 9]. The method of CEA is often implemented by identifying the consequence of interest as the impact on health status measured using the EQ5D [https://euroqol.org/] and valued using published preference weights [10] to generate Quality-adjusted life years (QALYs) for the relevant population. The lack of ‘gold’ standard methods for identifying, measuring, and valuing the impact of presenteeism may have discouraged researchers from collecting such data further limiting its availability in existing datasets. Prospective studies may be set-up to collect presenteeism related data; however, conducting such studies is almost always an expensive venture. From a pragmatic perspective, an alternative approach may be possible to develop predictions models for presenteeism based on data already collected, for example health status [11].

Since 2005, two studies have quantified the link between health status (measured using the EQ5D) and presenteeism [11, 12]. Lamers et al. [12] used data from a cluster Randomised control trial (RCT), designed to evaluate the effectiveness of physiotherapy guidelines for low back pain in a sample of 483 Dutch patients, to assess the relationship between health status and presenteeism caused by low back pain. Health status was measured using the EQ5D-3L (applying UK preference weights), absenteeism using the Health and labour questionnaire (HLQ) [13], and presenteeism using the Quality and quantity (QQ) method that reports efficiency loss [14]. The analysis estimated a mean EQ5D-3L score of 0.48 for individuals who reported absenteeism (off work for a full 2 weeks) and a mean EQ5D-3L score of 0.71 for those who did not report absence from work. Patients who reported zero days absent from work had a mean efficiency loss due to back pain of 0.20. The authors concluded the study provided evidence that indicated a potential relationship between health status and productivity exists with lower mean EQ5D-3L scores for those reporting absences from work compared to those who did not; however, the evidence was insufficient to recommend the prediction of presenteeism using health status.

In a later study by Krol et al. [11], two distinct prediction model linking health status (EQ5D-3L) with productivity (two separate models for absenteeism and presenteeism) were developed based on responses from a sample of 1013 employed individuals from the Dutch population. Individuals were presented with 16 EQ5D-3L health states and asked to state their expected (imagined) level of productivity for each given health state. The subsequent prediction model for presenteeism, measured using the Quality and quantity (QQ) method, was estimated using Generalised estimating equations (GEEs). The purpose of the prediction model was to estimate levels of presenteeism and populate datasets that have not recorded such data. To promote wide applicability of the model across multiple datasets only age and gender were included as covariates in the prediction model [14]. Krol and colleagues [11] assessed the external validation of their prediction model using data collected by Lamers and colleagues [12] and found the model was poor at estimating presenteeism at the individual level but was reasonable when data were aggregated.

Prediction models or ‘mapping’ (also called ‘crosswalking’) algorithms have been produced to develop a quantitative link between non-preference-based, disease-specific measures and generic preference-based measures such as the EQ5D-(3L or 5L) [15]. Franklin and colleagues [16] used mapping methods to quantify the relationship between health status, measured using the EQ5D-3L, and capability, measured using the ICEpop CAPability measure for Older people (ICECAP-O). The study concluded that a clear relationship could not be defined [16]. Nevertheless, the methods used by Franklin and colleagues did indicate the potential for the development of a mapping algorithm that: (1) uses health status as an explanatory variable; and (2) maps from health to a concept beyond health.

An important recommendation for analysts seeking to develop a mapping algorithm is that the first step should be to understand whether there is sufficient conceptual overlap between the constructs being mapped [17]. There is existing qualitative evidence to support that there is conceptual validity between existing measures of health status (the EQ5D and SF6D) and the concept of presenteeism. Jones and colleagues used the results from qualitative semi-structured interviews to show a conceptual link between the impact on health status, as measured by the EQ5D or SF6D, and the potential impact on presenteeism [18]. The study did not, however, provide a quantifiable link between the two concepts of health status and presenteeism providing motivation for the development of a mapping algorithm [18]. The aim of this study was to identify whether it is feasible to develop a mapping algorithm that can be used to predict presenteeism using existing multiattribute measures of health status. The goals of the mapping algorithms are twofold: (1) is to explore the extent to which health status/capability measures are able to predict presenteeism, allowing for a further understanding between any potential relationship; and (2) to provide a method which allows presenteeism to be retrospectively predicted using health status/capability data in large datasets where such data have not been collected.

Methods

Case study

Rheumatoid arthritis (RA) is a fluctuating chronic inflammatory auto-immune condition that primarily causes stiffness and pain in joints and tendons of the hands and feet. It is the most common inflammatory auto-immune condition in the United Kingdom (UK) and if left untreated can cause permanent damage to joints leaving the individual disabled [19]. Typically, disease onset occurs before the age of 65 years old (the current retirement age in the UK) meaning that individuals are frequently affected during their working lifetime [20]. There is substantial evidence to suggest that RA is significantly associated with increases in presenteeism [21].

Study sample

The relevant study population for this study was defined as adults who were currently in work and had a self-reported diagnosis of RA. A sample of adults (18 years and over) with RA who were currently working in full-time or part-time paid positions were invited to take part in the study. The study sample was identified and recruited using an internet panel provider (ResearchNow, now called Dynata; https://www.dynata.com/). A sample size of n = 500 was informed in line with published mapping studies listed on the Health economics research centre (HERC) database of mapping studies (version 7.0) [22].

Data collection

Data were collected using a bespoke online survey. Ethical approval from the University of Manchester was granted (reference number: 16144). Informed consent was taken at the beginning of the survey before the participant completed the survey. Respondents were informed they could leave the survey at any time without providing a reason; however, it was also explained that once the participant clicked “submit” they would not be able to retrieve and withdraw their responses due to anonymisation. The survey collected data on each individual’s: demographics; job type (sedentary, light, medium and heavy) and employment status (full-time, part-time; employed or self-employed); disease severity, measured by the Routine assessment of patient index data three survey (RAPID3 [23]; medications; health status (EQ5D-5L and SF6D); and presenteeism, measured using the Work productivity activity impairment (WPAI) [24]. The WPAI was selected as the relevant measure of presenteeism for this study because it is recommended for use in patients with RA by the Outcomes measures in rheumatology group (OMERACT) [25], adopts a patient perspective, and is relatively short, thereby reducing participant burden. The WPAI asks: ‘During the past 7 days, how much did your rheumatoid arthritis affect your productivity whilst you were working?’. The WPAI records levels of presenteeism using a zero to ten Likert scale where zero indicates ‘RA had no effect on my work’ and ten indicates ‘RA completely prevented me from working’. The WPAI has been well tested for its validity and reliability both within RA and other chronic conditions [26, 27]. The EQ5D-5L and SF6D were transformed into index values using the relevant published algorithms available and acceptable for use during the analysis period of this study [28, 29].

Analysis

Data analysis involved three stages in line with published recommendations for producing mapping algorithms [30, 31].

Statistical correlation

Spearman’s rank (r) correlation was used to measure the strength and direction between the measures of health status (EQ5D-5L/SF6D) and presenteeism (WPAI). The potential strength of the correlation was described by categories defined prior to the start of the study: very weak (r = 0 to 0.19); weak (r = 0.2 to 0.39); moderate (r = 0.40 to 0.59); strong (r = 0.6 to 0.79); and very strong (r = 0.8 to 1) [32]. If a sufficient correlation (defined as moderate or above) was identified between the EQ5D-5L [33] and/or SF6D [34] with WPAI [24] then those measures of health status would be taken forward and developed to form a mapping algorithm for presenteeism. Supplementary Appendix 1 describes the approach to understand the performance of the WPAI in this study sample in terms of reliability (internal consistency) measured using Cronbach’s alpha.

Regression model and specification

Potentially suitable regression methods for producing a mapping algorithm were defined prior to analysis of the data. The dependent variable was defined as the level of presenteeism (using WPAI) and the independent variables included a measure of health status (EQ5D-5L or SF6D) with covariates for age and gender. This study took a parsimonious approach to the inclusion of additional covariates to allow for wider applicability of a specified mapping algorithm; a method used in published algorithms [11, 35]. Age was collected in pre-defined age bands and gender (male; female) was treated as a dummy variable.

There are many potential regression models that can be used to generate a mapping algorithm. Published guidelines for developing a mapping algorithm state that the selection of model type depends on the characteristics of the dependent variable (categorical, ordinal, etc.) and its distribution [30]. Longworth and Rowen [30] explain the need to take into account the bi-model distribution of the EQ5D for algorithms attempting to predict EQ5D values. However, the focus of this study is to develop an algorithm that predicts levels of presenteeism and not utilities for the EQ5D. Presenteeism, measured using the WPAI, can take values from zero to ten, increasing by increments of one and typically exhibits a negative distribution skewed to the left (many zeros). No formal guidelines exist to inform the model type for predicting presenteeism, therefore five types of regression models were selected as potential candidates to develop the mapping algorithm: (1) Ordinary Least Squares (OLS); (2) Tobit; (3) Censored Least Absolute Deviation (CLAD); (4) Ordinal Logit (Ologit); (5) multi-variable logit (mlogit).

OLS models a linear relationship and assumes equal distance between values of the dependent variable; this is consistent with the interpretation of the levels (zero to ten) included in the WPAI. OLS is an unbounded regression model and may produce inconsistent estimators when dealing with censored (left or right) dependent variables [36]. Tobit models are a potentially useful alternative when data are censored. Tobit models allow the analyst to set upper and lower limits for the dependent variable, for example 0 ≤ y ≤ 10. Tobit models are highly sensitive to heteroscedasticity which can lead to inconsistent estimates and affecting the standard errors [37]. Therefore, the use of a CLAD model was explored because it is less sensitive to skewed data and is robust to heteroscedasticity but is also censored at a lower value of zero [38].

Ordinal logit regression models are used for its ability to predict an ordinal dependent variable, for which presenteeism, as measured by the WPAI, is in this study. Ordinal logit models estimate the cumulative probability of observing an outcome using specified explanatory variables. The multinomial logit model, a similar regression model to ordinal logit where it also uses cumulative probabilities to predict an outcome level, was selected for its ability to generate predictions across multiple outcome levels. The observed outcome of the WPAI may take one of multiple levels ranging from zero to ten.

Six model specifications (see Table 1) were run for each of the five regression models. In total, 60 potential mapping models were specified to test their ability to predict presenteeism. The EQ5D-5L and SF6D were incorporated into separate mapping models as: (1) index scores; and (2) dummy variables for each level of severity associated with each domain.

Table 1 Model specifications

Model performance

The Root mean square error (RMSE) was used as the metric from which to judge models relative ability to predict presenteeism; a lower RMSE reflects smaller prediction errors. The RMSE was selected as the measures of prediction accuracy because it is able to penalise to a greater extent those predictions that are further away from the actual observed value [39]. The RMSE is an appropriate measure of error where predicted levels of presenteeism that are further away from the actual are interpreted to be considerably worse compare to those that are closer to the true value. The Mean bias error (MBE) is used to estimate the average bias, under or over-prediction, of the model as defined by the sign (negative or positive) and may be used to inform measures to correct to the bias [40].

To calculate the RMSE and MBE for each model, the K-fold method was used to split the sample. There is no ‘gold standard’ method for selecting the most appropriate number of folds, however ten folds is common practice [41] and therefore K = 10 in this study. The RMSE results are reported using graphical plots and across quartiles of the WPAI’s range.

Results

A total of 514 individuals completed the survey. A total of 42 individuals were dropped from the sample. The primary outcome, level of presenteeism as measured by the WPAI, was missing for 42 observations (8% of the total sample).

Dynata (ResearchNow) recommend rejecting surveys where participants take less than 33% of the average time taken to complete the survey; participants completed the survey within an average of equating to 4.29 min. Therefore, a further 13 observations were dropped from the sample because they completed the entire questionnaire in less than 4.29 min. Two observations were dropped because they reported contradicting answers to two separate questions that asked them about their current work status. One observation reported to be on maternity leave; and one reported to have worked longer hours than are available in one week. The final sample consisted of 472 individuals working with RA. Table 2 describes the key characteristics of the study sample.

Table 2 Key characteristics of sample

Figure 1 illustrates the distributions of two measures of health status (EQ5D-5L or SF6D) and presenteeism (WPAI). The distribution for the EQ5D is highly skewed to the right whereas the distribution of the SF6D appears on visual inspection to be normally distributed. The distribution for presenteeism is slightly skewed to the left and negative; however, there is a spike in the number of people reporting the value of ‘five’ as their level of presenteeism. Testing for heteroscedasticity is reported in the Supplementary Appendix 2. In this study, the internal consistency of WPAI, measured using Cronbach’s alpha, was 0.899 suggesting sufficiently high reliability for this measure in this sample (see Supplementary Appendix 1).

Fig. 1
figure 1

Distribution of health status and presenteeism

Statistical correlation

Spearman’s rank correlation suggested a strong and negative correlation between the WPAI and EQ5D-5L (r = − 0.64) and the WPAI and SF6D (r = − 0.60) providing evidence that, in theory, mapping algorithms could be produced using either of these measures of health status.

Model selection

Table 3 presents information on the predictive ability (RMSE) of all the models ran to predict presenteeism using EQ5D-5L or SF6D data. The MBE for all models was zero indicating zero bias in the models. Overall, the models that used dummy variables for each of the domains of the EQ5D-5L and SF6D produced more accurate estimates compared with those that used the index score and typically, those models that used covariates (age and gender) also performed better compared with models that did not include covariates.

Table 3 RMSE of all potential model specifications for predicting presenteeism

Table 3 reports the RMSE for each model. The model with the smallest RMSE (1.7858) was for the OLS model with SF6D dummy model with age and gender interacted (model 36). The model with the next smallest RMSE was the OLS model with EQ5D dummy model with age and gender interacted (model 33) which had a RMSE that was fractionally larger than model 36 (RMSE = 1.7859). The full algorithms for models 36 and 33 are presented in the supplementary appendix 4. The observed and predicted values of the two model specifications (33 and 36) are illustrated in Fig. 2. The graphical plots suggest the two mapping algorithms were able to predict presenteeism scores with some degree of accuracy. However, both models tended to over-predict levels of presenteeism at observed levels between zero and four and under-predict levels of presenteeism at observed levels of five and over.

Fig. 2
figure 2

Observed and predicted levels of presenteeism using EQ5D-5L and SF6D. The size of the circles represent the volume of observed and predicted values of presenteeism

The RMSE of models 33 and 36 were compared across the quartile of the range of the presenteeism scale (Table 4). Model 36 had lower RMSEs in three of the four quartiles suggesting that, overall, model 36 generates more accurate predictions of presenteeism compared with model 33.

Table 4 RMSE across subsets of WPAI (presenteeism) range

Discussion

This study aimed to develop a mapping algorithm that predicts levels of presenteeism, measured by the WPAI, using HRQoL data. The study tested a wide range of potential models. The top six models, based on the lowest RMSE, were similar where they all used OLS regression and dummy variable data. However, given the mean scores and widely overlapping RMSE confidence intervals, based on this current study, it is clear no model outperforms any other. Descriptively, the SF6D domain level dummies, with age and gender interacted (model 36) (the model with the lowest RMSE) would be a potential candidate model that could be tested further as based on the results of this study. With that said, the range of RMSE (minimum and maximum) values for each model do not increase uniformly across all models as the RMSE increases (see Table 3) suggesting the potential need to conduct a study with a larger sample size.

The top two models that utilise the SF6D or EQ5D-5L domain level dummies, with age and gender interacted (model 36 and 33, see Table 3) had only fractionally worse predictive ability, as measured by the RMSE. Examining the graphical plots of the predicted levels of presenteeism estimated by these two models (Fig. 2) reveals little difference in predictions between the two models. It is reasonable to suggest that the SF6D and the EQ5D-5L may have the potential to predict presenteeism to a similar degree of accuracy; a pragmatic result for populating those datasets that house only EQ5D-5L or SF6D data.

The qualitative study that explored the conceptual validity between measures of health status, captured by the SF6D and EQ5D, and presenteeism suggested both measures of health status were able to capture important factor of RA that increase levels of presenteeism [18]. The results of this study suggest the same; however, further research is needed to confirm the predictive ability of the SF6D and EQ5D for levels of presenteeism.

Strengths

To our knowledge, this is the second of two studies that have applied mapping algorithms to quantitatively link health status with a concept beyond health and is the first to apply mapping methods to predict levels of presenteeism. Prediction models using health status data for presenteeism are limited and have focussed their efforts on developing models using EQ5D-3L data [11, 12], whereas the study presented here is the first to develop a prediction model for presenteeism using EQ5D-5L and SF6D data.

There is strength in the results of this study because they are based on data collected from individuals who were still working with RA during the time of this study. The results capture the reality of working with RA including nuances such as the ability to adapt and manage a chronic condition. This is in direct contrast with the study design used by Krol and colleagues [11] where individuals were asked to imagine their levels of presenteeism given a specific health state. A potential reason why Krol and colleagues [11] did not find an strong relationship between health status and presenteeism is that individual who have no experience of working with a chronic condition have little understanding of the actual impact it may have on their ability to work. However, it must be noted that the results from this study are far from conclusive and an external validation study is needed to confirm confidence in the algorithms generated in this study.

Limitations

The developed mapping algorithm must be understood in light of some limitations. Few observations for presenteeism at levels 9 and 10 (very severe levels of presenteeism) meant that the mapping algorithm struggled to predict these high levels of presenteeism (see Supplementary Appendix 3).

This preliminary study used a complete case analysis of a dataset comprising data from all completed surveys. We did not use multiple imputation methods to generate estimates of ‘missing’ data because the literature is currently unclear regarding how to combine multiple imputation within predictive modelling. Research into multiple imputation methods is currently very active with researchers exploring issues related to; the assumptions made when applying imputation methods [42], how to account for imputation uncertainty and its impact on subsequent statistical testing [43], and how model selection is affected after having applied multiple imputation methods [44]. Using a complete case analysis approach will not affect the observed estimated mapping algorithms but may affect the generalisability of the results.

Developing a prediction model based on few observations is not recommended, therefore, we considered the possibility of collapsing observed presenteeism levels eight, nine and ten to make one group. Ultimately, this approach was decided against where the primary purpose of the mapping algorithm was to enable a prediction of presenteeism at all levels. It may be the case that the lack of observations for very high levels of presenteeism (nine and ten) reflects the current health and work status of the individuals sampled in this study where those individuals who are able to continue working do so because they know they are, broadly, able to keep pace with their work and therefore report low, mild or moderate levels of presenteeism. Individuals who might report severe presenteeism may be struggling to remain productive at work and are potentially less likely to engage with studies such as this potentially making them a difficult subgroup to reach. Further research is required to study the characteristics of individuals who work with severe levels of presenteeism. Furthermore, the evidence presented in this study may potentially help towards an improved understanding of the differences between inter-individual levels of presenteeism; however, further research is needed to quantify absolute productivity losses. Potential new methods, such as the Productivity adjusted life years (PALYs), as discussed by Ademi et al. [45] aim to quantify productivity loss and incorporate productivity explicitly in cost-effectiveness studies. A mapping algorithm linked to PALY utilities may be useful, particularly to populate those datasets where PALY utilities have not been collected.

To promote the use of a mapping algorithm, it must be rigorously tested using an external dataset [30]. Unfortunately, and to our knowledge, there is no dataset that has SF6D, EQ5D-5L and WPAI data that can be used to externally validate the algorithms.

The mapping algorithms were developed using an RA population only. Further research is needed to understand whether the models could be used in: (1) populations with diseases similar to RA, for example ankylosing spondylitis; and (2) populations with any other form of chronic physical condition that makes working difficult, for example chronic pain.

Conclusion

The results of this study suggest there is a quantifiable relationship between health status, measured using the EQ5D-5L and SF6D, and presenteeism, measured using WPAI. This study indicates the potential to develop a mapping algorithm to populate large datasets that have health status data, EQ5D-5L or SF6D, but do not currently possess presenteeism data; a pragmatic and inexpensive solution towards generating estimates of presenteeism where such data are scarce. However, it is not possible to recommend the mapping algorithms developed in this study due to the lack of external validity. Further research is needed to assess the external validity and understand the generalisability of the mapping algorithms in populations working with different chronic conditions.