FormalPara Key Points for Decision Makers

There is a large amount of literature on methods to handle missing data in randomized controlled trials, which are not necessarily always the only data source used in health economics and outcomes research.

Researchers should report the presence of missing data and consider the mechanism of missingness.

Researchers should conduct sensitivity analyses and report those results in the study.

1 Introduction

Missing data are defined as “values that are not available and that would be meaningful for analysis if they were observed” [1]. Unfortunately, some study protocols and/or statistical analysis plans used in the health economics and outcomes research (HEOR) studies place little emphasis on the approach to assess missing data. Data can be missing because of various reasons, which can include patient attrition during the study follow-up, unavailability of clinical information and/or patient-reported outcomes (PROs) [2]. Observations with missing information about costs or other outcomes may be systematically different from those with complete information. Missing data can bias the inference in HEOR studies because of incomplete information for costs and/or health outcomes and in confounding variables, which in turn can lead to inappropriate policies [3]. The HEOR studies considered in the context of the present article are real-world evidence studies that conducted a secondary/post-hoc analysis using randomized controlled trial (RCT) data, and a within-trial cost-utility analysis in which the outcome of interest was costs or PROs including preference-based utilities (e.g., EQ-5D). The study also had to be observational by design using specific analytical approaches to address missing utility, costs, or scores. However, this article did not include those studies, which addressed other healthcare resource utilizations such as hospitalizations, physician visits, or telemedicine costs.

There is a large amount of literature on handling missing data in RCTs that are not necessarily always the data used in HEOR [1, 4]. A review of issues of the Health Technology Assessment from 2013 to 2015 found that the most common approach for the RCT-based cost-effectiveness analysis was to restrict analysis to those observations with complete data [5]. A previous study provided a structured approach and practical guidance on managing missing data in the context of a cost-effectiveness analysis based on RCT data to address the uncertainty associated with missing data, and quantify its impact on results [3]. However, in HEOR studies, which are based on large, observational, often longitudinal and/or multi-country data sources, missing data may cause substantially more bias as more data on outcomes, exposure and/or other confounding variables may be missing [6]. The causes and patterns of missing data in observational data sources considered in HEOR studies tend to be different from what can be observed in RCTs. For example, many HEOR studies use data sources where patients were not randomized in treatment groups, and the presence of missing data may further confound relationships between treatment and outcomes, which complicates the ability to determine the effect of missingness on outcomes. It becomes more challenging to disentangle potential bias because of missing data from other types of bias in observational studies. Though the presence of missing data can produce considerable bias in HEOR studies, it is often not given adequate attention.

Methods that are specifically developed to account for missing data generated from RCTs might not be immediately applicable to studies based on observational data. Efficacy and safety are important endpoints for RCTs, whereas HEOR studies based on observational data sources jointly model costs and outcomes, which might make it more challenging to apply modern analytical techniques to handle missing data. Moreover, in a RCT, one might have access to individual patient-level data, whereas in a HEOR study based on observational data one might have access to only summary statistics, which does not enable the assessment of the potential mechanisms and individual patterns of missingness. Nonetheless, few studies have attempted to address the issues of handling missing data in HEOR based on different data sources [7]. This study bridges that gap by focusing on handling missing data within the context of HEOR.

The objectives of the present study are to provide an overview of different types of missing data mechanisms used in HEOR studies and report the findings of a systematic literature review (SLR) of specific methods used to deal with missing data derived from trials (a post-hoc or secondary analysis using RCT data, and a within-trial cost-utility analysis), observational studies, reviews, and previously published guidance. This review specifically focused on handling missing data within the HEOR field emphasizing cost, and PRO measures including preference-based utilities (e.g., EQ-5D). It needs to be mentioned here that the objective was not to quantify how often all of the different methods to handle missing data were used or how often researchers ignore the issue of missing data in the HEOR studies considered in the specific context of this article.

The remainder of this article is structured as follows. We first provide a brief summary on the types of missing data and the most common approaches used in HEOR studies. Then, we describe the methods of the SLR and its main findings. In the final section, we discuss handling missing data in HEOR studies based on the findings of the review.

1.1 Background on Missing Data Classifications and Mechanism for Data Missingness

This section uses the classification of missing data problems as proposed by Rubin according to the following types of mechanism for missing data: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) [8,9,10]. An example of each of those mechanisms is available in Sections 1.1 and 1.2 of the Electronic Supplementary Material (ESM).

1.1.1 Missing Completely at Random (MCAR)

In this case, the observed or unobserved values of all variables in a study do not have any influence on the probability of an observation being missing [11].

1.1.2 Missing at Random (MAR)

In this case, the probability of missing data for a particular variable is associated with the observed values of variables (either observed values of other variables in the dataset or observed values for the same variable at previous timepoints) in the dataset, but not upon the missing data [11]. When a researcher had no control over missingness, and the distribution of missingness is unknown, MAR is the assumption. Generally speaking, it is not possible to test whether MAR holds in any dataset [12].

1.1.3 Missing Not at Random (MNAR)

In this case, the probability of missing data for a particular variable is related to the underlying value of that specific variable, even within the strata of observed variables, as well as other unobserved values within the dataset [11]. An alternative approach to classify the mechanism of missingness is to place it in two categories: (1) ignorable and (2) non-ignorable (informative) [8, 9, 13, 14]. In the case of ignorable missingness, missing values occur independently of the data collection process; the MCAR and MAR mechanisms belong to this category. If MAR holds, then the missing data model is considered “ignorable” for all practical purposes. To make valid inferences, one needs to consider the factors, which might influence the missing data rate [9]. In the case of non-ignorable missingness, there is a structural cause to the missingness mechanism that depends on unobserved variables or the missing value itself and should be addressed when dealing with the missing data. The MNAR mechanism falls under the non-ignorable missingness category. In the case of non-ignorable missingness, it is important to consider and justify the process that created the missing data [9]. There are situations in real-world HEOR studies, in which the missingness mechanism cannot be classified distinctly under MAR/MCAR/MNAR classes and therefore MAR/MNAR do not always coincide with the clear distinction between ignorable and non-ignorable. The assumption of ignorable or non-ignorable is important for real-world HEOR studies involving missing data.

It is important to highlight that all methods for handling missing data imply some assumptions (implicitly or explicitly) on the mechanisms. In the next section, we provide a summary of the most commonly used approaches to missing data.

1.2 Background on Methods to Deal with Missing Data

The methods to deal with missing data include the following: complete-case analysis (CCA), available-case (AC) analysis, multiple imputation (MI), multiple imputation by chained equation (MICE), and predictive mean matching. The MICE is a type of MI technique, which can tackle both continuous and categorical variables and uses a series of regression to model variables with missing data conditional upon other variables in the dataset [15]. A detailed description of these methods is provided in Sections 1.1 and 1.2 of the ESM.

2 Methods

2.1 Literature Search Strategy

PubMed was searched for relevant peer-reviewed journal articles in English published until December 2020. The inclusion criteria focused on those studies for which the analysis of interest focused on estimation of health-state utility values, health state-related costs, or quantification of PROs in relation to health states. We excluded primary RCT studies from this review and studies where the outcome of interest pertained to safety or efficacy only. Studies were excluded if they did not discuss or review methodologies to account for missing data or if they conducted only a CCA as a method to address missing data and did not employ any other techniques, and did not deal with cost, utility, or PRO data. No minimum sample size was considered as an inclusion criterion in this SLR. The inclusion and exclusion criteria are listed in Table 1. The full search strategy and the keywords used in the search are listed in Table 1 of the ESM.

Table 1 Inclusion and exclusion criteria

It is worthwhile to mention in this context that at the design phase of this review, the feasibility of conducting separate reviews for guidance papers and analytical studies was explored. After running several search strings and identifying initial studies, we found that it was not clear how to distinguish between these two types of article. Indeed, most methodological guidance documents include an analysis using actual data and can therefore be considered as both a guidance and analytical study. Conversely, generic guideline documents around the topic of missing data seldom include detailed technical guidance. We therefore decided to conduct a single search to cover all of our studies of interest.

2.2 Selection and Extraction of Studies

Identified records were reviewed independently by two reviewers in accordance with the pre-defined search strategy. All disagreements were resolved with the help of a third reviewer to reach a consensus. The data extraction was performed with specific attention to the type of the study, the presence and type of missing data in costs and health outcomes, and the methods used to tackle missing data.

To make the extraction process homogeneous across volunteers, a pre-designed rubric to identify the type of study, data source, and methods explored to handle missing data was provided to each volunteer. A sample extraction was also provided by one of the reviewers to guide the volunteers in the extraction process.

3 Results

A total of 1433 records were identified. After the exclusion of books (n = 7) and duplicate records (n = 2), abstracts from 1424 records were screened for eligibility. Following abstract screening, 57 articles were assessed for eligibility based on full-text articles of which 40 studies met the eligibility criteria for the study (Fig. 1). Included studies are summarized in Table 2 of the ESM.

Fig. 1
figure 1

Preferred reporting items for systematic reviews and meta-analyses (PRISMA) diagram

3.1 Characteristics of Included Studies

There was considerable heterogeneity in the design of the studies, and some of these were based on a combination of designs (Table 2). Thirteen (32.5%) studies were pharmacoeconomic analyses [16,17,18,19,20,21,22,23,24,25,26,27,28], eight studies (20%) addressed guidelines [3, 29,30,31,32,33,34,35], seven (17.5%) studies used simulation [16, 18, 19, 32, 33, 36, 37], and seven (17.5%) studies were methodological [7, 37,38,39,40,41,42] in nature. Other types of studies included two systematic reviews and two survey research articles [43,44,45,46]. Two studies involving survey research were designed to gather health-related quality-of-life information from participants. One of the systematic reviews addressed handling missing data in within-trial cost effectiveness, while the other was an educational review of using cost data in a pharmacoeconomic analysis that addressed various methods of handling of missing cost data [43, 44].

Table 2 Characteristics of included studies (N = 40)

Twenty-one (52.5%) studies used RCTs [3, 16,17,18,19,20,21,22,23,24,25,26,27,28, 30,31,32,33, 39, 47, 48] and six (15%) studies [7, 36, 41, 46, 51, 52] used retrospective or prospective cohorts as the only source of data. Two review studies [43, 44], one study describing guidelines [35], and one methodological paper [38] did not use any data source (Table 2).

3.2 Methods Used to Handle Missing Data

Thirty (75%) studies used MI for addressing missing data [3, 7, 16, 17, 19,20,21,22,23, 25,26,27,28,29,30,31,32,33,34, 36, 37, 39,40,41, 43,44,45,46, 49, 50]. Among these thirty studies, seven studies used MI with predictive mean matching [16, 19, 23, 27, 28, 31, 32] and 17 studies used MICE [3, 16, 19, 21,22,23, 25,26,27,28, 31, 32, 37, 39, 43, 45, 49], one study used MI based on data augmentation [17], and one study implemented MI by using a fully conditional specification [36]. Fifteen (37.5%) studies used a CCA [3, 7, 16, 17, 19, 20, 30, 32, 34, 35, 40, 43, 44, 46, 51] while three studies used pattern-mixture models [3, 31, 47], three studies used last value carried forward [20, 35, 44] and two studies used an AC analysis [43, 46] (Table 2; Fig. 2).

Fig. 2
figure 2

Frequency of different methods and techniques used to handle missing data. CCA complete-case analysis, LVCF last value carried forward, MI multiple imputation, MICE multiple imputation by chained equation, PMM predictive mean matching

Several other methods were used that included item and aggregate imputation and inverse probability weighting [25, 33, 43]. Seventeen out of 21 studies that utilized the RCT as the only source of data used MI [3, 16, 17, 19,20,21,22,23, 25,26,27,28, 30,31,32,33, 39]. The remaining four studies used a range of methods including the semiparametric model with propensity score method, Bayesian approach with Markov Chain Monte Carlo, and mean value of available response [18, 24, 47, 48]. Six studies based on retrospective or prospective cohorts also used MI, CCA, or Bayesian robust two-stage causal modeling with an instrumental variable approach.

Twenty studies addressed missing data issues by incorporating multiple methods [3, 7, 16,17,18,19,20, 24, 25, 31,32,33,34, 36, 37, 40, 42, 46,47,48], while 11 studies used one single method [21,22,23, 26,27,28, 39, 41, 45, 49, 50]. Nine studies lacked any empirical illustration or were theoretical in nature [29, 30, 35, 38, 43, 44, 51,52,53].

Seventeen studies addressed dealing with missing cost data [3, 16,17,18,19,20,21,22,23, 25,26,27,28, 30, 31, 43, 47], 23 studies dealt with missing health outcomes data [3, 7, 16, 19,20,21,22,23,24,25,26, 28,29,30,31,32,33, 35, 39, 40, 46, 47, 50], which included PROs, and only nine studies addressed dealing with missing covariates [26, 28, 34, 36, 37, 45, 46, 49, 50]. Among those 17 studies dealing with missing cost data, 15 used MI [3, 16, 17, 19,20,21,22,23, 25,26,27,28, 30, 31, 43], 11 used MICE [3, 16, 19, 21, 22, 25,26,27,28, 31, 43], seven used CCA [3, 16, 17, 19, 20, 30, 43], four used predictive mean matching [16, 19, 27, 31], one study used last value carried forward [20], and one study used a pattern-mixture model [31].

Among the 23 studies dealing with missing health outcome data, 19 used MI [3, 7, 16, 19,20,21,22,23, 25, 26, 28,29,30,31,32,33, 39, 40, 50], 11 studies used MICE [3, 16, 19, 21, 22, 25, 26, 28, 31, 32, 39], ten studies used CCA [3, 7, 16, 19, 20, 30, 32, 35, 40, 46], six studies used predictive mean matching [16, 19, 23, 28, 31, 32], one study used last value carried forward [20], and one study mentioned a pattern-mixture model [31]. All of the nine studies dealing with missing data of covariates used MI [26, 28, 34, 36, 37, 45, 46, 49, 50], seven studies used MICE [26, 28, 37, 45, 46, 49, 50], and one study used a CCA [46].

Eleven studies [21,22,23, 26,27,28, 39, 41, 45, 49, 50] used only a single method without considering a sensitivity analysis to evaluate any potential impact of the selected method on the result of the analysis. We also observed that only three studies have attempted to use non-imputation based methods, such as propensity score matching, inverse probability weighting, or an instrumental variable [33, 41, 48].

3.3 Findings from Three Guidance Documents

Table 3 presents a list of recommendations from three guidance papers [3, 5, 32], which address missing data in a context within the scope of this review. They emphasized efforts to be undertaken to minimize missing data in the design and data collection phase of the study. For instance, in surveys, questionnaires should be designed in a user-friendly manner with appropriate modes of administration and follow-up necessary to improve participants’ engagement and subsequently minimize missing data [5]. Researchers should provide a thorough description of missing data magnitude and patterns and identify the associations between outcomes, baseline variables, and missingness [3]. Though a CCA may be an appropriate choice in some cases, it has major limitations in the context of studies using repeated measures. Use of a CCA approach can also cause a loss of statistical power [5]. If MI is used, it is recommended that the imputation model includes all available variables, even those that are not included in the analysis model. There is also a need to validate the method chosen to handle missing data by comparing the results with a different method under the same assumption on the missing data mechanism [3]. Another guidance paper addressing missing PRO data in RCTs [32] suggested that if the sample size is small and there is a large amount of missing data, imputation at the composite score level might be more useful. If there are many item nonresponses, imputation at the item or subscale level becomes more beneficial. However, imputation at the item level might give rise to a non-convergence issue. This paper suggested the use of treatment as an explanatory variable in a MI model to resolve the non-convergence issue after considering the distributional assumptions of the outcomes in the study [32]. If ordinal logit models were used for imputation purposes, one should be cautious about the low count in the levels of the items to be imputed [32]. Each statistical model to be used in the imputation should be pretested before implementation to ensure that appropriate estimates can be produced.

Table 3 Practical guidelines on handling missing data

These three guidance papers highlighted the importance of conducting a sensitivity analysis to identify a possible departure from the MAR assumption by using alternate approaches, for example, a selection model or pattern-mixture model [3, 5, 32]. However, the choice between two approaches depends on the importance of expressing the difference between observed and unobserved data in a specific research context [3]. In the context of a sensitivity analysis, a selection model considers alternative missing data mechanisms (e.g., MNAR), while a pattern-mixture model recognizes the differences in the distribution of the observed and unobserved data [3].

4 Discussions and Practical Considerations

The studies included in this review were very heterogeneous in terms of design. We found that most studies used MI. Although many studies included multiple methods for missing data, several studies did not. The results of the review emphasized the importance of sensitivity analyses. We also found that MI is preferred as a technique when the missing data mechanism was considered as MAR in economic evaluations based on registry studies [29]. In a previous review with recommendations, it was further emphasized that the information collection process, assumptions, and limitations of the choice of methods should be reported and a sensitivity analysis should be conducted in cost-effectiveness analysis studies [30].

In the context of using an imputation model for handling missing data in PRO studies, recommendations included considering the prevalence of missing data in the study, the sample size, the number of items, and the number of levels within individual items in the measurement instrument. The recommendation also suggested inclusion of the above factors in the analysis plan [32].

All data analyses rely on assumptions and their assessment, including those underlying the approach used to handle (or not handle) missing data, is important when making methodological decisions. This requires a thorough assessment of the missingness mechanism when choosing a method to address missing data. Although we did find that MI was one of the most common approaches used for handling missing data, likely because of its flexibility, there was variability in the approach used to impute data. In some cases, other approaches relying on the MAR assumption, such as propensity score methods (either matching or inverse probability weighting) were used in included studies and this approach may be more or less helpful depending on the context of the data [43, 48]. If incomplete cases provide little information, then inverse probability weighting may be a preferred approach over MI, as inverse probability weighting models the probability that an individual is a complete case, whereas the MI models the distribution of missing values given observed data [43]. Researchers should be cognizant that the MI technique does not recover the missing values but replaces them with plausible values of the specific variables in order to subsequently analyze complete datasets, which allow the incorporation of the uncertainty in the imputed values of the variable. The literature review also suggests that using as many predictors as possible is important for making the MAR assumption more plausible [9]. This has implications for study planning and underscores the importance of having sufficient auxiliary variables: these are covariates that are used in the imputation models but commonly not of particular interest in the final analytical model; inclusion of these variables can improve the missing data handling procedure by better resembling the MAR assumption or by increasing power by recapturing some of the missing information [11]. Furthermore, our findings suggest that evaluation of the plausibility of underlying analytic assumptions is important. This implies that researchers should include sensitivity analyses that assess the robustness of the results to violation of MAR. Researchers should also pay attention to the skewness of cost distribution and the boundness of utility measures when regression models are used for the purpose of imputation [3].

We did find that a substantial minority of 15 studies used the CCA as one of the methods to deal with missing data. The previous literature review of handling missing data in the context of an RCT found that the most common method of handling was the CCA, followed by simple imputation [4]. In contrast to this, this SLR found that among real-world evidence studies and within-trial cost-utility analyses that used specific methods in conjunction with a CCA or alone to handle missing data, MI was the most common approach. It is important to note that though a CCA is a straightforward method to implement, it is generally only valid (i.e., lead to unbiased estimates) under the MCAR assumption, which is a very strong assumption that is unlikely to be met in many HEOR applications. Recently, there is some suggestion that a CCA may be preferable to MI when exposure and confounding variables in a main analysis are MNAR because many applications of MI assume data are MAR [54]. However, even in the circumstances where CCA would lead to unbiased results, it is an inefficient approach as it does not include all the information available from the incomplete records [54].

Researchers should report the presence and patterns of missing data in outcomes, confounders, and auxiliary variables. For instance, the fraction of missing information measured also should be considered as it can give precious information on choosing auxiliary variables to be used in imputation models to handle missing data [55].

Consistent with the guideline documents reviewed, we also recommend that researchers should assess the proportion and patterns of missingness of all variables and use alternative missing data handling methods in a sensitivity analysis [6]. Note that there is no general consensus on a specific threshold for the proportion of missingness above which a CCA is not warranted, it can vary from 2% to 10%. One of the limitations of this SLR is it was restricted to studies published in the English language only. Another limitation is that because the primary objective of this study was to identify specific analytical methods beyond the default approach of the CCA to handle missing data, this SLR excluded those studies that used the CCA as the only method to handle missing data, as those studies did not make an explicit acknowledgment of the presence of missing data. Finally, we acknowledge that the SLR included articles published until December 2020; this is because the main findings of the review served as the foundation for a series of discussions, which informed the development of the guidance presented in this paper.

5 Conclusions

There was a large amount of HEOR literature on handling missing data in a RCT context; however, there were very few studies that have attempted to address missing data. This is the first SLR that attempts to bridge this gap in the HEOR literature by identifying all of the different specific analytical methods that are used to handle missing data in HEOR studies based on different data sources. Based on the literature review, this study further delineates recommendations for researchers to handle missing data in the context of observational or real-world evidence studies.

All of our recommendations are based on current practices that were reported in the HEOR literature, as found within the scope of this review, and serve as an important stepping stone to advance the field on the topic. We found that most studies used MI as an analytic approach relying on the MAR assumption. Based on our review of what is currently reported in the literature including existing recommendations, we suggest that HEOR researchers consider the missingness mechanism and include sensitivity analyses when devising analysis plans.