FormalPara Key Points for Decision Makers

Larger incentive rewards can reduce program participation from certain subgroups such as African Americans.

Financial incentive design is a product with multiple interdependent components; therefore, the best fit design should not focus only on reward amount.

Offering a menu of financial incentive conditions for participants to choose from will increase program participation more than offering larger reward amounts.

1 Background

The obesity epidemic and its related negative impact on public health, both on quality of life and the national economy, have given rise to an increase in workplace, healthcare, and community efforts to move evidence-based interventions (EBIs) into typical practice [1, 2]. However, there is limited evidence that available EBIs are effectively engaging subgroups of the population that experience disparities in obesity, and obesity-related outcomes, for the duration necessary to achieve a clinically meaningful weight loss (i.e., ≥ 5% of initial weight) [3]. Some subgroups, such as African Americans, Latino, and lower socioeconomic status groups [4], have been shown to be more likely to experience obesity but less likely to engage in weight control EBIs [3, 5]. Understanding how best to reach populations that experience obesity-related health disparities in real-world settings is a key area of need in the field of dissemination and implementation science. Furthermore, it is a vital step towards achieving the full public health potential of weight-loss EBIs and eliminating obesity-related health disparities [3].

There is emerging evidence, and a strong theoretical rationale, supporting the use of financial incentives as an implementation strategy to improve the reach of EBIs focused on weight loss [6,7,8,9,10,11,12,13,14,15]. The use of incentives is also endorsed by the WHO [16] as an important component of behavioral weight control programs and was included in the US Patient Protection and Affordable Health Care Act (ACA). Starting in 2014, Section 2705 of the ACA allowed employers to use up to 30% of the total amount of health insurance premiums to provide outcome-based wellness incentives [17]. Nevertheless, it appears as though the benefit of using financial incentives to engage participants in wellness and weight loss programs may not achieve the desired outcomes relative to program reach [18,19,20]. For example, participation rates in weight control programs do not align with rates that one would expect based on economic theory—that is, applying a financial incentive should engage a larger proportion of the intended audience if well calibrated [19]. The RAND Health report showed that the uptake of worksite wellness programs is limited with eligible employee participation rates ranging from 21% (for fitness programs) to 7% (for smoking cessation programs). The weight loss program participation rate, at 10%, was the second lowest rate identified in the report [21].

One of the largest challenges to understand the use of financial incentives to improve the reach of weight loss programs is the limited variety of incentive designs used within and across studies. It is likely that heterogeneity exists in preferences towards different financial incentive attributes such as magnitude, form, frequency, and contingency [20]. Consider an adult man and woman who are obese and from different racial groups: when presented with an opportunity to participate in a weight loss program, each considers the competing demands, interests, and values relative to budget constraints. These constraints are likely different for the two individuals due to different experiences and motivations that are related to their respective backgrounds and characteristics (e.g., health literacy levels and information exposure differences). Changes to a healthier diet could be perceived as more costly to one person, but not the other. Changes could also be perceived differently for these individuals. One may see participation as negatively impacting satisfaction with a new status quo due to a shift from more preferred, less healthful foods to less preferred, more healthful foods—which is a disincentive to participation. The addition of a financial incentive to the equation may offset the time and psychological costs faced (e.g., limiting food she enjoys, initiating new physical activities) by each. However, too high of an incentive may lead to health behaviors that are only sustained as long as the incentive is provided [22, 23]. Modest incentives are intended to give temporary extrinsic motivation that is intended to give way to intrinsic motivation as a participant begins to realize the benefits of weight loss. The amount, type, timing, and location of incentives needed will vary across these individuals. Just considering the amount of incentives, one of the hypothetical people in our example may be more satisfied with the status quo (i.e., not initially motivated to lose weight) and would need a higher amount. The other may be dissatisfied with the status quo and therefore more motivated to change. This person may not need any incentives, but the application of an incentive may have the potential to disrupt initial intrinsic motivation.

While there is strong justification to examine the potential heterogeneity of preferences for incentive models to improve the reach of evidence-based weight loss programs, most of the research in this area has examined different incentive amounts on program effectiveness and maintenance with only a limited number of studies examining incentives as an implementation strategy to improve reach (i.e., program enrollment/participation rate). Furthermore, the current literature found mixed evidence on how financial incentives may effectively improve program reach [24,25,26,27]. The mixed evidence may be due to the fact that studies did not thoroughly consider incentive constructs and led to the objective of this study—to inform better incentive designs and implementation through understanding the preferences of individuals towards different financial incentive attributes.

2 Experimental Method and Implementation

The development of tailored incentive designs for subgroups of the population relies on the understanding of target population preferences towards different incentive attributes. We used a discrete choice experiment (DCE) to understand public preferences towards different financial incentive attributes. The DCE has been used widely in health economics and the studies that examine preferences towards weight loss program attributes are especially relevant to this study [28,29,30]. The application of choice experiment allows us to examine a wider range of incentive attributes (i.e., more and broader ranges of levels within each attribute), overcoming the limitations of actual field experiments (i.e., limited scope, limited representativeness, and limited incentive designs). It further allows larger sample sizes for procuring sufficient statistical power and oversampling of those at-risk groups that were often difficult to recruit in weight loss programs and randomized control trials.

2.1 Survey Instrument Development

To serve the purposes of customization and targeted design, care was taken to (1) identify and describe the incentive attributes that are plausible and relevant to those at-risk subgroups; and (2) adjust survey questionnaires’ readability to accommodate low literacy levels in order to ensure survey accessibility and data quality (i.e., minimize data noises caused by comprehension of the survey itself). At-risk subgroups were identified a priori and were based on the literature documenting disparities in obesity-related health outcomes and program participation for these groups (e.g., gender, race, and socio-economic status) [4]. The research team involved stakeholders in the survey development through a 2-h listening session with 22 program assistants and nutrition outreach instructors from the Expanded Food and Nutrition Education Program (EFNEP). Those educators worked with low-income populations who experience a high rate of overweight and obesity. Based on the information gained from the listening session, the initial questionnaire was drafted and subsequently a total of four focus groups were held, approximately 1 month apart, to test, refine, and pilot the survey instruments. Focus group participants were recruited from EFNEP and Carilion Clinic (a healthcare organization in southwest Virginia partnering with the research team), were overweight or obese, and the majority of them were low income (i.e., Medicaid eligible).Footnote 1 The list of financial incentive attributes and their levels were formed and finalized during the process. This use of qualitative evidence in survey design follows recommended practice [31]. The initial formation of the list was also aided by an extensive literature review and a pilot study of the project. The final list of incentive attributes and their levels used in our DCE are presented in Table 1.Footnote 2 The Appendix Figs. 5 and 6 show the survey description of the weight control program and incentive schemes.

Table 1 Attributes and attribute levels used in constructing the financial incentive

A total of 96 choice sets (each containing two incentive designs comprising unique combinations of incentive attributes and an opt-out option) were chosen based on the final list of attributes and associated levels using D-efficiency criterion and the design was done using software package Ngene. The D-efficiency design has been shown to provide more statistically efficient parameter estimates, on average, than other designs that are less D-efficient [32]. Each survey includes four choice sets. The choice sets generally can range from 1 to 18 and the decision should be informed by contextual knowledge and the trade-off between respondents’ burden and correlation within and between subjects [33]. We chose to ask each participant to respond to four choice sets based on what we learned in the focus group interviews related to participant burden considering our tasks’ numeric nature. The Appendix Fig. 7 presents an example of the choice question (see ESM).

2.2 Survey Implementation

Survey participants were identified through the electronic patient database of our research team partner, Carilion Clinic.Footnote 3 A total of 7554 adults (aged 18 years or older) with the most recent clinic record of non-missing BMI (used to identify and oversample those with BMI ≥ 25) and non-missing mailing information were identified. Those patients were mailed introductory letters, signed by the Carilion Clinic primary care branch director, describing the study goals, mail survey expected length of completion time (15–25 min), and informing about the upcoming phone recruitment and the standard confidentiality statement. The phone recruitment and mail survey implementation were all conducted by Virginia Tech Survey Research Center following standard recruitment rules in 2012. Among those 7554 identified individuals, about 6657 (88%) of them had valid phone numbers on record or were reachable by the recruiters over the phone. The phone recruitment resulted in a total of 2737 individuals (41% response rate) who consented to participation in the mail survey. Those individuals later received the survey instruments via mail along with US$2 as a thank you for participation. Two to three weeks after the first round of survey mail-out, those who had not responded were mailed the same survey a second time. A total of 1232 participants completed the survey. During the recruitment process, specific emphasis was placed on oversampling low-income (identified by the Medicaid eligibility information in the patient electronic health records), African-American, and male participants.

2.3 Empirical Analysis

To inform customization of financial incentives, there are several layers of information to be acquired. First, we established the statistical presence of preference heterogeneity towards incentive attributes among individuals (i.e., differences among individuals in their likes and dislikes) to provide support for the need for customization. We used a random parameter logit model assuming that preference parameters follow certain probability distributions (such as normal distribution or log-normal distribution) in order to allow them to differ across individuals. Second, we used a conditional logit model with interaction terms between attributes and demographic variables to investigate subgroup preference differences in the attribute ranking. The choice of demographic variables was based on the a priori identification of at-risk subgroups using evidence from the literature that includes gender, race, income, employment, education, and overweight/obesity status. All chosen demographic variables were included in the models simultaneously. Attributes that contribute more explanatory power to the model (and are thus in some sense more ‘important’) would result in larger differences between the full and partial log-likelihood values. Third, rankings of levels within each attribute provide further insights into the design of subgroup-tailored incentive programs. We utilized the conditional log models estimated in the second step to conduct a counterfactual analysis. We systematically varied levels within a given attribute one at a time in a program condition (A) while holding other attributes at the same levels as the alternative program condition (B) and predicted the probabilities of each program being chosen relative to the opt-out choice. We used the differences of the predicted probabilities between A and B relative to the opt-out choice as the measure of the ‘marginal effect’ of a given attribute level on the program participation probability.

This last step involves probability prediction for individuals with specific demographic and health profiles such as an employed, Caucasian female with high education level, high income, and overweight. Those predicted probabilities are within-sample individual profile-specific predictions, which means that we need to have population-level compositions of those different at-risk subgroups to provide meaningful population-level reach information. Therefore, after predicting participation probabilities within our sample for each individual, we constructed sample weights to aggregate those individual probabilities to weighted average profile-level (e.g., national overall or subgroup levels) participation probability predictions to help inform incentive design selections. This approach follows Hensher et al. [33] and creates a synthetic population based on the adult US population, and predicts market shares (i.e., incentivized weight control program participation rates) in that synthetic population. To accomplish this goal, we used 2009–2010 National Health and Nutrition Examination Survey (NHANES) as a supplementary data set for the construction of the profile-specific weights. The NHANES is a nationally representative survey that contains demographic information and health outcomes of a national random sample of individuals. Each observation in the NHANES data set is assigned a sample weight that is designed as a measure of the number of people in the population represented by that sample person. We summed the individual sample weights provided in the NHANES data set across all NHANES individuals sharing the same individual profiles to generate profile-specific weights. To arrive at national level or more aggregated subgroup-level predictions, we predicted profile-specific participation probabilities and then used the profile weights to generate a weighted average of participation probability across profiles.

To facilitate a summary of the results, we used a ranking method to accommodate the nature of categorical attributes. For each individual profile, we ranked ordered levels within each attribute by each level’s ‘marginal effect’ on the participation probability (rank from 1 to 4 in order of importance or magnitudes of effect). Then, within each weighted aggregate subgroup of interest, we calculated the proportion of that subgroup ranking each level of each attribute the same way. To confirm the statistical presence of preference heterogeneity towards incentive attributes in the sample, we estimated a random parameter logit model assuming all coefficients following normal distributions. To capture a log-linear relationship among reward magnitude and choice preference documented in the literature [34], we log-transformed the nonzero reward magnitude variable.

To investigate how many incentive options could maximize population reach and to what extent, we examined the program participation probability. We predicted the program participation probabilities under choice sets comparing single incentive options with multiple options. To maintain manageable option sets for our investigation, we fixed the only continuous attribute, reward amount, while we conducted the probability imputation. Then we changed the reward amount and fixed it while expanding the choice sets again. This way, we were able to see the distributions of participation probabilities change as the menu size changed under a given total amount of rewards.

3 Results

3.1 Summary Statistics and Attribute Preference Heterogeneity

Summary statistics of key variables of our sample compared with Virginia’s census information is presented in Table 2. The study sample contains a higher percentage of African-American, men, low-income, and unemployed individuals. These differences confirm the success of our strategy of oversampling those rarely studied groups (i.e., males, low socioeconomic status individuals). We also confirmed the existence of preference heterogeneity (Table 3). In summary, 60% of the coefficients (9 out of 15) have statistically significant standard deviations (SDs) signaling extensive preference heterogeneity within the sample. Furthermore, all incentive attributes have at least one level that demonstrated statistically significant SDs signaling the existence of preference heterogeneity in all attributes.

Table 2 Comparison of study sample characteristics with Virginia state demographic characteristics
Table 3 Random parameter logit model estimation results

3.2 Rankings of Attributes

This overall sample model (i.e., without interaction with demographic variables) shows that on average people mostly prefer a larger reward magnitude, to be paid weekly in the form of a debit card with contingency of attending weigh-ins and the program held in a community center. Next, we examined specific subgroup differences of their preferences towards each attribute. Figure 1 presents the attribute preference ranking results for the full sample and by subgroups that were classified by profile variables such as race, gender, income, and weight status. The rank ordering of attribute importance confirms the subgroup preference heterogeneity. The top two most important attributes in people’s program participation decision-making are dollar amount and program location. However, the relative ranking differs across subgroups. The obese subgroup and female subgroup exhibit the same preferences as the full sample. However, program location is deemed more important than dollar amount for subgroups of African Americans, low-income females, and males. Among those subgroups, the African-American subgroup exhibits the largest preference towards program location as compared with reward amount. This result is consistent with our experiences during focus groups, where African-American participants were particularly concerned with program location when discussing their choices (due to a number of reasons, such as convenience/accessibility, travel time, and privacy/stigma). Payment frequency is the least important attribute for most subgroups except for low-income females and females (whose least preferred attribute is reward condition).

Fig. 1
figure 1

Attributes’ relative importance (i.e., preference ranking) by populations of interest

3.3 Rankings of Levels Within Attributes

As shown in Table 3, the reward magnitude is positively correlated with individuals’ willingness to participate in an incentivized weight control program holding everything else constant (i.e., mean effect of 0.36 with p < 0.01). However, the effect’s SD is sizable and statistically significant (SD 0.45; p < 0.01). This means that there exist subgroups of individuals who dislike larger incentive amounts. We followed the profile participation probability generation process described above to predict participation probability with just the total reward amount varying from $0–$300 for the national aggregated and several subgroups of interest (i.e., profiles of obese individuals, African Americans, low-income females, males, and females). We present the prediction results in Fig. 2. Each panel shows the predicted participation probability distribution for each $ amount for a specific aggregated population. The solid line shows the average participation probability. It is confirmed that the average participation responses to increasing reward amount are positive. Meanwhile, it is also clear that even the average responses show a sharp increase initially with a very small incentive amount. The dollar effects quickly reach their plateaus (i.e., beyond a small dollar amount threshold, the participation probabilities are no longer increasing at a high rate). For national level, it shows that the program will reach 50% participation with approximately $40 total award for 6 months; however, $100 total award (more than doubled the costs) is only projected to increase the participation rate by about 6%. This response trend is consistent across all populations examined.

Fig. 2
figure 2

The impacts of total reward amounts on program participation probability by populations of interest

We present the ranking results in Fig. 3. Each row of the figure corresponds to one incentive attribute. The first row is program location, the second row is payment form, the third row is reward contingency, and the fourth row is payment frequency. The four bars within the small panel correspond to the four populations of interest. A darker color patch signals a higher ranking (i.e., larger importance).

Fig. 3
figure 3

Rankings by the impacts of attribute levels on program participation probability: program location, payment form, reward contingency, and payment frequency (by populations of interest)

Using workplace as an example, the figure shows almost 50% of the national population would rank workplace as the most important program location level, while it was only ranked first for < 25% of the obese and African-American subgroups. For obese subgroups, clinic was the most preferred location choice. While for African-American and low-income women subgroups, community centers were the most preferred location. Program location is the one attribute that exhibits quite heterogeneous preferences across populations of interest.

Within the payment form attribute, the preference towards debit card was consistent across subpopulations except for African Americans. Almost 50% of African Americans responded to gym pass payment form the most, while about 35% responded to the debit card payment form the most. The least preferred payment form was medical debit card.

Similarly, most subgroups preferred rewards to be contingent on attending weigh-ins only (i.e., participants will earn rewards as long as they show up to weigh-in). However, < 20% of African Americans preferred an attendance-only contingency. Instead, the African-American subgroup highly preferred incentives for weight loss (> 60% prefer weight contingency). The least preferred contingency condition is compliance. In terms of preference for payment frequency, it is quite consistent across all groups that weekly payment is the most desirable level with the obese subgroup showing the strongest preference towards this frequency level. The only notable difference is that the obese subgroup strongly disliked quarterly payments while the other subgroups rank quarterly payments the second-most preferred option. It is clear that payment at the end of the program is universally disliked.

3.4 Incentive Options

As shown in Table 3, all levels within program location and payment form show significant mean effects and most SDs are statistically significant.Footnote 4 The other two attributes have only one significant level and one significant SD within each. Reward contingency obtained a relatively larger statistical significance than payment frequency in terms of p-values. However, it is of policy importance to know beyond the individual attribute and level significance in program reach to answer the following questions: (a) whether or not offering more than one incentive program option will achieve higher reach and if so to what extent; and (b) how many options are needed to provide the maximum population reach.

To answer those questions, we first ranked all the attributes (except for reward amount) by their relative importance and the dominance of heterogeneity informed by our model shown in Table 3. The order is program location, payment form, reward contingency, and then payment frequency. After ordering the attributes, we ordered levels within each attribute as well. We then examined the menu of options for each reward amount level.

For a given total reward amount, we constructed five option sets (menus) to examine the probability changes. The first menu contains only one incentive option and is designed with the first ranked level for each attribute (i.e., location of clinic, payment form of debit card, reward earned based on weigh-in attendance, reward paid at weekly frequency). The next menu offers two incentive designs which varied only by location levels (i.e., clinic vs community center), while other attributes maintain the same levels as the first menu. The third menu offers four incentive designs allowing both program location and payment form to take on two levels each, and so on until the fifth menu, which offers 32 incentive programs that allow all four attributes to take on two levels each.

We chose two levels of reward amounts to show the findings (shown in Fig. 4a [total reward amount is set at $48] and Fig. 4b [total reward amount is set at $576]). Both figures show the distributions of national-level participation probabilities with dots and lines representing average response levels for the national level and for those subgroups of interest (distributions for all subgroups show similar results, therefore we do not show them in one figure to maintain clarity). Both figures reveal a similar trend—as more incentive options are offered, the mean participation rate increases while the probability distributions shrink in range and in degree of dispersion. This result confirms that providing more incentive options for any given amount of rewards will have three benefits in population reach: first, it will increase program reach on average; second, it will reach more subpopulations and reduce program participation disparities; and third, it is easier for the program to recruit those hard-to-reach populations.

Fig. 4
figure 4

a The impact of number of program options on the distribution of participation probability when total reward amount is set at $2/week for 24 weeks by populations of interest. b The impact of number of program options on the distribution of participation probability when total reward amount is set at $24/week for 24 weeks

To elaborate on the third benefit, we examined the low end of the distributions. Offering two programs instead of one (two program locations vs one design) while setting total reward amount at $48 would reach 54% of those hard-to-reach individuals instead of only 34% (a 59% increase in reach for those individuals whose willingness to participate is at the low end of the distribution) (Fig. 4a). The story remains the same when the total reward amount increases to $576: offering two programs reaches 46% of those individuals while one program offering only reaches 16% of them (a 53% increase). Both figures confirm that the largest gain at return on choice sets is at offering two programs rather than one program. This conveys an important policy implication: if the goal is to increase population reach, offering two options will be the most cost-effective way.

Comparing Fig. 4a, b, it confirms the finding that ‘paying more’ does not necessarily do a better job at program reach. Offering a much higher reward amount does not gain more in reaching those hard-to-reach subgroups (a 53% increase at the $576 level vs a 59% increase at the $48 level). Furthermore, offering a larger reward results in larger disparities in program reach (i.e., the distribution range is larger than that with the smaller reward amount). For example, African Americans show a much lower mean participation rate than other subgroups when the dollar amount is set at $576 as compared with $48, signaling the existence of a much lower participation rate at the end of the quantiles for this subgroup.

3.5 Results Validation: A Case Study

To further examine the validity of our incentive design effect prediction and follow the recommendations of the literature [35, 36], we utilized a unique first-hand data set from a cluster randomized, controlled, worksite weight loss study [37, 38]. This study followed a two-group cluster randomized design. Worksites were randomly assigned into a comparison condition that did not include financial incentives for weight loss (i.e., Living My Way [LMW] condition) or an intervention condition that included a modest financial incentive tied to weight loss but intended to improve program reach (i.e., INCENTA condition; for details see [37]). The program lasted for 12 months. A total of 28 worksites were recruited and randomized into one of the two conditions. The study collected brief health information including self-reported weight and height and socio-economic and demographic information prior to the program recruitment, which provided a unique data set for analyzing program reach.

The incentive design for the study can be summarized as the following: for the LMW group, there was $0 in reward amount; there were monthly weigh-ins, and the program location was at the workplace; for the INCENTA group, the maximum reward was set at $20/month; it was offered at the workplace; it was contingent on percentage of weight loss during weigh-ins, the reward was offered by check, and payment was quarterly. We used the baseline health survey data on all eligible employees (BMI ≥ 24.5) coupled with the closest attribute levels as the two conditions to predict the participation rate. The observed participation rates were 27% for control worksites and 36% for incentive worksites. Our model predicted that control worksites would have an approximately 25% participation rate (95% confidence interval [CI] 15–39) and the incentive worksites would have an approximately 41% participate rate (95% CI 30–51). The prediction provides initial evidence of the validity of our model.

4 Discussion and Policy Implications

There are sound scientific rationales for why incentives are effective and evidence from controlled clinical trials across a wide range of different applications supports their efficacy. Nevertheless, there are many important questions that remain to be answered regarding the use of incentives for health-related behavior changes, including optimal incentive values, appropriate intervention durations for different problems and populations, and cost effectiveness [6, 18, 19, 39,40,41]. This gap in the literature is mainly due to the rareness of market data on the supply and demand of different variations of incentivized weight loss programs.

Our study presents a rare effort to address the preference for financial incentive designs to understand how population preferences may influence the reach of an incentivized weight control program. We considered greater numbers of attributes/levels and oversampled populations that were at risk for overweight/obesity and were understudied in the existing literature. To overcome the market data sparsity, we conducted an attribute-based choice experiment that enabled us to elicit and quantify target consumers’ preference towards five types of attributes: reward amount, program location, payment form, reward condition, and payment frequency. As the results show, there are significant amounts of heterogeneous preferences across different populations of interest towards not only different attributes but also levels within each attribute. Even for the dollar amount, our common sense of ‘paying more will never hurt’ does not stand if the goal is to reach those hard-to-reach populations. Different incentive designs appeal to different types of individuals. Therefore, the natural conclusion is to offer a choice set of incentive options instead of a one-size-fits-all approach, with the goal of maximizing population reach and/or reaching those most needy subgroups.

Our random parameter logit model confirmed that all five attributes have a statistically significant impact on individuals’ decision making in terms of incentivized weight control program participation. Among these attributes, reward amount, program location, and payment form were the most important attributes for all target subpopulations: the obese subgroup, African Americans, low-income women, females, and males. The order of importance among them varies by subpopulation. Our results reveal that paying more does not help with the goal of reaching some subgroups. For example, program location is more important in decision making for African Americans, low-income females, and males compared with reward amount. African Americans specifically showed a negative response to larger reward amounts; paying them more may actually push some of them away from the program instead of attracting them in. Although decreasing willingness to participate with increasing reward amount seems counterintuitive, the phenomenon matches focus group observations and previous literature [42]. Some individuals in the focus groups reacted negatively to the presence of a financial reward, expressing the sentiment that individuals should lose weight for their health, not for financial gain. Indeed, financial incentives in behavioral health interventions have already raised a degree of ethical controversy [43,44,45,46,47,48,49,50,51,52]. Our findings suggest that low levels of reward amount are appropriate and more cost effective at closing the program reach disparity gap because they minimize the impact from negative responses and also gain most of the participation responses from positive responses.

Our results further show that preference heterogeneity exists even within each attribute. For example, among program location levels, workplace was the most preferred location for the national population while it was less preferred for some subgroups such as the obese subgroup (where a clinic was the most preferred location) and the African-American subgroup (a community center was most preferred). Similarly, debit card and gym pass were the two top ranked levels within payment form, and the order varied by subgroup. There was relatively more consistency in the preference for levels within reward contingency and payment frequency where contingency on weigh-in attendance and weekly payment schedule were the most preferred options. There still exist vast differences among one or two subgroups regarding those. For example, African Americans mostly prefer contingency on weight loss outcomes directly. There is universal dislike towards medical card and payment at the end of the program.

Furthermore, we used our models to conduct a systematic counterfactual analysis through predicting participation probability weighted by a NHANES national sample weight for meaningful external prediction results. This exercise revealed several policy-relevant insights: offering more than one incentive option will not only gain the program’s population reach but will also increase the participation rate among those hard-to-reach subgroups; offering a menu of two programs instead of one will gain the most in participation boost; offering a larger reward amount will work against the goal of reaching those hard-to-reach subgroups.

Several limitations should be kept in mind when interpreting our findings. First, our sample is recruited from one local electronic medical record, which limits the generalizability of our findings. The synthetic population construction using NHANES sample weights is helpful to mitigate this sampling limitation but cannot address it fully since it only controls for observable constructs. If the local unobserved differences (e.g., norms) drive preference heterogeneity, our findings cannot be generalized to other contexts. Second, the hypothetical nature of the discrete choice experiment is subject to hypothetical bias. However, we followed the best practices of the discrete choice experiment literature to mitigate this bias through using ‘the cheap talk’ [53]. ‘The cheap talk’ is a text script presented to the respondents before they see the choice questions, which highlights the importance of the real-world impact of their answers and alerts them to possible bias. Furthermore, our results focus on examining relative preference differences between subgroups; therefore, as long as the bias is homogeneous across subgroups the relative differences findings would cancel out or minimize the bias.

Some future work is needed to further investigate causality behind the preference towards incentive designs we found; for example, investigating causes for the negative response to reward amount to design recruitment protocols to ameliorate those unintended spillovers. Previous cases of repugnance have been successfully treated with in-kind rather than monetary transactions [54], but given the overwhelming preference for payment with debit card we saw in our study, a less desirable payment form may reduce participation probability more than it increases by not being monetary. Future work could also investigate program location more closely, since it is the second-most important attribute and is also the one that displays a great deal of heterogeneity. Program location itself could be decomposed into constituent attributes, which may provide a more coherent picture of location preferences. Understanding why some individuals prefer some locations over others can suggest locations not previously studied, new locations that could be built, or predict which existing location would be most suitable given a target population and geographic area.

Despite considering more incentive components than previous studies, there are still more ways in which financial incentives can vary. Evidence suggests that deposit contracts lower participation rates [24], and future work could ascertain this result as well as check for individual heterogeneity. Escalating payments, in which later payments are larger than earlier payments, may increase effectiveness [55] or self-weighing [56], but their effect on participation is unknown. Adding uncertainty to the incentive design may also increase effectiveness [26, 57], but again, the effect on participation is unknown. Previous studies suggest that group-based incentives may be more powerful than individual incentives [19, 58], but the effect on participation and the role of individual heterogeneity is unknown. Exploiting group dynamics may be an inexpensive way to increase reach and effectiveness.

5 Conclusion

Effective weight control programs will achieve high population health impact if they can reach those needy subpopulations. Incentivized weight control programs can increase program participation but the incentive schemes should be tailored to subgroups that exhibits different incentive preference to achieve its population health impact. The study detangles the preference for incentive attributes with the goal of improving implementation effectiveness. We confirmed the existence of preference heterogeneity and the importance of targeted incentive design in maximizing population reach and closing health disparities. Our findings also show that offering a menu of incentive designs (more than one design option) for participants to choose from would increase program participation rate more than offering higher reward amounts. This indicates that offering more choices will be more cost-effective at increasing program reach.