Background

Maintaining a healthy diet poses a considerable challenge for many people. There are myriad complex social, ecological, and psychological barriers to maintaining a healthy diet, including the higher cost of healthy foods, inadequate or incorrect nutritional knowledge, the enjoyment derived from less healthy foods, and a natural bias towards the present and short-term payoffs (e.g., the convenience of fast food) over potential long-term health benefits [1,2,3,4]. Past studies have demonstrated that financial savings can promote healthier food choices [5, 6]. Randomized interventions have demonstrated that participants who receive incentives, discounts or vouchers for healthy food items purchase greater quantities of fruits and vegetables [7, 8]. Prior work also supports the ability of tailored feedback to increase healthy eating behaviors; compared to generic information and messages, tailored nutrition-related messaging has a greater beneficial impact on individuals’ dietary behaviors [9, 10]. Currently lacking is information on how financial incentives and tailored messaging strategies can be combined to best promote healthier food choices during grocery shopping trips.

Discovery is a South Africa-based health insurance provider serving over 2.6 million members. Available to all Discovery members is the Vitality wellness program, a voluntary, low cost (329 South African Rand [R][≈ £18] per family/year) incentivized health promotion program. One of the largest programs is the HealthyFood benefit (HF), a three-tiered incentive program offering monthly cash back payments (up to max of R1000 [≈£55]) to incentivize the purchase of healthier food items at partner grocery chains. To allow Vitality to determine cash back payments, HF members have a membership card or linked credit card that when swiped during checkout at the partner grocers transmits purchase details to Vitality. Upon HF activation, members are eligible for 10% cash back on their healthy food expenditures at participating grocery stores (e.g., R10 cash back received if R100 spent on healthy foods during the month). “Healthy foods” include most fresh and frozen fruits and vegetables, low-fat dairy, whole grains, legumes, seeds, nuts, and selected oils (full catalog available at https://bit.ly/2MDHtXJ). This cash back percentage can increase to 15% upon the completion of an online health questionnaire and from 15 to 25% with completion of an in-person wellness check (i.e., blood pressure measurement, diabetes screening). As it stands, HF members have no financial disincentive for purchasing unhealthy foods and only receive feedback on their purchasing behaviors via a small notation on shopping receipts and through monthly cash back deposit notifications. Past work (analyses of member surveys and grocery scanner data) has demonstrated that HF enrollment and HF cash back amount are associated with healthier food purchasing [11, 12]. These results sparked interest in further exploring and rigorously testing whether changes to the HF benefit design could make the program more salient or motivating to members who may be less engaged with the current program and increase the healthiness of their food purchasing.

The purpose of this randomized controlled trial (RCT) was to test the effectiveness of differing financial incentive structures and text messaging feedback strategies in increasing healthy food purchases amongst HF members. We hypothesized that two strategies, increasing the salience of the financial incentive earned for healthy food purchases by increasing its size and highlighting the financial losses incurred from a new disincentive for unhealthy food purchases, would most effectively shift participants’ food purchasing.

Methods

Study design

The protocol was approved by the University of Witwatersrand Ethics Committee (Johannesburg, South Africa) and designated as exempt by the University of Pennsylvania Institutional Review Board. Potentially eligible members were identified by the Vitality team usng their member databases. While no formal consent process was required given existing language in the Vitality membership agreement, all individuals were given an opportunity to opt out of participation prior to randomization. The study biostatistician generated a randomization list that was sent to the Vitality team, who then linked it with the member database by study ID. Using a simple randomization scheme, members were assigned to one of the six study arms with equal probability. Following randomization, all members who did not opt out were sent an email describing and providing examples of any changes they would experience with their HF program during the intervention period. All operational aspects of the study (e.g., sending text messages and emails, administering monthly cash back payments) were performed by the Vitality team. The analytic team had no contact with participating HF members and was blinded to study arm assignment. Since this study addressed differing financial incentives and text messages by study arm, participant blinding after randomization was not feasible.

Study population

Eligible individuals were adult Vitality members who activated the HF benefit in 2014 but had remained at the baseline 10% cash back level through the beginning of October 2015 (i.e., likely less engaged as they had not completed the tasks to move to a higher cash back amount). At the time of HF program enrollment, individuals chose one of two available grocery store chain partners as their preferred grocer. While members monthly cash back is determined based on healthy food spending at both partner grocers (i.e., not just the member’s preferred grocer), to accommodate our rapid feedback message design, we focused on individuals who had chosen a specific large grocery store chain as their preferred grocery store, as that grocer had the shortest lag between purchases and data transfer to Vitality. To ensure that the intervention targeted members who regularly shopped at this selected grocer (vs. made occasional small purchases there), we screened participants for a minimum of R1000 (≈£55) spent on groceries in the month prior to randomization, with at least 90% of this spending at this selected grocer. Finally, given the nature of the enrollment process and the tested interventions, we excluded those without an available email address and mobile phone number.

Study outcomes

In the HF program, all food items are categorized as healthy, neutral, or unhealthy. Healthy foods are marked with a Vitality sticker (on shelves) and noted on shopping receipts. The primary study outcome was the average monthly percent healthy food spending at the selected grocer during the Full Intervention period. Monthly percent healthy food spending was determined by dividing the amount of money spent on healthy foods at the selected grocer during a month by the total amount spent on all food items at this grocer during that month. Secondary outcomes also examined shopping behavior at the selected grocer and included the average monthly percent unhealthy food spending (unhealthy food expenditures/total food expenditures), the average monthly percent healthy items (healthy food items/total food items), and the average monthly percent unhealthy food items (unhealthy food items/total food items).

Member involvement

Vitality members were not involved in the research design or in the selection of outcome measures.

Study intervention

After removing members who opted out, the remaining individuals were randomized to one of the six intervention arms that differed in the combination of financial incentive structures, weekly text messages, and monthly text messages (Table 1): Arm 1 (Usual Care): 10% cash back, no weekly text, standard monthly text; Arm 2: 10% cash back, generic weekly text, standard monthly text; Arm 3: 10% cash back, personalized weekly text, standard monthly text; Arm 4: 25% cash back, personalized weekly text, standard monthly text; Arm 5: 10 + 15%NET cash back, personalized weekly text, standard monthly text; and, Arm 6: 10 + 15%NET cash back, personalized weekly text, unbundled monthly text.

Table 1 Study arms and comparisons of interest

The tested financial incentive structures were: 1) remaining at 10% cash back, 2) an increase to 25% cash back, or 3) 10 + 15%NET cash back. In the 10 + 15%NET participants could earn additional cash back above the baseline 10% level, but would also experience a financial penalty for unhealthy food purchases. This additional cash back was 15% of the difference between participants’ monthly healthy and unhealthy purchases. The rationale for this 10 + 15%NET structure was twofold. First, participants would only receive additional cash back (above the 10%) if they spent more on healthy foods than on unhealthy foods and would maximize their monthly cash back by purchasing no unhealthy foods. Second, no participant would earn less than the baseline 10% cash back amount (Vitality requirement to ensure there was no penalty to members for participating in the study).

The tested weekly text messaging strategies were: 1) no message (usual care for HF members), 2) a generic weekly text message (general information on the HF program and healthy eating habits, with different messages for each week of intervention), and 3) a personalized weekly text message (detailed individual feedback including the monetary and item breakdown of healthy vs. unhealthy purchases made at the selected grocer during the prior week) (Table 2). All weekly text messages were sent on Fridays. We were not able to confirm that these messages were opened and read.

Table 2 Examples of Weekly and Monthly Text Messages

In the HF program, all members receive a monthly text message informing them of their monthly cash back deposit. Given the introduction of the 10 + 15%NET incentive structure, which included a penalty for unhealthy purchases, we added a new monthly text messaging strategy to highlight financial losses incurred due to unhealthy food purchases (referred to as the unbundled monthly text message) (Table 2).

The weekly text messages began in November 2015 (referred to as start of the Partial Intervention). The Full Intervention (financial incentive changes, weekly text, and monthly text) was delayed due to unexpected Vitality operational issues and began in January 2016 and lasted through July 2016.

Statistical analysis

Participant characteristics and historical shopping patterns were summarized by study arm. We had six pre-specified pairwise comparisons of interest, each chosen to isolate different financial incentive or text messaging strategy comparisons (Table 1). For example, comparing Arm 1 vs. Arm 2 highlighted the difference between no weekly message and the generic weekly message, while comparing Arm 2 vs. Arm 3 isolated the potential difference between the generic vs. personalized weekly text messages. Given this number of planned comparisons, we applied the Holm-Bonferroni procedure to correct for multiple comparisons. This approach uses a step-wise approach to compare the k-th smallest p-value out of the six pre-specified comparisons to a threshold of 0.05/k for significance, starting with the smallest p-value and stopping the first time the threshold for significance is not met, at which point no further hypotheses are declared significant [13]. We powered the study using the initial p-value threshold for statistical significance of 0.05/6 = 0.008. With 450 per arm, we had 80% power to detect an absolute difference of 3.7% (e.g., 29.4% versus 33.1%) in the percent healthy spending between study arms, assuming the control mean (standard deviation) of 29.4% (15.7%).

We used the Wilcoxon rank-sum test to assess between-arm differences in the primary and secondary outcomes. Some participants did not shop at the selected grocer during each month of the Full Intervention period. To handle these missing monthly shopping data points most conservatively, we assumed a worst-case scenario. Missing percent monthly healthy spending or percent monthly healthy items were designated as 0%; the rationale was that a percent healthy spending or items of 0% reflected a complete lack of engagement with the HF program, while a non-0 % healthy spending reflected the degree of engagement. Inversely, when monthly shopping data was missing, the percent unhealthy spending or unhealthy items were designated as 100% unhealthy. As an alternative approach to handling these missing shopping data points, we applied the multiple imputation using chained equations (MICE) algorithm [14, 15]. The imputation approach estimated the joint distribution for the participants’ shopping pattern for the 12 months prior to the intervention period, monthly shopping during the intervention period itself, and the shopping pattern from the month following the intervention, as well as available participant demographic variables (age, gender, household size, length of HF membership, geographic region), baseline transaction count (number of shopping trips at selected grocer during month before intervention), and assigned study arm, in order to impute the missing shopping data. Differences between arms were averaged across 25 imputations and the variance for inference was calculated using Rubin’s rule [14].

Additional analyses

We conducted two additional pre-specified analyses. First, to examine longitudinal trends in the primary and secondary outcomes, we applied a linear mixed model to the monthly data. The model included fixed effects for study arm, a linear time trend and an interaction term between study arm and time trend, as well as a random effect for participant. We tested the differences in the study arm and time trend interaction terms to determine whether slopes differed between study arms. Second, we repeated the main analysis of the primary and secondary outcomes using multivariable regression models to control for baseline participant demographics and their baseline shopping behavior (during the 12 months prior to the intervention). We also conducted an additional post-hoc (not pre-specified) analysis to examine higher-level themes across the study arms. We compared the primary and secondary outcomes between those getting the lower (10%) incentive amount (Arms 1–3) vs. those getting the higher (10 + 15%NET or 25%) incentive amount (Arms 4–6) and between those who received no/generic weekly message (Arms 1 and 2) vs. those who received a personalized weekly message (Arms 3–6). All analyses were conducted in Stata 12.1 (StataCorp).

Results

Figure 1 shows the Consolidated Standards of Reporting Trials (CONSORT) diagram for the study. After removing the 20 individuals who opted out of participating, the remaining 7314 people were randomized to one of the six study arms. Due to a technical error in the identification process, this randomized cohort erroneously included 3108 members who did not have the selected grocer as their preferred shopping partner and an additional 1365 people who did not meet at least one of the other pre-specified inclusion criteria. This error was discovered following data collection. To adhere to our original inclusion and exclusion criteria, these non-eligible participants were removed from the final analyzed cohort (reasons for exclusion detailed in Fig. 1). The number of people removed from each arm due to this error did not differ (p = 0.77) and those removed did not differ in age (p = 0.47), family size (p = 0.17), or length of Vitality membership (p = 0.94) from the final analytic cohort (Additional file 1: Table S1). While the proportions of women (35.3% of those removed vs. 46.5% of the final cohort, p < 0.01) and the people living in Gauteng region (58.2% of those removed vs. 65.7% of the final cohort, p < 0.01) differed between the removed and final populations, these differences are difficult to interpret as the gender is that of the primary Vitality member (not the person doing the grocery shopping) and the concentration of the selected grocer’s store locations varies by region. The final analytic sample consisted of 2841 eligible participants.

Fig. 1
figure 1

CONSORT Diagram. Flow diagram of member recruitment, randomization, inclusion criteria, and analytic sample

The age, gender, household size, and geographic region distribution of members were similar between all study arms (Table 3). In the overall analytic cohort, 46.5% of participants were female, the mean age was 47.8 years (11.8), and the mean household size was 2.9 (1.2) members. The participants in the six arms were similar in their baseline shopping behaviors during the 12 months prior to the intervention (Table 3). For the entire analyzed cohort, the average number of monthly shopping transactions at the selected grocer prior to the intervention was 8.0 (4.2), and the baseline average healthy food spending amount was R883.4 (R541.9), approximately £50. At baseline, the percent monthly healthy and unhealthy spending for the entire cohort were 27.7% (10.7%) and 17.5% (7.4%), respectively (Table 3). The baseline percent healthy spending and items are higher than those in the intervention period, and the unhealthy spending and items lower, reflecting our aforementioned conservative approach to missing data (i.e., missing = 0% healthy and 100% unhealthy).

Table 3 Participant demographic and baseline shopping characteristics

During the intervention period, 2351 participants (82.5%) had complete shopping data (at least one transaction at selected grocer during each month of the Full Intervention), and 209 participants (7.4%) were missing only one month of shopping data. Only 67 participants (2.4%) were missing shopping data for all the months. The average percent monthly healthy spending (reflecting program engagement and level of spending i.e., the missing monthly shopping data = 0% healthy spending) ranged from 24.8% (11.4%) in Arm 1 to 26.8% (12.9%) in Arm 2 (Table 4). This difference between Arm 1 and Arm 2 was the largest in magnitude, but the associated p-value of 0.093 did not surpass the Holm-Bonferroni corrected threshold of 0.008

Table 4 Average monthly spending and items by Arm (mean [SD])

needed to be considered statistically significant. Repeating the examination of average percent monthly healthy spending using imputed missing shopping data yielded similar results, with no statistically significant differences between any of the arms (Additional file 1: Table S2).

No statistically significant between-arm differences were noted for any of the secondary outcomes as well (Table 4). The analysis using imputed missing shopping data also yielded similar results for the secondary outcomes, with no statistically significant between-arm differences (Additional file 1: Table S2).

In the additional analyses, we examined the trend in the primary and secondary outcomes using the monthly data during the Full Intervention period. A sharp peak in the holiday months (December–January) followed by a downward trend was noted across all study arms (Fig. 2). As in the primary analysis, the widest gap between trend lines emerged for Arm 1 and Arm 2; however, the downward slopes of all arms’ trends were approximately 0.01, representing a decrease of 1% in healthy spending for each month that passed (Additional file 1: Table S3). None of the between-arm slopes differed significantly. Finally, when controlling for baseline participant demographics and shopping behaviors in a multiple regression, we found Arm 2 had a 1.3% higher average percent monthly healthy spending than Arm 1, but the associated p-value of 0.017 was not significant after Holm-Bonferroni correction (Additional file 1: Table S3). In this regression analysis, Arm 2 had a 1.6% higher average percent monthly healthy items than Arm 1, but this difference was also not statistically significant (p = 0.014) (Additional file 1: Table S3). Finally, no differences in the shopping outcomes were noted when grouping the arms by higher-level themes (lower vs. higher incentive, no/generic weekly message vs. personalized weekly message) (Additional file 1: Table S4 and Additional file 1: Table S5).

Fig. 2
figure 2

Percent healthy spending over time by Study Arm. Arm 1 (Usual care): 10% cash back, no weekly text, standard monthly text; Arm 2: 10% cash back, generic weekly text, standard monthly text; Arm 3: 10% cash back, personalized weekly text, standard monthly text; Arm 4: 25% cash back, personalized weekly text, standard monthly text; Arm 5: 10 + 15%NET, personalized weekly text, standard monthly text; Arm 6: 10 + 15%NET, personalized weekly text, unbundled monthly text

Discussion

In this RCT of adult members of an insurer-grocery store partnered healthy food promotion program, none of the tested financial incentive and text messaging combinations differentially affected the examined shopping outcomes, monthly healthy and unhealthy spending and item counts.

The absence of any appreciable change in food purchasing among those whose cash back was more than doubled (increase from 10 to 25%) and those who experienced a new financial disincentive for unhealthy foods (10 + 15%NET) is noteworthy. Past work has repeatedly demonstrated the power of financial incentives to influence health-related behaviors, including fruit and vegetable purchases, weight loss, smoking cessation, and vaccination rates [7, 16,17,18]. The effect of financial disincentives on health-related behaviors has also been established in the literature; observational and simulation studies demonstrate the effectiveness of taxes on decreasing the purchasing of unhealthy items, including sugar sweetened beverages, less-healthy foods, alcohol, and cigarettes [19,20,21,22,23,24,25,26].

There are several possible reasons why the tested financial incentive structures did not affect members’ purchasing behaviors. First, the absence of any effect may reflect the temporal separation between the shopping and the cash back payments. To overcome the present bias that can contribute to unhealthy food choices, as well as to leverage the regret aversion that can also influence choices, more immediate savings or penalties, experienced at the time of purchase, may be more effective than what was implemented in this study. Our original study design called for immediate feedback after each shopping trip; unfortunately, lags in the processing of the shopping data made this design unworkable. Second, we focused on HF members who were still at the baseline 10% cash back level. This lower engagement with HF may reflect lower cost sensitivity since more cost sensitive individuals would likely have completed the simple tasks necessary to get to the highest cash back level of 25%. In a cost-insensitive population, the monetary difference between the 10 and 25% cash back levels and the potential financial losses in the 10 + 15%NET Arms may have been of insufficient size to shift purchasing behaviors. Of note, the participants in the 10 + 15%NET Arms (Arms 5 and 6) were still getting 10% cash back regardless of any unhealthy food purchases. The effect of this incentive structure might have been different if unhealthy food choices had resulted in participants’ cash back amount being below their baseline 10% level (cash back below 10% was not tested because Vitality did not want to penalize members for study participation). Finally, the lack of observed effect of the financial incentives and disincentives may reflect a different valuation system. It is possible that individuals who are consuming unhealthy foods derive considerable utility from doing so, and this utility may be equally or more valuable than the financial gains and losses resulting from their shopping choices.

Given the literature supporting the value of tailored, individualized nutritional feedback over generic feedback, the absence of differences between Arm 1 (no weekly message) and Arm 3 (personalized weekly message) and between the no/generic weekly message arms (Arms 1 and 2) and the personalized message arms (Arms 3–6) were unexpected [9, 10]. One possibility is that the personalized weekly text feedback was confusing, not motivating, contained the “wrong” information, or required too much mental accounting. To maximize the potential of tailored messaging strategies, future work is needed to improve message quality by identifying the ideal message frequency, timing, and content to optimally support engagement in programs like HF. While only noted in the adjusted regression analyses and bordering on statistical significance, there was a suggestion of differences in the monthly percent healthy food spending and monthly percent healthy food items between Arm 1 (no weekly message) and Arm 2 (generic weekly message). One possible explanation for this unanticipated, possible difference was that any message, even a generic one, served as a reminder for HF members to take advantage of the program.

The study design had several limitations. First, the generalizability of the study findings to other contexts was limited by the current uniqueness of Vitality and the HF program, as well as the limited demographic information available on participants. While we had no information on participants’ income, a likely contributor to their sensitivity to financial incentives and penalties, we do know that, in South Africa, having private insurance, like Discovery Vitality, is associated with higher income, suggesting this population may be relatively cost insensitive. Second, the weekly text messages only reflected participants’ shopping at the selected grocer. Members likely shopped at multiple stores or only purchased certain foods at certain stores. Given this, the weekly text messages may not have accurately described their shopping behaviors, thereby diminishing the potential impact of this personalized feedback. Third, the classification of foods as healthy, unhealthy, and neutral is an inherently noisy signal in this context for many reasons. While healthy foods are labeled on store shelves, unhealthy foods are not, limiting participants’ ability to actively avoid them. Further, there are no clear guidelines regarding the ideal make up of a grocery cart (i.e., what is the ideal percent healthy and unhealthy?) and no data on what a “normal” percent healthy basket is for different populations (i.e., were enrolled participants already at the high end of basket healthiness?). Fourth, the incentive structure changes, particularly the 10 + 15%NET, may have been confusing for participants or entirely missed by participants, limiting any effect of these changes on participants’ food purchasing behaviors. Unfortunately, we were not able to assess members’ awareness or understanding of any changes to their benefit design. Fifth, we were unable to determine if the weekly and monthly text messages were opened and read by participants.

The study design reveals how existing programs can become laboratories to study approaches to improve health. This study was conducted in the same setting in which interventions would be potentially be implemented. And, while this pragmatic design led to some unexpected challenges (e.g., delay in start of the Full Intervention, protocol deviation during recruitment that reduced sample size and limited statistical power) and made certain design features unfeasible (e.g., immediate feedback on shopping, cash back below 10%), it also offered the chance to assess evidence-based interventions in a real-world setting.

Conclusions

The results of this RCT suggest that merely changing the amount of a delayed financial incentive or introducing a small financial penalty for unhealthy food choices are insufficient when trying to shift food purchasing behaviors among “low utilizing” members of a healthy food promotion program. While it is possible that improving peoples’ dietary habits merely requires larger and more immediate financial benefits or consequences or better messaging, it is more likely that this complex, diverse, multi-dimensional problem requires equally multi-dimensional and tailored solutions. The United Kingdom Behavioral Insights team’s EAST framework emphasizes that interventions to change behaviors need to be easy, attractive, social, and timely [27]. Building on this framework and expanding beyond financially-centered motivators, effective interventions to promote healthier diets may require a combination of creative strategies, including changing default options (e.g., a default healthy basket for online grocery shoppers), creating choice architecture environments that support healthier choices (e.g., more salient in-store signage and displays), leveraging intrinsic motivators (e.g., aligning HF incentives with participant’s own goals), and utilizing technology-based tools (e.g., new Vitality app that allows users to scan foods and quickly identify healthier alternatives). Given the importance of a healthy diet for chronic disease prevention and management, innovative strategies to support optimal food choices at the point of purchase will remain a focus for health behavior change researchers and public health practitioners. Partnerships with food retailers and insurers hold great potential to offer real-world opportunities for implementing and evaluating these novel strategies.