Background

Overweight and obesity remains a substantial public health challenge worldwide and particularly in higher income countries [1]. Behavioral weight management programs, which use behavioral skills training to help individuals make changes in their eating and activity habits, remain the first-line strategy for overweight and obesity management [2]. Self-monitoring of dietary intake, physical activity, and weight plays a key role in these programs [3, 4], and greater adherence to self-monitoring has been demonstrated to be the best predictor of weight loss success [5,6,7,8]. In particular, early and consistent engagement in self-monitoring is important for weight management success [9, 10]. In order to both support greater adherence to self-monitoring and to help individuals interpret the data collected from self-monitoring to set effective behavioral goals, participants in weight management programs are typically provided feedback based on their self-monitoring [11].

Within Social Cognitive Theory, provision of feedback is postulated to provide positive reinforcement for successful goal attainment, insight into potential barriers and challenges, and support for problem-solving and effective development of future goals [12, 13]. Moreover, Supportive Accountability Theory [14, 15] posits that interventionist support is essential for promoting engagement with health behavior change interventions, which often takes the form of feedback. However, beyond these basic theoretical principles, surprisingly little empirical evidence exists to guide the crafting of feedback messages for weight management as well as improvements in dietary intake and physical activity [16].

Feedback has been recognized as a potentially essential ingredient in the behavioral change technique taxonomy [17], and while self-monitoring has been the focus of many systematic reviews/meta-analyses [5, 7, 18, 19], feedback has received less attention [20, 21]. Sherrington and colleagues [20] focused on the personalization of feedback (i.e., individualization of feedback either by a human or an algorithm) in internet-based weight management studies, and they found that personalized feedback may confer approximately a 2 kg benefit over interventions that did not provide personalized feedback. Schembre et al. [21] concentrated on just-in-time feedback in diet and physical activity-focused interventions and was unable to conduct a meta-analysis due to the variability in targeted behaviors, study duration, and feedback types. Thus, it is essential to identify and evaluate feedback types that may optimize this intervention component, given the widespread use of feedback in behavioral interventions for weight management, dietary change and physical activity change, the personnel costs of human-generated feedback [22], as well as the various forms in which feedback may be presented (e.g., positive reinforcement messages vs. areas for change [23]; numerical displays [24, 25] vs. vibrations [26] vs. text [27,28,29]).

The primary aim of the current study was to systematically review and, if possible, meta-analyze self-monitoring interventions that use feedback as a behavior change technique (BCT), to determine the impact of feedback on diet and physical activity behaviors, weight, and self-monitoring behaviors (i.e., diet or physical activity, or weight). The secondary aims were to evaluate aspects of feedback (e.g., how different types of feedback are perceived by participants, how feedback impacts retention, what types of feedback are typically provided, how frequently feedback is provided, the length of feedback) to determine whether there are potentially feedback elements that are associated with superior outcomes.

Methods

The review proposal was submitted to PROSPERO prior to data extraction; it was accepted on April 11, 2022, registration number: CRD42022316206. The search strategy, raw data, and analysis scripts are provided on the Open Science Framework (OSF; https://osf.io/j9duf/).

Search methods for identification of studies

A medical librarian (AD) searched PubMed/MEDLINE, Web of Science, CINAHL, PsycINFO, and Google Scholar. The search was limited to articles published in the English language and published from 1970 through March 2022. Keywords included “self monitor” OR “self monitoring” OR “self monitored” OR “self directed” OR “self evaluate” OR “self regulate” OR “self regulated” OR “self track” OR “self tracking” OR “self weighing” AND obes* OR overweight OR weigh* OR “body mass” OR bmi OR calor* OR diet OR exercise* OR “physical fitness” OR “physical activity” OR walk* OR step OR steps OR pedometer* AND feedback AND behavior* OR behaviour*. Search strategies were modified for each database, utilizing controlled vocabularies (e.g., Medical Subject Headings) as appropriate. Complete search strategies are provided on the OSF website. In addition, searches of reference lists of identified studies and forward citation tracking using Google Scholar was performed by two authors (RAK and LK) to identify further eligible publications.

Screening

All potentially-eligible study records generated from the search strategy were imported into Covidence systematic review software (Veritas Health Innovation, Melbourne, Australia; available at www.covidence.org). Duplicates were removed before all titles and abstracts were screened independently by two authors (RAK and LK), categorizing articles as provisionally eligible or excluded according to the pre-registered eligibility criteria (Table 1).

Table 1 Inclusion and exclusion criteria

Specifically, articles were evaluated on the following criteria (in order) and categorized as excluded on the first criterion where they did not meet eligibility (if applicable): 1) no full text, 2) not published in English, 3) not an empirical peer-reviewed paper, 4) participants were not adults, 5) not a randomized controlled trial, 6) not an intervention targeting diet, physical activity or self-weighing, 7) BCTs did not include both self-monitoring and feedback (of behavior or outcome of behavior), 8) did not compare different forms of feedback or did not compare 2 or more interventions that only differ in whether feedback is provided, and 9) did not include primary outcomes of diet, physical activity, self-monitoring behavior and/or body weight. Conflicts were resolved by discussion. Afterwards, all full texts were screened independently by the same two authors and coded as eligible or excluded. Again, conflicts were resolved by discussion. The flow of study records is documented in the PRISMA diagram (Fig. 1).

Fig. 1
figure 1

PRISMA Flow Chart

Feedback on behavior and outcomes was defined based on the behavior change technique taxonomy [30]. Specifically, feedback on behavior was defined as “Monitor and provide informative or evaluative feedback on performance of the behavior (e.g., form, frequency, duration, intensity); e.g., inform the person of how many steps they walked each day (as recorded on a pedometer) or how many calories they ate each day (based on a food consumption questionnaire).” Feedback on outcomes was defined as “Monitor and provide feedback on the outcome of performance of the behavior; e.g., inform the person of how much weight they have lost following the implementation of a new exercise regime.”

Data extraction and synthesis

Two reviewers (RAK and LK) extracted data into a structured coding form. The data extracted included study characteristics (i.e., target behavior(s), country where the intervention took place, inclusion/exclusion criteria, study conditions, sample size, participant characteristics), intervention characteristics (i.e., theoretical foundation, self-monitoring modality, feedback modality, intervention components, self-monitoring duration), feedback characteristics (i.e., frequency, length, type), outcome characteristics, and study results (i.e., effect size for targeted outcomes, overall study conclusions). We also extracted information about feedback perceptions and retention. All relevant study outcomes were included in the extraction and synthesis.

In addition, a meta-analysis was conducted if at least three studies using similar manipulations and reporting on the same outcome provided data on group means and standard deviations or standard errors that could be used to calculate Cohen’s d [31]. We used metafor 3.8–1 [32] in R Studio 2021.09.2/ R version 4.1.2 to compute random effects models to calculate pooled effect sizes and to adjust for potential publication bias using the trim-and-fill method [33]. Heterogeneity was evaluated using I2 as recommended by Higgins et al. [34]. To account for multiple comparisons (i.e., when the control group was used for more than one comparison), the N of the control group was split, as recommended by Harrer et al. [35]; this was only the case for one study [36].

Risk of bias assessment

All studies that were eligible for inclusion were assessed for methodological quality by two reviewers using the revised Cochrane risk-of-bias tool for randomized studies (version 2) [37]. Studies were evaluated related to 6 types of bias: selection bias, performance bias, detection bias, attrition bias, reporting bias, and other sources of bias. Each type of bias is rated as 1) low risk of bias, 2) some concerns, or 3) high risk of bias. For the overall rating, the category indicating the highest risk of bias for an individual component is used. In addition, Egger’s test was conducted to test for publication bias in the meta-analyses [38, 39].

Results

The literature search yielded 1,396 studies, of which 647 were duplicate citations, 21 were books or chapters and 34 were reviews or meta-analyses, leaving 694 articles to be screened for eligibility. 544 articles were excluded upon title or abstract screening because the study did not meet the inclusion criteria. Thus, 138 full-text articles were assessed for eligibility. After 120 articles that did not meet the inclusion criteria were excluded, there were 18 included publications. An additional 6 studies were identified through forward and backward citation searches. A total of 24 publications reporting on 19 studies were included in the review (see PRISMA diagram, Fig. 1), with a total of 3,261 participants.

Characteristics of included studies

Of the 19 studies, 6 focused on diet [27, 28, 40,41,42,43], 14 focused on physical activity [24,25,26, 36, 41, 43,44,45,46,47,48,49,50,51,52], 3 focused on sedentary behaviors [24, 26, 53], and 9 focused on weight management [24, 27,28,29, 40,41,42,43, 51] (with some studies focusing on more than one of these behaviors) (Table 2). Outcomes for the different behaviors, however, varied widely between studies. Dietary behaviors reported comprised energy intake [27, 41, 47, 54], percent carbohydrates, protein, fat, and saturated fat from total daily energy intake, sodium intake, total fiber, added sugar [54], percent saturated, monounsaturated and polyunsaturated fats from daily energy intake [27], daily vegetable and fruit portions, weekly consumption of sweetened beverages and ultraprocessed foods [43], and achievement of diet goals [40]. Regarding physical activity, studies investigated total minutes of physical activity [50], daily minutes of moderate to vigorous physical activity [43, 51], steps [25, 26, 36, 46], walking lengths [47], physical activity energy expenditure [41, 44], metabolic equivalents [24, 48], accelerometer counts [52], activity data [45], time spend standing [26], sedentary time [26, 53], achievement of physical activity goals [50] (see raw data provided on the OSF).

Table 2 Study Characteristics by Comparison Category

The median number of study participants was 80 (range: 17–828). The majority of studies included samples with a mean age between 30 and 50 [27, 28, 40,41,42, 44, 46, 51, 52] who were predominantly female [25, 27,28,29, 36, 40,41,42,43,44, 46, 48, 51, 52] and White [26,27,28,29, 41, 42, 44, 46, 51, 53] (although ethnicity or race was not reported in 7 publications). Studies were conducted in the United States (n = 10, 52.6%; [27,28,29, 41, 42, 46, 47, 51,52,53], Brazil (n = 1, 5.3% [43]), the Netherlands (n = 1, 5.3% [40]), Finland (n = 1, 5.3% [24]), Thailand (n = 1, 5.3% [25]), China (n = 1, 5.3% [45]), and the United Kingdom (n = 4, 21.1% [26, 36, 44, 48]. Study duration ranged from 2 days to 24 months, although most interventions (n = 11) were shorter than 12 weeks. The studies were published between 2005 and 2022.

Theoretical foundation

In total, 11 studies (57.9%) cited a theoretical foundation for the intervention. The most frequently cited theories were general Behavioral Theory (n = 2, 10.6%), Control Theory (n = 2, 10.6%), and Social Cognitive Theory (n = 4, 21.1%).

Retention

Retention rates were calculated by the number of participants who completed the final follow-up data collection visit (varying between studies from 2 days to 24 months). On average, retention was 76.3% (range: 48.0%-100%).

Feedback features of included studies

The 19 studies varied in the content, frequency, and the timing of the feedback, with often limited details provided about the feedback (Table 3). Feedback was often graphical, such as a history of physical activity data, or reduced to numerical indicators for activity units or steps (e.g., [25, 48]). Some studies provided feedback on goal attainment (e.g., [43, 46]) or positive reinforcement (e.g., [29, 40]). Most studies, especially if published recently, provided feedback on digital devices such as PDAs (e.g., [27]), smartphone apps (e.g., [51]), or smartwatches [45]. Frequency of feedback varied, the majority of studies provided feedback continuously (3 studies; [25, 26, 45]), daily (or multiple times daily) (5 studies; [28, 29, 42, 47, 48]), or weekly (3 studies; [36, 41, 53]). Other studies had varied feedback frequency throughout the study (i.e., decreasing frequency) [27], randomization to different frequencies [46], or did not describe the frequency of feedback [24, 40, 43, 51].

Table 3 Feedback Characteristics

Across the studies, 9 compared feedback to no feedback [24, 25, 28, 36, 40, 42, 45, 48, 51, 52] and 5 compared human- versus algorithm-generated feedback [29, 41, 43, 47, 53]. The remaining 4 studies included other types of feedback comparisons, including feedback frequency (daily vs. weekly [27, 46]), richness of feedback (simple vs. visual vs. contextualized [44]), and the behavior on which feedback was provided (upright time vs. sedentary time [26]).

Impact of feedback on diet and physical activity behaviors, weight, and self-monitoring behaviors

A file containing means and standard deviations for all group comparisons can be found on the OSF (https://osf.io/j9duf/).

Impact of feedback provision

Nine studies compared participants who received and did not receive feedback, allowing us to test whether providing feedback had a positive impact on behaviors or weight. Studies yielded mixed results. Six studies reported benefits of feedback such as reaching diet goals [40], self-monitoring diet and exercise more frequently [40], losing more weight [40], and being more physically active [24, 36, 45, 48, 52]. This positive impact, however, was not universally observed; other comparisons did not report an impact of feedback provision on physical activity [25, 52] or weight [24, 28, 42].

Due to the large heterogeneity of studies in terms of feedback provided and outcomes studied (e.g., reporting weight change in various ways), we were only able to conduct a random effects meta-analyses for differences in physical activity based on 9 comparisons reported in 6 studies [25, 36, 45, 48, 51, 52]. The meta-analysis yielded a statistically significant pooled effect size of Cohen’s d = 0.29, 95% CI [0.16;0.43] (test for overall effect: Z = 4.14, p < 0.001; see Fig. 2). Heterogeneity was low (I2 = 9.07, Tau2 = 0.00, H2 = 1.00, df = 9, p = 0.432 [56]). Results were unchanged when using trim-and-fill, indicating no evidence for publication bias (see Fig. 3).

Fig. 2
figure 2

Forest plot for the random effects meta-analysis comparing the impact of providing feedback vs not providing feedback on physical activity behaviors

Fig. 3
figure 3

Funnel plot created using the trim-and-fill method. No studies were filled, indicating that publication bias is unlikely

Impact of human vs. algorithm-generated feedback

Five studies compared the impact of human- and algorithm-generated feedback on behaviors and weight with mixed results. Studies reported significant group differences, including healthier diet composition in participants receiving human-generated feedback [43] and a reduction in sedentary time [53]. Conversely, West et al. [29] reported greater weight loss in participants who received algorithm-generated feedback, compared to participants who received human-generated feedback. Other studies did not report differences between groups for physical activity [41, 43], self-monitoring behaviors [43], or weight loss [41]. Due to the large heterogeneity of studies in terms of feedback provided and outcomes studied, we were unable to conduct any meta-analyses.

Impact of other forms of feedback

Five studies investigated the impact of different forms of feedback on behaviors. Due to the large heterogeneity of studies in terms of feedback provided and outcomes studied, no meta-analysis could be conducted with these studies. The impact of feedback frequency was tested in two studies, including the SMART trial that resulted in several publications (as described below). Kerrigan et al. [46] reported that providing daily feedback increased step counts more than providing weekly feedback. This finding was not mirrored by the publications stemming from the SMART trial related to weight management (i.e., the primary SMART study outcome) [27, 49] or physical activity [27]; however, the SMART trial reported a greater reduction in energy consumed for participants who received daily vs. weekly feedback messages [54] and found greater adherence to self-monitoring if daily feedback messages were provided [55]. Self-monitoring behaviors were then correlated with greater adherence to physical activity goals and weight loss [49, 50].

In a test of another type of feedback, Godino et al. [44] tested whether feedback richness (simple vs. visual vs. contextualized) impacted participants’ physical activity, and found no significant group differences. In addition, Rabbi et al. [47] tested whether personalized feedback (personalized vs. non-personalized feedback, both generated by an algorithm) affected participants’ diet and physical activity. The authors reported that personalized feedback led to increased physical activity, but dietary behaviors were not different between the conditions. Finally, Martin et al. [26] investigated if the behavior on which feedback (i.e., sedentary time vs. upright time) was provided impacted participants’ physical activity. Again, no significant group differences were found.

Impact of feedback on participants’ perception of the intervention and retention

Seven of the included studies [25, 26, 36, 44, 47, 51, 52] reported on participants’ evaluation of the provided feedback. In all seven studies, evaluations were highly positive, with participants reporting that the feedback was motivating [26] and the main reason for using the intervention device [25, 52]. Fanning et al. [51] reported that participants asked for more frequent messages, and Paschali et al. [52] noted that participants in the “no feedback” condition were disappointed that they did not receive any feedback. Feedback thus seems to be an integral component of interventions that participants expect and enjoy. Somewhat unexpectedly, participants even reported that they found the feedback motivating and enjoyable even if it was not related with objectively measured or even perceived changes in behavior (e.g., [26, 52]).

Three of the included studies explicitly compared retention rates between conditions. All three studies [29, 41, 42] did not report differences in retention based on the condition, suggesting that feedback might not prevent attrition.

Risk of bias assessment

We used the Cochrane Risk of Bias 2.0 tool to evaluate all of the studies. All studies were subject to significant risk of bias (see Table 4 for details), with 9 studies having the overall rating of some concern and 10 studies receiving the overall rating of high risk of bias. The high risk of bias largely resulted from lack of pre-registration of the analysis plan.

Table 4 Risk of bias assessment for included studies

Discussion

Feedback is a core component of behavioral change interventions [17]; however, because feedback is rarely the focus of intervention and thus varied systematically, little is known about how feedback should ideally be formulated and presented. The current systematic review aimed to compile the existing evidence about feedback on self-monitoring behaviors, dietary intake, physical activity, and weight. Overall, evidence for the effectiveness of feedback was mixed. There was a significant effect for feedback (vs. no feedback) on physical activity, but this finding was driven by only half of the studies reporting a significant effect for including feedback (compared to no feedback). However, the effect of the presence or absence of feedback for outcomes other than physical activity has rarely been examined and thus we were unable to conduct meta-analyses for these other outcomes.

Despite the popularity of digital interventions which often incorporate algorithm-generated feedback [57, 58], effects of providing human- vs algorithm-generated feedback is understudied. Interestingly, while results of four out of five included studies reported either no difference or human-generated feedback to be superior, findings by West et al. [29] suggest that algorithm-generated feedback may be more effective in certain circumstances. For example, algorithms consistently provide feedback on all of the desired behaviors, which may not happen with a human, and algorithms can provide more immediate feedback, without consideration for holidays, illness, or weekends. In addition, complex algorithms may detect patterns of behavior that may be beyond the capabilities of an interventionist. More research is urgently needed to understand which form of feedback generation are most effective under which circumstances, given that generating feedback automatically may improve the cost-effectiveness and sustainability of behavioral interventions as well as their reach [20].

Available research regarding feedback frequency was especially limited. Two studies [46, 55] focused on the frequency of providing feedback, showing that daily feedback was associated with greater self-monitoring, which was in turn associated with improved behavioral and health outcomes such as physical activity and weight loss. The link between self-monitoring and intervention effectiveness has been previously established; providing feedback frequently (but also not too frequently so that it may annoy users, especially when paired with a notification [59]) may thus be key for intervention effectiveness. More research is needed to confirm these findings also for other behaviors and to determine potential dose–response effects of feedback for the engagement with intervention components.

It is important to note that there are numerous characteristics of within the design of each feedback package (e.g., frequency, behavioral vs. outcome focus, length, personalization, graphical vs. numerical vs. text vs. vibration modality, achievement vs. future behavior change valence). Due to the infrequency of each characteristic of feedback and the lack of systematic manipulation of some of these characteristics, we were not able to evaluate the independent effects of these characteristics, which may have led to the mixed outcomes in this review. It will be important to systematically vary these feedback characteristics to determine optimal combinations, as some of these characteristics may have small but potentially additive effects.

This review only included studies that specifically compared different feedback conditions and not intervention packages, to isolate effects of feedback provision and different forms of feedback. However, different BCTs included in an intervention may interact since they link to or build on one another. For example, feedback provision may boost the effectiveness of other BCTs such as goal-setting since it may allow participants to identify changes that are most urgently needed or easiest to achieve [60]. Potential interactions between BCTs may also explain why Fanning [51] (which also used goal-setting) reported relatively large effects of feedback on changes in physical activity, while other studies (which did not use goal-setting) produced smaller effects.

Based on evaluations of feedback provision reported in a small number of included studies, it can be concluded that feedback provision is a desired and well-received study component, which mirrors previous research [59]. Surprisingly, in some of these studies, feedback provision did not improve intervention effectiveness despite the study participants reporting to find it useful, perhaps because feedback sometimes focuses on what the participant is doing well and maintains a human connection in some studies. On the other hand, previous research has pointed out that feedback may not always be beneficial; depending on the valence, it may also be seen as demotivating and so promote disengagement – rather than engagement – with the intervention [61]. In addition, the studies that examined the effect of feedback on retention did not find benefits [29, 41, 42]. These findings underline that feedback needs to be carefully crafted to achieve its desired effects of promoting intervention engagement and effectiveness.

Despite the systematic approach to this review, there are limitations that are important to note. First, the details on feedback provided in studies was often unavailable, which complicates the interpretation of the findings. Second, some of the interventions were extremely short (i.e., 2 days [25]) and most interventions were less than 12 weeks, so may not have been long enough to adequately test the feedback effect. In addition, some of the outcomes we examined were too different to include in additional meta-analyses. Furthermore, many studies had to be excluded because they tested intervention packages, which makes it difficult to estimate effects of individual intervention components. Third, the vast majority of included studies did not conduct sensitivity analyses to test for potential demographic differences in effects, and many included samples that were predominantly female, well educated, and white. This review thus cannot speak to the generalizability of the findings to deprived populations. Future research needs to address this issue, since engagement with and effectiveness of behavioral interventions likely are not equal for all [62, 63]. Finally, there was a high risk of bias in the majority of the studies, reflecting changing trends in pre-registration of analyses. In the future, rigorous experimental research using appropriate study designs such as factorial trials are needed to examine optimal feedback components further.

However, there are also strengths of this study. The design and conduct of the literature searches by an experienced medical librarian, the inclusion of 5 literature databases, and the use of forward and backward citation searches, which led to a comprehensive set of literature upon which to perform the review. Additionally, consistent with open science principles, we have reported the raw data on the OSF website. Finally, two reviewers independently coded all of the studies.

Conclusion

This review underlines the importance of feedback as a behavior change technique in interventions, but also clearly indicates that greater detail should be provided in scientific manuscripts regarding the feedback components (including examples and potentially screenshots) and frequency. In addition, more research is needed on how feedback is best generated (i.e., what can be generated by an algorithm and what potentially cannot) and presented to maximize intervention effectiveness.