Introduction

Sustained moderate-to-vigorous intensity physical activity (PA) is beneficial for older adults. PA improves physical functioning [1, 2], mental and physical health [3,4,5,6,7], and quality of life for this population [8]. Unfortunately, most older adults do not engage in sufficient PA [9] and their rates of PA tend to decline with age [10]. Among adults aged 65–79, activity levels decrease approximately 2% per year, compounding [11]. Women generally accumulate less PA than men [12,13,14,15]. Women also face decreased muscle quality and lean mass after menopause [16], which can exacerbate inadequate activity’s negative effects [17]. Though current interventions can produce PA initiation in older adult women [18], evidence for long-term adherence has been disappointing [19, 20]. More sustainable and scalable interventions are needed to enhance PA for older women who are not meeting recommended PA guidelines [21].

Walking has emerged as a primary target for PA promotion to older adults. It is the most commonly performed type of PA [22] and is accessible, simple, and enjoyable [23, 24]. Even adjusted for other types of PA, walking is associated with decreased mortality among older adults [25]. mHealth media, such as applications on smart devices and wearable activity monitors, have shown promise as scalable methods for delivering walking interventions [26, 27]. These media are acceptable among older adults [28, 29], and pilot studies in this age group have demonstrated short-term step count increases [30,31,32,33]. Unfortunately, use of activity monitors has not been associated with long-term PA adherence in this demographic group [34, 35]. A review found only two technology-based intervention studies of older adults with follow-ups of 6 months or greater, and neither found a significant result at follow-up [36].

It may be that the messaging and framing paradigms assumed by most activity monitor systems are not a good fit for promoting sustained PA in older women. One commonly employed perspective in mHealth media, termed the corrective technology perspective, emphasizes the potential utility of applying technology to highlight objective discrepancies between the user’s intentions and subsequent performance [37, 38]. Another common perspective, the quantified self perspective, is centered on stimulating personal insight via the presentation of detailed, objectively measured data [37, 38]. An important limitation of these perspectives is that they do not typically involve higher order factors, such as personal meaning, values, experiential outcomes, or subjective interpretations of their data [37, 38]. Perhaps due to this missing contextual relevance, the use of digital technologies alone appears to rarely translate to sustained behavior change in older adults [39].

Adopting a celebratory technology framework may be advantageous for promoting sustained PA in older women. While corrective technology focuses on “fixing” a behavior, celebratory technology focuses on positive aspects of beliefs, actions, and values related to the behavior [40]. This framework attempts to imbue personal informatics with a sense of reflection [41], context [42], and storytelling [43], or, evoke the qualified self more so than the quantified self. Orienting PA promotion in this way is concordant with Self-Determination Theory (SDT), which posits that autonomous regulations are powerful predictors of behavior over time (i.e., motivations for PA born of inherent interest, identity, or valuing outcomes) [44]. A growing literature supports the utility that SDT provides for understanding and influencing PA behaviors [45,46,47,48,49,50]. Thus, enriching digital PA technologies with a celebratory technology perspective that emphasizes enjoyment, interest, meaning, and personal values may be a useful approach to supporting adherence to health-related behaviors.

Active games are one potential method for achieving a celebratory technology perspective and targeting autonomous regulations for PA. Games and play have long been linked to SDT constructs [51,52,53]. Seminal definitions of games have explicitly stated that games are intrinsically motivating, and further that they are often characterized by the imposition of arbitrary constraints and limitations along the path to goal achievement [54,55,56]. Indeed, popular games can greatly increase perceptions of autonomy and autonomous regulation in the context of PA [51, 53, 57,58,59]. Active games created to target SDT constructs have increased autonomous regulation [51], as have active games in which PA is incidental [59].

In line with this literature, we engaged in human-centered design that theory-based program planning to develop CHALLENGE (Challenges for Healthy Aging: Leveraging Limits for Engaging Networked Game-based Exercise) [60, 61], a PA intervention that aims to supplement the corrective technology of wearable activity monitors and apps with celebratory intervention components that use the autonomy-supportive frame of a game. The focus of the intervention is to facilitate, augment, and emphasize positive aspects of walking for PA among older adult women, rather than focusing on “correcting” insufficient PA [40]. CHALLENGE was designed to use arbitrary rules to invoke game mechanics and produce playful experiences that increase autonomous regulations for PA [61].

In this paper, we present the protocol of an NIA-funded R01 study designed to evaluate the efficacy of CHALLENGE (ClinicalTrials.gov Identifier: NCT04095923; NIH Grant R01AG064092; UTMB IRB #19–0158). We hypothesize that older women who play the game (i.e., engage in the experimental CHALLENGE intervention) will exhibit increases in autonomous regulations for PA at the intervention end (12 months) and after a 6-month no intervention maintenance period (18 months). We further hypothesize that participants who experience the intervention will exhibit increases in walking (measured in steps) at these time points. We will also investigate potential mechanisms of the intervention effects; specifically, we will conduct longitudinal mediation analysis for outcomes (number of steps) measured at months 12 and 18 by autonomous regulation (or other regulation) measured at month 6. For exploratory purposes, we will investigate the potentially moderating effects of age, trait playfulness, and physical functioning on engagement and step count.

Methods

Design

The goals of the CHALLENGE study are two-fold: first, to test the efficacy of a novel intervention that uses game elements to reframe traditional mHealth intervention components; and second, to investigate potential mechanisms by which the intervention may improve PA. To pursue these aims, we will conduct a two-group randomized controlled trial. We hypothesize that the elements of game design in CHALLENGE will lead to celebratory and playful experiences that will improve SDT constructs linked to exercise and subsequently lead to sustained increases in walking (Fig. 1).

We will recruit healthy older women and randomize them into the game-based intervention group or to an activity monitor-only comparison group. This comparison will ensure that differences between conditions are due to the game content. Participants in both groups will receive a wearable activity monitor; the companion feedback app for those monitors will be downloaded to participants’ mobile devices. In the game intervention group, participants will also be added to a private Facebook group, where interventionists will post weekly challenges that require PA but have arbitrary, autonomous goals (see intervention description, below). Assessments at baseline and 6, 12, and 18 months will measure steps and psychosocial/motivational self-report variables. Throughout the study we will also collect process variables such as wear of the monitor and posts to the social media group. The primary outcome will be change in steps, which we will investigate at the intervention end (12 months) and after a 6-month maintenance period with no intervention (18 months).

Fig. 1
figure 1

Conceptual model of CHALLENGE

Participants and recruitment

We will recruit 300 community dwelling older women in the U.S. on a rolling basis with the assistance of the recruitment core of the UTMB Claude D. Pepper Older Americans Independence Center. Participants will be recruited using online recruitment using UTMB’s newsletters, presentations at local wellness events, and targeted Facebook advertisements. To help accomplish the latter, we will employ the services of Trialfacts (San Diego, CA), which uses digital means (e.g., Facebook advertisements) to provide targeted patient recruitment services. We anticipate recruitment and randomization to continue for approximately 3 years.

Eligibility criteria

Inclusion criteria

Inclusion criteria include age 65–85, self-report as women, ability to read and write in English, the availability of a mobile device using either iOS or Android software with a working camera, willingness to use a private Facebook group, sufficient internet access to post photos to Facebook at least once per week, and having an existing Facebook account or the willingness to create one.

Exclusion criteria

Exclusion criteria include being unable to walk for exercise (self-report), walking less than 475 m in a 6-minute walk test during baseline assessment, having had stroke, hip fracture, hip or knee replacement, or spinal surgery in the past 6 months, answering “yes” to any question on the PAR-Q + without providing a doctor’s note giving permission to begin a PA program, self-reporting weekly PA ≥ 150 min, having a BMI under 18 or over 40, reporting psychological issues that would interfere with study completion, planning to move away from the Galveston-Houston area, clinical judgment concerning safety, currently participating in an organized commercial or research PA program, and another member of the household being a participant or staff member on this trial.

Sample size and statistical power

Because effect size estimates taken directly from pilot studies can produce biased power estimates when used alone [62, 63] and our pilot trials could not provide the range of outcomes needed (e.g., steps at 6, 12, and 18 months), we used several sources to estimate the range of possible intervention effects. We estimated steps per day increases of approximately 1500, 1000, and 500 at the three follow-up points. These numbers were based on similar mHealth walking studies [36], preliminary research [32], and the clinical significance of 500-step-per-day increase for older adults [64,65,66]. To achieve 80% power, a sample size of approximately 252 participants would be needed to detect this difference between groups at 18 months. Our most conservative estimates would require a sample size of 275. We will recruit 300 participants in case of any missing data that cannot be dealt with via imputation (e.g., extreme levels of missingness, or if there is reason to believe that data are missing not at random).

Assuming at least 275 participants, we will have 99% power to detect that the intervention increases autonomous regulation by 0.6 units, as measured by the BREQ-3, when the standard deviation of BREQ-3 scores are 1 (as observed in our preliminary research). Furthermore, we will have 83% power to detect a mediator coefficient of 620, which was calculated assuming that the number of steps walked has a standard deviation of 3000 (as observed in our preliminary research), and adjusting for the confounding of intervention with BREQ-3 score. Therefore, we will have 81% power to detect a mediated effect of 372, which would explain 27.4% of the effect of the intervention through the indirect effect of autonomous regulation [67,68,69]. All testing will be 2-sided with α = 0.05.

Randomization

A blinded statistician will generate a randomization schedule to intervention or control based on a block randomized scheme with random block sizes. The block randomization approach [70] ensures the steady enrollment of game intervention participants, minimizing potential for lulls in which the social network receives fewer new members. This list will be used to number opaque envelopes that contain the words “intervention” or “comparison.” Carbon paper will be included in the envelopes for auditing purposes, and aluminum foil will also be used to ensure opacity of the envelopes. For each block, an equal number of intervention and control sheets of paper will be placed in envelopes in the order from the randomization schedule. Sealed envelopes will include study staff’s signature over the seal. When the interventionist meets a participant for orientation, s/he will open the first numbered envelope available, signing and dating the envelope to provide a paper trail.

Interventions

Participants in both interventions will follow the same study flow, attending a brief initial visit for consent with study staff and provision of an accelerometer for baseline testing (orientation visit). This visit will be followed by a baseline/randomization visit and three follow-up assessment visits (at 6, 12, and 18 months).

Comparison intervention

The comparison intervention will consist of the provision of an activity monitor and a goal negotiation process with study staff (i.e., setting step goals, action planning, and discussion for relevant problem solving). Participants will receive an activity monitor from the Fitbit Inspire line and download the accompanying Fitbit app. Study staff will provide the monitor and app and set up the app on the participant’s mobile device. Study staff will then go over how to use the monitor display and app. Participants will receive a written troubleshooting guide to help with common problems, such as how to update the app and how to reconnect the Bluetooth connection. Step goals will be negotiated between the interventionist and individual participants, with an initial suggestion of approximately 3,000 steps per day above baseline levels on 3 goal days per week. These numbers were based on published norms for older adults [71] as well as the acceptability of step goals in preliminary research [32, 33]. This is the template for goal suggestions, but participants will be able to modify this as they feel is appropriate.

Participants will receive weekly feedback emails from study staff that present their average daily step count for the previous week and personalized goal recommendations for the current week. Personalized goal recommendations for each week will start with participants’ initial goal and increase according to their weekly performance. If participants meet their previous week’s goal recommendation, the recommended goal for the current week will increase (1) the number of days to aim to achieve the current step count goal by one (e.g., “aim to achieve 4,500 steps on your 4 most active days” would change to “…5 most active days”; to a maximum of 5 days), then (2) the step count in 1,000 step increments (e.g., the previous example would increase to “aim to achieve 5,500 steps on your 5 most active days”; to a maximum of 8,000 steps). Participants will be encouraged to adjust their goals as desired, and any stated adjustment will be reflected in the weekly emails. The weekly feedback emails will also remind participants of their initial step count levels to highlight their progress, include supportive comments (e.g., “Every day is an opportunity to level up! You are making progress!”), and encourage participants to wear/sync their Fitbit devices.

Experimental intervention

We conducted formative human-centered design that theory-based program planning to develop the CHALLENGE intervention [60, 61]. Participants randomized to the intervention will go through the same processes and receive the same intervention content as the comparison arm, and will also be invited to participate in a PA-promoting game mediated by a social network. If a participant does not have a Facebook account, study staff will walk them through creating one. Study staff will provide an overview of the group and discuss basic privacy and good citizenship rules (e.g., no sharing of others’ posts outside the group, no insults, no offensive photo or text content, no inclusion of other identifiable people in photos). General privacy concerns will also be addressed, and the interventionist will suggest methods of dealing with any participant concerns regarding the information they post. Participants will not be required to share their step data in the group, though they may share that information if they so choose.

The Facebook group will be modeled after popular large-scale private groups, such as cooking clubs and book clubs. Thus, one large group will be created, rather than multiple smaller groups. Study staff will post weekly challenges in the Facebook group (Text Box 1). Over the course of the week, participants will reply to these challenges with images and other content obtained during walks. Participants will be encouraged to comment and react to others’ posts. Challenges will repeat each year, such that all participants will be exposed to every challenge regardless of when they join. Participants will be allowed to stay in the group even after the final follow-up, allowing “super-users” who are highly engaged to continue to motivate new participants. The sequence of challenges will remain approximately the same each year, as many challenges will be tied to specific holidays or seasons. All challenges have been designed to allow multiple responses per week and to encourage repeated photo posting.

Participants receiving the intervention will also be given a box of props/materials that can be used for some of the challenges. These will include cards, such as description cards for scavenger hunt-style challenges (see Text Box 1). We will provide cardboard cut-outs in the form of frames, masks, and other photo props that can be decorated and used in photos. Masks will allow participants to maintain anonymity as well as celebrate different holidays (e.g., Mardi Gras masks).

An interventionist will serve as a moderator for the Facebook group (a PhD-level behavioral scientist with experience in group-based PA promotion). The moderator will post the weekly challenge at the start of each week followed by a comment to this post providing an example response meeting the specified challenge (photo and text; Fig. 2). At the start of each week, the moderator will also post a wrap-up of the previous week. The wrap-ups will summarize the photos and discussion of the previous week and will include a badge (Fig. 3). Badges will reflect the corresponding weekly challenge content, and participants who posted that week will be tagged to this post in recognition of their contribution(s). Badges will reward challenge completion, not step goal achievement. This method of rewarding both task completion and performance as suggested by SDT is an autonomy-supportive way of handling rewards [72].

Study staff will abstract information on the social network weekly, taking screenshots and logging participants’ posts, comments, and reactions for the week. Though we did not encounter cyberbullying in preliminary studies, we will have procedures in place to deal with any interpersonal problems that arise. Problematic content will be flagged and discussed between the moderator(s) and study staff and then with the Principal Investigator. Based on this discussion, the content may be removed and the author notified. Participants will be encouraged to direct messages to the moderator(s) with concerns. The moderator(s) will create a management plan for repeat offenders.

We will use several methods from the marketing literature to encourage continuous participant posting. In moderator posts, we will include content shown to be associated with increased comments and likes, including photos, questions, entertaining and relational content, and recognition for posting [73,74,75,76,77,78,79]. Super-users will be recognized, with monthly posts highlighting the “power users” that posted each week that month. The weekly feedback emails for participants in the experimental group will additionally include feedback on participants’ Facebook engagement and a link to a REDCap survey to rate each weekly challenge.

Text Box 1. Illustrative examples of weekly walking challenges
Fig. 2
figure 2

Illustrative post and example response by the moderator

Fig. 3
figure 3

Illustrative example of a weekly summary of participants’ posts and badge used to recognize the contributions of study participants

Retention strategies

Participants will be compensated for their time with $25 ClinCards that will be provided at each follow-up assessment (for a total of $75; ClinCards can be used as debit cards and will allow us to automatically add additional funds at each follow-up visit). Study staff will attempt to contact participants via phone and/or email to schedule upcoming study visits. Participants will be reminded by phone of assessment visits the business day before the scheduled visit. We will mail seasonal cards expressing appreciation for participating in the study (e.g., for winter holidays, the start of spring, etc.). The weekly emails will encourage participants to contact study staff if they encounter difficulties in using the various technologies employed in the study or have other issues they would like to discuss. Study staff will place phone calls to participants if they are not able to access their Fitbit data or note other evidence of potential disengagement.

Data collection

Assessments at baseline, 6 months, 12 months, and 18 months will include objective and self-report quantitative measures (Fig. 4). Questionnaires will be completed online using a REDCap database. The REDCap database will also feature participant information and tracking data collection instruments that will be completed by study staff. We will create a new Fitbit or Google account for each participant and use this account to obtain participants’ activity data weekly. If participants already have an active Fitbit account, they will be allowed to use this account. We will strongly encourage the use of the provided Fitbit device. If this is not workable for participants and they feel they need to use their personal device for any reason, this will be permitted to avoid loss of participation. Data will be manually abstracted from the Facebook group continuously during the study (i.e., the content of all posts, comments, and reactions). Qualitative interviews will be conducted at 6 and 18 months (see below). Baseline measurements will collect sociodemographic data (e.g., age, race, ethnicity, marital status).

Primary outcome

Step count and physical activity

Because the emphasis of the intervention is on walking, steps will be the primary outcome and central measure of PA. Step count will be measured objectively using ActiGraph wGT3X monitors. A week-long assessment will take place at each measurement period. We will set the epoch length will be one minute and consider 2,020 activity counts per minute to constitute moderate-vigorous intensity PA [80] Non-wear time will be determined by 60 or more consecutive minutes of zero activity counts. Assessments will be considered valid if the monitor is worn at least 10 h per day on at least 4 days of the week. PA minutes (total, light, moderate-vigorous) will be secondary outcomes. For exploratory purposes, we will also abstract continuous step data from the Fitbit monitors.

Secondary

Motivation-related constructs

For the primary motivation analyses, we will use the Behavioral Regulation in Exercise Questionnaire-3 (BREQ-3). The BREQ-3 [81, 82] is a 24-item questionnaire that features six subscales to operationalize amotivation (e.g., “I don’t see why I should have to exercise”), external regulation (e.g., “I exercise because other people say I should”), introjected regulation (e.g., “I feel guilty when I don’t exercise”), identified regulation (e.g., “I value the benefits of exercise”), integrated regulation (e.g., “I consider exercise part of my identity”), and intrinsic regulation (e.g., “I exercise because it’s fun”). Responses are on a unipolar scale and range from “0-Not true for me” to “4-Very true for me”. Previous/preliminary research has found this scale to have acceptable internal validity, with Cronbach’s alpha statistics ranging from 0.73 to 0.86. We will use a composite of intrinsic, integrated, and identified regulation for an autonomous regulation score. We will also investigate differences between groups for each of the individual motivation sub-types.

We will measure autonomy, competence, and relatedness in the context of exercise using the Psychological Needs in Exercise Scale (BPNES). The BPNES [83] is an 11-item scale that features three subscales to operationalize autonomy (e.g., “I feel that the way I exercise is the way I want to”), competence (e.g., “I feel I have made a lot of progress in relation to the goal I want to achieve.”), and relatedness (e.g., “My relationships with the people I exercise with are close.”). Responses are on a unipolar scale and range from “1- I don’t agree at all” to “5-I completely agree”. Previous/preliminary research has found this scale to have acceptable internal validity, with Cronbach’s alpha statistics ranging from 0.75 to 0.86.

Identity and values

We will measure exercise identity using the Exercise Identity Scale (EIS). The EIS [84] is a 9-item scale that assesses the degree to which an individual identifies as someone who exercises (e.g., “I consider myself an exerciser”). Responses are on a unipolar scale and range from “1-Strongly disagree” to “7-Strongly agree”. The EIS has demonstrated good internal consistency, test-retest reliability, and construct validity [84,85,86].

We will measure value-based living using the Engaged Living Scale (ELS). The ELS [87] is a 16-item scale that features two subscales to operationalize valued living (e.g., “I have values that give my life more meaning”) and life fulfillment (e.g., “I am satisfied with how I live my life”). Responses are on a unipolar scale and range from “1-Completely disagree” to “5-Completely agree”. The ELS has demonstrated good internal consistency and construct validity, including in a large nonclinical sample and a clinical sample consisting of chronic pain patients [87].

Trait playfulness and playful experiences

We will use the Adult Playfulness Trait Scale (APTS) to operationalize disposition to engage in playful behavior among study participants at baseline. The APTS [88, 89] is a 19-item scale that features three subscales to operationalize fun-seeking motivation (e.g., “I am often the person who starts fun things in a situation”), uninhibitedness (e.g., “If I want to do something, I usually don’t let what other people may think stop me”), and spontaneity (e.g., “I often do things on the spur of the moment”). Responses are on a bipolar scale and range from “1-Stongly disagree” to “5-Strongly agree”. Psychometric studies have provided evidence of acceptable internal consistency for its subscales (Cronbach’s alphas ranging from 0.68 to 0.87) [89] and criterion validity [90].

We will use the Playful Experiences Questionnaire (PLEXQ) to investigate the extent to which the targeted playful experiences were felt for the participants in the intervention group. The PLEXQ [91] is a 51-item questionnaire that assesses the occurrence of 17 distinct playful experiences (e.g., “Please respond about how you felt when you were playing the weekly challenges game… I enjoyed discovering new things”). The PLEXQ has demonstrated acceptable internal consistency for its subscales (Cronbach’s alphas ranging from 0.70 to 0.88) [91].

Acceptability and engagement

We will measure usability via the System Usability Scale (SUS). The SUS [92] is a 10-item scale that measures the usability of digital artifacts (e.g. “I think I would like to use this system frequently”). Responses are on a bipolar scale and range from “Strongly disagree” to “Strongly agree”. The SUS is a widely used instrument with evidence demonstrating its construct validity and reliability [93]. We will supplement these data with items designed to probe the acceptability of the technologies involved in the intervention (e.g., “How often did you have problems with the following pieces of technology not working?.The Fitbit”; “I felt comfortable using the group”) [94].

Finally, we will characterize participants’ engagement with the technological aspects of the study via systems usage data. For the comparison arm and the experimental arm, we will record the days over the course of the intervention and follow-up periods for which there is evidence of PA tracker device wear. For the experimental arm, we will characterize engagement with the social media group by recording all posts, comments, and reactions in the Facebook group.

Anthropometric and physical functioning data

Height will be measured using a wall-mounted stadiometer (Seca Corp., Hamburg, Germany). Weight will be measured using a calibrated electronic scale (Tanita, Arlington Heights, IL, USA). Two consecutive measurements of height and weight will be taken. If consecutive measurements are not within 2.5% of one another, a third measure will be taken. The two closest measures will be averaged. Height and weight will be used to calculate body mass index (kg/m²). We will administer the one-minute sit to stand test as an index of physical functioning [95]. The one-minute sit to stand test will be conducted by a blinded assessor. The blinded assessor may be present in person or by videoconference. We will employ recommended procedures for conducting this test, including using a standard, slightly padded armless chair with a seat height of 45.0–48.0 cm, having participants’ arms be crossed over chest, including a practice cycle, etc. [95].

Fig. 4
figure 4

Study constructs and their measures by assessment time point

Qualitative interviews

To investigate participants’ experience with the intervention and identify issues related to sustainability, we will conduct brief qualitative individual interviews during the 6- and 18-month assessments. A trained research assistant will use a semi-structured guide to elicit participants’ feelings about their experiences associated with participating in the intervention. In the comparison arm, these questions will focus on using the activity monitor and their usage patterns. In the experimental arm, questions will focus on the game: how participants felt about the challenges, which challenges were most fun, and any negative experiences. All interviews will be digitally recorded and transcribed verbatim (qualitative analytic methods are described below).

Data analysis

We will use descriptive statistics to characterize the study sample and investigate process measures reflecting participants’ experiences with the wearable monitor/app and Facebook. From the database of information abstracted from Facebook and Fitbit, we will compute descriptive statistics for engagement (posts, comments, and reactions given, comments and reactions received, total days engaged, total number of engagements) and monitor activity (days the monitor was worn, daily steps). All between-group comparisons will be conducted in R (version 3.5 or later) and will follow intent-to-treat principles. Multiple imputation of missing data and transformation of non-normal data will be performed as needed.

Primary data analysis

Step count

We will use linear mixed-effects models to evaluate differences in steps from baseline between intervention conditions across 6, 12, and 18 months. We will evaluate terms for the main effects of intervention condition and time, as well as the interaction between intervention condition and time. The first set of models, to be fit after completion of the intervention by all participants, will include outcome values from the 6-month and 12-month time points (testing the effects of the intervention during the intervention period). Contrasts will be used to assess changes at each follow-up and between time points. Covariates will be included if they improve model fit as operationalized via the Bayesian information criterion. The second set of models, to be fit after completion of all data collection for all participants, will include 6-, 12-, and 18-month time points (testing changes from the intervention period to the no-intervention maintenance period). A linear trend will be modeled if fit criteria indicate that a simple trend is reasonable. We hypothesize that steps will be greater in the game intervention condition than the standard intervention at all time points.

Secondary data analysis

Motivation

We will fit separate linear mixed-effects models to investigate differences between intervention conditions over time for autonomous regulations, as well as separate models for intrinsic, integrated, and identified regulation. We hypothesize that each of these motivation types will be greater in the game intervention than in the standard intervention at each follow-up time point.

Mediation and moderation

To explore motivation-related potential mechanisms of the intervention effects, we will conduct longitudinal mediation analysis for outcomes (number of steps) measured at months 12 and 18 with autonomous regulation and each of its individual subcomponents measured at month 6 being the mediators of interest [96]. In this initial efficacy trial, we have prioritized investigation of autonomous regulation as a mediator, mirroring previous SDT-based intervention studies to allow clear comparisons [97]. We will use bootstrapping in testing for the indirect effect of each mediator on the outcome [98]. Note that exploring the potential indirect (mediation) effects may be recommended even when there is no significant treatment effect found on the primary outcome, for potential hypothesis generating purposes [99]. If a significant indirect effect is found for at least two potential mediators, we will conduct further mediation analysis using multiple mediator models [100]. In these analyses, we will control for the baseline outcome, as appropriate.

We will also investigate whether baseline variables for age, playfulness (trait propensity towards play) [88, 89], and physical functioning (as measured by the one-minute sit to stand test) [101, 102] moderate engagement and PA. We will fit linear models to investigate the impact of age, playfulness, and physical functioning on engagement as well as the impact of engagement on steps. We hypothesize that younger participants, more playful participants, and participants with greater physical functioning will have greater engagement. We also hypothesize that greater engagement totals will be associated with greater step counts. We will use linear models to investigate playful experiences between groups and across follow-ups.

Analysis of the individual weekly challenges

As an exploratory analysis, we will compare weekly step counts, weekly post frequency, and weekly self-reported PA enjoyment to investigate differences by challenge week. For this analysis we will employ a longitudinal mixed model as a function of time, with outcomes measured per week. Since challenge and time in the study are confounded, each effect will be modeled separately for each outcome. We will use nonlinear models and assess the fit of nonparametric smoothers using a generalized additive model formulation. To assess the relative impact of weekly challenges, we will perform a post-hoc set of pairwise comparisons of the relative effectiveness of each challenge in affecting each outcome. While our default computation will guard against Type I error inflation with a Tukey adjustment, we will also use Hsu’s “comparison with the best” formulation to identify which challenge (or group of challenges) produces the highest step counts, post frequency, and PA enjoyment. As many of the outcome variables are relative frequencies or counts, we will respectively employ a logistic or Poisson model formulation.

Qualitative analyses

Using NVivo software, two trained research assistants will conduct a thematic analysis of text and photos posted to Facebook and interview transcripts [103]. They will also use Michie and colleagues’ methods for coding behavior change techniques. Analysis will be overseen by EJL and DT, who are experienced in qualitative analysis [104,105,106]. Coding will be iterative, with initial codes based on Self-Determination Theory and additional codes based on grounded theory [107]. The qualitative team will meet to discuss any differences in coding.

Fidelity

Moderators will be trained by the Principal Investigator and will post pre-approved challenges and example posts verbatim. The moderators’ communication (e.g., weekly emails) will be largely scheduled and scripted (except for communications regarding scheduling, reporting unacceptable content, etc.). Study staff will record data from Facebook and Fitbit regularly, so we will be able to track the number of posts to the group per day.

Privacy and confidentiality

Participants will be informed of potential privacy and confidentially risks and practices in place to ensure protection in the informed consent. The initial orientation visit will also include basic information on privacy and confidentiality on Facebook. Participants will not be required to share their step data in the group, in order to maintain privacy of personal information (though they may choose to share that information if they like). Masks and similar props will allow participants maintain anonymity as well as celebrate different holidays as desired (e.g., Mardi Gras masks). The private Facebook group will only allow individuals participating in the study and study staff to join. The orientation process will include discussion on basic privacy and good citizenship rules (i.e., no sharing of others’ posts outside the group, no insults, no offensive photo or text content, no inclusion of other identifiable people in photos). Study staff will not reveal any sensitive information in Facebook and participants will not be required or encouraged to share personal health information in the private group. Data that could be used to identify a specific study participant will be handled with IRB-approved procedures.

Safety monitoring

Participants will be instructed to report illnesses and injuries to study staff promptly. Study staff will contact participants who report any such issues or reference possibly relevant issues in the course of the study proceedings (e.g., in response to the weekly feedback emails or in the Facebook group). To ensure safety, a qualified medical monitor will provide monthly review of participant safety and safety to continue the study based on data collected that month.

Ethics

All individuals involved in this study will participant in an informed consent process, and the study will be conducted in accordance with the ethical guidelines and regulations established by the University of Texas Medical Branch at Galveston Institutional Review Board (study protocol #19–0158).

Discussion

Pronounced demographic shifts and advances in healthcare have resulted in a large and growing population of older adults in the U.S. Older adult women can face unique challenges to engaging in recommended levels of PA and have poor adherence to PA guidelines [12,13,14,15]. Walking is a form of PA that is accessible and enjoyable for most older adult women, and emerging technologies can support walking interventions [23, 24]. Contemporary activity monitor systems, however, do not appear to be sufficient for bringing about lasting behavior change [36]. Part of the reason for this may be the framing of these intervention; a “celebratory framework” approach, emphasizing interest, meaning, personal values, and psychological needs, may be uniquely conducive to promoting PA adherence to older adult women.

The CHALLENGE study will evaluate if and how (i.e., proposed mechanisms of action) an intervention that supplements an activity monitoring system with elements of game design leads to lasting changes in PA patterns in older adult women. The intervention is designed to take a “celebratory technology” approach that targets SDT constructs known to predict sustained behavior change. By taking part in a year-long intervention centered on imbuing walking behaviors with experiences of autonomy and playfulness, participating older adult women may internalize health-promoting changes to their identity and relationship with PA. Study results will have implications for how we can harness powerful and increasingly ubiquitous technologies for health promotion to the vast and growing population of older adults in the U.S. and abroad.

This study has several potential limitations. First, our results will not be generalizable to all older women in the U.S. We chose to exclude adults over the age of 85 due to important differences in functional abilities between this population and older adults who are less than 85 years of age [8]. Generalizability is further limited by the convenience sampling we will employ, and the technology requirements inherent to the intervention. Participants may be relatively motivated to increase their PA and relatively interested in and/or comfortable with using the featured technologies. While we will evaluate the effects of some potentially moderating variables (i.e., age, physical functioning, and trait playfulness), additional studies will be needed to investigate (1) which contextual factors further influence intervention efficacy, (2) which specific components within CHALLENGE may be primarily responsible for driving results, and (3) how CHALLENGE might be able to adapt in real time and scale up to better serve participants.

The CHALLENGE intervention is the result of a systematic line of inquiry investigating how we can promote PA for older women in a way that is autonomy-supportive and readily scalable. We developed CHALLENGE as a theory-informed behavioral intervention that aims to simultaneously accomplish both instrumental (e.g., self-monitoring of PA) and experiential objectives (e.g., playful experiences) [61]. The present study is a randomized controlled trial that is powered to provide evidence reflecting the intervention’s efficacy and mechanisms of action. In accord with recommended guidelines [108, 109], the next steps in our research agenda will include further intervention refinement and research centered on the program’s dissemination and implementation using a hybrid effectiveness-implementation design.