Participant Characteristics
Assessed demographic factors included gender, age, employment status, highest level of education, post-tax household income level, and relationship status. Body mass index (BMI; kg/m2) was calculated based on self-reported height and weight. Physical activity was assessed using an adapted version of the Godin Leisure-Time Exercise Questionnaire [31, 32], with vigorous activity minutes weighted by two to account for additional benefits. Physical activity habit strength was assessed with the Behavioral Automaticity Index [33]. Previous exposure to physical activity messages was assessed using a purpose-built single item “How much information or advice about building healthy physical activity habits would you say you have read or viewed in the past?” with six response options ranging from none to a substantial amount.
Outcomes
Attention
When viewing intervention material on the monitor, participants’ eye movements were recorded as an indicator of attention. Total gaze duration within pre-specified areas of interest (AOI; i.e., all graphics and text) was used as the primary measure. The proportion of gaze time within an AOI, relative to total slide viewing time (AOI ratio), was also assessed. Gaze duration within AOIs is considered a valid and reliable measure of visual attention [26] and was chosen as the primary outcome as attention is considered a core component of message processing and may be a prerequisite for influencing user perceptions, attitudes, and intentions [34].
Eye movements were recorded with an Arrington ViewPoint EyeTracker system (Arrington Research), sampling at 220 Hz with a spatial precision of 0.25° of visual angle. The location of gaze was sampled every 4.4 ms. Gaze duration was calculated as the number of samples recorded within that AOI, multiplied by the sampling rate. This allowed for a calculation of raw time (in ms) of gaze in each AOI and the total time spent viewing each slide.
In addition, to help visualize how participants attended to the materials, heatmaps were calculated from gaze points, represented as x,y scatter density. Data were plotted for participants within the top tertile (highest in NFC) versus the bottom tertile (lowest in NFC).
Physical Activity Determinants
Intentions, attitudes, and perceived behavioral control were assessed pre- and post-exposure using standard self-report items, in line with suggestions by Ajzen [35, 36]. Scale scores were created for each determinant by summing items together, with higher scores equating to more positive responses. An overview of the measures used, including items and response scales, is provided in Supplementary Material 2.
User Experience and Perceptions
All user experience and perception measures were assessed during the post-exposure questionnaire.
Quality of Experience
An adapted version of the enjoyment of website experiences scale [37]was used to assess participants’ experience of viewing the physical activity materials. The measure was adapted to say “while viewing the materials” rather than “while visiting the website.” The measure is comprised of twelve items, four measuring the subscales of engagement, four measuring positive affect, and four items measuring fulfillment. Item scores are summed together to form subscale scores (range = 0–24) and an overall user experience score (range = 0–72). Higher scores indicate a more positive user experience. Previous research has demonstrated that this measure has a high degree of reliability and validity [37].
Perceived Message Effectiveness
A 3-item perceived effectiveness scale, adapted from Jensen et al. [38], was used to assess the perceived persuasiveness of the message. The three items are “Was the material convincing?”, “Would people your age who are not already active be more likely to become active after reading the information presented?”, and “Would the materials be helpful for convincing your friends and family to become more physically active?” with 4-point response options: definitely no, no, yes, and definitely yes (score range per item; 0–3). Higher sum scores of the three items indicate greater perceived effectiveness (range = 0–9).
Perceived Message Informativeness
Participants’ thoughts about the amount of information in the message materials were measured using an adapted version of the perceived message informativeness scale [39]. The scale includes 2 items which require participants to rate their responses on a 5-point scale ranging from strongly disagree to strongly agree. The two items are “The material was informative” and “I learned something from the material presented.” Higher scores indicate greater perceived informativeness (range = 0–8).
Moderators
Need for Cognition
Need for cognition was assessed at baseline using the short-form need for cognition scale [40]. The scale consists of 18 items requiring participants to rate on a 7-point Likert scale, ranging from “strongly disagree” (0) to “strongly agree” (6), the degree to which they believe the statement to be characteristic of them. Responses were summed to create a total NFC score (range = 0–108) with higher scores indicating higher NFC.
Perceived Personal Relevance
Perceived personal relevance of the message was assessed post-test using three purpose-built items, with an acceptable Cronbach’s alpha (0.79). Items were scored on a 7-point Likert scale that ranged from “strongly disagree” to “strongly agree.” The items were as follows: “The information provided was very relevant to me”; “The information provided was applicable to my situation”; “The information provided seems like it was written with someone like me in mind.” Personal relevance scores were calculated by summing the responses to the three items together (range = 0–21), with higher scores indicating the participants held higher personal relevance perceptions of the message.
Sample Size Calculation
We assumed that (i) the difference in prevalence of NFC categories (high vs. low) would not exceed 40% (i.e., ranging from 3:7 to 7:3); (ii) under an appropriate transformation, gaze duration outcome mean would be normally distributed per combination (treatment group × low vs. high NFC) with constant within-group standard deviation (SD); and (iii) the alternative hypothesis of interest consists of high (low) NFC individuals having a gaze duration of 2.5 SD (1.5 SD) when viewing matched materials and 0.5 SD (0.5 SD) spans when viewing mismatched materials. Subsequent simulations indicated that 50 individuals (25 allocated to each experimental condition) would be sufficient to provide at least 80% power to detect an interaction between need for cognition (high vs low) and treatment groups (central vs peripheral) in a linear regression (2-sided alpha = 0.05).
Statistical Method
Means (standard deviations) and frequencies (percentages) are reported for continuous and discrete participant demographics, respectively, unless otherwise specified. All analyses included only those that completed data collection (n = 50).
The analysis of both the primary endpoint, gaze duration within AOIs, and the ratio of gaze duration time within AOIs to total duration consisted of a mixed effects binomial regression model with quadratic parameterization. As gaze duration differed greatly by slide, in both analyses, non-nested random intercepts were included for both individual and slide. Fixed effects consisted of NFC, treatment allocation (central vs peripheral), age, gender (female vs male), moderate-to-vigorous physical activity, prior message exposure, and personal relevance. The interaction between NFC and treatment allocation was of primary interest, and an additional post hoc interaction between NFC and personal relevance was also explored. In these analyses, continuous fixed effects were centered and standardized by the sample mean and standard deviation. For the primary analysis, individuals were analyzed into the group to which they were allocated. There was one individual in the peripheral route condition with a total AOI duration of 229 s, almost three times the next slowest individual in that group (82 s). We believe that this was probably due to English being a second language. There was also one individual who did not receive the correct stimulus. A sensitivity analysis was explored to examine the impact of these protocol deviations, which excluded the individual with English language difficulties and ensured that individuals were analyzed by the treatment they received rather than how they were allocated.
The analysis of post-treatment assessments of secondary outcomes consisted of linear regressions adjusting for age, gender, treatment allocation, NFC, personal relevance, physical activity, and prior exposure. Models for physical activity determinants (i.e., intensions, attitudes, and perceived behavioral control) also adjusted for pre-treatment scores. Again, interactions between NFC and both treatment allocation and personal relevance were explored.
Analyses were performed in R (version 3.6.3) using package glmmTMB for the negative binomial mixed effects modeling. Significance was set at a threshold of 0.05 (two-sided).