Introduction

One of the main challenges in physical activity (PA) research remains the comparability of PA estimates derived from different measurement tools. Estimates do not only differ substantially between self-reported and device-based measured outcomes, but also within these categories (Nigg et al., 2020; Pulsford et al., 2023). Each method (i.e., accelerometry, questionnaires, or diaries) for assessing PA presents its own set of challenges and advantages. Self-reported tools allow for assessing PA in large groups with limited effort but can, for example, suffer from social desirability or recall biases (Helmerhorst, Brage, Warren, Besson, & Ekelund, 2012). Device-based measures enable researchers to track movement patterns throughout the day and address the 24 h activity cycle (Rosenberger et al., 2019), but are expensive, time consuming, and the choice of, for example, the device used, the wearing position and the processing of the data impact the derived PA estimates and are often poorly reported (Keadle, Lyden, Strath, Staudenmayer, & Freedson, 2019). A universally accepted standard for PA measurement has not yet been established, while ongoing discussions are focused on identifying best practices (Burchartz et al., 2020; Nigg et al., 2020), which has important implications for the creation and application of PA guidelines (Bull et al., 2020; Gill et al., 2023). Important parameters for measurement tools are measures of reliability, the agreement between different tools (referred to as validity in this manuscript), and the stability of differences between measurement tools over time (Patterson, 2000). Therefore, the current study aimed to replicate the results of our previous study using data from the first SMARTFAMILY trial (Fiedler, Eckert, Burchartz, Woll, & Wunsch, 2021) with data from the SMARTFAMILY2.0 trial. The original aims were to investigate the stability of the pairwise differences between three methods of measuring PA (accelerometry, diary, and questionnaire) and to assess the impact of using different epoch lengths (10 s and 60 s) for accelerometer-derived moderate and vigorous physical activity (MPA and VPA) in adults, children, and adolescents (hereafter described as children) within two independent measurement weeks. Additionally, the study aimed to evaluate the reliability and validity of the aforementioned measurement tools.

Methods

The methods for the participants and procedure are thoroughly described in the study protocol of the main study (Wunsch et al., 2020). This study refers to the participants of the control group from the second SMARTFAMILY trial. The methodology for this replication study is based on our previous study on this topic referring to the first SMARTFAMILY trial (Fiedler et al., 2021). The most important information relevant to this manuscript is provided briefly in the following paragraphs. The measurements in this study differ in two points from the previous one:

  • The Global Physical Activity Questionnaire (GPAQ) (Armstrong & Bull, 2006) was used for adults and children in the present examination based on the new PA recommendations of the World Health Organization (Bull et al., 2020), in the previous study the International Physical Activity Questionnaire (IPAQ) (Craig et al., 2003) was used for adults and the Sixty-Minute Screening Measure (Prochaska, Sallis, & Long, 2001) for children. The GPAQ has shown a moderate-to-strong positive correlations with the IPAQ in previous research (e.g., Bull, Maslin, & Armstrong, 2009).

  • Following the poor results of the PA diary in our previous study, the design and description of the diary were improved and one example of a filled-out diary was provided to each participant (for an example see https://osf.io/e8acs/).

Participants and procedure

Participants were eligible for this study if they represented a family with at least one child and one adult who were living in a common household, and were part of the control group (43 adults aged 36–58 years and 50 children aged 4–20 years). Full ethical approval was obtained for the study. All participants, children, and legal guardians provided written informed consent before commencing the study by signing the informed consent form (The International Registered Report Identifier (IRRID) for the SF study is RR1-10.2196/20534.). The trial was conducted in accordance with the Declaration of Helsinki. Families of the control group had a baseline measurement (T0), a 3-week waiting period without any intervention or measurement, and a postmeasurement (T1). Data collection at T0 and T1 involved measuring PA using accelerometers, diaries and questionnaires over the course of 1 week. The procedures were identical for both timepoints.

Measurements

Accelerometer

Hip-worn (right side) 3‑axial accelerometers (Move 3/Move 4, Movisens GmbH, Karlsruhe, Germany) were used to continuously record PA. The accelerometer has been considered accurate for assessing energy estimation (Anastasopoulou et al., 2014). Epoch lengths were chosen to represent the most commonly used epoch length (60 s), and a shorter epoch length (10 s). The outcomes for the accelerometer that were used for this study were MPA (3.0–5.9 metabolic equivalents (MET)) and VPA (> 6 MET) for all participants. Accelerometer data were included if a minimum wear time of at least 8 h per day for at least 4 of the 7 days during the measured week was obtained. For valid measurements, the average of MPA and VPA per valid day was multiplied by 7 to represent the total minutes per week.

Diary

All participants completed a daily PA diary during the two measurement weeks. The diary included information such as the date, time, type of activity, duration, and perceived intensity of each activity. Participants were instructed to rate the intensity of each activity as light, moderate, or vigorous based on factors like perspiration and shortness of breath. Only activities with a duration of more than 10 min were reported and the minutes of MPA and VPA were summarized as total minutes per week.

Questionnaire

At the end of each measurement week, participants completed the German short version of the GPAQ (Armstrong & Bull, 2006), which asked about their activities during the previous week. The questionnaire specifically focused on minutes spent in MPA (at work/school, recreational, and transport) and VPA (at work/school and recreational) and was processed according to the GPAQ protocol. This allowed for the recording of total minutes per week for both MPA and VPA.

Statistical analysis

To compare the mean differences for the four PA measures (accelerometry with 10 s and 60 s epoch lengths, diary, and questionnaire) between T0 and T1, the differences in total minutes per week for MPA and VPA were calculated for all six combinations (e.g., the difference of diary and questionnaire) at each measurement week. These differences were defined as new parameters, ranging from −590 to 399 min/week. If any of the original parameters contained missing data, the corresponding difference parameter was also considered as missing data for that participant. Test–retest reliability was calculated for each parameter between T0 and T1. Validity was calculated between all parameters at both T0 and T1. Stability was calculated for each of the new difference parameters between T0 and T1. The raincloud plots (Allan et al., 2019) were created using R (R Core Team, 2022), RStudio (Posit Team, 2023), and the ggplot2 package (Hadley Wickham, 2016). Statistical analyses were performed using the correlation package (Makowski, Ben-Shachar, Patil, & Lüdecke, 2020), and the degree of agreement was assessed using the Spearman correlation coefficient (rs). The calculations were performed separately for children and adults, and pairwise deletion was used for each calculation. The level of significance was set at p < 0.05 and was not based on the confidence intervals as the correlation package uses the Fieller et al. (Fieller & Pearson, 1957) correction leading to possible disagreements in the interpretation of significance between p-values und confidence intervals.

Results

Participant characteristics

The data of 43 adults and 50 children were used in this study. Characteristics of the participants are presented in Table 1.

Table 1 Characteristics of the participants. Displayed are the number of participants (N), means, and standard deviations (SD) for the parameters gender (male/female), age in years, height, weight (kg), and body mass index (BMI)

Physical activity outcomes

The full descriptive results of PA measurements at T0 and T1 and corresponding reliability, validity, and stability measures (rs) are presented in the supplement Tables S1–S6. Figure 1a, b visualize the descriptive PA level estimated by each measurement tool for adults and Fig. 1c, d for children. Overall, the descriptive values show the highest MPA values for the GPAQ, followed by accelerometry with 10 s epochs and 60 s epochs, and the lowest PA values are reported for the PA diary. These results are consistent for VPA except that the diary shows higher values than the accelerometry.

Fig. 1
figure 1

Descriptive means of moderate physical activity (MPA) in adults (a) and children (b) as well as vigorous physical activity (VPA) in adults (c) and children (d). Displayed are the results (independent measurements, distribution, and box plots) of the physical activity diary (Diary), accelerometry with 60 s epochs (Acc 60), accelerometry with 10 s epochs (Acc 10), and the Global Physical Activity Questionnaire (GPAQ) for two independent measurement weeks (T0 and T1) in minutes per week

Stability

The differences in the amount of PA gathered by accelerometers using 10 s, and 60 s epoch lengths, and the PA diary showed a significant association in both adults and children in MPA and VPA between T0 to T1 (0.36 ≤ r ≤ 0.58, p ≤ 0.035) with the only exception of MPA using 60 s and diary for adults. Significant associations of the differences between accelerometry and the GPAQ were only found for MPA using 10 s epochs in adults (r = 0.39, p = 0.041). The only significant association of the differences between the diary and GPAQ was found for MPA in children (r = 0.36, p = 0.027). All other comparisons yielded nonsignificant associations.

Test–retest reliability

Both MPA and VPA indicated significant associations for accelerometry (both 10 s and 60 s epochs) and the PA diary between T0 and T1 for adults and children (0.34 ≤ r ≤ 0.82, p ≤ 0.024, see supplement Tables S1 and S2). PA measured by the GPAQ showed the only significant association between T0 and T1 for VPA in adults (r = 0.52, p = 0.017).

Validity

Additional analysis of pairwise rs between all measurement methods at both T0 and T1 showed significant associations between 10 s and 60 s epochs for adults and children (0.90 ≤ r ≤ 0.98, p ≤ 0.001, see supplement Tables S3 and S4). The GPAQ showed significant associations to accelerometry (both 10 s and 60 s epochs) for MPA in adults at T0 and T1 and for VPA in children at T1 (0.36 ≤ r ≤ 0.52, p ≤ 0.048). The PA diary and accelerometry showed significant associations in MPA at T0 and T1 in adults, and T1 in children. For VPA associations between the PA diary and accelerometry were found at T1 in adults, and at T0 in children (0.37 ≤ r ≤ 0.46, p ≤ 0.028). The PA diary and the GPAQ showed significant associations at both measurement weeks for MPA and VPA except for VPA in children at T1 (0.39 ≤ r ≤ 0.63, p ≤ 0.021).

Discussion

This study aimed to replicate the results of a previous study on the reliability, validity, and stability of a PA questionnaire, a PA diary, and accelerometry using 10 and 60 s epochs for MPA and VPA in adults and children over two measurement weeks with new data. As in the previous study, descriptive PA estimates from the questionnaire yielded the highest results for MPA and VPA and accelerometry showed the second-highest results in MPA. VPA results differed from our previous work such as the PA estimations by the diary were higher than those of accelerometry. As before, only accelerometry showed preliminary evidence for reliable, valid, and stable results for both epoch lengths. Contrary to our previous findings, the role of the diary and questionnaire are reversed. The diary indicated preliminary evidence for reliable, mainly valid, and stable results compared to accelerometry in this study, while the GPAQ showed very limited significant associations in all three categories.

The present results are comparable to the previous study (Fiedler et al., 2021) for the PA estimations by accelerometry using 10 and 60 s epochs. This was to be expected, as the only difference was in the choice of epoch length. Nonetheless, up to 163 min higher MPA per week and up to 30 min higher VPA per week for 10 s epochs show the importance of considering and documenting such data processing choices as pointed out by other research (Orme et al., 2014). Results for the questionnaire and PA diary, however, differ from the previous findings. The highest values for PA were still reported by the questionnaire, but reliability, validity, and stability indices were higher in the PA diary than for the GPAQ, while the previous work indicated them to be higher in the IPAQ than the diary. The reason that the indices of the diary improved is most likely due to the fact that we provided additional information and examples on how to fill in the diary during the measurement weeks after the poor indices during the first trial. The reason for the lack of reliability, limited validity, and stability of the GPAQ is not so easy to explain as it shows a moderate-to-strong correlation to the previously used IPAQ (Bull et al., 2009). Both questionnaires aim to estimate total MPA and VPA but the GPAQ includes more domain-specific estimates. The total amount of estimated MPA was roughly the same between the previous study and this replication study. VPA, however, was estimated 3 times higher by the GPAQ in this study compared to estimates of the IPAQ in the previous study with comparable values for accelerometry. This points to possible issues in estimating VPA in healthy adults using the GPAQ.

Strengths and limitations

The main strength of this study is that it provides new insights into previous findings within a comparable study setting and extends the previous findings by the results for a refined PA diary and by providing questionnaire-based estimates for children. One limitation that was not present in the previous study but occurred during the current study is that data were collected during the ongoing COVID-19 pandemic. However, data has only been collected when schools were open to allow comparability within the data and to limit the influence of restrictions on PA patterns.

Conclusion

Considering the results of both studies, we found important differences for the quality criteria within and between the measurement tools. This reinforces the current demand for detailed reporting of the rationale behind choosing a specific tool and the data processing steps used in studies. Furthermore, the advantage of combining the results of different measurement tools, for instance, to add contextual information to accelerometry measures, should be evaluated in the future.