Abstract
Comparing adherence to physical activity (PA) guidelines can be challenging due to the varying types of measurement and analysis methods used to quantify PA. Therefore, previous results of test–retest reliability, validity, and stability for self-reported (i.e., questionnaire and diary) and device-based measured (i.e., accelerometry with 10/60 s epochs) PA were replicated in 43 adults and 50 children from the SMARTFAMILY2.0 trial. Data were collected throughout two independent measurement weeks and descriptive values were reported and visualized. The relationships among and between all variables included during both measurement weeks for each quality criterion were analyzed using Spearman correlations, stratified by children and adults. This was done to illustrate the quality criteria, namely test–retest reliability, validity, and stability. Descriptive results showed the highest moderate and vigorous PA values for questionnaires and accelerometry showed the second highest results in moderate PA, while in vigorous PA the estimations by the diary were higher than those of accelerometry. As before, only accelerometry demonstrated preliminary evidence for reliable, valid, and stable results for both epoch lengths. Contrary to our previous findings, the diary showed higher correlation coefficients for the quality criteria than the questionnaire. Additionally, correlation coefficients were higher in moderate than in vigorous PA, and the patterns of significance differed partially between children and adults. The present results reinforce the findings and conclusions presented in the previous study and add information about PA questionnaire results in children. Comparing both studies, discrepancies exist in estimating vigorous PA in healthy adults by the Global and the International Physical Activity Questionnaire.
Avoid common mistakes on your manuscript.
Introduction
One of the main challenges in physical activity (PA) research remains the comparability of PA estimates derived from different measurement tools. Estimates do not only differ substantially between self-reported and device-based measured outcomes, but also within these categories (Nigg et al., 2020; Pulsford et al., 2023). Each method (i.e., accelerometry, questionnaires, or diaries) for assessing PA presents its own set of challenges and advantages. Self-reported tools allow for assessing PA in large groups with limited effort but can, for example, suffer from social desirability or recall biases (Helmerhorst, Brage, Warren, Besson, & Ekelund, 2012). Device-based measures enable researchers to track movement patterns throughout the day and address the 24 h activity cycle (Rosenberger et al., 2019), but are expensive, time consuming, and the choice of, for example, the device used, the wearing position and the processing of the data impact the derived PA estimates and are often poorly reported (Keadle, Lyden, Strath, Staudenmayer, & Freedson, 2019). A universally accepted standard for PA measurement has not yet been established, while ongoing discussions are focused on identifying best practices (Burchartz et al., 2020; Nigg et al., 2020), which has important implications for the creation and application of PA guidelines (Bull et al., 2020; Gill et al., 2023). Important parameters for measurement tools are measures of reliability, the agreement between different tools (referred to as validity in this manuscript), and the stability of differences between measurement tools over time (Patterson, 2000). Therefore, the current study aimed to replicate the results of our previous study using data from the first SMARTFAMILY trial (Fiedler, Eckert, Burchartz, Woll, & Wunsch, 2021) with data from the SMARTFAMILY2.0 trial. The original aims were to investigate the stability of the pairwise differences between three methods of measuring PA (accelerometry, diary, and questionnaire) and to assess the impact of using different epoch lengths (10 s and 60 s) for accelerometer-derived moderate and vigorous physical activity (MPA and VPA) in adults, children, and adolescents (hereafter described as children) within two independent measurement weeks. Additionally, the study aimed to evaluate the reliability and validity of the aforementioned measurement tools.
Methods
The methods for the participants and procedure are thoroughly described in the study protocol of the main study (Wunsch et al., 2020). This study refers to the participants of the control group from the second SMARTFAMILY trial. The methodology for this replication study is based on our previous study on this topic referring to the first SMARTFAMILY trial (Fiedler et al., 2021). The most important information relevant to this manuscript is provided briefly in the following paragraphs. The measurements in this study differ in two points from the previous one:
-
The Global Physical Activity Questionnaire (GPAQ) (Armstrong & Bull, 2006) was used for adults and children in the present examination based on the new PA recommendations of the World Health Organization (Bull et al., 2020), in the previous study the International Physical Activity Questionnaire (IPAQ) (Craig et al., 2003) was used for adults and the Sixty-Minute Screening Measure (Prochaska, Sallis, & Long, 2001) for children. The GPAQ has shown a moderate-to-strong positive correlations with the IPAQ in previous research (e.g., Bull, Maslin, & Armstrong, 2009).
-
Following the poor results of the PA diary in our previous study, the design and description of the diary were improved and one example of a filled-out diary was provided to each participant (for an example see https://osf.io/e8acs/).
Participants and procedure
Participants were eligible for this study if they represented a family with at least one child and one adult who were living in a common household, and were part of the control group (43 adults aged 36–58 years and 50 children aged 4–20 years). Full ethical approval was obtained for the study. All participants, children, and legal guardians provided written informed consent before commencing the study by signing the informed consent form (The International Registered Report Identifier (IRRID) for the SF study is RR1-10.2196/20534.). The trial was conducted in accordance with the Declaration of Helsinki. Families of the control group had a baseline measurement (T0), a 3-week waiting period without any intervention or measurement, and a postmeasurement (T1). Data collection at T0 and T1 involved measuring PA using accelerometers, diaries and questionnaires over the course of 1 week. The procedures were identical for both timepoints.
Measurements
Accelerometer
Hip-worn (right side) 3‑axial accelerometers (Move 3/Move 4, Movisens GmbH, Karlsruhe, Germany) were used to continuously record PA. The accelerometer has been considered accurate for assessing energy estimation (Anastasopoulou et al., 2014). Epoch lengths were chosen to represent the most commonly used epoch length (60 s), and a shorter epoch length (10 s). The outcomes for the accelerometer that were used for this study were MPA (3.0–5.9 metabolic equivalents (MET)) and VPA (> 6 MET) for all participants. Accelerometer data were included if a minimum wear time of at least 8 h per day for at least 4 of the 7 days during the measured week was obtained. For valid measurements, the average of MPA and VPA per valid day was multiplied by 7 to represent the total minutes per week.
Diary
All participants completed a daily PA diary during the two measurement weeks. The diary included information such as the date, time, type of activity, duration, and perceived intensity of each activity. Participants were instructed to rate the intensity of each activity as light, moderate, or vigorous based on factors like perspiration and shortness of breath. Only activities with a duration of more than 10 min were reported and the minutes of MPA and VPA were summarized as total minutes per week.
Questionnaire
At the end of each measurement week, participants completed the German short version of the GPAQ (Armstrong & Bull, 2006), which asked about their activities during the previous week. The questionnaire specifically focused on minutes spent in MPA (at work/school, recreational, and transport) and VPA (at work/school and recreational) and was processed according to the GPAQ protocol. This allowed for the recording of total minutes per week for both MPA and VPA.
Statistical analysis
To compare the mean differences for the four PA measures (accelerometry with 10 s and 60 s epoch lengths, diary, and questionnaire) between T0 and T1, the differences in total minutes per week for MPA and VPA were calculated for all six combinations (e.g., the difference of diary and questionnaire) at each measurement week. These differences were defined as new parameters, ranging from −590 to 399 min/week. If any of the original parameters contained missing data, the corresponding difference parameter was also considered as missing data for that participant. Test–retest reliability was calculated for each parameter between T0 and T1. Validity was calculated between all parameters at both T0 and T1. Stability was calculated for each of the new difference parameters between T0 and T1. The raincloud plots (Allan et al., 2019) were created using R (R Core Team, 2022), RStudio (Posit Team, 2023), and the ggplot2 package (Hadley Wickham, 2016). Statistical analyses were performed using the correlation package (Makowski, Ben-Shachar, Patil, & Lüdecke, 2020), and the degree of agreement was assessed using the Spearman correlation coefficient (rs). The calculations were performed separately for children and adults, and pairwise deletion was used for each calculation. The level of significance was set at p < 0.05 and was not based on the confidence intervals as the correlation package uses the Fieller et al. (Fieller & Pearson, 1957) correction leading to possible disagreements in the interpretation of significance between p-values und confidence intervals.
Results
Participant characteristics
The data of 43 adults and 50 children were used in this study. Characteristics of the participants are presented in Table 1.
Physical activity outcomes
The full descriptive results of PA measurements at T0 and T1 and corresponding reliability, validity, and stability measures (rs) are presented in the supplement Tables S1–S6. Figure 1a, b visualize the descriptive PA level estimated by each measurement tool for adults and Fig. 1c, d for children. Overall, the descriptive values show the highest MPA values for the GPAQ, followed by accelerometry with 10 s epochs and 60 s epochs, and the lowest PA values are reported for the PA diary. These results are consistent for VPA except that the diary shows higher values than the accelerometry.
Stability
The differences in the amount of PA gathered by accelerometers using 10 s, and 60 s epoch lengths, and the PA diary showed a significant association in both adults and children in MPA and VPA between T0 to T1 (0.36 ≤ r ≤ 0.58, p ≤ 0.035) with the only exception of MPA using 60 s and diary for adults. Significant associations of the differences between accelerometry and the GPAQ were only found for MPA using 10 s epochs in adults (r = 0.39, p = 0.041). The only significant association of the differences between the diary and GPAQ was found for MPA in children (r = 0.36, p = 0.027). All other comparisons yielded nonsignificant associations.
Test–retest reliability
Both MPA and VPA indicated significant associations for accelerometry (both 10 s and 60 s epochs) and the PA diary between T0 and T1 for adults and children (0.34 ≤ r ≤ 0.82, p ≤ 0.024, see supplement Tables S1 and S2). PA measured by the GPAQ showed the only significant association between T0 and T1 for VPA in adults (r = 0.52, p = 0.017).
Validity
Additional analysis of pairwise rs between all measurement methods at both T0 and T1 showed significant associations between 10 s and 60 s epochs for adults and children (0.90 ≤ r ≤ 0.98, p ≤ 0.001, see supplement Tables S3 and S4). The GPAQ showed significant associations to accelerometry (both 10 s and 60 s epochs) for MPA in adults at T0 and T1 and for VPA in children at T1 (0.36 ≤ r ≤ 0.52, p ≤ 0.048). The PA diary and accelerometry showed significant associations in MPA at T0 and T1 in adults, and T1 in children. For VPA associations between the PA diary and accelerometry were found at T1 in adults, and at T0 in children (0.37 ≤ r ≤ 0.46, p ≤ 0.028). The PA diary and the GPAQ showed significant associations at both measurement weeks for MPA and VPA except for VPA in children at T1 (0.39 ≤ r ≤ 0.63, p ≤ 0.021).
Discussion
This study aimed to replicate the results of a previous study on the reliability, validity, and stability of a PA questionnaire, a PA diary, and accelerometry using 10 and 60 s epochs for MPA and VPA in adults and children over two measurement weeks with new data. As in the previous study, descriptive PA estimates from the questionnaire yielded the highest results for MPA and VPA and accelerometry showed the second-highest results in MPA. VPA results differed from our previous work such as the PA estimations by the diary were higher than those of accelerometry. As before, only accelerometry showed preliminary evidence for reliable, valid, and stable results for both epoch lengths. Contrary to our previous findings, the role of the diary and questionnaire are reversed. The diary indicated preliminary evidence for reliable, mainly valid, and stable results compared to accelerometry in this study, while the GPAQ showed very limited significant associations in all three categories.
The present results are comparable to the previous study (Fiedler et al., 2021) for the PA estimations by accelerometry using 10 and 60 s epochs. This was to be expected, as the only difference was in the choice of epoch length. Nonetheless, up to 163 min higher MPA per week and up to 30 min higher VPA per week for 10 s epochs show the importance of considering and documenting such data processing choices as pointed out by other research (Orme et al., 2014). Results for the questionnaire and PA diary, however, differ from the previous findings. The highest values for PA were still reported by the questionnaire, but reliability, validity, and stability indices were higher in the PA diary than for the GPAQ, while the previous work indicated them to be higher in the IPAQ than the diary. The reason that the indices of the diary improved is most likely due to the fact that we provided additional information and examples on how to fill in the diary during the measurement weeks after the poor indices during the first trial. The reason for the lack of reliability, limited validity, and stability of the GPAQ is not so easy to explain as it shows a moderate-to-strong correlation to the previously used IPAQ (Bull et al., 2009). Both questionnaires aim to estimate total MPA and VPA but the GPAQ includes more domain-specific estimates. The total amount of estimated MPA was roughly the same between the previous study and this replication study. VPA, however, was estimated 3 times higher by the GPAQ in this study compared to estimates of the IPAQ in the previous study with comparable values for accelerometry. This points to possible issues in estimating VPA in healthy adults using the GPAQ.
Strengths and limitations
The main strength of this study is that it provides new insights into previous findings within a comparable study setting and extends the previous findings by the results for a refined PA diary and by providing questionnaire-based estimates for children. One limitation that was not present in the previous study but occurred during the current study is that data were collected during the ongoing COVID-19 pandemic. However, data has only been collected when schools were open to allow comparability within the data and to limit the influence of restrictions on PA patterns.
Conclusion
Considering the results of both studies, we found important differences for the quality criteria within and between the measurement tools. This reinforces the current demand for detailed reporting of the rationale behind choosing a specific tool and the data processing steps used in studies. Furthermore, the advantage of combining the results of different measurement tools, for instance, to add contextual information to accelerometry measures, should be evaluated in the future.
Availability of data and material
Data are available on the open science framework (https://osf.io/e8acs/).
Code availability
Code is available on the open science framework (https://osf.io/e8acs/).
References
Allan, M., Poggiali, D., Whitaker, K., Marshall, T., Rhys, K., & Rogier, A. (2019). Raincloud plots: A multi-platform tool for robust datavisualization. Wellcome Open Research. https://doi.org/10.12688/wellcomeopenres.15191.1.
Anastasopoulou, P., Tubic, M., Schmidt, S., Neumann, R., Woll, A., & Härtel, S. (2014). Validation and comparison of two methods to assess human energy expenditure during free-living activities. PLoS ONE, 9(2), e90606. https://doi.org/10.1371/journal.pone.0090606.
Armstrong, T., & Bull, F. (2006). Development of the world health organization global physical activity questionnaire (GPAQ). Journal of Public Health, 14(2), 66–70.
Bull, F. C., Maslin, T. S., & Armstrong, T. (2009). Global physical activity questionnaire (GPAQ): nine country reliability and validity study. Journal of Physical Activity & Health, 6(6), 790–804. https://doi.org/10.1123/jpah.6.6.790.
Bull, F. C., Al-Ansari, S. S., Biddle, S., Borodulin, K., Buman, M. P., Cardon, G., Carty, C., Chaput, J.-P., Chastin, S., Chou, R., Dempsey, P. C., DiPietro, L., Ekelund, U., Firth, J., Friedenreich, C. M., Garcia, L., Gichu, M., Jago, R., Katzmarzyk, P. T., & Willumsen, J. F. (2020). World Health Organization 2020 guidelines on physical activity and sedentary behaviour. British Journal of Sports Medicine, 54(24), 1451–1462. https://doi.org/10.1136/bjsports-2020-102955.
Burchartz, A., Anedda, B., Auerswald, T., Giurgiu, M., Hill, H., Ketelhut, S., Kolb, S., Mall, C., Manz, K., Nigg, C. R., Reichert, M., Sprengeler, O., Wunsch, K., & Matthews, C. E. (2020). Assessing physical behavior through accelerometry—State of the science, best practices and future directions. Psychology of Sport and Exercise, 49, 101703. https://doi.org/10.1016/j.psychsport.2020.101703.
Craig, C. L., Marshall, A. L., Sjöström, M., Bauman, A. E., Booth, M. L., Ainsworth, B. E., Pratt, M., Ekelund, U., Yngve, A., Sallis, J. F., & Oja, P. (2003). International physical activity questionnaire: 12-country reliability and validity. Medicine & Science in Sports & Exercise, 35(8), 1381–1395. https://doi.org/10.1249/01.MSS.0000078924.61453.FB.
Fiedler, J., Eckert, T., Burchartz, A., Woll, A., & Wunsch, K. (2021). Comparison of self-reported and device-based measured physical activity using measures of stability, reliability, and validity in adults and children. Sensors, 21(8), 2672. https://doi.org/10.3390/s21082672.
Fieller, E. C., & Pearson, E. S. (1957). Tests for rank correlation coefficients. I. Biometrica, 44(3), 470–481.
Gill, J. M., Chico, T. J., Doherty, A., Dunn, J., Ekelund, U., Katzmarzyk, P. T., Milton, K., Murphy, M. H., & Stamatakis, E. (2023). Potential impact of wearables on physical activity guidelines and interventions: opportunities and challenges. British Journal of Sports Medicine, 57(19), 1223–1225. https://doi.org/10.1136/bjsports-2023-106822.
Helmerhorst, H. H. J., Brage, S., Warren, J., Besson, H., & Ekelund, U. (2012). A systematic review of reliability and objective criterion-related validity of physical activity questionnaires. International Journal of Behavioral Nutrition and Physical Activity, 9(1), 103. https://doi.org/10.1186/1479-5868-9-103.
Keadle, S. K., Lyden, K. A., Strath, S. J., Staudenmayer, J. W., & Freedson, P. S. (2019). A framework to evaluate devices that assess physical behavior. Exercise and Sport Sciences Reviews, 47(4), 206–214. https://doi.org/10.1249/JES.0000000000000206.
Makowski, D., Ben-Shachar, M., Patil, I., & Lüdecke, D. (2020). Methods and algorithms for correlation analysis in R. Journal of Open Source Software, 5(51), 2306. https://doi.org/10.21105/joss.02306.
Nigg, C. R., Fuchs, R., Gerber, M., Jekauc, D., Koch, T., Krell-Roesch, J., Lippke, S., Mnich, C., Novak, B., Ju, Q., Sattler, M. C., Schmidt, S. C. E., Van Poppel, M., Reimers, A. K., Wagner, P., Woods, C., & Woll, A. (2020). Assessing physical activity through questionnaires—A consensus of best practices and future directions. Psychology of Sport and Exercise, 50, 101715. https://doi.org/10.1016/j.psychsport.2020.101715.
Orme, M., Wijndaele, K., Sharp, S. J., Westgate, K., Ekelund, U., & Brage, S. (2014). Combined influence of epoch length, cut-point and bout duration on accelerometry-derived physical activity. The International Journal of Behavioral Nutrition and Physical Activity, 11(1), 34. https://doi.org/10.1186/1479-5868-11-34.
Patterson, P. (2000). Reliability, validity, and methodological response to the assessment of physical activity via self-report. Research Quarterly for Exercise and Sport, 71(sup2), 15–20. https://doi.org/10.1080/02701367.2000.11082781.
Posit Team (2023). Rstudio: integrated development environment for R. Computer software. Posit Software, PBC. http://www.posit.co/
Prochaska, J. J., Sallis, J. F., & Long, B. (2001). A physical activity screening measure for use with adolescents in primary care. Archives of Pediatrics & Adolescent Medicine, 155(5), 554–559.
Pulsford, R. M., Brocklebank, L., Fenton, S. A. M., Bakker, E., Mielke, G. I., Tsai, L.-T., Atkin, A. J., Harvey, D. L., Blodgett, J. M., Ahmadi, M., Wei, L., Rowlands, A., Doherty, A., Rangul, V., Koster, A., Sherar, L. B., Holtermann, A., Hamer, M., & Stamatakis, E. (2023). The impact of selected methodological factors on data collection outcomes in observational studies of device-measured physical behaviour in adults: A systematic review. International Journal of Behavioral Nutrition and Physical Activity, 20(1), 26. https://doi.org/10.1186/s12966-022-01388-9.
R Core Team (2022). R: a language and environment for statistical computing. Computer software. R Foundation for Statistical Computing. https://www.R-project.org/
Rosenberger, M. E., Fulton, J. E., Buman, M. P., Troiano, R. P., Grandner, M. A., Buchner, D. M., & Haskell, W. L. (2019). The 24-hour activity cycle: a new paradigm for physical activity. Medicine & Science in Sports & Exercise, 51(3), 454–464. https://doi.org/10.1249/MSS.0000000000001811.
Wickham, H. (2016). ggplot2: elegant graphics for data analysis. Computer software. New York: Springer. https://ggplot2.tidyverse.org
Wunsch, K., Eckert, T., Fiedler, J., Cleven, L., Niermann, C., Reiterer, H., Renner, B., & Woll, A. (2020). Effects of a collective family-based mobile health intervention called “SMARTFAMILY” on promoting physical activity and healthy eating: protocol for a randomized controlled trial. JMIR Research Protocols, 9(11), e20534. https://doi.org/10.2196/20534.
Funding
This research was supported by the Federal Ministry of Education and Research within the project SMARTACT. BMBF Grant: FKZ 01EL1820C.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
Conceptualization, Janis Fiedler; Data curation, Janis Fiedler; Formal analysis, Janis Fiedler, Funding acquisition, Alexander Woll; Investigation, Janis Fiedler and Kathrin Wunsch; Methodology, Janis Fiedler; Project administration, Alexander Woll and Kathrin Wunsch; Supervision, Alexander Woll and Kathrin Wunsch; Validation, Janis Fiedler Writing–original draft, Janis Fiedler; Writing–review & editing, Alexander Woll and Kathrin Wunsch
Corresponding author
Ethics declarations
Conflict of interest
J. Fiedler, A. Woll and K. Wunsch declare that they have no competing interests. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Ethics approval: The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board (or Ethics Committee) of the Karlsruhe Institute of Technology (28.11.2019). Consent to participate: All participants, children, and legal guardians provided written informed consent prior to commencing the study by signing the informed consent form.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Trial Registration
German Clinical Trials Register DRKS00010415; Date of Registration: 15 July 2016
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fiedler, J., Woll, A. & Wunsch, K. Comparison of self-reported and device-based measured physical activity—a replication study. Ger J Exerc Sport Res (2024). https://doi.org/10.1007/s12662-024-00979-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12662-024-00979-x