Physical Activity Questionnaires for Pregnancy: A Systematic Review of Measurement Properties

Sattler, Matteo C.; Jaunig, Johannes; Watson, Estelle D.; van Poppel, Mireille N. M.; Mokkink, Lidwine B.; Terwee, Caroline B.; Dietz, Pavel

doi:10.1007/s40279-018-0961-x

Physical Activity Questionnaires for Pregnancy: A Systematic Review of Measurement Properties

Systematic Review
Open access
Published: 09 August 2018

Volume 48, pages 2317–2346, (2018)
Cite this article

Download PDF

You have full access to this open access article

Sports Medicine Aims and scope Submit manuscript

Physical Activity Questionnaires for Pregnancy: A Systematic Review of Measurement Properties

Download PDF

Matteo C. Sattler ORCID: orcid.org/0000-0002-0689-1938¹,
Johannes Jaunig¹,
Estelle D. Watson^2,3,
Mireille N. M. van Poppel^1,4,
Lidwine B. Mokkink⁵,
Caroline B. Terwee⁵ &
…
Pavel Dietz^1,6

11k Accesses
44 Citations
2 Altmetric
Explore all metrics

Abstract

Background

In order to assess physical activity (PA) during pregnancy, it is important to choose the instrument with the best measurement properties.

Objectives

To systematically summarize, appraise, and compare the measurement properties of all self-administered questionnaires assessing PA in pregnancy.

Methods

We searched PubMed, Embase, and SPORTDiscus with the following inclusion criteria: (i) the study reported at least one measurement property (reliability, criterion validity, construct validity, responsiveness) of a self-administered questionnaire; (ii) the questionnaire intended to measure PA; (iii) the questionnaire was evaluated in healthy pregnant women; and (iv) the study was published in English. We evaluated results, quality of individual studies, and quality of evidence using a standardized checklist (Quality Assessment of Physical Activity Questionnaires [QAPAQ]) and the GRADE (Grading of Recommendation, Assessment, Development, and Evaluation) approach.

Results

Seventeen articles, reporting 18 studies of 11 different PA questionnaires (17 versions), were included. Most questionnaire versions showed insufficient measurement properties. Only the French and Turkish versions of the Pregnancy Physical Activity Questionnaire (PPAQ) showed both sufficient reliability and construct validity. However, all versions of the PPAQ pooled together showed insufficient construct validity. The quality of individual studies was usually high for reliability but varied considerably for construct validity. Overall, the quality of evidence was very low to moderate.

Conclusions

We recommend the PPAQ to assess PA in pregnancy, although the pooled results revealed insufficient construct validity. The lack of appropriate standards in data collection and processing criteria for objective devices in measuring PA during pregnancy attenuates the quality of evidence. Therefore, research on the validity of comparison instruments in pregnancy followed by consensus on validation reference criteria and standards of PA measurement is needed.

Effectiveness of interventions to increase device-measured physical activity in pregnant women: systematic review and meta-analysis of randomised controlled trials

Article Open access 01 December 2022

Self-report Pregnancy Physical Activity Questionnaire overestimates physical activity

Article 01 July 2015

Reliability and concurrent validity of the International Physical Activity Questionnaire short form among pregnant women

Article Open access 14 March 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

FormalPara Key Points

There was high-quality evidence that the Pregnancy Physical Activity Questionnaire (PPAQ) has sufficient reliability in assessing total physical activity (PA) and vigorous PA (VPA) in pregnancy. However, the questionnaire revealed insufficient construct validity in assessing these scores, but the evidence for this was of low-to-moderate quality.
The Australian Women’s Activity Study (AWAS), Leisure-Time Exercise Questionnaire (LTEQ), Leisure-Time Physical Activity Questionnaire (LTPAQ), and Recent Physical Activity Questionnaire (RPAQ) showed both insufficient reliability and construct validity when assessing either total PA, moderate-to-vigorous PA, or VPA in pregnancy. This assessment was based on very low-to-moderate quality evidence.
Most importantly, we need more high-quality evidence regarding the validity of objective measures of PA in pregnancy, such as accelerometers, and standards in data collection and processing criteria of these devices. Only then will we be able to guarantee adequate and comparable estimations of the validity of a PA questionnaire in pregnancy.

1 Introduction

Physical activity (PA) plays a pivotal role in the improvement and maintenance of physical and mental health [1]. In pregnancy, regular PA can have various health benefits for mother and fetus, such as reduced symptoms of depression [2] and lower risks for excessive gestational weight gain [3], gestational diabetes mellitus [4], lower birth weight [5], preterm birth [3], and pre-eclampsia [6]. There is even evidence that PA during pregnancy may improve cardiac and neurobehavioral maturation of the offspring [7], which is in harmony with the premise of fetal programming [8]. Therefore, the American College of Obstetricians and Gynecologists [9] recommends that pregnant women, in the absence of medical or obstetric complications, participate in moderate-intensity activities for at least 20–30 min per day on most or all days of the week.

Research on PA in pregnancy has grown steadily over the last years. To provide solid evidence-based recommendations, and to determine the health benefits of PA, effectiveness of PA interventions, dose-response relationships of PA, and health outcomes, as well as to assess global trends of PA over time, adequate measurement of PA in pregnancy is essential. In particular, a measurement instrument should provide reliable and valid estimates of PA in this target population.

Questionnaires are a commonly used, inexpensive, and acceptable method to determine PA levels. Because of different study purposes, populations, settings, or unsatisfactory pre-existing questionnaires, many PA questionnaires have been developed, which introduces complexity when choosing the right questionnaire for one’s study purpose. Moreover, using different questionnaires hinders the comparability of PA levels across studies and countries, especially if the questionnaires vary in their measurement quality. Therefore, an overview of measurement properties of PA questionnaires for use in pregnancy is helpful to select the best qualified questionnaire. A critical appraisal of the methodological quality of these validation studies and the overall evidence is essential for drawing unbiased conclusions about measurement properties.

Although the measurement properties of PA questionnaires have been systematically reviewed for non-pregnant populations [10,11,12], there is still a lack of knowledge addressing this issue in pregnancy. The purpose of this systematic review was to critically appraise, compare, and summarize the measurement properties (reliability, criterion validity, construct validity, responsiveness) of all available self-administered questionnaires measuring PA in pregnancy, taking the methodological quality of these studies as well as the quality of evidence into account.

2 Methods

2.1 Literature Search

We performed a systematic literature search using a priori defined eligibility criteria in the databases PubMed, Embase using the filter Embase only, and SPORTDiscus. The search strategy included (variations of) the terms ‘physical activity’, ‘measurement properties’ [13], ‘questionnaire’ and ‘pregnancy’ (see Electronic Supplementary Material Appendix S1 for the full search strategy). Publication types such as interviews, case reports, or biographies were excluded. This search strategy was adapted for Embase and SPORTDiscus following their individual search guidelines. Additional studies were identified by searching references of the retrieved articles. The search was performed on the 17 July 2017.

2.2 Eligibility Criteria

The eligibility criteria were based on the previous series of reviews on PA questionnaires [10,11,12], and adapted to our target population. The following inclusion criteria were used:

(i)
The aim of the study was to evaluate one or more of the following measurement properties of a self-administered questionnaire: reliability, criterion validity, construct validity, or responsiveness.
(ii)
The aim of the questionnaire was to measure PA, which was defined as any bodily movement produced by skeletal muscles that resulted in energy expenditure (EE) above resting level [14].
(iii)
The study was performed in healthy pregnant women, irrespective of the population for which the questionnaire was originally developed (e.g., pregnant women, general population, adolescents).
(iv)
The article had to be published in English.

Since different modes of data collection likely cause heterogeneity in effect estimates and data quality [15], the aim of this review was to provide evidence-based recommendations only for self-administered PA questionnaires. Consequently, we excluded PA interviews (face-to-face, telephone), diaries, interview-administered questionnaires, questionnaires measuring physical functioning, and questionnaires (questions) asking about sweating. All studies performed in patients (e.g., pregnant women with gestational diabetes) were excluded. There were no limitations concerning the mean age or body mass index of the study populations.

Finally, measurement properties regarding the internal structure (structural validity, internal consistency, cross-cultural validity/measurement invariance), development, and content validity of the PA questionnaires were not assessed in this review. The evaluation of internal structure (e.g., using Cronbach’s alpha) is relevant for constructs consisting of reflective indicators [16]. These indicators are manifestations of the construct and, thus, should be highly correlated with each other. In contrast, PA is represented by causal or composite indicators, which can independently contribute to PA. The evaluation of content validity would require the inclusion of studies of the development and translations of the questionnaire as well as studies focusing on content validity and expert opinions. Therefore, a single but comprehensive evaluation of content validity of (all available) PA questionnaires should be performed in a future review.

2.3 Selection of Articles

Two researchers independently performed abstract selection, selection of full-text articles, data extraction, and quality assessment. Disagreements were discussed and resolved. Full-text articles were retrieved if the abstracts fulfilled the inclusion criteria or if the abstract did not contain measurement properties, but these were likely to be presented in the full-text article.

2.4 Data Extraction

We used a standardized extraction form, based on the QAPAQ (Quality Assessment of Physical Activity Questionnaire) checklist [17], to obtain the required information to evaluate the methodological quality and results of each individual study. The QAPAQ checklist was developed for PA questionnaires and is based on the COSMIN (COnsensus based Standards for the selection of health Measurement INstruments) checklist for assessing the methodological quality of studies of measurement properties of patient-reported outcome measures (PROMs) [18] and a list of criteria for sufficient measurement properties [19].

To provide a description of the PA questionnaire, the following information was collected: (i) target population of the questionnaire; (ii) dimension(s) of PA (e.g., habitual, EE); (iii) setting (e.g., household, sports); (iv) recall period; (v) number of questions; (vi) parameters of PA (e.g., frequency, duration, intensity); (vii) number and type of scores which can be calculated (e.g., total EE, minutes of activity per day). To assess the methodological quality and results of each individual study, we extracted information regarding study population, sample size, time intervals, data analysis, and results of the measurement properties.

2.5 Assessment of Measurement Properties

2.5.1 Content Validity

Content validity is the degree to which the questionnaire encompasses all relevant aspects and dimensions of the intended construct. Since there is no statistical criterion (e.g., numerical value) for content validity, we evaluated content validity for all included questionnaires using the extracted qualitative attributes. Based on previous systematic reviews [11], the following two criteria were assessed: (i) if the questionnaire aims to measure total PA, it should incorporate activities in all settings (home, recreation, sports, transport, work); (ii) the questionnaire should measure at least frequency and duration of PA together with a recall period of at least 1 week.

2.5.2 Reliability

Reliability is the extent to which the scores for participants, who did not change, are the same for repeated measurements under several conditions (free from measurement error) [20]. We considered parameters of reliability (Pearson/Spearman correlation, intraclass correlation coefficient [ICC], kappa, concordance) and measurement error (standard error of measurement [SEM], change in the mean or mean difference [\(\bar{d}\); systematic error], limits of agreement [LOA; random error], smallest detectable change [SDC], coefficient of variation [CV]) for the assessment of reliability [17].

To ensure that a measurement detects clinically important changes accurately (beyond measurement error), a definition of minimal important change (MIC) of PA is required. Currently, there is no consensus about MIC of PA in pregnancy but a change in the frequency of twice per week or a change in moderate PA or moderate-to-vigorous PA (MVPA) of 30 min (≥ 90 MET [metabolic equivalent of tasks] min) per week can be seen as important for both the individual and the clinician. According to this definition, the PA questionnaire should be able to reliably measure changes of ± 20% of currently recommended PA guidelines (i.e., 150 min of MVPA). Only when the LOA or SDC are smaller than the MIC can one be confident that changes as large as the MIC reflect true changes (e.g., statistically significant) in individual people that cannot be attributed to measurement error. Consequently, measurement error was rated using MIC_frequency = 2 and MIC_{duration/intensity} = 30 min (90 MET min) per week. It is important to note that these considerations about MIC were made irrespective of individual differences such as fitness, physical capacity, and body composition. Furthermore, for a CV (i.e, standard deviation in relation to the mean), a maximum value of 15% was considered acceptable, which indicates that every observed PA score could vary on average ± 15% of the mean score (or 95% of the observed PA scores were between ± 1.96 × 15% of the mean). Finally, we considered ICC, kappa, and concordance coefficients of ≥ 0.70 or Pearson/Spearman correlation coefficients of ≥ 0.80 as sufficient [17].

Based on QAPAQ [17], each result received either a positive (sufficient), negative (insufficient), or indeterminate rating. The result was sufficient (+) if ICC/kappa/concordance was ≥ 0.70 or Pearson/Spearman ≥ 0.80 or MIC > LOA/SDC or CV ≤ 15%, and otherwise insufficient (–). If no such coefficient was reported, the rating of the result was indeterminate (?).

2.5.3 Construct and Criterion Validity

Construct validity is the degree of agreement between the questionnaire and comparable measures of PA, whereas criterion validity is the degree of agreement between the questionnaire and the gold standard of measuring PA. Although doubly-labeled water (DLW) and the respiratory chamber can be considered as the gold standard for measuring EE, there is no gold standard for the assessment of PA. Consequently, all comparisons to other instruments were considered as evidence for construct validity in our review.

Based on QAPAQ [17] and the series of previous systematic reviews [10,11,12], a priori defined correlations were considered as sufficient (Table 1). The result was sufficient (+) if the correlation was equal to or above the defined cut points, and otherwise insufficient (–). If no correlation coefficient or comparable measure was reported, the rating of the result was indeterminate (?).

Table 1 Cut points for sufficient correlations per dimension of PA measured by the questionnaire and level of quality

Full size table

2.5.4 Responsiveness

Responsiveness can be considered as an aspect of validity and is the degree to which an instrument detects changes over time in the construct [21, 22]. In this case, it is the ability of the questionnaire to detect changes in PA in a longitudinal setting (validity of change score rather than single score). We applied the same approach as for construct validity to rate responsiveness, except that the change in scores of the questionnaire was compared with the change in scores of other instruments such as accelerometers.

2.6 Quality of Individual Studies

Evaluation of the methodological quality of the included studies was based on the QAPAQ checklist [17], the series of previous reviews [10,11,12], as well as the recently updated COSMIN checklist [23]. For the assessment of the quality of all individual studies, we assigned one of three different levels of quality (1: very good, 2: adequate, 3: doubtful) for each outcome (PA score) and measurement property. If an individual study had any substantial flaws in the design or analysis, the quality was inadequate (level 4).

To evaluate the methodological quality of studies of reliability and measurement error, we considered ICC, kappa, and concordance as adequate measures of reliability, and LOA, SDC, and CV as adequate measures of measurement error. We considered Pearson and Spearman correlation coefficients as less adequate since they neglect systematic errors between measurements [24]. However, Pearson and Spearman correlations are widely used in validation studies and, thus, were not omitted from our review. To ensure that the measured construct did not change over time, an adequate time interval between test and retest should be defined. For pregnancy, we considered a time interval from 2 days to 2 weeks as adequate to ensure that PA did not change over time (e.g., between the second and third trimesters) [2]. If there have been no substantial flaws in the design or analysis (level 4), we assigned one of the following levels of quality for each PA score reported in an individual study for the assessment of reliability and measurement error:

Level 1: an adequate time interval between test and retest (2 days–2 weeks) and reporting of ICC, LOA, SDC, SEM, CV, kappa, or concordance.
Level 2: an inadequate time interval between test and retest (> 2 weeks) and reporting of ICC, LOA, SDC, SEM, CV, kappa, or concordance; or an adequate time interval between test and retest (2 days–2 weeks) and reporting of Pearson/Spearman correlation.
Level 3: an inadequate time interval between test and retest (> 2 weeks) and reporting of Pearson/Spearman correlation.

To evaluate the methodological quality of studies of construct validity and responsiveness, it is important to formulate a priori hypotheses about the expected direction and magnitude of the results, which guarantees unbiased conclusions. Since this criterion was rarely met previously [10,11,12] and a study may still provide unbiased coefficients without these hypotheses, we did not rate the quality of these studies as inadequate but stated how many studies formulated such an a priori hypothesis. We further applied our own criteria in order to compare all results with the same set of hypotheses. Depending on the type of comparison, we assigned three different levels of quality for the assessment of construct validity and responsiveness (Table 1). Higher levels of quality (level 1 or 2) were provided if the questionnaire was evaluated against objective measures of PA (e.g., accelerometer) depending on the use of the objective data. More specifically, a higher level of quality was given the more similar the constructs were. For example, the comparison of moderate PA from the questionnaire with moderate PA from the accelerometer is currently the optimal approach (level 1), whereas a comparison with total counts (including, light, moderate, and vigorous PA [VPA]) is less optimal (level 2). We assigned level 3 of quality when the questionnaire was compared with measures less similar to the construct, such as pedometers, questionnaires, diaries, and interviews, or if different intensity levels were compared against each other (e.g., light PA estimated from the questionnaire compared with MVPA estimated from the accelerometer).

2.7 Quality of Evidence

We evaluated the quality of the body of evidence using the state of-the-art GRADE (Grading of Recommendation, Assessment, Development, and Evaluation) approach [25]. Since this assessment should be outcome-specific, we evaluated the quality of evidence for each questionnaire version (including different language versions) and measurement property (reliability, measurement error, construct validity, responsiveness) for three outcomes (total PA, MVPA, and VPA) separately. In addition, we pooled the evidence from individual studies when there was more than one study of the same questionnaire available. In particular, we applied a modified GRADE approach to grade the body of evidence [26]. For each outcome (PA score), the quality of evidence could be high, moderate, low, or very low depending on the assessment of four factors (risk of bias [methodological quality of the individual study], imprecision, inconsistency, indirectness). At the beginning, the quality of evidence for each outcome was high, but could be downgraded if there were any serious shortcomings in these factors. Currently, there are no guidelines for upgrading due to very good measurement properties.

Regarding risk of bias, high-quality evidence (no downgrading) was available when most individual studies had very good quality (level 1). When most individual studies were of doubtful quality (level 3) or only one study of adequate (level 2) or very good quality was available, we downgraded the quality of evidence by one level (e.g., from moderate to low). When only one individual study of doubtful quality or multiple studies of inadequate quality (level 4) were available, we downgraded by two levels. Moreover, we downgraded by three levels if there was only one individual study of inadequate quality available. To evaluate imprecision, we determined the optimal information size (OIS) to ensure a sufficient precision in the estimation of adequate effect sizes. Assuming that ICC = 0.7, a sample size of n ≥ 45 would be required to obtain a 95% confidence interval (CI) with a maximum width of 0.30 (i.e., ± 0.15; calculated using STATA 12.1, Statacorp, College Station, TX, USA) [27]. Likewise, assuming r = 0.40, a sample size of n ≥ 123 would be required to obtain a 95% CI with the same width [28]. Serious imprecision was present if the total sample size did not meet these criteria (i.e., 45 for reliability and 123 for construct validity and responsiveness), and we downgraded the quality of evidence by one level. We downgraded the quality of evidence by two levels (very serious imprecision) when the total sample size was n < 12 for reliability or n < 32 for construct validity and responsiveness (95% CI width of ± 0.30). Because publication bias is difficult to assess in studies of measurement properties (e.g., lack of registries), we did not downgrade due to this factor. Finally, we downgraded by one or two levels in the presence of unexplained inconsistency (differences in results [i.e., sufficient, insufficient]) or indirectness (differences in populations, interventions, outcomes, indirect comparisons).

3 Results

3.1 Literature Search

The literature search resulted in 1,719 hits. Of these, 27 articles were selected based on titles and abstracts. After reading the full-texts, ten articles were excluded because of the absence of measurement properties (n = 5) [29,30,31,32,33] or using a diary/record (n = 3) [34,35,36] or an interview (n = 2) [37, 38]. Finally, 17 articles [39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55] on 11 different PA questionnaires (17 versions) [39, 44, 56,57,58,59,60,61,62,63] were included (Fig. 1). Overall, these 17 articles reported 18 studies of measurement properties. It should be noted that the studies describing the development of the short and long form of the International Physical Activity Questionnaire (IPAQ) [59] share the same reference in order to avoid any misconceptions. All results are presented for questionnaires developed for the pregnant and non-pregnant population separately only to improve readability.

Table 2 shows a summary of all included articles and questionnaires in combination with evaluated measurement properties and study populations. Construct validity was assessed for all questionnaires, whereas reliability (parameters of reliability and measurement error) was assessed for six questionnaires (11 versions) and responsiveness for two questionnaires. In most studies, an accelerometer was used as a comparison measure. Eight studies [42,43,44,45,46, 49, 51, 55] assessed the measurement properties of the Pregnancy Physical Activity Questionnaire (PPAQ) [44] or adaptations of this questionnaire (e.g., Japanese version). Another study [48] evaluated the long form of the IPAQ, whereas two studies (of reliability and construct validity), reported in one article [52], evaluated the short form of the IPAQ (IPAQ-SF). One study [39] used a strongly modified version of the IPAQ measuring leisure time (LT) PA (LTPA) in pregnancy. One article [40] reported one study evaluating two questionnaires, namely the Australian Women’s Activity Study (AWAS) [60] and the Recent Physical Activity Questionnaire (RPAQ) [57].

Table 2 Explanation of acronyms or abbreviated names of questionnaires, studies on measurement properties and sample characteristics

Full size table

3.2 Description of Questionnaires

A detailed description of the questionnaires is shown in Table 3. Of the 11 questionnaires, four were developed to assess PA in pregnant women [39, 44, 62, 63], whereas five were developed for adults [56, 57, 59, 61], one for adults and adolescents [58], and one for women with young children [60].

Table 3 Description of PA questionnaires

Full size table

Of the seven questionnaires that were developed for the non-pregnant population, six (Activity Questionnaire for Adolescents and Adults [AQuAA], AWAS, Global Physical Activity Questionnaire [GPAQ], IPAQ, IPAQ-SF, RPAQ) aim to measure the construct PA and one (Leisure-Time Exercise Questionnaire [LTEQ]) measures LT exercise. When assessing (total) PA, the AQuAA, AWAS, GPAQ, IPAQ, and RPAQ cover all relevant settings of PA (home, recreation, sports, transport, work). The GPAQ assesses sport-related PA within discretionary time (leisure, recreation, sports). Likewise, the RPAQ assesses sport-related PA such as competitive running and swimming in its section on recreation. The AWAS assesses planned activities (including sports, leisure, recreation) and was developed to measure PA in women with young children, and therefore focuses particularly on childcare activities and domestic responsibilities. The IPAQ-SF aims to cover all settings of PA without discriminating between them. Most of the questionnaires use a typical week or the last week as a recall period and the number of questions varies from seven (IPAQ-SF) to 68 (AWAS). Duration, frequency, and intensity of PA are obtained by all questionnaires except LTEQ, which only collects frequency and intensity. Usually, both a total PA score and separate scores for time spent in different intensity levels (e.g., light PA, VPA) as well as sedentary behavior (SB) can be calculated using minutes per day/week, MET min per week or frequency per week as units of measurement. In addition, GPAQ, IPAQ, and RPAQ provide separate PA scores for different settings.

Of the four questionnaires developed for the pregnant population, PA is measured with reference to the specific trimester (Physical Activity and Pregnancy Questionnaire [PAPQ], PPAQ), the last 2 weeks (Leisure-Time Physical Activity Questionnaire [LTPAQ]) [39] or since becoming pregnant (Questionnaire of recreational exercise from Norwegian Mother and Child Cohort Study [Q1 of MoBa]) [63]. PAPQ and PPAQ aim to measure the construct (total) PA, whereas LTPAQ and Q1 of MoBa aim to measure LTPA or recreational exercise during pregnancy. The LTPAQ was based on the IPAQ but was strongly modified to provide a better discrimination between the structured (LT excluding household) and unstructured (household) features of PA. Parameters of duration, frequency, and intensity of PA are assessed by all questionnaires except Q1 of MoBa. Scores for total PA, time spent in light PA, moderate PA, VPA, and SB can be calculated for the PAPQ, PPAQ, and LTPAQ. For Q1 of MoBa, only a total PA score can be calculated. All four questionnaires use minutes per week or MET min/week to calculate PA scores.

Finally, all questionnaires that assigned MET intensities for activities use compendium-based information about intensities for different activities [64]. These MET intensities are based on the general population, including men and non-pregnant women. In contrast, the PPAQ uses pregnancy-specific MET intensities whenever possible, such as for walking and light-to-moderate intense household activities [44].

3.3 Assessment of Measurement Properties

3.3.1 Content Validity

A comprehensive evaluation of the content validity of PA questionnaires during pregnancy was not part of this review. Consequently, no included study assessed the content validity in a methodological approach but some provided information on content validity. During the development of the PPAQ, one study [44] used 24-h recalls to select both prevalent and discriminatory activities of pregnant women. The findings of the study showed that watching television, standing or slowly walking at work while carrying light/moderate loads, and childcare were the most relevant activities. Another study [54] discussed the content validity of the GPAQ theoretically in the context of previous research and expert opinions. Their conclusion was that the GPAQ includes important settings (e.g., work, transport, leisure) and scores (frequency, duration, intensity) of PA but including pregnancy-specific activities (and settings) such as caregiving might result in a better content validity. Furthermore, one study [39] of the LTPAQ strongly modified the IPAQ to provide a better discrimination between the structured (LT excluding household) and unstructured (household) features of PA. They excluded occupational PA and used the degree of breathlessness (none, some, strong) instead of light, moderate, and vigorous to describe the intensity of activities, which may result in a better understanding for some women. Finally, studies of adaptations of the PPAQ [43, 45, 49, 51, 55] included expert opinions and pilot studies to assess content validity and, consequently, items were modified and/or deleted during their cross-cultural validation process.

According to our criterion (i) (see Sect. 2.5.1), of those questionnaires that aim to measure total PA, AQuAA, AWAS, GPAQ, IPAQ, IPAQ-SF, PAPQ, and PPAQ cover all relevant settings of PA. The RPAQ does not collect information on household-related activities [57] since the authors showed in a previous study [65] that these activities were inversely correlated with objectively measured PA. Therefore, they only included a few activities such as stair-climbing at home, mowing the lawn, watering the lawn or garden, or home maintenance. The IPAQ-SF aims to cover all settings of PA, but domain-specific scores cannot be obtained. The LTEQ, LTPAQ, and Q1 of MoBa were developed to collect specific information about LT/recreational exercise and LTPA rather than total PA. According to criterion (ii) (see Sect. 2.5.1), all included questionnaires assess frequency and duration of PA except LTEQ and Q1 of MoBa and no questionnaire uses a recall period of less than 1 week. In sum, the AQuAA, AWAS, GPAQ, IPAQ, IPAQ-SF, LTPAQ, PAPQ, and PPAQ provided sufficient content validity for the assessment of PA during pregnancy, whereas LTEQ, Q1 of MoBa, and RPAQ did not.

3.3.2 Reliability

The results for reliability (parameters of reliability and measurement error) of ten studies of six questionnaires (11 versions) are summarized in Table 4. Of the questionnaires developed for the non-pregnant population, the IPAQ-SF [52] showed sufficient reliability for all estimates of PA, the LTEQ [53] for strenuous LT exercise but not for total, mild, and moderate LT exercise, and the RPAQ [40] showed sufficient reliability for moderate PA but insufficient reliability for all other estimates of PA. The AWAS [40] showed insufficient reliability (ICC < 0.70).

Table 4 Parameters of reliability and measurement error of PA questionnaires during pregnancy

Full size table

Of the questionnaires developed for the pregnant population, parameters of reliability and measurement error were only assessed for (versions of) the PPAQ and LTPAQ. In sum, studies of the English [44], Turkish [45], and Vietnamese versions [51] of the PPAQ showed sufficient reliability. The Chinese version [55] showed sufficient reliability for all PA scores except moderate PA, VPA, and sports/exercise. The French version of the PPAQ [43] showed sufficient reliability for all scores except for transportational PA and, likewise, the Japanese version [49] for all scores except for transportational PA, sports/exercise, and occupational PA (1-week interval only). Although three studies [39, 49, 51] assessed measurement error, only one study reported LOA or CV for repeated measurements. In particular, the results for the LTPAQ [39] were insufficient because of large LOA (MIC_{frequency/duration} < LOA/SDC) and CV. These values indicate large measurement errors and hamper a reliable detection of MIC of PA (e.g., two sessions or 30 min of MVPA per week) [17].

3.3.3 Construct and Criterion Validity

The results for construct validity are summarized in Table 5. Of the 11 different questionnaires, construct validity was mostly assessed by validation against accelerometers and less often against pedometers, logbooks, or other PA questionnaires.

Table 5 Construct validity and responsiveness of PA questionnaires during pregnancy

Full size table

Of the seven questionnaires developed for the non-pregnant population, the AQuAA [50], AWAS [40], GPAQ [54], IPAQ [48], IPAQ-SF [52], and LTEQ [53] showed insufficient construct validity because of low coefficients or large disagreements (e.g., wide LOA). The RPAQ [40] showed a sufficient correlation with PA estimates from the accelerometer for total active time (r ≥ 0.50) but not for total physical activity energy expenditure (PAEE) and other estimates of PA.

Of the four questionnaires developed for the pregnant population, the LTPAQ [39] showed insufficient construct validity. The ratings for the PAPQ [47] were insufficient for light and moderate PA but sufficient for VPA. However, the LOA indicated large disagreement between PAPQ and accelerometry in assessing VPA. The results of studies of the construct validity of (versions of) the PPAQ were predominantly insufficient, such as for the Vietnamese [51], Japanese [49], English [44, 46], Chinese [55], and bilingual [46] versions of the questionnaire. Likewise, the second study [42] of the English version revealed insufficient construct validity for all scores expect for LT-MVPA. The Turkish version of the PPAQ [45] showed sufficient validity for the assessment of total PA due to a high correlation with the pedometer but insufficient ratings for all other estimates. The French version of the PPAQ [43] received sufficient ratings for total, light, and moderate PA, household/caregiving and occupational but insufficient ratings for sports/exercise, vigorous, and transportational PA. Finally, Q1 of MoBa [41] showed insufficient construct validity. There was a low correlation (r < 0.50) between sum of weekly exercise estimated from the questionnaire and VPA estimated from the accelerometer.

3.3.4 Responsiveness

Only two studies examined responsiveness for two questionnaires (see Table 5). The AQuAA [50] showed insufficient responsiveness. Similarly, the GPAQ [54] showed insufficient responsiveness because of large disagreements (large LOA) between the questionnaire and accelerometer. Moreover, the GPAQ showed both systematic (difference in intercepts) and proportional differences (difference in slopes) regarding the change in MVPA between 14–18 and 29–33 weeks of gestation as indicated by Passing Bablok regression [54].

3.4 Quality of Individual Studies

Regarding the assessment of reliability of each PA score, nine studies [39, 40, 43,44,45, 49, 51, 52, 55] of AWAS, IPAQ-SF, LTPAQ, PPAQ, and RPAQ were at the highest level of quality (level 1) and one study [53] of the LTEQ at level 3 because of use of Pearson correlations and an inadequate time interval between test and retest. Regarding construct validity, six studies [40, 41, 47, 50, 52, 54] of AQuAA, AWAS, GPAQ, IPAQ-SF, PAPQ, and Q1 of MoBa were at the highest level of quality (level 1), four studies [40, 43, 44, 55] of PPAQ and RPAQ at level 1 and 2, one study [42] of PPAQ at level 1 and 3, and six studies [39, 45, 46, 49, 51, 53] of LTEQ, LTPAQ, and PPAQ at level 3 (see Table 5). The quality of one study of the IPAQ was either of level 1, level 2, or level 3 depending on the evaluated PA score [48]. Different levels of quality were assigned due to comparisons with either objective (e.g., accelerometer, pedometer) or subjective (e.g., logbook, questionnaire) measures of PA or comparisons between different intensity levels. For example, a lower level of quality was assigned if light PA measured by the questionnaire was compared with MVPA measured by the accelerometer (e.g., Japanese version of the PPAQ) [49] or if PA measured by the questionnaire was compared with pedometer measured daily steps (e.g., LTEQ) [53]. Furthermore, the quality for the assessment of total PA was often of level 2 because total PAEE estimated from the questionnaires was compared against accelerometer estimated total counts. Responsiveness was evaluated in two studies [50, 54] for two questionnaires (AQuAA, GPAQ). The quality of these studies was rated as level 1.

Finally, almost none of the studies formulated a priori hypotheses about expected results for construct validity or responsiveness. Only two studies [50, 52] of the AQuAA and IPAQ-SF considered a minimum correlation of r = 0.5 as an adequate agreement between PA questionnaire and accelerometer.

3.5 Quality of Evidence

Table 6 summarizes the overall results (i.e., sufficient/insufficient measurement properties) and quality of evidence (GRADE) for three PA scores; total PA, MVPA, and VPA (per questionnaire and measurement property). None of the questionnaires provided evidence for all the relevant measurement properties (i.e., reliability [parameters of reliability or measurement error], construct validity, responsiveness). Only for the AWAS, IPAQ-SF, LTEQ, LTPAQ, PPAQ (i.e., Chinese, English, French, Japanese, Turkish, Vietnamese versions), and RPAQ was both reliability and construct validity assessed. Because there was usually only one study per questionnaire version and PA score available (except PPAQ), inconsistency could not be evaluated for these studies. With reference to the eligibility criteria and the checklist for methodological quality, we identified no serious indirectness, and therefore, did not downgrade the quality of evidence for any of the PA scores due to this factor.

Table 6 GRADE evidence profile: measurement properties of PA questionnaires for the assessment of total PA, MVPA and VPA during pregnancy

Full size table

Overall and irrespective of the reported results (i.e., sufficient/insufficient measurement properties), the quality of the body of evidence was limited and ranged from very low to moderate. There was no high-quality evidence indicating that any of the included questionnaires had sufficient measurement properties in assessing total PA, MVPA, or VPA. Only the Turkish and French versions of the PPAQ showed both sufficient reliability and construct validity when assessing total PA (but not MVPA and VPA), but these results were based on low-to-moderate quality evidence.

Although different language versions of questionnaires should be treated initially separately [26], one may consider pooling the results (i.e., body of evidence) of the different versions of the PPAQ. When doing so, there was high-quality evidence (no serious risk of bias, no serious imprecision, no serious inconsistency, no serious indirectness) that the PPAQ had sufficient reliability in assessing total PA and VPA. We did not consider downgrading the quality of evidence for VPA as most of the results were sufficient (four of five studies), except the Chinese version, which may have occurred because most women did not engage in these activities, as suggested by the authors [55].

The results for construct validity of the PPAQ were inconsistent for total PA (i.e., two studies showed sufficient and five studies insufficient results) and consistently insufficient for VPA (see Table 6). When pooling these results, the PPAQ showed insufficient validity in assessing total PA, which was based on low-quality evidence (serious risk of bias, serious inconsistency, no serious imprecision, no serious indirectness). Similarly, there was moderate-quality evidence that the PPAQ has insufficient validity in assessing VPA (serious risk of bias, no serious inconsistency, no serious imprecision, no serious indirectness). We could not pool the results for MVPA and other measurement properties such as measurement error and responsiveness of the PPAQ due to a lack of multiple studies.

4 Discussion

In contrast to the considerable evidence concerning measurement properties of PA questionnaires in adults [11], youth [10], and elderly people [12], little information is available about the quality of PA questionnaires in pregnancy. This article provides an overview of the measurement properties of all self-administered questionnaires assessing PA in pregnancy. In contrast to other reviews [66], the quality of individual studies as well as the overall quality of evidence was evaluated.

The findings show that the quality of evidence of measurement properties for self-administered PA questionnaires assessing PA in pregnancy is currently low to moderate. Most PA questionnaires showed insufficient measurement properties. Only two studies assessed responsiveness for two questionnaires (AQuAA, GPAQ) and, thus, no questionnaire demonstrated sufficiency for all relevant measurement properties (i.e., content validity, reliability, construct validity, responsiveness). Of those questionnaires for which evidence for both reliability and construct validity was available, only few showed consistent results. Based on low-to-moderate quality evidence, only the Turkish and French versions of the PPAQ showed sufficient reliability and construct validity in assessing total PA. When considering all versions together, the PPAQ showed sufficient reliability in assessing total PA and VPA, based on high-quality evidence. However, based on low-to-moderate quality evidence, the questionnaire showed insufficient construct validity in assessing these PA scores. Furthermore, the pooled results of the PPAQ were consistently sufficient for reliability, but inconsistent for construct validity (i.e., sufficient or insufficient). Although there was limited high-quality evidence, we currently recommend the PPAQ, irrespective of language, to assess PA during pregnancy. The PPAQ showed sufficient content validity and was the only included questionnaire with versions showing both sufficient reliability and validity.

Construct validity was assessed for all (versions of) questionnaires and most of them were compared with objective measures of PA such as accelerometers or pedometers. However, the methodological quality of these individual studies varied substantially. No study used DLW, although this technique can safely be applied in pregnancy [67], but it does not represent maternal EE since the DLW will cross the placenta. For many PA scores, comparisons were made with a different level of intensity in accelerometer data, which led to a lower quality of the individual study. For example, time spent in light activities does not necessarily correlate with time spent in moderate or vigorous activities. Furthermore, sometimes (total) PA was compared with pedometer estimated daily steps. Because pedometers are not able to capture duration, frequency, and intensity of PA [68], the quality of these individual studies was considered as doubtful. Only few studies reported statistics such as LOA to assess absolute validity, rather than relative validity evaluated with Spearman or Pearson correlations. Reliability was assessed for six questionnaires (11 versions) and the methodological quality of these individual studies was usually high. Most studies used ICC or LOA and adequate time intervals between test and retest. Finally, only two studies of very good quality assessed responsiveness, the ability of a questionnaire to detect changes in PA over time. Especially in pregnancy, a period in which PA usually changes profoundly [2], a questionnaire with sufficient responsiveness is needed to capture these changes.

During pregnancy, a precise focus on content validity such as the choice of recall periods, activities or relevant settings of PA is needed. First, the intensity, type, and duration of PA can change with the ongoing pregnancy [2]. For example, light activities become more frequent, especially during the second and third trimesters. Activities can become more intense throughout pregnancy because of increased fatigue [2] and energy requirements [69]. For example, carrying loads can be experienced as more exhausting in late compared to early pregnancy, and walking up the stairs will objectively require more energy with increasing body weight. Furthermore, work-related PA might be more important in early pregnancy compared to the second and/or third trimester due to maternity leave. Similarly, household and caregiving activities become more important, especially when assessing PA in combination with parity. These pregnancy-related changes should be considered when assessing PA during pregnancy. Questionnaires with sufficient content validity (AQuAA, AWAS, GPAQ, IPAQ, IPAQ-SF, LTPAQ, PAPQ, PPAQ), based on our elementary criteria, may need to be further appraised with respect to these considerations.

In pregnancy EE needed for some activities increases, especially in the second and third trimesters [69, 70], and the intensity of activities may be different [2, 71]. Many PA questionnaires use compendium-based information about MET intensities of different activities [64], which are based on the adult non-pregnant population. Pregnancy-specific MET intensities are scarce and may only be available for light and moderate household PA [72]. Such intensities are applied in, for example, the PPAQ. The lack of pregnancy-specific MET intensities together with the application of intensities from the non-pregnant population can be a source of bias when assessing total PA or PAEE. This could be the reason that for the RPAQ, a low correlation was shown for total PAEE, but a high correlation for total active time. However, more studies would be needed to test this hypothesis.

The present findings also revealed heterogeneity in the study design and analysis. This could result in a serious bias (e.g., risk of bias, inconsistency) and hampers the comparability of findings across (included) studies and countries. For example, accelerometers have been widely used to assess construct validity in this review. Although these devices can provide accurate information about duration, frequency, and intensity of PA under free-living conditions [73], there are currently no standards for accelerometer data collection and processing [74,75,76], including during pregnancy. Consequently, we observed large heterogeneity in data collection and processing criteria (Table 5). In contrast to the placement of the accelerometer (most women wore the device on their waist or hip), the included studies differed considerably in epoch length (i.e., 5 s to 10 min), registration period (3–14 days), and the definition of a valid week (e.g., 3 of 4 days, 4 of 8 days, 10 of 14 days). Furthermore, not all studies reported processing criteria, including the definition of filters and sampling frequency, which were reported least often. Since different decision rules for accelerometer data could impact PA outcomes [76], the reporting of these would increase transparency, comparability between studies and countries, and allow assessment of potential risks of bias.

Most importantly, we observed large heterogeneity in applied cut points [77,78,79,80,81] used to classify the intensity of PA into light, moderate, and vigorous. These cut points were usually developed for non-pregnant populations. For example, cut points for moderate PA in this review varied substantially between 191 [79] and 1952 [78] counts per minute, which will affect estimates of both PA and construct validity [82]. The influence of using different cut points on construct validity was demonstrated by two studies included in this review [49, 50]. Because there are currently no validated cut points available for pregnant women, it is unclear which cut points provide the best comparison for assessing construct validity. Not only are pregnancy-specific cut points lacking, but little is known in general about the reliability and validity of accelerometers in pregnancy [83]. Changes in body girth, gait, and monitor tilt can affect the accuracy and the ability to detect certain movements [84].

All things considered, objective devices such as accelerometers and pedometers are likely to provide sufficient reliability, whilst construct validity may be limited due to technical shortcomings, non-wearing time, participant interference with the results, and application of (different) cut points [85]. Lower construct validity of comparison measures clearly limits the quality of evidence for the validity of PA questionnaires. This is one of the greatest challenges for reviews on measurement properties of PA questionnaires, such as for the present review. Because of these shortcomings, future (validation) studies should report their decision rules in detail and attempt to develop guidelines for the optimal use of accelerometer data in the target population (e.g., pregnancy). To this end, two recent reviews emphasized the importance of such standards, as well as critically scrutinizing the validity of accelerometers and attempting to provide age-specific practical considerations for choosing the most appropriate method [85, 86].

4.1 Recommendations for Choosing a Questionnaire

The choice of the right questionnaire depends on the study purpose. According to this, different settings (e.g., work, recreation), dimensions of PA (e.g., PAEE, total PA), or recall periods (e.g., last week, typical week) might become more important. In addition to previous recommendations for the selection of PA questionnaires [17], we recommend the following criteria for use in pregnancy:

(i)
When assessing total PA, the questionnaire should cover all relevant settings of PA (work, home, transport, recreation, sports), but should especially focus on household/caregiving.
(ii)
The questionnaire should measure at least duration and frequency of PA and should include a large range of light and moderate activities. Lower intensity activities become more prevalent during pregnancy, especially in the second and third trimesters. This will ensure sufficient content validity as well as discrimination of pregnant women regarding the level (e.g., time) engaged in these activities. For example, during the development of the PPAQ, light activities such as slowly walking at work while carrying light/moderate loads and childcare were one of the most discriminatory activities [44]. In general, identifying relevant activities for the target population should precede the selection of questions used.
(iii)
The recall period of the questionnaire should be the last week (or last seven days), a typical week in a specific trimester, or the current trimester but should not expand over more than one trimester as PA during pregnancy varies [2].
(iv)
Because pregnancy-specific MET intensities for different activities are lacking and energy cost changes during pregnancy, we further recommend using total time when assessing total PA instead of assigning activities different MET intensities from the non-pregnant population.

In general, we recommend using a questionnaire that has been evaluated in the target population and provides (consistent) results with sufficient content validity, reliability, construct validity, and responsiveness, based on high-quality evidence. If a questionnaire does not provide sufficient content validity, evaluation of further measurement properties is irrelevant. In our opinion, (versions of) the PPAQ may currently be the best choice to assess self-reported PA during pregnancy. However, some language versions of the PPAQ showed insufficient measurement properties, and, in fact, sufficient measurement properties for one language does not guarantee the same quality for other language versions and target populations. We carefully recommend not using AWAS, LTEQ, LTPAQ, RPAQ, and Q1 of MoBa (at least for some PA scores) because of insufficient content validity and/or both insufficient reliability and validity. However, our findings concerning the measurement properties of all included questionnaires were based on very low-to-moderate quality evidence.

4.2 Limitations and Strengths of this Review

Whenever a study presented multiple PA scores for construct validity and responsiveness, we tried to integrate all of them into our tables. However, if an individual study used both different cut points and average counts, we integrated coefficients with higher quality (Table 1), usually average counts. Furthermore, we did not apply any restrictions concerning certain pregnancy characteristics such as parity or pregnancy body mass index (BMI). For example, study populations in this review consisted of both normal-weight and overweight/obese pregnant women. Whether this heterogeneity influenced the results is unclear and difficult to assess because of the low number of studies. However, in our review, this may have been a problem for only inter- and not intra-questionnaire comparisons.

Another problem was the observed heterogeneity in data collection and processing criteria of objective measures such as accelerometers and pedometers. Unfortunately, these criteria likely impact both PA and validation outcomes. We were unable to define particular criteria and comparison measures as a preferable ‘gold standard’. Although we tried to incorporate the use of accelerometer data and the similarity between constructs into our quality assessment, we did not evaluate the application of different decision rules such as registration period, epoch length, filter, valid wear time, and cut points. In theory, VPA estimated from the questionnaire should be compared with VPA measured by accelerometry but the use of different cut points influences this association. These limitations are of major concern for this systematic review. Since the results of the validity of a questionnaire strongly depend on the validity of the comparison measure, we recommend that all readers bear in mind the importance of standards when using objective measures of PA during pregnancy and interpret the presented results carefully.

Lastly, we tried to use state-of-the-art methodology for our quality and result rating. The assessment was based on our experience, a series of previous published systematic reviews [10,11,12], a standardized quality checklist for PA questionnaires [17] as well as the COSMIN [23, 26] and GRADE [25] guidelines. Researchers in the field are invited to discuss these findings in the light of their own expertise, possibly assigning different criteria (e.g., MIC of PA during pregnancy), levels of quality, and result ratings.

4.3 Recommendations for Further Research

We recommend further studies assessing the quality of those questionnaires that provide sufficient content validity but limited high-quality evidence of sufficient measurement properties. Furthermore, future studies should include responsiveness in their assessment. In this review, most questionnaires were in the English language but a questionnaire should always be evaluated in the target population and language. We observed large heterogeneity in data collection and processing criteria. We strongly recommend that future studies be designed to develop standards for accelerometer use and analysis, in particular during pregnancy. Although only little is known about the validity of accelerometers in our target population, we currently recommend the use of omniaxial devices that capture all directions of movements and the use of total (or averaged) counts, which are independent from any cut points. Finally, since lower validity of (objective) comparison measures hinders the accurate estimation of the validity of a PA questionnaire, we strongly recommend research on the validity of accelerometers during pregnancy before evaluating measurement properties of PA questionnaires.

5 Conclusions

Evidence concerning the measurement properties of self-administered PA questionnaires in pregnancy is at the moment limited and mostly of lower quality (i.e., very low to moderate). No questionnaire showed sufficient content validity, construct validity, reliability, and responsiveness. Some versions of the PPAQ showed sufficient measurement properties, based on low-to-moderate quality evidence. Overall (i.e., when pooling the results of all versions), the PPAQ showed sufficient reliability in assessing total PA and VPA, based on high-quality evidence. However, based on low-to-moderate quality evidence, the questionnaire revealed insufficient construct validity in assessing these PA scores. Only after the development of guidelines for the most appropriate use of accelerometer data during pregnancy will we be able to provide recommendations for PA questionnaires based on high-quality evidence.

References

Lee I-M, Shiroma EJ, Lobelo F, Puska P, Blair SN, Katzmarzyk PT. Effect of physical inactivity on major non-communicable diseases worldwide: an analysis of burden of disease and life expectancy. Lancet. 2012;380:219–29. https://doi.org/10.1016/S0140-6736(12)61031-9.
Article PubMed PubMed Central Google Scholar
Poudevigne MS, O’Connor PJ. A review of physical activity patterns in pregnant women and their relationship to psychological health. Sports Med. 2006;36:19–38.
Article PubMed Google Scholar
da Silva SG, Ricardo LI, Evenson KR, Hallal PC. Leisure-time physical activity in pregnancy and maternal-child health: a systematic review and meta-analysis of randomized controlled trials and cohort studies. Sports Med. 2017;47:295–317. https://doi.org/10.1007/s40279-016-0565-2.
Article PubMed Google Scholar
Tobias DK, Zhang C, van Dam RM, Bowers K, Hu FB. Physical activity before and during pregnancy and risk of gestational diabetes mellitus: a meta-analysis. Diabetes Care. 2011;34:223–9. https://doi.org/10.2337/dc10-1368.
Article PubMed Google Scholar
Melzer K, Schutz Y, Boulvain M, Kayser B. Physical activity and pregnancy: cardiovascular adaptations, recommendations and pregnancy outcomes. Sports Med. 2010;40:493–507. https://doi.org/10.2165/11532290-000000000-00000.
Article PubMed Google Scholar
Aune D, Saugstad OD, Henriksen T, Tonstad S. Physical activity and the risk of preeclampsia: a systematic review and meta-analysis. Epidemiology. 2014;25:331–43. https://doi.org/10.1097/EDE.0000000000000036.
Article PubMed Google Scholar
Moyer C, Reoyo OR, May L. The influence of prenatal exercise on offspring health: a review. Clin Med Insights Womens Health. 2016;9:37–42. https://doi.org/10.4137/CMWH.S34670.
Article PubMed PubMed Central Google Scholar
Barker DJP. The origins of the developmental origins theory. J Intern Med. 2007;261:412–7. https://doi.org/10.1111/j.1365-2796.2007.01809.x.
Article PubMed CAS Google Scholar
American College of Obstetrics and Gynecology. Committee opinion no. 650: physical activity and exercise during pregnancy and the postpartum period. Obstet Gynecol. 2015;126:e135–42. https://doi.org/10.1097/AOG.0000000000001214.
Article Google Scholar
Chinapaw MJM, Mokkink LB, van Poppel MNM, van Mechelen W, Terwee CB. Physical activity questionnaires for youth: a systematic review of measurement properties. Sports Med. 2010;40:539–63. https://doi.org/10.2165/11530770-000000000-00000.
Article PubMed Google Scholar
van Poppel MNM, Chinapaw MJM, Mokkink LB, van Mechelen W, Terwee CB. Physical activity questionnaires for adults: a systematic review of measurement properties. Sports Med. 2010;40:565–600. https://doi.org/10.2165/11531930-000000000-00000.
Article PubMed Google Scholar
Forsen L, Loland NW, Vuillemin A, Chinapaw MJM, van Poppel MNM, Mokkink LB, et al. Self-administered physical activity questionnaires for the elderly: a systematic review of measurement properties. Sports Med. 2010;40:601–23. https://doi.org/10.2165/11531350-000000000-00000.
Article PubMed Google Scholar
Terwee CB, Jansma EP, Riphagen II, de Vet HCW. Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Qual Life Res. 2009;18:1115–23. https://doi.org/10.1007/s11136-009-9528-5.
Article PubMed PubMed Central Google Scholar
Caspersen CJ, Powell KE, Christenson GM. Physical activity, exercise, and physical fitness: definitions and distinctions for health-related research. Public Health Rep. 1985;100:126–31.
PubMed PubMed Central CAS Google Scholar
Bowling A. Mode of questionnaire administration can have serious effects on data quality. J Public Health. 2005;27:281–91. https://doi.org/10.1093/pubmed/fdi031.
Article Google Scholar
Costa DSJ. Reflective, causal, and composite indicators of quality of life: a conceptual or an empirical distinction? Qual Life Res. 2015;24:2057–65. https://doi.org/10.1007/s11136-015-0954-2.
Article PubMed Google Scholar
Terwee CB, Mokkink LB, van Poppel MNM, Chinapaw MJM, van Mechelen W, de Vet HCW. Qualitative attributes and measurement properties of physical activity questionnaires: a checklist. Sports Med. 2010;40:525–37. https://doi.org/10.2165/11531370-000000000-00000.
Article PubMed Google Scholar
Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Patrick DL, et al. Protocol of the COSMIN study: COnsensus-based Standards for the selection of health Measurement INstruments. BMC Med Res Methodol. 2006;6:2. https://doi.org/10.1186/1471-2288-6-2.
Article PubMed PubMed Central CAS Google Scholar
Terwee CB, Bot SDM, de Boer MR, van der Windt DAWM, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60:34–42. https://doi.org/10.1016/j.jclinepi.2006.03.012.
Article PubMed Google Scholar
Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63:737–45. https://doi.org/10.1016/j.jclinepi.2010.02.006.
Article PubMed Google Scholar
Deyo RA, Centor RM. Assessing the responsiveness of functional scales to clinical change: an analogy to diagnostic test performance. J Chronic Dis. 1986;39:897–906. https://doi.org/10.1016/0021-9681(86)90038-X.
Article PubMed CAS Google Scholar
Terwee CB, Dekker FW, Wiersinga WM, Prummel MF, Bossuyt PMM. On assessing responsiveness of health-related quality of life instruments: guidelines for instrument evaluation. Qual Life Res. 2003;12:349–62.
Article PubMed CAS Google Scholar
Mokkink LB, de Vet HCW, Prinsen CAC, Patrick DL, Alonso J, Bouter LM, Terwee CB. COSMIN risk of bias checklist for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27:1171–9. https://doi.org/10.1007/s11136-017-1765-4.
Article PubMed CAS Google Scholar
Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. 2005;19:231–40. https://doi.org/10.1519/15184.1.
Article PubMed Google Scholar
Guyatt GH, Oxman AD, Schünemann HJ, Tugwell P, Knottnerus A. GRADE guidelines: a new series of articles in the Journal of Clinical Epidemiology. J Clin Epidemiol. 2011;64:380–2. https://doi.org/10.1016/j.jclinepi.2010.09.011.
Article PubMed Google Scholar
Prinsen CAC, Mokkink LB, Bouter LM, Alonso J, Patrick DL, de Vet HCW, Terwee CB. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27:1147–57. https://doi.org/10.1007/s11136-018-1798-3.
Article PubMed PubMed Central CAS Google Scholar
de Vet Henrica CW, Terwee CB, Mokkink LB, Knol DL. Measurement in medicine: a practical guide. Cambridge: Cambridge University Press; 2011.
Google Scholar
Moinester M, Gottfried R. Sample size estimation for correlations with pre-specified confidence interval. TQMP. 2014;10:124–30. https://doi.org/10.20982/tqmp.10.2.p0124.
Article Google Scholar
Wildschut HI, Harker LM, Riddoch CJ. The potential value of a short self-completion questionnaire for the assessment of habitual physical activity in pregnancy. J Psychosom Obstet Gynaecol. 1993;14:17–29.
Article PubMed CAS Google Scholar
McParlin C, Robson SC, Tennant PWG, Besson H, Rankin J, Adamson AJ, et al. Objectively measured physical activity during pregnancy: a study in obese and overweight women. BMC Pregnancy Childbirth. 2010;10:76. https://doi.org/10.1186/1471-2393-10-76.
Article PubMed PubMed Central Google Scholar
Liu J, Blair SN, Teng Y, Ness AR, Lawlor DA, Riddoch C. Physical activity during pregnancy in a prospective cohort of British women: results from the Avon longitudinal study of parents and children. Eur J Epidemiol. 2011;26:237–47. https://doi.org/10.1007/s10654-010-9538-1.
Article PubMed Google Scholar
Ko Y-L, Chen C-P, Lin P-C. Physical activities during pregnancy and type of delivery in nulliparae. Eur J Sport Sci. 2016;16:374–80. https://doi.org/10.1080/17461391.2015.1028468.
Article PubMed Google Scholar
Santos PC, Abreu S, Moreira C, Santos R, Ferreira M, Alves O, et al. Physical activity patterns during pregnancy in a sample of Portuguese women: a longitudinal prospective study. Iran Red Crescent Med J. 2016;18:e22455. https://doi.org/10.5812/ircmj.22455.
Article PubMed PubMed Central Google Scholar
Lindseth G, Vari P. Measuring physical activity during pregnancy. West J Nurs Res. 2005;27:722–34. https://doi.org/10.1177/0193945905276523.
Article PubMed Google Scholar
Smith KM, Foster RC, Campbell CG. Accuracy of physical activity assessment during pregnancy: an observational study. BMC Pregnancy Childbirth. 2011;11:86. https://doi.org/10.1186/1471-2393-11-86.
Article PubMed PubMed Central Google Scholar
Stein AD, Rivera JM, Pivarnik JM. Measuring energy expenditure in habitually active and sedentary pregnant women. Med Sci Sports Exerc. 2003;35:1441–6. https://doi.org/10.1249/01.MSS.0000079107.04349.9A.
Article PubMed Google Scholar
Rousham EK, Clarke PE, Gross H. Significant changes in physical activity among pregnant women in the UK as assessed by accelerometry and self-reported activity. Eur J Clin Nutr. 2006;60:393–400. https://doi.org/10.1038/sj.ejcn.1602329.
Article PubMed CAS Google Scholar
Schmidt MD, Freedson PS, Pekow P, Roberts D, Sternfeld B, Chasan-Taber L. Validation of the Kaiser Physical Activity Survey in pregnant women. Med Sci Sports Exerc. 2006;38:42–50.
Article PubMed Google Scholar
Aittasalo M, Pasanen M, Fogelholm M, Ojala K. Validity and repeatability of a short pregnancy leisure time physical activity questionnaire. J Phys Act Health. 2010;7:109–18.
Article PubMed Google Scholar
Bell R, Tennant PWG, McParlin C, Pearce MS, Adamson AJ, Rankin J, Robson SC. Measuring physical activity in pregnancy: a comparison of accelerometry and self-completion questionnaires in overweight and obese women. Eur J Obstet Gynecol Reprod Biol. 2013;170:90–5. https://doi.org/10.1016/j.ejogrb.2013.05.018.
Article PubMed Google Scholar
Brantsaeter AL, Owe KM, Haugen M, Alexander J, Meltzer HM, Longnecker MP. Validation of self-reported recreational exercise in pregnant women in the Norwegian Mother and Child Cohort Study. Scand J Med Sci Sports. 2010;20:e48–55. https://doi.org/10.1111/j.1600-0838.2009.00896.x.
Article PubMed CAS Google Scholar
Brett KE, Wilson S, Ferraro ZM, Adamo KB. Self-report Pregnancy Physical Activity Questionnaire overestimates physical activity. Can J Public Health. 2015;106:e297–302. https://doi.org/10.17269/cjph.106.4938.
Article PubMed PubMed Central Google Scholar
Chandonnet N, Saey D, Almeras N, Marc I. French Pregnancy Physical Activity Questionnaire compared with an accelerometer cut point to classify physical activity among pregnant obese women. PLoS One. 2012;7:e38818. https://doi.org/10.1371/journal.pone.0038818.
Article PubMed PubMed Central CAS Google Scholar
Chasan-Taber L, Schmidt MD, Roberts DE, Hosmer D, Markenson G, Freedson PS. Development and validation of a Pregnancy Physical Activity Questionnaire. Med Sci Sports Exerc. 2004;36:1750–60.
Article PubMed Google Scholar
Cirak Y, Yilmaz GD, Demir YP, Dalkilinc M, Yaman S. Pregnancy physical activity questionnaire (PPAQ): reliability and validity of Turkish version. J Phys Ther Sci. 2015;27:3703–9. https://doi.org/10.1589/jpts.27.3703.
Article PubMed PubMed Central Google Scholar
Cohen TR, Plourde H, Koski KG. Use of the Pregnancy Physical Activity Questionnaire (PPAQ) to identify behaviours associated with appropriate gestational weight gain during pregnancy. J Phys Act Health. 2013;10:1000–7.
Article PubMed Google Scholar
Haakstad LAH, Gundersen I, Bo K. Self-reporting compared to motion monitor in the measurement of physical activity during pregnancy. Acta Obstet Gynecol Scand. 2010;89:749–56. https://doi.org/10.3109/00016349.2010.484482.
Article PubMed Google Scholar
Harrison CL, Thompson RG, Teede HJ, Lombard CB. Measuring physical activity during pregnancy. Int J Behav Nutr Phys Act. 2011;8:19. https://doi.org/10.1186/1479-5868-8-19.
Article PubMed PubMed Central Google Scholar
Matsuzaki M, Haruna M, Nakayama K, Shiraishi M, Ota E, Murayama R, et al. Adapting the Pregnancy Physical Activity Questionnaire for Japanese pregnant women. J Obstet Gynecol Neonatal Nurs. 2014;43:107–16. https://doi.org/10.1111/1552-6909.12267.
Article PubMed Google Scholar
Oostdam N, van Mechelen W, van Poppel M. Validation and responsiveness of the AQuAA for measuring physical activity in overweight and obese pregnant women. J Sci Med Sport. 2013;16:412–6. https://doi.org/10.1016/j.jsams.2012.09.001.
Article PubMed Google Scholar
Ota E, Haruna M, Yanai H, Suzuki M, Anh DD, Matsuzaki M, et al. Reliability and validity of the Vietnamese version of the Pregnancy Physical Activity Questionnaire (PPAQ). Southeast Asian J Trop Med Public Health. 2008;39:562–70.
PubMed Google Scholar
Sanda B, Vistad I, Haakstad LAH, Berntsen S, Sagedal LR, Lohne-Seiler H, Torstveit MK. Reliability and concurrent validity of the International Physical Activity Questionnaire short form among pregnant women. BMC Sports Sci Med Rehabil. 2017;9:7. https://doi.org/10.1186/s13102-017-0070-4.
Article PubMed PubMed Central Google Scholar
Symons Downs D, LeMasurier GC, DiNallo JM. Baby steps: pedometer-determined and self-reported leisure-time exercise behaviors of pregnant women. J Phys Act Health. 2009;6:63–72.
Article Google Scholar
Watson ED, Micklesfield LK, van Poppel MNM, Norris SA, Sattler MC, Dietz P. Validity and responsiveness of the Global Physical Activity Questionnaire (GPAQ) in assessing physical activity during pregnancy. PLoS One. 2017;12:e0177996. https://doi.org/10.1371/journal.pone.0177996.
Article PubMed PubMed Central CAS Google Scholar
Xiang M, Konishi M, Hu H, Takahashi M, Fan W, Nishimaki M, et al. Reliability and validity of a Chinese-translated version of a Pregnancy Physical Activity Questionnaire. Matern Child Health J. 2016;20:1940–7. https://doi.org/10.1007/s10995-016-2008-y.
Article PubMed Google Scholar
Armstrong T, Bull F. Development of the World Health Organization Global Physical Activity Questionnaire (GPAQ). J Public Health. 2006;14:66–70. https://doi.org/10.1007/s10389-006-0024-x.
Article Google Scholar
Besson H, Brage S, Jakes RW, Ekelund U, Wareham NJ. Estimating physical activity energy expenditure, sedentary time, and physical activity intensity by self-report in adults. Am J Clin Nutr. 2010;91:106–14. https://doi.org/10.3945/ajcn.2009.28432.
Article PubMed CAS Google Scholar
Chinapaw MJM, Slootmaker SM, Schuit AJ, van Zuidam M, van Mechelen W. Reliability and validity of the Activity Questionnaire for Adults and Adolescents (AQuAA). BMC Med Res Methodol. 2009;9:58. https://doi.org/10.1186/1471-2288-9-58.
Article PubMed PubMed Central Google Scholar
Craig CL, Marshall AL, Sjostrom M, Bauman AE, Booth ML, Ainsworth BE, et al. International physical activity questionnaire: 12-country reliability and validity. Med Sci Sports Exerc. 2003;35:1381–95. https://doi.org/10.1249/01.MSS.0000078924.61453.FB.
Article PubMed Google Scholar
Fjeldsoe BS, Marshall AL, Miller YD. Measurement properties of the Australian Women’s Activity Survey. Med Sci Sports Exerc. 2009;41:1020–33. https://doi.org/10.1249/MSS.0b013e31819461c2.
Article PubMed Google Scholar
Godin G, Shephard RJ. A simple method to assess exercise behavior in the community. Can J Appl Sport Sci. 1985;10:141–6.
PubMed CAS Google Scholar
Haakstad LAH, Voldner N, Henriksen T, Bo K. Physical activity level and weight gain in a cohort of pregnant Norwegian women. Acta Obstet Gynecol Scand. 2007;86:559–64. https://doi.org/10.1080/00016340601185301.
Article PubMed Google Scholar
Magnus P, Trogstad L, Owe KM, Olsen SF, Nystad W. Recreational physical activity and the risk of preeclampsia: a prospective cohort of Norwegian women. Am J Epidemiol. 2008;168:952–7. https://doi.org/10.1093/aje/kwn189.
Article PubMed PubMed Central Google Scholar
Ainsworth BE, Haskell WL, Whitt MC, Irwin ML, Swartz AM, Strath SJ, et al. Compendium of physical activities: an update of activity codes and MET intensities. Med Sci Sports Exerc. 2000;32:S498–504.
Article PubMed CAS Google Scholar
Wareham NJ, Jakes RW, Rennie KL, Mitchell J, Hennings S, Day NE. Validity and repeatability of the EPIC-Norfolk Physical Activity Questionnaire. Int J Epidemiol. 2002;31:168–74. https://doi.org/10.1093/ije/31.1.168.
Article PubMed Google Scholar
Evenson KR, Chasan-Taber L, Symons Downs D, Pearce EE. Review of self-reported physical activity assessments for pregnancy: summary of the evidence for validity and reliability. Paediatr Perinat Epidemiol. 2012;26:479–94. https://doi.org/10.1111/j.1365-3016.2012.01311.x.
Article PubMed PubMed Central Google Scholar
Goldberg GR, Prentice AM, Coward WA, Davies HL, Murgatroyd PR, Wensing C, et al. Longitudinal assessment of energy expenditure in pregnancy by the doubly labeled water method. Am J Clin Nutr. 1993;57:494–505.
Article PubMed CAS Google Scholar
Corder K, Brage S, Ekelund U. Accelerometers and pedometers: methodology and clinical application. Curr Opin Clin Nutr Metab Care. 2007;10:597–603. https://doi.org/10.1097/MCO.0b013e328285d883.
Article PubMed Google Scholar
Butte NF, Wong WW, Treuth MS, Ellis KJ, O’Brian Smith E. Energy requirements during pregnancy based on total energy expenditure and energy deposition. Am J Clin Nutr. 2004;79:1078–87.
Article PubMed CAS Google Scholar
Lof M, Forsum E. Activity pattern and energy expenditure due to physical activity before and during pregnancy in healthy Swedish women. Br J Nutr. 2006;95:296–302.
Article PubMed CAS Google Scholar
Symons Downs D, Chasan-Taber L, Evenson KR, Leiferman J, Yeo S. Physical activity and pregnancy: past and present evidence and future recommendations. Res Q Exerc Sport. 2012;83:485–502. https://doi.org/10.1080/02701367.2012.10599138.
Article Google Scholar
Roberts DE, Fragala MS, Pober D, Chasan-Taber L, Freedson PS. Energy cost of physical activities during pregnancy. Med Sci Sports Exerc. 2002;34:S124.
Article Google Scholar
Plasqui G, Westerterp KR. Physical activity assessment with accelerometers: an evaluation against doubly labeled water. Obesity (Silver Spring). 2007;15:2371–9. https://doi.org/10.1038/oby.2007.281.
Article Google Scholar
Matthews CE, Hagströmer M, Pober DM, Bowles HR. Best practices for using physical activity monitors in population-based research. Med Sci Sports Exerc. 2012;44:S68–76. https://doi.org/10.1249/MSS.0b013e3182399e5b.
Article PubMed PubMed Central Google Scholar
Rosenberger ME, Haskell WL, Albinali F, Mota S, Nawyn J, Intille S. Estimating activity and sedentary behavior from an accelerometer on the hip or wrist. Med Sci Sports Exerc. 2013;45:964–75. https://doi.org/10.1249/MSS.0b013e31827f0d9c.
Article PubMed PubMed Central Google Scholar
Mâsse LC, Fuemmeler BF, Anderson CB, Matthews CE, Trost SG, Catellier DJ, Treuth M. Accelerometer data reduction: a comparison of four reduction algorithms on select outcome variables. Med Sci Sports Exerc. 2005;37:S544–54.
Article PubMed Google Scholar
Swartz AM, Strath SJ, Bassett DR, O’Brien WL, King GA, Ainsworth BE. Estimation of energy expenditure using CSA accelerometers at hip and wrist sites. Med Sci Sports Exerc. 2000;32:S450–6.
Article PubMed CAS Google Scholar
Freedson PS, Melanson E, Sirard J. Calibration of the computer science and applications, Inc. accelerometer. Med Sci Sports Exerc. 1998;30:777–81.
Article PubMed CAS Google Scholar
Hendelman D, Miller K, Baggett C, Debold E, Freedson P. Validity of accelerometry for the assessment of moderate intensity physical activity in the field. Med Sci Sports Exerc. 2000;32:S442–9.
Article PubMed CAS Google Scholar
Matthews CE. Calibration of accelerometer output for adults. Med Sci Sports Exerc. 2005;37:S512–22.
Article Google Scholar
Colley RC, Tremblay MS. Moderate and vigorous physical activity intensity cut-points for the Actical accelerometer. J Sports Sci. 2011;29:783–9. https://doi.org/10.1080/02640414.2011.557744.
Article PubMed Google Scholar
Watson KB, Carlson SA, Carroll DD, Fulton JE. Comparison of accelerometer cut points to estimate physical activity in US adults. J Sports Sci. 2014;32:660–9. https://doi.org/10.1080/02640414.2013.847278.
Article PubMed Google Scholar
Connolly CP, Coe DP, Kendrick JM, Bassett DR, Thompson DL. Accuracy of physical activity monitors in pregnant women. Med Sci Sports Exerc. 2011;43:1100–5. https://doi.org/10.1249/MSS.0b013e3182058883.
Article PubMed Google Scholar
DiNallo JM, Downs DS, Le Masurier G. Objectively assessing treadmill walking during the second and third pregnancy trimesters. J Phys Act Health. 2012;9:21–8.
Article PubMed Google Scholar
Pedišić Ž, Bauman A. Accelerometer-based measures in physical activity surveillance: current practices and issues. Br J Sports Med. 2015;49:219–23. https://doi.org/10.1136/bjsports-2013-093407.
Article PubMed Google Scholar
Migueles JH, Cadenas-Sanchez C, Ekelund U, Delisle Nyström C, Mora-Gonzalez J, Löf M, et al. Accelerometer data collection and processing criteria to assess physical activity and other outcomes: a systematic review and practical considerations. Sports Med. 2017;47:1821–45. https://doi.org/10.1007/s40279-017-0716-0.
Article PubMed PubMed Central Google Scholar
The IPAQ Group. International Physical Activity Questionnaires (IPAQ). 2002. http://www.ipaq.ki.se. Accessed 10 Aug 2017.
Hopkins WG. Measures of reliability in sports medicine and science. Sports Med. 2000;30:1–15.
Article PubMed CAS Google Scholar
Passing H, Bablok W. A new biometrical procedure for testing the equality of measurements from two different analytical methods. Application of linear regression procedures for method comparison studies in clinical chemistry, Part I. J Clin Chem Clin Biochem. 1983;21:709–20.
PubMed CAS Google Scholar
Hustvedt B-E, Christophersen A, Johnsen LR, Tomten H, McNeill G, Haggarty P, Løvø A. Description and validation of the ActiReg: a novel instrument to measure physical activity and energy expenditure. Br J Nutr. 2004;92:1001–8.
Article PubMed CAS Google Scholar
Kumahara H, Schutz Y, Ayabe M, Yoshioka M, Yoshitake Y, Shindo M, et al. The use of uniaxial accelerometry for the assessment of physical-activity-related energy expenditure: a validation study against whole-body indirect calorimetry. Br J Nutr. 2004;91:235–43. https://doi.org/10.1079/BJN20031033.
Article PubMed CAS Google Scholar

Download references

Acknowledgements

Open access funding provided by University of Graz.

Author information

Authors and Affiliations

Institute of Sport Science, University of Graz, Graz, Austria
Matteo C. Sattler, Johannes Jaunig, Mireille N. M. van Poppel & Pavel Dietz
Centre for Exercise Science and Sports Medicine, Faculty of Health Sciences, School of Therapeutic Sciences, University of Witwatersrand, Private Bag 3, Johannesburg, 2050, South Africa
Estelle D. Watson
MRC/Wits Developmental Pathways for Health Research Unit, Department of Paediatrics, Faculty of Health Sciences, School of Clinical Medicine, University of Witwatersrand, Private Bag 3, Johannesburg, 2050, South Africa
Estelle D. Watson
Department of Public and Occupational Health, Amsterdam Public Health Research Institute, VU University Medical Center, Amsterdam, The Netherlands
Mireille N. M. van Poppel
Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands
Lidwine B. Mokkink & Caroline B. Terwee
Institute of Occupational, Social and Environmental Medicine, University Medical Centre, University of Mainz, Mainz, Germany
Pavel Dietz

Authors

Matteo C. Sattler
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Jaunig
View author publications
You can also search for this author in PubMed Google Scholar
Estelle D. Watson
View author publications
You can also search for this author in PubMed Google Scholar
Mireille N. M. van Poppel
View author publications
You can also search for this author in PubMed Google Scholar
Lidwine B. Mokkink
View author publications
You can also search for this author in PubMed Google Scholar
Caroline B. Terwee
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Dietz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matteo C. Sattler.

Ethics declarations

Funding

No sources of funding were used to assist in the preparation of this article.

Conflict of interest

Matteo Sattler, Johannes Jaunig, Estelle Watson, Mireille van Poppel, Lidwine Mokkink, Caroline Terwee, and Pavel Dietz declare that they have no conflicts of interest relevant to the content of this review. Caroline Terwee and Lidwine Mokkink are the developers of the Quality Assessment of Physical Activity Questionnaire (QAPAQ), and the COnsensus based Standards for the selection of health Measurement INstruments (COSMIN) checklist and methodology.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Not applicable.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 16 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Sattler, M.C., Jaunig, J., Watson, E.D. et al. Physical Activity Questionnaires for Pregnancy: A Systematic Review of Measurement Properties. Sports Med 48, 2317–2346 (2018). https://doi.org/10.1007/s40279-018-0961-x

Download citation

Published: 09 August 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s40279-018-0961-x

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Physical Activity Questionnaires for Pregnancy: A Systematic Review of Measurement Properties

Abstract

Background

Objectives

Methods

Results

Conclusions

Similar content being viewed by others

Effectiveness of interventions to increase device-measured physical activity in pregnant women: systematic review and meta-analysis of randomised controlled trials

Self-report Pregnancy Physical Activity Questionnaire overestimates physical activity

Reliability and concurrent validity of the International Physical Activity Questionnaire short form among pregnant women

1 Introduction

2 Methods

2.1 Literature Search

2.2 Eligibility Criteria

2.3 Selection of Articles

2.4 Data Extraction

2.5 Assessment of Measurement Properties

2.5.1 Content Validity

2.5.2 Reliability

2.5.3 Construct and Criterion Validity

2.5.4 Responsiveness

2.6 Quality of Individual Studies

2.7 Quality of Evidence

3 Results

3.1 Literature Search

3.2 Description of Questionnaires

3.3 Assessment of Measurement Properties

3.3.1 Content Validity

3.3.2 Reliability

3.3.3 Construct and Criterion Validity

3.3.4 Responsiveness

3.4 Quality of Individual Studies

3.5 Quality of Evidence

4 Discussion

4.1 Recommendations for Choosing a Questionnaire

4.2 Limitations and Strengths of this Review

4.3 Recommendations for Further Research

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Funding

Conflict of interest

Ethical approval

Informed consent

Electronic supplementary material

Supplementary material 1 (DOCX 16 kb)

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation