FormalPara Key Points

In children and adolescents, no self-report or proxy-report sedentary behavior questionnaires are available that are both valid and reliable.

To improve the methodological quality of future studies, researchers need to adopt standardized tools such as COSMIN for the evaluation of measurement properties. In addition, reviewers and journal editors should also take into consideration whether such tools have been used when evaluating research articles.

Content validity needs more attention to ensure that questionnaires measure what they intend to measure.

1 Introduction

Sedentary behavior is defined as activities performed in a seated or lying posture with very low energy expenditure (<1.5 metabolic equivalents [METs]) [1]. Sedentary behavior comprises a wide variety of activities, e.g. watching television, quiet play, passive transport, and studying. Excessive engagement in sedentary activities is seen in countries all over the world, i.e. 68 % of girls and 66 % of boys from 40 different countries in North America and Europe watch television for 2 or more hours per day [2]. Moreover, screen time seems to cover only a small part of the total time spent sedentary [3].

The relationship between sedentary behavior and health risks in children and adolescents is therefore of great interest. A recent review of reviews found strong evidence for an association between sedentary behavior and obesity in children [4]. Furthermore, moderate evidence for an association between blood pressure, physical fitness, total cholesterol, academic achievements, social behavioral problems, self-esteem, and sedentary behavior was found [4]. However, a major part of the existing evidence is based on cross-sectional studies, and subsequently no conclusion about causality can be drawn. Furthermore, sedentary behavior is often assessed using measurement instruments with inadequate or unknown measurement properties, and in some cases only screen time as an indicator of total sedentary time is assessed. Reviews examining the prospective relationship between sedentary behavior and different health outcomes concluded that there is no convincing evidence [5]. In addition, the evidence varied across type of measurement instrument and type of sedentary behavior [6].

Accelerometers and inclinometers are acknowledged as both valid and reliable instruments for measuring sedentary behavior in children and adolescents [79]; however, these measures are labor-intensive for researchers and are costly [10], and cannot provide information on the type and setting of sedentary behavior. Additionally, accelerometers cannot properly distinguish standing from sitting [11]. On the other hand, self- or proxy-report questionnaires are relatively inexpensive and easy to administer [10, 12]. Moreover, they can provide information on the type and setting of sedentary behavior. However, the use of questionnaires is not without limitations as social desirability and problems with accuracy of recall are factors of bias [12, 13].

A combination of objective measures, such as inclinometers providing information on duration and interruptions, and self-report providing information on the type and setting of sedentary behavior, would be optimal for measuring sedentary behavior. Different questionnaires for specific target populations have been developed, using different recall periods and formats, measuring different types and settings of sedentary behavior, and with different outcomes for measurement properties. This large variety of questionnaires available makes it difficult to choose the best instrument when conducting research; therefore, an overview of the measurement properties and characteristics of existing sedentary behavior questionnaires is highly warranted.

In 2011, Lubans et al. [7] reviewed studies examining the validity and reliability of questionnaires measuring sedentary behavior, indicating mixed results for both validity and reliability. As the amount of studies assessing the measurement properties of sedentary behavior questionnaires in children and adolescents has more than doubled since then, an update is required. Furthermore, an overview of the characteristics (e.g. target population, setting measured, recall period) of the included questionnaires was not incorporated in the review of Lubans et al., and studies in children under the age of 3 years were excluded [7]. Therefore, the aim of this review was to summarize studies that focused on assessing the measurement properties (e.g. validity, reliability, responsiveness) of self- or proxy-report questionnaires assessing (constructs of) sedentary behavior in children and adolescents under the age of 18 years, including a methodological quality assessment. Moreover, a summary of the questionnaire characteristics is provided.

2 Methods

This review was registered at PROSPERO, the international prospective register of systematic reviews (registration number CRD42016035963), and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines were followed.

2.1 Literature Search

Systematic literature searches were carried out using the PubMed, SPORTDiscus (complete database up until December 2015), and EMBASE (complete database up until November 2015) databases. In PubMed, search terms were used in ‘AND’ combination and related to the following topics: ‘sedentary behavior’, ‘children’, (e.g. child, childhood, sedentary time, prolonged sitting), and ‘measurement properties’ (e.g. reliability, reproducibility, validity, responsiveness). The search was limited to humans and a variety of publication types (e.g. case reports, biography) were excluded (by using the ‘NOT’ combination). Free-text, Medical Subject Heading (MeSH), and Title/Abstract (TIAB) search terms were used. In SPORTDiscus, search terms regarding ‘children’ and ‘sedentary behavior’ were used in ‘AND’ combination. Search terms were used as title and abstract words. In EMBASE, both TIAB and EMTREE ‘sedentary behavior’ and ‘measurement properties’ search terms were used in ‘AND’ combination, and the EMBASE limits for children (e.g. infant, child) were applied (‘AND’ combination). In addition, reference lists and author databases were screened for additional studies. The full search strategies can be found in electronic supplementary material Appendix S1.

2.2 Inclusion and Exclusion Criteria

Studies were included if they met the following criteria: (i) the study evaluated one or more of the measurement properties of a self- or proxy-report questionnaire, including sedentary behavior items; (ii) the aim of the questionnaire was to measure one or more of the constructs and dimensions of sedentary behavior; (iii) the average age of the study population was <18 years; and (iv) the study was published in the English language. Exclusion criteria were (i) studies examining questionnaires including physical activity and sedentary behavior items that had no separate score for sedentary behavior items; (ii) studies only reporting correlations between sedentary behavior constructs and non-sedentary constructs (e.g. correlation of self-reported or proxy-reported sedentary behavior with total activity counts measured by accelerometry); and (iii) studies evaluating the measurement properties of the questionnaire in a clinical sample.

2.3 Selection Procedures

Two reviewers (TA and LH) independently selected studies of potential relevance based on titles and abstracts. Thereafter, both reviewers checked whether the full texts met the inclusion criteria. A third reviewer (MC) was consulted when inconsistencies arose.

2.4 Data Extraction

Two independent reviewers (TA and LH) extracted data regarding the characteristics of the questionnaire under study, as well as the methods and results of the assessed measurement properties of the questionnaire, using structured forms. Disagreement between reviewers with respect to data extraction was discussed until consensus was reached.

Data regarding the questionnaire characteristics were extracted using the Quality Assessment of Physical Activity Questionnaire (QAPAQ) checklist, Part 1, which appraises the qualitative attributes of physical activity questionnaires [14]. Although originally developed for physical activity questionnaires, the QAPAQ checklist was also considered appropriate for sedentary behavior as physical activity and sedentary behavior questionnaires have similar structures and formats. Five of the nine checklist items were considered necessary to provide an informative summary of sedentary behavior questionnaires: (i) the constructs measured by the questionnaire, e.g. watching television, passive transport, quiet play, total sedentary behavior; (ii) the setting, e.g. at home, at school, leisure time; (iii) the recall period; (iv) the target population for whom the questionnaire was developed; and (v) the format, including the dimensions (i.e. duration, frequency), the number of questions, and the number and type of response categories. In addition, the following data regarding the methods and results of the assessed measurement properties were extracted: study sample, comparison measure, time interval, statistical methods, and results for each measurement property.

2.5 Methodological Quality Assessment

Methodological quality of the studies was assessed using a slightly modified version of the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist with a 4-point scale (i.e. excellent, good, fair, or poor) [1517]. Two independent reviewers (LH, and either MC, CT, or LM) assessed the methodological quality of each study, and disagreements were discussed until consensus was reached. The final methodological quality score was determined by applying the ‘worse score counts’ method (i.e. if one item was scored ‘poor’, the final score of the methodological quality was scored as ‘poor’) for each study separately.

Reliability, measurement error, internal consistency, and structural validity were rated using the designated COSMIN boxes, while convergent, criterion, and construct validity were rated as construct validity. None of the studies examined criterion validity, although this term was used in some studies that actually assessed construct validity. Content validity was not rated as too little information was available on the methods used for developing the questionnaire. Instead, a description of the questionnaire was included in the results section. None of the included studies examined the responsiveness of sedentary behavior questionnaires in children or adolescents.

One slight modification was applied to the original COSMIN, i.e. the percentage agreement was added as an excellent statistical method in the measurement error box as it is considered a parameter of measurement error rather than reliability [18]. For completing the reliability box, standards previously described by Chinapaw et al. [19] were used to assess the appropriateness of the time interval in a test–retest reliability study; i.e. (i) questionnaires recalling a usual week should have a time interval between >1 day and <3 months; (ii) questionnaires recalling the previous week should have a time interval between >1 day and <2 weeks; and (iii) questionnaires recalling the previous day should have a time interval between >1 day and <1 week.

2.6 Questionnaire Quality Assessment

2.6.1 Reliability

Reliability refers to the extent to which scores for persons who have not changed are the same, with repeated measurement under several conditions [20]. The outcomes regarding reliability of the included questionnaires were seen as acceptable in the following situations: (i) an outcome of >0.70 for intraclass correlations and kappa values [21]; or (ii) an outcome of >0.80 for Pearson and Spearman correlations as a result of not taking systematic errors into account [22]. For an adequate measurement error the smallest detectable change (SDC) should be smaller than the minimal important change (MIC) [21]. Internal consistency was considered acceptable when Cronbach’s alphas were calculated on unidimensional scales and were between 0.70 and 0.95 [21].

The majority of studies provided separate correlations for the different constructs of sedentary behavior, as presented in the questionnaire, e.g. providing separate correlations for watching television, passive transport, and reading. Therefore, to obtain a final reliability rating, an overall evidence rating was applied in the present review, incorporating all available correlations for each questionnaire per study. A questionnaire received a positive evidence rating (+) when there were ≥80 % acceptable correlations, a mixed evidence rating (+/−) when the acceptable correlations were ≥50 and <80 %, and a negative rating (−) when there were <50 % acceptable correlations. No evidence rating for measurement error could be conducted as information on the MIC is currently lacking for all included questionnaires, which is needed for interpretation of the findings. Therefore, only a description of results is given.

2.6.2 Validity

Validity refers to the degree to which a measurement instrument measures what it is supposed to measure [20]. Validity concerns three measurement properties, i.e. content validity, structural validity, and construct validity. Content validity refers to the degree to which the content of a questionnaire adequately reflects the constructs to be measured [20]; structural validity refers to the degree to which the scores of a questionnaire are an adequate reflection of the dimensionality of the construct to be measured [20]; and construct validity refers to the degree to which the scores of a measurement instrument agree with hypotheses, e.g. agreement with scores of another measurement instrument [20]. In case of structural validity, a factor analysis was considered appropriate if the explained amount of variance by the extracted factors was at least 50 % of when the comparative fit index (CFI) was >0.95 [21, 22]. However, as most of the included construct validity studies lacked a priori formulated hypotheses it was unclear what was expected, making it difficult to interpret these results. Table 1 presents the criteria for judging the results of construct validity studies. Level 1 indicates strong evidence, level 2 indicates moderate evidence, and level 3 indicates weak evidence, yet worthwhile to investigate further. Similar to the reliability rating, an overall evidence rating for construct validity was applied, incorporating all available correlations provided for each questionnaire per study. As no hypotheses for validity were available in relation to mean differences and limits of agreement, only a description of the results is included in Sect. 3.

Table 1 Constructs of sedentary behavior measured by the questionnaires evaluating construct validity, subcategorized by level of evidence and criteria for acceptable correlations

3 Results

A total of 3049, 4384, and 2016 studies were identified in the PubMed, EMBASE, and SPORTDiscus databases, respectively. After removing duplicates, 7904 studies remained. After screening titles and abstracts, 72 full-text papers were assessed for eligibility, of which 30 met the inclusion criteria. Another 16 studies were found through cross-reference searches. Eventually, 46 studies on 46 questionnaires were included (Fig. 1), of which 33 assessed test–retest reliability, nine assessed measurement error, two assessed internal consistency, 22 assessed construct validity, eight assessed content validity, and two assessed structural validity. Two of the included questionnaires were assessed by two studies, i.e. the Patient-Reported Outcome Measurement Information System [23, 24] and the Girls Health Enrichment Multi-site Studies Activity Questionnaire [25, 26]. In addition, multiple modified versions of questionnaires were examined by the included studies, i.e. two versions of the Canadian Health Measures Survey [27, 28], the Adolescent Sedentary Activity Questionnaire [29, 30], the International Physical Activity Questionnaire–Short Form [31, 32], and the Youth Risk Behavior Survey [34, 35]. Furthermore, three versions of the Self-Administered Physical Activity Checklist [3638] and the Health Behavior in School-aged Children were included [3941]. The remaining questionnaires were only examined by one single study.

Fig. 1
figure 1

PRISMA flow diagram of study inclusion process. PRISMA preferred reporting items for systematic reviews and meta-analyses

3.1 Description of Questionnaires

Electronic supplementary material Table S1 provides a description of the included questionnaires, stratified by age group, i.e. preschoolers younger than 6 years of age, children aged between 6 and 12 years, and adolescents from the age of 12 years. Of the included questionnaires, 8 were designed for preschoolers, 24 were designed for children, and 14 were designed for adolescents. Nineteen of the questionnaires merely focused on screen time, while 27 focused on a variety of constructs of sedentary behavior. Response categories were mostly categorical (e.g. Likert scale) or continuous (e.g. time spent, in hours and/or minutes). Recall periods varied across questionnaires, including past few months, last week, previous day, and a usual/habitual/typical day/week.

3.2 Test–Retest Reliability

Table 2 summarizes the test–retest reliability studies, of which four were in preschoolers, 18 were in children, and 11 were in adolescents and older children. None of the studies received an excellent methodological quality rating, 9 had a good rating, 17 had a fair rating, 6 had a poor rating, and 1 of the studies received both a fair rating and a poor rating due to the use of multiple time intervals. A small sample size and no description of how missing items were handled were the major reasons for the low methodological quality ratings. In preschoolers, the Energy Balance-Related Behaviors self-administered primary caregiver questionnaire [42] seemed the most reliable, currently available questionnaire for assessing sedentary behavior, although the methodological quality of this study was only rated as fair and the evidence was mixed. For children and adolescents, the most reliable, currently available questionnaires were the Sedentary Behavior and Sleep Scale [43] (i.e. good methodological quality, mixed evidence rating) and the Adolescent Sedentary Activity Questionnaire (Brazilian version) [30] (i.e. fair methodological quality, positive evidence rating), respectively.

Table 2 Test–retest reliability of sedentary behavior questionnaires for youth sorted by age category, methodological quality, and evidence rating

3.3 Measurement Error

Table 3 shows an overview of the nine studies that assessed the measurement error of questionnaires. One of the included measurement error studies received a good methodological quality rating, while eight of the studies received a fair rating, predominantly due to the lack of describing how missing items were handled. The questionnaires showing the highest percentage of agreement between two measurements are the ‘Questionnaire for measuring length of sleep, television habits and computer habits’ [44], and the ‘Measures of out-of-school sedentary and travel behaviors of the iHealt(H) study’ [45], for children and adolescents, respectively.

Table 3 Measurement error of sedentary behavior questionnaires for youth, sorted by age category and methodological quality

3.4 Internal Consistency

Internal consistency was analyzed in two of the included studies, demonstrating acceptable Cronbach’s alphas (i.e. 0.75 for the unidimensional sedentary lifestyle subscale [35], and 0.78 for the unidimensional sedentary behavior subscale [46]). The methodological quality was rated as good and excellent, respectively.

3.5 Construct Validity

Of the included construct validity studies, 3 included preschoolers as a study population, 13 studies included children, and 6 studies included adolescents and older children. Table 4 summarizes the construct validity studies (n = 21) examining the relationship of the questionnaire with other measurement instruments. None of these studies received an excellent or good methodological quality rating, 5 received a fair rating, and 16 were rated as poor. Major reasons for the low methodological quality scores were both the lack of a priori formulated hypotheses and the use of comparison measures with unknown measurement properties. In preschoolers, the Direct Estimate [47] seemed the most valid, currently available, sedentary behavior questionnaire as it received a positive level 2 evidence rating and a fair methodological quality rating. In children, the Youth Activity Profile [52] seemed the most valid questionnaire as it received a positive level 2 evidence rating and a fair methodological quality. Studies in adolescents only received negative evidence ratings, thus no final conclusion regarding the most valid sedentary behavior questionnaires can be drawn. One of the construct validity studies was not included in Table 4 [46] as it examined construct validity by testing a hypothesis with regard to differences in scores between known groups. On the Energy Retention Behavior Scale, scores for known group validity demonstrated statistically significant higher scores for overweight or obese children than for underweight or normal-weight children, which was in line with the a priori hypothesis.

Table 4 Validity of sedentary behavior questionnaires for youth, sorted by age category, methodological quality, and level of evidence and evidence rating

3.6 Structural Validity

Two of the included studies analyzed the structural validity of the questionnaire, i.e. the Korean Youth Risk Behavior Survey (KYRBS) [35] and the Energy Retention Behavior Scale for Children (ERB–C scale) [46]. Structural validity was assessed by performing confirmatory factor analysis. The KYRBS includes five subscales, including one sedentary lifestyle subscale, while the ERB–C scale includes two subscales, one of which is sedentary behavior. Both studies showed acceptable fit of the expected factor structures, i.e. Normed Fit Index (NFI) 0.960, Turker–Lewis Index (TLI) 0.956, CFI 0.969 and root mean squared error of approximation (RMSEA) 0.034 for the KYRBS [35], and NFI 0.91, non-NFI (NNFI) 0.92, CFI 0.95, and RMSEA 0.08 for the ERB–C scale [46]. The methodological quality was rated as good and excellent, respectively.

3.7 Content Validity

Eight studies evaluated the content validity of the questionnaire, of which four predominantly focused on the comprehensibility of the questionnaire by asking children or parents about, for example, terminology, appropriateness of reading level, ambiguity, and other difficulties [29, 44, 46, 48]. The other four studies focused on the content of the questionnaire by consulting experts, e.g. researchers active in the field of physical activity, about, for example, relevance of items [30, 44, 46, 48]. Due to the minimal information about the procedures available in the greater part of the included studies, it was impossible to assess the quality of the content validity studies and to thus interpret the results. In addition, in seven of the included studies, pilot testing of the questionnaire for comprehensibility was incorporated. Unfortunately, too little information was provided to assess the methodology of the content validity examination [33, 38, 45, 4952]. Additionally, translation processes were mentioned in six [24, 30, 42, 45, 53, 54] of the included studies. Due to minimal information about the methods used, the quality of the greater part of these studies was unclear.

4 Discussion

The aim of this review was to summarize existing evidence on the measurement properties of self-report or proxy-report questionnaires assessing sedentary behavior in children and adolescents under the age of 18 years. Additionally, we summarized the characteristics of the included self-report and proxy-report questionnaires. Our summary yielded a wide variety of questionnaires, designed for different target populations and assessing different constructs and dimensions of sedentary behavior. Test–retest reliability correlations of the included questionnaires ranged from 0.06 to 0.97. In addition, correlations found for construct validity ranged from −0.16 to 0.84. Although a number of studies received a positive evidence rating for test–retest reliability or construct validity, the methodological quality of the studies was mostly rated as fair or poor. Unfortunately, no questionnaires assessing total sedentary behavior or other constructs of sedentary behavior with both a positive evidence rating for reliability and validity were available. Hence, we have no conclusive recommendation about the best available sedentary behavior self-report or proxy-report questionnaire in children and adolescents.

4.1 Reliability and Measurement Error

As the methodological quality of the included studies assessing test–retest reliability and/or measurement error was mainly rated as fair or poor, no definite conclusion can be drawn about the reliability of the majority of the examined sedentary behavior questionnaires. Moreover, the lack of multiple studies assessing the same questionnaire in the same target population further limited the ability to draw final conclusions. To achieve higher methodological quality for both reliability and measurement error, we recommend that future studies include detailed descriptions of the methods used, e.g. how missing items were handled, and to include an appropriate sample size [15, 17]. Additionally, as correlations varied across different recall periods (e.g. usually, or yesterday), and different time frames and constructs of sedentary behavior (e.g. weekdays and weekend days, overall sedentary behavior, and watching television), no conclusion can be drawn about specific time frames or constructs of sedentary behavior being more reliable than others. Additionally, when measurement errors occur, information on the MIC should be available to allow interpretation of the results [21]. To the best of our knowledge, no information on the MIC is available as yet.

4.2 Construct Validity

Due to the low methodological quality of the included studies examining validity, and the lack of multiple studies assessing the same questionnaire, no conclusive conclusion can be drawn about the validity of the examined questionnaires. We specifically recommend future validity studies to describe a priori hypotheses, and choose comparison measures with known and acceptable measurement properties. The low methodological quality of all included validity studies might partly explain the high prevalence of negative evidence ratings, i.e. <50 % acceptable correlations.

Studies demonstrating acceptable correlations often used comparison measures providing weaker levels of evidence, i.e. other questionnaires or cognitive interviews (level 3 evidence). In general, higher correlations were found when lower levels of evidence comparison measures were used. A possible explanation might be the equivalence of dependence on recall in both the questionnaire under study and the comparison measure, i.e. other questionnaires or cognitive interviews, compared with objective, higher levels of evidence comparison measures, e.g. inclinometers and accelerometers. Other potential factors that may explain the low correlations may be inadequate content validity, the lack of a gold standard, and a mismatch in time frames between questionnaire and comparison measures. As the studies lack information about the development of the questionnaires (e.g. a justification of the constructs included, and the dimensions measured), and lack appropriate testing of the relevance, comprehensiveness, and comprehensibility of the content of the questionnaires, it remains unclear whether the content validity of the included questionnaires is acceptable. Evaluating the content validity of questionnaires is essential to obtaining insight into the comprehensibility of the questionnaire for the target population, and to ensure all relevant aspects of the construct are measured and that no irrelevant aspects are included [20]. Without evaluating these aspects of validity, there is no certainty the questionnaire measures what it is supposed to measure. The limited attention to content validity is also shown by the wide variety of constructs (e.g. watching television, quiet play, studying), and dimensions (e.g. duration and frequency) being measured by the included questionnaires. A justification of these choices is lacking. Only two studies, by Tucker et al. [23, 24], provided sufficient description and support for the development of their questionnaire, e.g. experts of the field and the target population were consulted and contributed to the content of the questionnaire.

Furthermore, studies using a translated version of an existing questionnaire often did not report sufficient information about the translation processes. Only the studies by de Fátima Guimarães et al. [30] and Tucker et al. [24] included adequate descriptions of the translation process, e.g. translations by language experts, and review by experts in the field. Moreover, cross-cultural validation of the translated questionnaires was often not conducted, making it impossible to examine whether the questionnaire truly measured the same constructs as the original questionnaire [22].

Additionally, the available objective measures of sedentary behavior, e.g. inclinometers or accelerometers, are still subject to subjectivity, e.g. the definition of non-wear time, the minimum number of valid hours per day and number of valid days, and the selection of a cut point for sedentary behavior remain subjective decisions. The accelerometer cut points for sedentary behavior in the included studies varied from <100 to <699 cpm, leading to different estimates of sedentary time. Importantly, constructs measured by questionnaire and accelerometer may not correspond when cut points deviating from <100 cpm are applied [55] as measured constructs may not match, i.e. they may exclude parts of sedentary time or include light physical activity, respectively. The problem of mismatched constructs also occurs in some cases due to non-corresponding time frames addressed by the measurement instrument and the comparison measures, e.g. leisure time versus all day.

4.3 Strengths and Limitations

A major strength of our review is that the methodological quality rating was performed separately from the interpretation of the findings. This makes the final evidence rating more transparent, e.g. whether negative evidence ratings are due to low-quality questionnaires in case of good or excellent methodological quality studies, or may be biased, in case of poor methodological quality. Additionally, through structured cross-reference searches, we also included studies that were not primarily aimed at examining measurement properties. Another strength is that at least two independent authors conducted the literature search and data extraction, as well as the quality rating. However, our review also has limitations. As most included studies did not report all details needed for an adequate quality rating, the quality ratings of the studies may have been underestimated. We did not contact authors for additional information as this would favor recent studies over older studies, thereby optimizing quality ratings of recent papers. Furthermore, only English-language papers were included, and as a result we might have missed relevant studies. Moreover, in some studies that were found through cross-reference searches, examining the measurement properties was not the primary aim. There is a possibility that not all such studies were found through cross-reference searches, yet finding these studies through systematic literature searches seems impossible as information on the assessment of measurement properties or sedentary behavior assessment by the questionnaires is lacking in the titles and abstracts.

4.4 Recommendations for Future Studies

Studies focusing on the development of questionnaires need to pay more attention to content validity. Moreover, the content validity of currently available questionnaires needs to be examined by testing the relevance, comprehensiveness, and comprehensibility of the content of the questionnaires, using appropriate qualitative methods [22]. The COSMIN group is currently developing detailed standards for assessing content validity of health status questionnaires, which may also be useful for assessing content validity of sedentary behavior questionnaire (see for more information). Criteria that, in our opinion, need to be considered are (i) a clear description and adequate reflection of the construct to be measured; (ii) comprehensibility of questions; (iii) appropriate response options; (iv) appropriate recall period; (v) appropriate mode of administration; and (vi) an appropriate scoring algorithm. A justification of choices needs to be provided, for example based on input from experts in the field and the target population.

More high-quality research on construct validity, reliability, measurement error, and responsiveness of the questionnaire is also needed, as well as studies on internal consistency and structural validity for questionnaires where this is applicable. To acquire high methodological quality studies, we recommend using a standardized tool, e.g. the COSMIN checklist [16, 56]. This tool can be used for the design of the study and provides an overview of what should be reported. Additionally, we recommend that when reviewers and journal editors evaluate studies, they take into consideration whether the investigators used such a standardized tool in order to prevent publishing of studies with inadequate information and low methodological quality. This need for a standardized tool for the assessment of measurement properties is consistent with recommendations by Kelly et al. [57].

In addition, for the construct validity of questionnaires assessing total sedentary time, we recommend using more objective, high-level evidence, comparison measures with available and acceptable measurement properties, e.g. inclinometers or accelerometers, instead of using measurement instruments with unknown or unacceptable measurement properties. Furthermore, appropriate accelerometer cut points for sedentary behavior need to be applied, e.g. <100 cpm [55, 58]. However, as the accuracy of accelerometers for measuring sedentary behavior remains questionable, and distinguishing sitting from standing quietly remains problematic [11], we recommend using the activPAL as an objective comparison measure for total sedentary time [9]. Importantly, the questionnaire in use and the comparison measure need to measure corresponding constructs and/or time frames. Additionally, stating a priori hypotheses should be carried out at all times to ensure unbiased interpretation of the results.

Finally, as a wide variety of questionnaires are available, we recommend researchers to critically review whether existing or slightly modified questionnaires are adequate for use in new studies, instead of developing new questionnaires. Moreover, we recommend authors of papers on measurement properties include the questionnaire under study and provide more details about its characteristics, e.g. questions and response options, so that researchers can assess whether existing questionnaires are adequate for their research.

5 Conclusions

None of the self- or proxy-report sedentary behavior questionnaires for children and adolescents included in this review were considered both valid and reliable. Whether this is due to the low methodological quality of the included studies or to poorly developed questionnaires is unclear. In addition, the lack of multiple studies assessing both the validity and reliability of a questionnaire in the same study population also hampered our ability to draw a definite conclusion on the best available instruments. Therefore, we recommend more high-quality studies examining the measurement properties of the most promising sedentary behavior questionnaires. Acquiring high methodological quality can be obtained by using standardized tools such as the COSMIN checklist [16].