Background

Sleep, sedentary behaviour (SB), and physical activity (i.e., 24-h movement behaviours) are important determinants of health and well-being [1, 2]. Research shows that sufficient sleep duration, less SB, and greater physical activity are associated with a decreased risk of numerous chronic non-communicable diseases including cardiovascular disease, type 2 diabetes, cancer, mental disorders, and all-cause mortality [2,3,4,5].

Movement behaviours have been traditionally examined and promoted in isolation from each other. However, a recent recognition that time spent in sleep, SB, and physical activity are exhaustive and mutually exclusive parts of any time period (e.g., 24-h day) has shifted the paradigm towards examining movement behaviours in a combination [6,7,8]. Moreover, particular concern has been drawn to the methodological shortcomings of most previous epidemiological studies on time spent in movement behaviours that examined specific movement behaviour in isolation while violating the assumptions of statistical methods used [9, 10]. Data quantifying time spent in movement behaviours are specific type of data (i.e., compositional data), and their specific mathematical properties need to be respected by using sound statistical methods (i.e., compositional data analysis). In compositional data, relevant information is in the relative distribution of the components, which indicates that the components need to be examined in a combination [9,10,11]. Therefore, there is a need for research tools that simultaneously assess movement behaviours across the whole 24-h day.

This novel paradigm has already been adopted by some public health authorities who also recognised the importance of promoting healthy movement behaviours in an integrated way and developed 24-h movement guidelines [12,13,14,15,16,17,18,19]. According to such guidelines, it is recommended to engage in moderate- to vigorous-intensity physical activity (MVPA) for at least 150 min per week, in light-intensity physical activity (LPA) for several hours per day, to avoid SB to the extent that total daily duration do not exceed eight hours per day, while getting between seven and nine hours of sleep. To monitor population prevalence and trends of adherence to the novel 24-h movement guidelines, surveillance systems need to be adapted accordingly [20].

The assessment of 24-h movement behaviours is also needed for individual level counselling, prescription, and referral of guiding discussions regarding behavioural change. Such treatments could be conducted in clinical care settings and community programs for healthy lifestyle promotion, disease prevention and management as well as in occupational and school settings. It has been advocated that integrated 24-h movement paradigm cater to individual differences (e.g., physical abilities, preferences) and offer a wide variety of counselling options on behavioural change (e.g., trading SB for LPA and/or MVPA only, while keeping sleep unchanged), that can bring health benefits [18]. However, a recent scoping review on features, perceptions, and effectiveness of tools to guide discussions on physical activity, SB, and/or sleep between health care providers and patients showed that tools to guide discussions on integrated 24-h movement behaviours are lacking [21].

Therefore, simultaneous assessment of sleep, SB, and physical activity is needed for research, policy, and practice. Such assessment can be conducted using device-based methods (e.g., accelerometers, inclinometers), self-reported methods (e.g., questionnaires, diaries), or using a combination of both methods (e.g., using sleep time diary to inform sleep detection algorithms for accelerometer data [22]). Both groups of measurement methods show certain strengths and weaknesses; and the choice of the measurement tool is usually guided by the level of reliability and validity required for specific purpose of use, resources available, feasibility, practicality, acceptability, sustainability, and the need to provide immediate feedback [23,24,25,26,27]. While device-based methods have advantages of providing more valid estimates, self-reported methods present lower costs, lower burden, and higher compliance. Self-reported methods can also provide contextual information on movement behaviours (e.g., where, with whom) and estimates of movement behaviours from more distant past. Self-reported tools are therefore indispensable in large-scale epidemiological studies, population surveillance, and practice [21, 26, 28].

Most of the self-reported tools were developed for assessment of only one or two movement behaviours [29,30,31,32,33,34,35,36,37,38], and to the best of our knowledge, self-reported tools for assessment of overall 24-h movement behaviours are scarce. While sleep, SB, and physical activity can be assessed using a combination of different self-reports, such an approach might be compromised and/or being inconvenient as different self-reports may have different recall periods, administration guidelines, or instructions to complete the items. It is also less likely that the sum of all movement behaviours assessed using different tools would equal 24 h (or other finite total). This might be of particular concern when using compositional data analysis that closes composition to the finite total [11] and proportionally rescale the components that do not add to the finite total (e.g., 24 h). Rescaling the data is likely to change measurement properties that need re-evaluation. Therefore, the aim of this study was to identify validated self-reported tools for assessment of movement behaviours across the whole 24-h day, and to review their attributes (movement behaviours being assessed including temporal and contextual information, accounting for a 24-h day, recall period, number of questions) and quantitative measurement properties (construct validity, test-retest reliability, responsiveness).

Methods

This systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [39], and it was registered in the International Prospective Register of Systematic Reviews (PROSPERO) with a registration number CRD42022330868. The review protocol can be accessed on the PROSPERO website (https://www.crd.york.ac.uk/prospero/).

Eligibility criteria

We included studies that met the following criteria: (i) published in English language, (ii) published in a peer-reviewed journal, (iii) reported assessment of time spent in sleep, SB, and physical activity using a single self-reported tool (without any restriction regarding the mode of administration), (iv) reported construct validity (i.e., the extent to which an instrument provides comparable measures to other validated instrument that measure the construct of interest [40]), test-retest reliability (i.e., the extent to which an instrument provide measures that are consistent from one test administration to the next [40]), or responsiveness (i.e., the ability of an instrument to detect change over time in the construct to be measured [40]) of self-reported estimates of movement behaviours across the full 24-h day, and (v) included adolescents (aged 12 to 17 years), adults (aged 18 to 64 years), or older adults (aged 65 years and older). No limitations regarding the sample size and health status of participants were applied. We excluded studies that reported validity by comparing measures of different constructs (e.g., comparing self-reported MVPA with physical fitness test score), secondary data analysis studies, reviews, and meta-analysis.

Literature search and study selection

A literature search was performed in databases of PubMed, Scopus, and SPORTDiscus. The primary search query combined terms: movement behaviours, self-reported method, and validity/reliability (Supplementary Table 1). The search with no publication time limits was performed in May 2022, and updated in September 2023.

All hits from the databases were transferred to the Mendeley Desktop Reference Management Program. After removing the duplicates, three authors (AŠ, LE, and KK) independently screened the titles and abstracts for eligibility. Afterwards, two authors (AŠ or LE, and KK) independently screened the full texts of potentially relevant articles for the final decision on study inclusion. Disagreements between authors were resolved through discussion and consensus. If there were any uncertainties, the fourth author (NŠ) was consulted. Additionally, to identify any relevant articles that might be missed by our primary search query, we performed a backward and forward citation searching, screened relevant reviews and meta-analysis that were identified through primary search query, and authors’ archive of references. Also, we conducted a secondary search that combined terms: title of the tool (tools that were identified during primary search) and validity/reliability.

Data extraction

Data from the included studies were extracted by two authors (AŠ or LE) and checked by the third author (KK). Disagreements between authors were resolved through discussion and consensus. If there were any uncertainties, the fourth author (NŠ) was consulted. The following information were extracted: (i) first author, (ii) year of publication, (iii) title of the self-reported tool, (iv) type of the self-reported tool, (v) movement behaviours assessed using self-reported tool, (vi) whether and how self-reported tool accounted for the finite sum of daily time spent in sleep, SB, and physical activity, (vii) number of questions, (viii) recall period, (ix) sample characteristics (i.e., sample size, proportion of females, mean age), (x) language of evaluated self-reported tool, (xi) reference tool used, (xii) time interval between two administrations, (xiii) construct validity indicators, (xiv) test-retest reliability indicators, and (xv) responsiveness indicators.

Quality assessment

The methodological quality of included studies was assessed using the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) checklist [41]. This checklist assesses the appropriateness of study design and statistical methods used in individual studies on measurement properties. The quality of the studies for evaluating validity and reliability was assessed using the 4-point scale (i.e., excellent, good, fair, poor) for each of the checklist items, while the final quality score was assessed using the “worst score counts” principle [42]. The COSMIN checklist used is available in Supplementary Tables 2 and 3. The quality assessment was done independently by two authors (AŠ or LE, and KK). Disagreements between authors were resolved through discussion and consensus. If there were any uncertainties, the fourth author (NŠ) was consulted.

Data presentation and interpretation

The data were narratively presented in tables and arranged according to the type of self-reported method (i.e., questionnaires, time-use recalls, time-use diary). Self-reported tools were listed alphabetically. The first table contains data on attributes of self-reported tools, the second table contains data on construct validity, and the third table contains data on test-retest reliability of self-reported tools.

Criteria on interpreting construct validity and test-retest reliability correlation coefficients were set a priori and were based on the findings from previous systematic reviews on measurement properties of physical activity and SB self-reports (Spearman/Pearson correlation coefficients for construct validity usually range from approximately 0.30 to 0.50 [29, 32, 34]; and Intraclass correlation coefficients (ICC) for test-retest reliability usually range from approximately 0.50 to 0.80 [29, 32, 34]). Convergent validity correlation coefficients were interpreted as: 0 to 0.20 as poor; 0.21 to 0.40 as fair; 0.41 to 0.60 as moderate; 0.61 to 0.80 as substantial; 0.81 to 1.00 as nearly perfect [43]. Test-retest reliability correlation coefficients were interpreted as: 0 to 0.49 as poor; 0.50 to 0.74 as moderate; 0.75 to 0.89 as substantial; 0.90 to 1.00 as nearly perfect [44].

Results

Our search query returned 2064 records (Fig. 1). After removing the duplicates, we screened titles and abstracts of 1507 records. We identified 56 potentially relevant articles and assessed their full texts for eligibility. Of these, we excluded 2 articles that included only children aged 11 years or less [45, 46], 17 articles on tools that do not assess all components across the full 24-h day [47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63], 12 articles that reported measurement properties of some but not all components across the full 24-h day [64,65,66,67,68,69,70,71,72,73,74,75], 7 articles that reported only measurement property of assessing total daily energy expenditure [76,77,78,79,80,81,82] and integrated movement behaviours score [83], 1 article that reported secondary data analysis [84], and 7 review articles [21, 38, 85,86,87,88,89]. Additional seven records were identified through other sources. Finally, 16 articles that reported measurement properties of 12 unique self-reported tools were included in our review [90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105]. There was 100% agreement on the selection of all 16 included articles between the two reviewers (AŠ or LE, and KK).

Fig. 1
figure 1

PRISMA flow diagram on the systematic search process

Attributes of self-reported methods

We identified eight unique self-reported questionnaires, three time-use recalls, and one time-use diary (Table 1). Most questionnaires ask about sleep, and domain-specific SB and physical activity, and account that their sum should be 24 h. The latter is achieved by subtracting non-SB estimates from 24 h to obtain SB (Simple Physical Activity Questionnaire [SIMPAQ]) or by assigning the “remaining time to 24 hours” to SB (Japan Public Health Center-based prospective study- physical activity questionnaire [JPHC-PAQ], Physical activity questionnaire [PAQ], Sedentary Time and Activity Reporting Questionnaire [STAR-Q]) or LPA (Daily Activity Behaviours Questionnaire [DABQ], Physical Activity Scale 2 [PAS 2]). The same method of accounting for 24-h day is also used in one time-use recall (7-Day Physical Activity Recall [7D PAR]), while other two time-use recalls (Computer-Based 24-Hour Physical Activity Recall [cpar24], Multimedia Activity Recall for Children and Adults [MARCA]) are computerised and features of the program ensure/facilitate complete data entry (i.e., 24 h of activities). In computerised time-use recalls, a responder is asked to report activities in the order that they were performed during the day by choosing from a custom compendium of activities. Similarly, in time-use diary (Time-use diary from the Harmonised European Time Use Study [TUD HETUS]) a responder is asked to record activities (in their own words) in the order that they were performed during the day. In computerised time-use recalls and in time-use diary, reported daily activities are converted into sleep, SB, and physical activity by using a compendium of physical activities.

Table 1 Attributes of self-reported measurement tools

Self-reported measurement tools differ substantially regarding recall period (ranging from the past day to the past year) and comprehensiveness (number of questions for questionnaires ranging from 4 to 88, while time-use recalls and time-use diary record activities over the past day to the past week). Most questionnaires assess total sleep time, domain-specific SB, and domain- and intensity-specific physical activity (DABQ, JPHC-PAQ, PAQ, PAQ SCCS, PAS 2, STAR-Q, 24HMBQ). Most questionnaires also assess at least some specific types of SB and physical activity (Table 1), while only some questionnaires assess sleep timing (DABQ, SIMPAQ, 24HMBQ), movement behaviours at weekdays/weekend days separately (DABQ, 24HMBQ), and social and physical context for some activities (STAR-Q). Time-use recalls (cpar24, MARCA) and time-use diary (TUD HETUS) provide detailed data on specific types of activities and the timing of activities performed during the 24-h day. The TUD HETUS also provide a social and physical context for all reported activities and the level of enjoyment while engaging in activities.

Validity of self-reported methods

A total of 11 studies evaluated validity of 10 self-reported tools for assessment of 24-h movement behaviours among adults (Table 2). Two studies were ranked with an excellent quality [92, 94], three with a good quality [93, 101, 103], three with a fair quality [91, 95, 96], and three with a poor quality [90, 97, 98]. Device-based method was used as a reference method in seven studies, while four studies used another self-reported method to evaluate validity. All but one study used Pearson/Spearman’s correlation coefficient between self-reported method and reference method. Some studies also reported Intraclass correlation coefficient and/or Bland-Altman statistics (e.g., mean difference, limits of agreement).

Table 2 Validity of self-reported measurement tools

Studies aggregated self-reported movement behaviours in a diversity of 24-h time-use compositions before being validated. Most studies validated daily time spent in sleep, SB, and physical activity of different intensities, one study validated domain-specific movement behaviours [90], and one study validated time spent in “super domains” [92]. All studies validated each component (i.e., aggregated self-reported movement behaviours) of 24-h time-use composition in isolation. For example, validity correlation coefficients for sleep time ranged between 0.22 and 0.69, for SB between 0.06 and 0.57, for LPA between 0.18 and 0.46, and for MVPA between 0.38 and 0.56.

Reliability of self-reported methods

A total of 11 studies evaluated reliability of 10 unique self-reported tools for assessment of 24-h movement behaviours among adults (Table 3). Three studies were ranked with a good quality [91, 94, 98], seven with a fair quality [95, 96, 99, 100, 102, 104, 105], and one with a poor quality [90]. Studies differed substantially regarding time interval between test and retest administrations (ranged between > 4 h to 15 months). Four studies used Intraclass corelation coefficient to evaluate test-retest reliability, while seven studies reported only Pearson/Spearman’s correlation coefficient. Some studies also reported Bland-Altman statistics (e.g., mean difference, limits of agreement). All studies evaluated test-retest reliability of each component of 24-h time-use composition in isolation. For example, reliability correlation coefficients for sleep time ranged between 0.41 and 0.92, for SB between 0.33 and 0.91, for LPA between 0.55 and 0.94, and for MVPA between 0.59 and 0.94.

Table 3 Reliability of self-reported measurement tools

Discussion

This systematic review identified 12 validated tools – eight questionnaires, three time-use recalls, and one time-use diary – for assessment of movement behaviours across the whole 24-h day. Most self-reported tools were designed for assessment of sleep, and domain-specific SB and physical activity, and generally showed adequate validity and/or reliability to be used in large-scale epidemiological studies and population surveillance.

Most self-reported tools included in our review showed comparable validity (DABQ, JPHC-PAQ, PAS 2, STAR-Q, 24HMBQ, 7D PAR, cpar24, TUD HETUS) and/or reliability (DABQ, JPHC-PAQ, PAQ, PAS 2.1, STAR-Q, SIMPAQ, 24HMBQ, cpar24, MARCA) correlation coefficients with the validity and/or reliability of most self-reported tools for assessment of a single movement behaviour [29,30,31,32,33,34,35,36,37]. The highest validity was observed for time-use diary TUD HETUS (r range: 0.55 to 0.92), and the highest test-retest reliability for short questionnaire SIMPAQ (rho range: 0.78 to 0.95) and computerised time-use recall MARCA (ICC range: 0.89–0.99). Higher validity of time-use diaries compared to other self-reports that rely on recalling more distant activities has been reported previously [106], and it was suggested that higher validity is associated with diminished recall bias. The highest reliability was observed in two studies that administered self-reported tool twice within the same day [100, 105], while the lowest reliability in a study where time interval between two administrations was more than one year [90]. However, it has been proposed that adequate time interval between two administrations is more than one day (to avoid recalling answers from the first administration), but less than three months for most tools (to guarantee sufficient stability of a behaviour per se) [107]. Therefore, reliability findings in these studies might differ if using adequate time intervals.

Self-reported time spent in sleep, SB, and physical activity is usually under- or over-estimated [29,30,31,32,33,34,35,36,37], which lead to sum of behaviours that do not equal to 24 h. However, most of the self-reported tools included in our review accounted that a sum of behaviours should add to 24 h by using different approaches. Some tools assigned the “remaining time to 24 hours” to either SB or LPA (DABQ, JPHC-PAQ, PAQ, PAS 2, STAR-Q, 7D PAR), one questionnaire (SIMPAQ) provided an alternative method for calculating SB by subtracting non-SB estimates from 24 h, two computerised recalls (cpar24, MARCA) ensured complete data entry by specific features of the program, and time-use diary (TUD HETUS) encourage responder to report activities during the 24-h period by providing a pre-defined recording fields. Most tools that used such approaches showed at least fair validity (DABQ, JPHC-PAQ, STAR-Q, 7D PAR, cpar24, TUD HETUS) and/or reliability (DABQ, JPHC-PAQ, PAQ, PAS 2.1, STAR-Q, SIMPAQ, cpar24, MARCA) for all movement behaviours examined. This is of great importance especially for studies that use compositional data analysis, since all components of time-use composition (e.g., 24-h movement behaviours composition consisting of time spent in sleep, SB, and physical activity) need to have adequate validity and/or reliability so that the time-use composition can be considered as valid and/or reliable.

By using COSMIN checklist, we found that only five validity studies (Table 2) and three reliability studies (Table 3) were ranked with at least good quality. The most frequent reasons for compromised quality of validity studies were poor selection of a reference measure, and insufficient sample size. The most frequent reasons for compromised quality of reliability studies were insufficient description of test and retest conditions, less appropriate use of statistical methods, less optimal time interval between two administrations, and a lack of description how missing data were handled. The COSMIN checklist (Supplementary Tables 2 and 3) can be used in future validation studies to guide methodological decisions in order to achieve high quality of the study. However, a careful consideration should be given to the choice of a reference measure since it is currently largely unknown which tools for assessment of 24-h movement behaviours could be considered as the best reference measure [108,109,110,111]. Accelerometers were frequently proposed to be a “reasonable gold standard” for assessment of free-living movement behaviours; however, hip placement is a preferred location for accurate assessment of physical activity [112], while thigh placement for SB [113], and wrist placement for sleep duration [114]. To avoid usage of multiple accelerometers, it was proposed that the best compromise might be using the same accelerometer at the hip during wake time and at the wrist during bedtime [115], or to combine thigh-worn accelerometer with sleep time diary [116]. The latest method was used in two studies included in our review [93, 94], while other studies used accelerometer placed on the chest, waist, or wrist [96, 97, 101, 103], wearable camera [92], or other self-reported method [90, 91, 95, 98].

Another important consideration is use of statistical analysis. Since data quantifying time spent in movement behaviours are compositional data, a recent study questioned the appropriateness of using Pearson/Spearman’s correlation coefficient and Intraclass correlation coefficients in studies examining validity and reliability of movement behaviours estimates [94]. Although those methods are recommended by COSMIN checklist, they are not intended for compositional data [7]. It has been warned that using traditional statistical methods (that were developed for data in real space) when dealing with compositional data (that lay in a constrained simplex space), may produce misleading results [9, 117]. To the best of our knowledge, compositional data analysis for evaluating validity and reliability of movement behaviours estimates are lacking. Therefore, to further support the development of 24-h movement behaviours research, there is a need to develop statistical analysis suitable for evaluation of validity and reliability of compositional data.

Considerations for research, policy, and practice

The choice of the measurement tool depends on the objective of the study, measurement characteristics of a tool, and resources available. Several decision matrix guides to selecting physical activity or SB measurement tools have been described previously [23, 24, 27]. When selecting the tool, the first step is usually to identify which domains and dimensions of movement behaviours are of interest, and for what purpose data are collected (e.g., study design, individual level counselling). Then, a careful consideration regarding measurement characteristics of the tools (e.g., validity, reliability) and resources available (e.g., cost, time available for administration) is needed.

If the purpose is epidemiological research on the relationship between 24-h movement behaviours and health outcomes, then the important measurement characteristics are strong validity correlations and low random error [23]. However, if the purpose is to assess movement behaviours in longitudinal or intervention studies, then responsiveness to detect change is of great importance [24]. In our review, the strongest validity correlation coefficients were observed for the time-use diary TUD HETUS, while some other tools showed fair-to-substantial correlation coefficients (DABQ, JPHC-PAQ, STAR-Q, 24HMBQ, cpar24), which can be also deemed as sufficient for epidemiological research. However, TUD HETUS and cpar24 assesses behaviours during a single day, indicating that more than one day of assessment is needed to get a representative estimate of individual’s movement behaviours [25], which present additional burden. Therefore, DABQ, JPHC-PAQ, and STAR-Q may be better choice for adult population, and 24HMBQ for a specific population of dormitory students. Those four self-reports also showed fair-to-good reliability coefficients, while quality ratings for their validation studies were fair-to-excellent. None of the studies included in our review reported responsiveness, and therefore, no recommendations for longitudinal or intervention studies could be made.

If the purpose is population surveillance, then low systematic error and high responsiveness to detect change in behaviour of a population are important characteristics [23, 24]. Low systematic error is important for accurate assessment of the proportion of population that have (un)healthy pattern of movement behaviours, while responsiveness is needed to follow population trends. In population health surveys, there is usually a limited space available for questions on movement behaviours, indicating that shorter questionnaires (JPHC-PAQ, PAQ, PAS 2, SIMPAQ) may be more appropriate for most surveys. However, PAS 2 and SIMPAQ showed poor validity (r < 0.21 for some estimates), while only reliability has been evaluated for PAQ. Therefore, JPHC-PAQ might be a preferred choice. Among shorter questionnaires, Bland-Altman plot has been reported only for PAS 2; physical activity estimates were systematically higher, while sum of sleep and SB systematically lower when compared with the reference measure [101]. As the reference measure was not a reasonable gold standard, those findings could not be interpreted as measurement errors. Future studies should carefully consider choosing a highly trusted reference measure and exploring systematic and random error. Also, responsiveness of such tools to detect trends in a population behaviour is yet to be explored.

In practice, assessment of 24-h movement behaviours is usually needed for individual level estimates. If the purpose is to assess whether individual meet recommended levels of 24-h movement behaviours, then important measurement properties are high sensitivity and specificity for such classification [118]. If the purpose is to assess change in individual’s behaviour, then responsiveness to detect change on an individual level needs to be high [24]. As clinicians are usually interested in clinically meaningful change, minimal detectable change (i.e., change that is beyond normal within-individual variability in behaviour and the measurement error and can be interpreted as real change for an individual [24]) should be lower than minimal important change (i.e., minimal within-individual change above which individuals/patients perceive themselves importantly changed [119]). However, none of the studies included in our review explored sensitivity and specificity neither responsiveness to detect change on an individual level. Two studies on test-retest reliability [96, 100] and four studies on construct validity [93, 94, 96, 101] reported substantially large random error (e.g., 95% limits of agreement for MVPA estimate ranged from − 109 to + 102 min/day [96]), indicating that minimal detectable change for these self-reports is likely to be too large to detect minimal important change. In clinical care settings, there is usually only a limited time available for counselling on movement behaviours, and it has been recommended that tool need to be quick to administer (up to three minutes), and to provide immediate feedback [21]. Therefore, only short questionnaires (JPHC-PAQ, PAQ, PAS 2, SIMPAQ) may be a potential candidate tool. As mentioned above, PAQ, PAS 2, and SIMPAQ could not be recommended due to poor or unknown construct validity.

Limitations

This review has some limitations that should be highlighted. First, the review was limited to studies that validated self-reported estimates of movement behaviours across the full 24-h day. This reduced the number of included studies, since studies that did not validate all components of the 24-h day were not included. According to the 24-h movement paradigm, movement behaviours are components of a finite total, and therefore, all components of the total need to be validated simultaneously. Therefore, self-reported tools evaluated in some excluded studies [64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83], need further validation of the whole 24-h time-use composition. Our review also did not include some important time-use tools used in the Multinational Time Use Study (MTUS) that harmonised over 100 national time-use surveys, and therefore, present a key resource for time-use research [120]. It might be that validation studies of most national time-use surveys are lacking. However, our review included TUD HETUS that showed high validity, and it might be that other similar time-use surveys have comparable validity. Second, literature search was conducted in only three databases, and therefore, we might miss some of the relevant studies. However, we used a comprehensive search query and conducted a secondary search, including citation searching, screening authors’ archive of references, and conducted a secondary database search on titles of identified self-reported tools. Third, most studies were conducted on convenience samples; therefore, findings might not be directly generalizable to the general population. As most tools were validated only in one language, future translation and cross-cultural validation studies might be needed for some self-reports.

Conclusions

This systematic review identified 12 validated self-reported tools for assessment of 24-h movement behaviours, indicating that only a limited number of tools are currently available. Validation studies generally showed adequate construct validity and test-retest reliability to be used in epidemiological studies and population surveillance, while little is known about adequacy for individual level assessments and responsiveness to behavioural change. To better support research, policy, and practice on 24-h movement behaviours, there is a need for further developments in measurement methods. There is a need to develop new tools for assessment of 24-h movement behaviours for specific purposes and/or to adapt the existing physical activity and SB self-reports in a way that they will resonate with the emerging 24-h movement paradigm. Future studies should examine measurement properties of 24-h movement behaviours estimates simultaneously and by using statistical methods that respect compositional nature of movement behaviours data.