Energy balance-related behaviours (EBRBs), i.e. lack of physical activity, excess sedentary behaviour and unhealthy dietary patterns are considered to be important contributors to the obesity epidemic [1]. In order to adequately inform prevention and intervention research on lifestyle behaviours, the assessment of EBRBs and their personal and environmental correlates and potential determinants is of utmost importance.

Large-scale observational and intervention studies most often have to rely on questionnaires to assess lifestyle behaviours and their potential determinants: questionnaire assessments are inexpensive, easy to administer and are widely accepted by study participants [2, 3]. However, questionnaire assessments rely on self-report that may be prone to recall and social desirability bias [4].

In a recent review Lubans et al. [5] concluded that self-report measures can provide reliable estimates of screen time in children and adolescents. However, the validity of these questionnaires remains largely untested. A review of physical activity questionnaires in young people by Chinapaw et al. [6] concluded that there was no physical activity questionnaire with both acceptable validity as well as reliability. Thus, more high-quality research is required into the measurement properties of measurement instruments of sedentary behaviour and physical activity in young people [6].

No gold standard exists for the assessment of dietary intake in large research populations. The commonly used methods in larger populations include food records, food frequency questionnaires, or 24-hour recalls all relying on self-report. All of these suffer from bias due to over- or underreporting and little is known to what extent factors like for example age, cognition, social background and complexity of questions influence the outcomes of the dietary assessment in children [7, 8].

Even less research has been conducted on the psychometric characteristics of measures of determinants of EBRBs [9]. Moreover, most questionnaires regarding energy-balance behaviours and potential behavioural determinants have been developed for administration in specific countries, while, especially in Europe, cross country studies and comparisons are now common and supported by the European Commission's framework programs.

It can be concluded that reliable and valid questionnaires in the area of potential drivers of childhood overweight and obesity are scarce, especially those covering a range of energy balance-related behaviours that can be used in large-scale studies across countries.

The ENERGY-project is a European Commission funded cross-European project to gain more insight in EBRBs and their potential behavioural determinants, and to inform and test a school-based and family-involved obesity prevention intervention scheme [10]. As part of the ENERGY-project a cross-sectional survey among more than 7000 children, their parents, and schools was conducted in seven countries representing different regions of Europe. This survey used questionnaires among children, parents, and school staff, as well as observations in the school and school environments [11].

However, for the survey no established valid and reliable measures that could be administered in large populations in different countries across Europe were available. Therefore, we developed a child and parent questionnaire to assess a range of EBRBs and potential individual and environmental behavioural determinants, and examined the test-retest reliability and construct validity of these two main questionnaires used in the ENERGY cross-sectional survey. The results of the parent questionnaire reliability and validity study are published in a separate paper [Singh et al: Test-retest reliability and construct validity of the ENERGY-parent questionnaire on parenting practices, energy balance-related behaviours and their potential behavioural determinants: the ENERGY-project. submitted for publication]. In the current paper, the methods and results of the child questionnaire test-retest reliability and construct validity study are presented and discussed.


Energy-child questionnaire

The ENERGY-child questionnaire was developed in order to assess EBRBs of the child as well as personal, and family and school-environmental determinants related to these EBRBs. The questionnaire was divided in eight sections, i.e. (A) Demographic characteristics; (B) Soft drinks and spending pocket money on soft drinks; (C) Fruit juices; (D) Breakfast behaviour; (E) Physical activity behaviour; (F) Screen viewing behaviour; and (G) Dieting behaviour. In the current study we assessed the test-retest reliability and construct validity of all sections (150 items), except 'demographic characteristics'.

Most concepts were measured by only one or two items due to practical constraints with regard to the length of the questionnaire. The questionnaire was developed from existing measures or such existing measures were adapted for the behaviours included in the ENERGY-child questionnaire [1214]. More details on the development of the questionnaire, the pre-testing, and translation procedures are described elsewhere [11]. The ENERGY-child questionnaire is available via the ENERGY-website in English and all languages in which the questionnaire was administered:

Study population: recruitment and data collection

In the current paper, the data of the test-retest reliability and construct validity study from six out of seven countries that participated in the cross-sectional study of the ENERGY-project [10] (i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain) are presented. Due to deviations from the study protocol Slovenian data were excluded from the current study. Data collection, data cleaning, and data analyses were performed according to a standardized protocol and are described hereafter.

We recruited children aged 10-12 years old. The recruitment and data collection took place from March-July 2010. Children were recruited in five phases: (1) we called schools and after a short explanation of the study we asked if the school was interested in participation in the study. (2) If the school showed interest, a letter with more information on the background, goals, and methods of the study was sent. (3) A second phone call followed after one week. During this phone call, the dates on which the measurements would take place, were agreed upon. Schools were asked to select one class of children aged 10-12 years to participate in the study. (4) A second letter or email was sent to the school to confirm the dates. The letter also contained practical information on the measurements. (5) We provided schools with an information letter, which was sent to the parents of the children of the selected class. This letter contained an active/passive informed consent and detailed information on the background, goals, and methods of the study.

In countries where ethical approval was necessary for such non-intervention studies this was obtained from the relevant ethical committee and informed consent of the child and/or parents was obtained prior to the study; in the other countries a declaration of 'No objection' was obtained from the ethical committees. In Greece, both the Ministry of Education and the ethical committee approved the study protocol.

Test-retest reliability study

We visited the school and children were asked to fill in the ENERGY-child questionnaire in the classroom under the supervision of the researcher/research assistant. Exactly one week later, the researcher/research assistant returned and the children were asked to fill in the questionnaire for a second time. We planned the second measurement at the same part of the day as the first measurement (e.g. morning or afternoon). We collected data by ID number to be able to merge the questionnaires from the test and re-test.

Construct validity study

For the construct validity study a cognitive interview was conducted among approximately three children of each participating class. Before the study started, we asked the teacher to select three children representative for the class. These children were asked to volunteer for a cognitive interview with the researcher/research assistant about the same subjects as the questionnaire.

Children who participated in the construct validity study were asked to fill in the ENERGY-child questionnaire together with the other children in the class (first measurement of the test-retest reliability study) and were subsequently interviewed by a researcher/research assistant. The interview was performed using a standard question route - considering the course of the child's day from getting up until going to sleep. The interviews were sound-recorded and transcribed. Based on the transcribed interview, a second researcher/research assistant (i.e. other than the one doing the interview) filled in a second identical child questionnaire without knowledge of the answers to the first questionnaire of the children. Data of children that participated in the construct validity study were excluded from the test-retest reliability study.

Data management

A standard data management protocol was developed to ensure missing and ambiguous values were handled consistently.

Double data entry

For both the test-retest reliability study and the validity study a randomly selected 5% of the questionnaires were re-entered in SPSS (double data entry) to check for typing errors and misinterpretation. A difference of less than 3% was accepted. In case there was a difference of more than 3%, the cases had to be re-entered in the original data set and the procedure was repeated. Across the countries, the rate of disagreement in the test-retest reliability and construct validity studies ranged from 0.0% - 1.7% and 0.0% - 2.3%, respectively.

Data definition

The data definition process consisted of adding variable labels, value labels and missing value definitions to the original data files.

Data cleaning

During the data cleaning original data was checked for duplicate records, system-missing values, out-of-range values and logical inconsistencies.

Statistical analyses


We calculated means and standard deviations for the participant characteristics and medians, 25th, and 75th percentiles values for the EBRBs.

Test-retest reliability and construct validity

For both test-retest reliability and construct validity we assessed agreement at the individual item level. The agreement of categorical items (mostly Likert-type scales), continuous, and dichotomous items was analysed with a two-way random effects single measure intraclass correlation coefficient (ICC 2.1); ICCs were classified as follows: 'excellent' (≥ .81), 'good' (.61 - .80), 'moderate' (.41 - .60), 'poor' (≤ .40) [3, 1517].

Because the calculation of the ICC depends on the existence of the variability in answering categories, we also calculated percentage agreement, with criteria established as 'excellent' (90% - 100%), 'good' (75% - 89%), 'moderate' (60%-74%), or 'poor' (< 60%). If ICC values were lower than .40/.60/.80 but the percentage agreement was higher than 60%/75%/90%, we reported the percentage agreement [18].

Gender-specific analyses did not show meaningful differences between boys and girls, both in the test-retest reliability and the construct validity study. Therefore, results are presented for both boys and girls combined.

All statistical tests were performed using SPSS version 15.0 (SPSS Inc., Chicago, IL).



The characteristics of the children that participated in the test-retest reliability and construct validity study are shown in table 1.

Table 1 Descriptive statistics of the children that participated in the test-retest reliability and construct validity study.

Completion of the 157-item questionnaire took about 30-60 minutes. The cognitive interviews took 35-60 minutes.

Test-retest reliability

There were 793 children who filled in the questionnaire for the first time. At the retest, 63 did not fill in the questionnaire and were therefore excluded from the current analysis (dropout rate: 7.9%).

In this study, we included test-retest reliability data from 730 children across the six countries. The number of participants ranged from 86 (Spain) to 155 (Greece). The mean age (standard deviation (sd)) of the children participating in the test-retest reliability study ranged from 11.3 (.5) years (Spain) to 12.5 (.6) years (Hungary). The majority of the children reported to speak the native language of the country at home.

Construct validity

There were 98 children who filled in the questionnaire. Two children did not show up at the interview and were therefore excluded from the current analysis (dropout rate: 3.0%).

In this study, we included construct validity data from 96 children across the six countries. All but two countries included 15 children; Greece included 16, and the Netherlands 20 children. The mean age (standard deviation (sd)) of the children participating in the test-retest reliability study ranged from 11.4 (.6) years (Belgium) to 12.0 (.6) years (Hungary). In Belgium, the majority of the children participating in the construct validity study were girls (67%), whereas in Greece the majority (69%) of the children were boys. In all countries, most children reported to speak the native language of the country at home.

Energy balance-related behaviours (EBRBs)

Table 2 presents the descriptives of the EBRBs, as assessed by the first completion of the questionnaire.

Table 2 Energy balance-related behaviours of children participating in the test-retest reliability and construct validity study. All values are medians (25th - 75th percentile.)

General findings test-retest reliability and construct validity study

Table 3 shows the questionnaire items, their ICC values, and percentage agreement for all countries combined, for both the test-retest reliability and construct validity study. Table 4 summarises these findings per category of the ENERGY-child questionnaire.

Table 3 Agreement (per questionnaire item) between questionnaires (test-retest reliability) and questionnaire and interview responses (construct validity) as indicated by intraclass correlation coefficients (ICC) and percentage agreement (agree).
Table 4 Overview of the results per section of the ENERGY child-questionnaire for both test-retest reliability and construct validity study, combined for all countries (test-retest reliability study: n = 730; construct validity study: n = 96).

Test-retest reliability study

For the total sample across all countries, the test-retest reliability was good to excellent in 115 (76.6%) items and moderate in 34 (22.7%) items. For one item ('How many hours of sports did you do yesterday?') we found an ICC-value of .22, indicating poor test-retest reliability. Eleven response items did not show enough variability, resulting in ICCs ≤ .60, but a high (≥90%) percentage agreement (table 3). The test-retest reliability was comparable across all countries. Country-specific values can be found in additional file 1.

Construct validity study

Construct validity appeared to be good to excellent for 70 out of 150 items (46.7%), as indicated by ICCs > .60 or percentage agreement ≥ 75%. For the remaining part, the ICCs of 39 items (26.0%) indicated moderate construct validity and 41 items (27.3%) indicated poor construct validity.

Constructs that showed consistently poor values across most of the EBRBs were

  • general attitude (e.g. 'I think watching television is....')

  • automaticity (e.g. 'Drinking fizzy drinks or fruit squash is something I do without even really thinking about')

  • parental and peer subjective norm ('If I watch television my parents/care givers think it is...' or 'If I do physical activity/sports, most of my friends think it is...')

Nine response items did not show enough variability, resulting in ICCs ≤ .40, but high (≥ 90%) percentage agreement (table 3).

The construct validity was comparable across all countries, except for Greece and the Netherlands. Greek data showed higher ICCs and percentages agreement (see additional file 2). The construct validity in Greece was excellent in about two thirds (68.0%) of the items, good in 19.3%, moderate in 10.7%, and poor in 2.0% of the items. Dutch data showed lower ICCs and percentages agreement. The construct validity in the Netherlands was excellent to good in 46.7% of the items, moderate in 26.0%, and poor in 27.3% of the items.


The current study assessed the test-retest reliability and construct validity of the ENERGY-child questionnaire in 10-12 year old children from six countries in Europe. The ENERGY-child questionnaire, assessing EBRBs of the child as well as potential personal, family, and school-environmental correlates of these EBRBs, showed good test-retest reliability and moderate to good construct validity.

In the light of the scarcity of the published reliable and valid instruments that simultaneously assess both sides of the energy balance, the results of the current study should be helpful for future research.

Test-retest reliability

More than three quarter of all items (n = 115 out of 150) of the ENERGY-child questionnaire showed good to excellent test-retest reliability.

Exceptions on these findings are the questions in which children were asked about 'yesterday' (e.g. 'About how many hours did you watch television yesterday?'). Here we find lower values especially on items like consumption of soft drinks, television watching and sports participation. Lower ICCs or percentage agreement are to be expected regarding such a question, because children will have larger variety in activities engaged in yesterday compared to on a usual day.

With comparable results across all countries, our results show that the ENERGY-child questionnaire has good test-retest reliability in the six European countries that participated in the current study.

Construct validity

Values for the construct validity were somewhat lower than those for the test-rest reliability. A closer examination of the questions showed that across all EBRBs, several constructs had consistently lower scores (i.e. general attitude, habit strength, parental and peer subjective norm). The lower validity for the habit strength questions is consistent with the findings of the ENERGY-parent questionnaire, where we also found lower values for the habit strength questions [Singh et al: Test-retest reliability and construct validity of the ENERGY-parent questionnaire on parenting practices, energy balance-related behaviours and their potential behavioural determinants: the ENERGY-project. submitted for publication]. These findings indicate that the use of single questions out of the original habit strength index [19] is not to be advised, and other habit strength questionnaire items should be considered in future research.

All items that have poor values should be reconsidered and in interpreting research results based on these items the lack of construct validity should be noted.

Some differences between countries were observed, i.e. Greece and the Netherlands, but because the number of cases per country was relatively small (Greece: n = 16 and the Netherlands: n = 20), we believe that more value should be attached to the combined data set.

However, for future interpretation of results of the ENERGY-study, these country-specific values might be helpful, explaining cross-country differences.

Comparison with other studies

Only few studies have reported on the psychometric properties of child questionnaires assessing a range of EBRBs. The psychometric properties of the Health Behaviours in School Children (HBSC) questionnaire has been reported in two different papers [20, 21]. Vereecken et al. [20] reported on the reliability and validity of questionnaire items aiming to assess a number of food items from the HBSC-questionnaire (HBSC FFQ) - among which soft drinks. Test-rest reliability for soft drink consumption was comparable to the values we found: in 11-12 year olds, Vereecken et al. [20] report a weighed kappa of .66 (percentage agreement: 53%). Similar to our results, the score for validity was somewhat lower. It is noteworthy, that Vereecken et al. [20] mention that overestimation is very likely when measuring food items such as soft drinks.

Booth et al. [21] assessed the reliability and validity of the physical activity questions of the HBSC-questionnaire among Australian adolescents. Reliability was assessed in a comparable way to our study (i.e. administered twice, two weeks apart). The concept of the questions in the HBSC-questionnaire assessing participation in (vigorous) physical activity by examining the frequency ('How often') and duration ('How long') of vigorous physical activity was clearly different from the way the ENERGY-child questionnaire assessed sports participation, i.e. 'How many hours...?'. Booth et al. [21] concluded that the HBSC-questions on participation in vigorous intensity physical activity had acceptable reliability, with values for children with a mean age of 13.7 years ranging between .36 - .44 (frequency) and .22 - .26 (duration).

There are at least two other studies that are worth comparing our results to, i.e. a study among children of the same age range, focusing on energy intake [22] and another cross-European study focussing on fruit and vegetable consumption [14]. Wilson et al. [22] examined the psychometric properties of a questionnaire among 10-12 year olds. The authors conclude that this 54-item questionnaire is to be a reliable and valid tool to assess dietary patterns and food behaviours, attitudes and environments in Australian school children [22]. Similar to the ENERGY-child questionnaire, Wilson et al. [22] assessed intake of sweetened beverages and found similar values for both reliability (.59) and validity (.34). The range of ICC values of the ENERGY-child questionnaire was somewhat broader than those reported by Wilson et al. [22]. This might be due to the fact that Wilson et al. present the ICCs for sum scores instead of single items, as we did. The fact that their study population for the test-retest reliability was much smaller (n = 134 versus 730) and that they examined a questionnaire focusing on energy intake and its determinants prohibits further comparison.

Comparing our study to other studies that assessed validity, it should be considered that both Vereecken et al. [20] and Wilson et al. [22] compared their questionnaires to 7-day food diaries, whereas in our study we conducted an interview assessing construct validity. Comparison to food records assesses the relative validity of the questionnaire and may be regarded as a more rigorous test of validity.

De Bourdeaudhuij et al. [14] examined the reliability and validity of a questionnaire assessing personal, social and environmental correlates of fruit and vegetable intake in schoolchildren in five European countries. The authors conclude that the questionnaire is a reliable and valid tool for 10-11 year-olds. Comparable to our study, they report good to very good test-retest reliability for the majority of the items. Again, detailed comparison with the results of the present validity study is not possible, because de Bourdeaudhuij et al. [14] examined predictive validity instead of construct validity.

Strengths and limitations

The test-retest study has several strengths, covering both the data collection and handling phase (i.e. large sample size, standardised protocol, centralised data management) and a questionnaire covering a large variety of children's EBRBs as well as potential personal, family, and school-environmental determinants, available for administration in nine languages.

However, also some limitations should be mentioned when interpreting our results. The study sample of the construct validity study was relatively small and therefore not fully representative of the total population of children across the countries represented, limiting the generalizability of the findings of the construct validity study.

The lack of a 'gold standard' in the validation study must be considered as a major limitation; such gold standards are just not available for assessment of most EBRBs or potential behavioural determinants. We chose to investigate construct validity of the questionnaire by comparing the answers of the questionnaire to the answers given in a face-to-face interview. The method of comparing questionnaires to interviews has been previously used to validate parent questionnaires on children's physical activity correlates [23]. Using interviews also enabled us to learn whether the respondents interpreted the questions as we intended. We therefore think that the use of face-to-face interviews was a strength of the current study, adding important feedback and gaining more insight into the participants' answers. Three shortcomings of this method for the validation of the questionnaire should be mentioned. First, the interpretation of the responses in the interview might lead to bias. We attempted to minimise this bias by following a strict data entry protocol, i.e. the face-to-face interviewer was another person as the one who filled in a second questionnaire based on the interview results. A second shortcoming lies within the fact that both data, i.e. from the questionnaire and the interview, were based on self-report, making it likely that there is correlated error between both measures. A third and general shortcoming of subjective reporting is that answers are more likely to be given in a social desirable direction, and a face-to-face interview is likely to increase this bias. We aimed to minimise this form of bias by clearly indicating the importance of honest answers instead of social desirable answers before the interview.

When interpreting the results, it should be considered that in the current study protocol, data were not collected on Mondays, to make sure that the questions referring to 'yesterday' did not cover a weekend day. Most probably, recalling activities on weekend days is more difficult for children, when compared to weekdays, since the latter tend to be more structured [5].

The current study examined the test-retest reliability and construct validity of the ENERGY-child questionnaire. Internal consistency was not assessed because most constructs were assessed by only one or two items. Future studies need to establish other aspects of validity and reliability such as content validity and responsiveness.


Our results demonstrate that the ENERGY-child questionnaire, assessing EBRBs of the child as well as personal, family, and school-environmental determinants related to these EBRBs, has good test-retest reliability and moderate to good construct validity.

Being able to validly and reliably assess EBRBs and several potential determinants of those EBRBs in different languages and countries will enable future observational and intervention research regarding childhood overweight and its drivers.