Data for the current study were collected as part of a study aimed at developing an assessment battery of language and executive function measures for two- and three-year-olds for use in a cohort study (see Mulder et al. 2014). From this sample, we selected only children for whom a video of at least one of the two delay of gratification tasks (see below) was available. This resulted in a sample of 62 children with a mean age of 34.8 months (SD = 5.7; range 23.5 to 42.7) and 29 (46.8%) girls. The majority of children (38 out of 55 with available parent questionnaire data, 69.1%) were from families in which one or both parent(s) had completed at least higher education (i.e., college or university). For 7 children, this information was missing. For all children for whom information on home language was available, Dutch was spoken at home (56 available reports, 6 missing).
The delay of gratification tasks were intermixed with a series of language and executive function tasks not reported on in this paper, and administrated in a fixed order. First, children were given tasks assessing selective attention and language abilities using a laptop, that are not reported on in this study. Next, the snack delay task was administered. The gift delay task was given approximately 20 min later, after a series of working memory tasks which are not part of the current report. All children also took part in a second assessment session about 2 weeks later, which included further executive function and language measures, none of which are reported on in the current study. All tasks were administered by extensively trained research-assistants (RAs), who were enrolled in a master’s degree programme Clinical Child, Family, and Education Studies. A quiet room with as few stimuli as possible in the children’s daily environment (i.e., home [n = 8] or day-care centre [n = 54]) was used for testing. Parents were informed about the study through an information letter and brochure, and signed for their child’s participation.
The current study aimed to develop measures for a large field research with children of varying ages. Therefore, we used a version of the snack and gift delay task with a 60-s delay in part of the sample of three-year-olds, and a 150-s delay in another part of the sample of three-year-olds, to see if more variance in pass/fail scores could be obtained when working with a longer delay in this age group. For consistency, however, only the first 60 s of both versions were coded for the purpose of the current study, and data from all children were pooled in the analyses. Also, for the children taking the longer version of the task, another task not reported on in this article was given before the gift delay task, in which the assessor was pretending to wrap the gift while the child was instructed not to peek (Kochanska et al. 1996) (14/40, 35% of three-year-olds). As the tests were administered at children’s homes or day-care centre, the size and height of the test table and chair were not standardized (24/62, 39%, were seated at a low child-height table on a small child’s seat, all others were seated at an adult-size table), and a caregiver was present in the room during test administration occasionally (17/60, 28%, for snack delay, 13/55, 24%, for gift delay). The potential influence of these variables (i.e., test version [short version without the additional part where the assessor was pretending to wrap the gift versus longer version with the additional part], table/chair height, and presence of caregiver) was studied in a series of preliminary analyses, as further described in the Analysis section.
Measures and coding
Two delay of gratification tasks, a snack and a gift delay task, were used to measure self-control. These were adapted for field research from the effortful control task battery of Kochanska and colleagues (Kochanska et al. 1996; Kochanska et al. 2000). In the snack delay task, children were shown an open box of raisins and asked to try not to touch the box of raisins until the RA had finished another task. Similarly, in the gift delay task children were shown an attractively wrapped gift with a bow and asked to try not to touch the gift until the RA had completed another task. RAs were trained on giving the instructions for both tasks as a friendly request (e.g., by saying ‘try not to touch’ in a friendly voice). Directly following the instructions, the reward was placed on the table at a distance of 25 cm from the child. In both tasks, the assessor pretended to be writing while sitting at a short distance behind the child during the 60-s delay time. At the end of the delay time, children always received positive feedback; they were either praised for waiting, or in case of ‘failure’, asked whether the raisins tasted good or complimented on the nice gift they got. Before the task was administered, caregivers (i.e., parents or teachers, depending on the setting that the test was administered in) were explained the purpose of the task and asked to either leave the room for the short duration of the task, or sit down quietly elsewhere in the same room and pretend to be reading, so as not to interact with the child during the delay time. Children’s performance on each of the delay of gratification tasks was scored as fail if they touched the reward at least once before the 60 s had passed and as pass if they completed the task without touching the reward for 60 s.
Behaviours shown during delay
Children’s behaviours during the 60-s delay time were coded from video by the second author for both tasks. On all five domains (visual attention, verbal, and hand, head, and body behaviours) directing attention towards the reward was coded as focusing, whereas directing attention away from the reward (i.e., self-distraction) and active avoidance was coded as distracting and withholding, respectively. Withholding can be seen as a special way of exerting self-control, which includes behaviours that children can use to actively stop themselves from touching the reward, such as placing their hands under the table. When a child showed different behaviours with their right and left hand, the hand regarded as closest to touching the reward was coded. Examples of codes within each of these categories are presented in Table 1; the full coding scheme is shown in Appendix A. The shared first author(HvR) coded all domains separately on one-second intervals using Mediacoder (Bos and Steenbeek 2009). That is, the video was paused every second and codes were based on the video still for all behavioural domains except the verbal domain. For the verbal domain, transcripts were generated first, and each utterance was coded separately with start and stop times. This approach resulted in five codes per second and 300 codes in total per video. Note that we coded focusing, distracting, and withholding within each of the domains to ensure comprehensive coding of the observed behaviours. However, only specific categories representing distracting and avoidance behaviours that were hypothesized to support self-control and that passed the inter-coder reliability test, discussed below, were used in further analyses; these specific variables are marked with an (*) in the Appendix and in Table 1.
In addition to the raw data, two measures for each of the five domains were derived for analytic purposes. First, the percentage of intervals that each behaviour occurred was computed for each child, across all time intervals until the child touched the object for the first time (or until a maximum of 60 s was reached for children who passed the task). For children who did not show a particular behaviour at all, the percentage occurrence variable was zero. Second, a latency variable was constructed, which refers to the time it took children to show each type of delay behaviour for the first time. This variable was constructed only for children who showed a particular behaviour at least once. As such, children who did not show a particular behaviour had a missing value on the latency variable for that behaviour.
Fifteen percent of the available videos were randomly selected and coded by a another coder to establish inter-rater reliability. We computed Kappa to evaluate reliability, for each domain as a whole (visual attention, verbal, etc.), and for each specific behaviour within each domain (presence vs. absence of this behaviour, only for domains with at least two behavioural codes) (see Table 1). Based on Landis and Koch (1977), a Kappa of .61 or higher was considered acceptable.
Inter-rater reliability was acceptable for the visual attention, hands, and body domains, somewhat lower for the head domain (95% CI just overlapped with .60) and very low for the verbal domain. As it was somewhat challenging to obtain agreement between coders on the direction of the head, and direction of the head was relatively strongly related to direction of the eyes (Kendall’s tau = .43; p < .001 for snack delay; Kendall’s tau = .57; p < .001 for gift delay), which we considered the more important variable, the former was dropped from the analyses.
Closer inspection of the reliability data for the verbal domain revealed that the low reliability was likely due to a combination of factors: an extremely unequal cross-cells distribution with many empty cells (e.g., Kottner and Dassen 2008), and difficulty to distinguish distracting from ‘no speech’ successfully. The latter issue was probably due to the fact that we also coded ‘mumbling’ as distraction, and establishing whether a child was mumbling quietly or making mouth movements without speaking was sometimes difficult. For these reasons, the verbal domain was dropped from the analyses.
Inter-rater reliability at the level of each specific behaviour showed that, within the visual attention domain, withholding was not reliable. As occurrence of this behaviour was very infrequent (1.3% and 1.5% of intervals on average on snack and gift delay, respectively), it was combined with visual attention distraction in the analyses. Finally, as some of our analyses involved concomitant behaviours (see further under Analytic section below), inter-rater reliability of the co-occurrence of behaviours was investigated. Co-occurrence of the visual attention, hands, and body domains was reliable between coders (Kappa = .65; 95% CI = .61–.69).
Nine videos (i.e., two of the snack and seven of the gift delay task) had to be excluded due to low technical video quality or task administration errors (e.g., caregivers interacting with the child during the task). In total, video data of 60/62 (93.5%) children were available for the snack delay task, and video data of 55/62 (87.1%) children were available for the gift delay task.
Parent- and teacher-rated self-control
Parents and teachers were asked to rate children’s self-control on a number of selected items from the inhibitory control subscale of the Early Childhood Behaviour Questionnaire (ECBQ; Putnam et al. 2006). This subscale assesses children’s ability to moderate behaviour or refrain from acting in situations where this is called for. The ECBQ is suited to measure temperament in children aged 18 to 36 months, and an age-adjusted version is available for children aged 3 years and older, the Child Behaviour Questionnaire (CBQ; Rothbart et al. 2001). However, our study included both two- and three-year-olds and the same scale needed to be given to all children in the sample for analytic purposes. Therefore, we decided to use the ECBQ for all children. In addition, for the purpose of the large field study, to limit the time required from parents and teachers to fill out the questionnaire, we used only a very limited set of ECBQ items that we had previously selected based on an earlier pilot as described in (Mulder et al. 2014).Footnote 1 Five items were given to parents, and three items were given to teachers. An example item given to parents is: ‘When asked not to, how often did you child touch an attractive item anyway?’. An example item given to both parents and teachers is: ‘When told no, how often did this child ignore your warning?’. For each item, parents and day-care teachers were asked to indicate how often the child displays the respective behaviour on a scale from 1 (never) to 7 (always). A total of 55 parents (55/62, 87%) and 50 day-care teachers (50/62, 81%) completed the items.
Internal consistency (Cronbach’s alpha) of our short version was .68 for parents and .94 for teachers in the current study. Moreover, in the current study, we administered the standard version of the inhibitory control scale of the ECBQ to parents and teachers of two-year-olds (12 items), and the standard short version of the inhibitory control scale of the CBQ to parents and teachers of three-year-olds (6 items). The association between inhibitory control on the standard ECBQ form and our short version was .84 (p < .001) for parents and .87 (p < .001) for teachers of two-year olds. The correlation between inhibitory control on the standard short CBQ form and our short version was lower at .42 (p = .013) for parents and .61 (p < .001) for teachers, likely due to the broader scope of the CBQ. Specifically, unlike the ECBQ, the CBQ includes items related to the ability to follow instructions and plan ahead (such as ‘Prepares for trips and outings by planning things s/he will need’).
To investigate which behaviours were predictive of task success (pass/fail on each task), a series of analyses were conducted. First, in a number of preliminary analyses, we investigated whether a set of variables (i.e., age, gender, chair height, the presence of a caregiver in the room, and test version) influenced the results and, hence, should be controlled. These variables were studied in relation to task success using logistic regression. Subsequently, we studied whether these variables predicted the percentage of time intervals that children showed each behaviour during delay, using linear regression. The variables that related to both task performance and/or behaviours during delay were included in the main analyses. Second, in the main analyses, we entered the behaviours children showed during delay in a logistic regression analysis with task performance (pass/fail) as the outcome variable to investigate which behaviours were predictive of task success. Bootstrapping was used in these analyses, because of the non-normal distribution of some of the variables (e.g., the percentage of time a behaviour occurred) and the small sample (Efron and Tibshirani 1993).
To investigate the effects of time and caregiver ratings of self-control on occurrence of behaviours during delay, a series of multilevel multinomial regression analyses were run in HLM (version 7.03). Multinomial regression is an extension of binary logistic regression, and allows the dependent variable to have more than two unordered categories (Hedeker 2003). We opted for this type of analyses because two behaviours were significantly related to delay task performance: visual attention distraction and hands withholding. To gain comprehensive insight into the unfolding of these behaviours over time, their joint occurrence was modelled as a single dependent variable with 2 × 2 categories. Thus, the dependent variable had the following four categories: 1) visual attention focusing and hands not withholding (this is the least controlled combination of behaviours), 2) visual attention focusing and hands withholding, 3) visual attention distracting and hands not withholding, and 4) visual attention distracting and hands withholding (this is the most controlled combination of behaviours). The first category was set as the reference category in the analyses. The logit probability of the occurrence of each of the other three categories was modelled relative to the reference category.
In the multilevel model, time was the first level predictor and caregiver-rated self-control was the second (child-) level predictor. The cross-level interactions between time and caregiver ratings were also included. The following models were run: 1) linear and quadratic time effects models to investigate both linear and nonlinear time effects; 2) time and caregiver ratings effects models with cross-level interactions and age as covariate. The models were fitted with the Penalized Quasi-Likelihood estimator that is available in HLM. As no relative fit indices are available for model comparison with this estimation method, we investigated statistical significance of each of the polynomial time effects in step 1 to determine whether they should be included in the model or not. All independent variables were added to the model grand mean centred.
Note that, although the number of observation points at the first level can be regarded as rather substantial for an observational multilevel study (Hox et al. 2018), the number of cases at the second level was relatively small. We therefore closely inspected our models for the stability of findings. All models ran without difficulty and the maximum number of iterations required was 27. Bonferroni correction was applied to adjust for a potential inflation of the Type I error rate. There were three tests of the statistical significance of the independent variables (time, parent and teacher rated self-control) for each task, that is, one test for the occurrence of each behavioural category in comparison to the reference category. Therefore, alpha was set to .05 / 3 = .017. Results with robust standard errors are presented.