Introduction

Self-regulation (SR) skills—a broad and sometimes poorly defined constellation of abilities at the heart of children’s development of many basic life skills—is a popular area of research. In recent years, the field has seen significant interest not only from developmental and education researchers, but also from practitioners, educators, and education policy makers. And for good reason – SR skills in early life have been linked to a wide array of important academic, social, and behavioral outcomes across the lifespan (e.g., Moffitt et al., 2011; Woodward et al., 2017). Specific terms and definitions vary depending on the research field (e.g., self-management, self-control, willpower, effortful control, hot and cool executive functioning, etc.), but nearly all refer to a set of skills that enable individuals to monitor and regulate prepotent or automatic behaviors, emotions, and attention to pursue a goal (Burman et al., 2015). Over the last several decades of expansion and growth in this area of research, the field has manifested considerable variability in how SR related skills are conceptually defined and operationally measured (e.g., Jones et al., 2019). For example, single terms like “self-control” are commonly used to refer to multiple, distinct concepts depending on the context (e.g., delay of gratification, inhibitory control; Jones et al., 2016a; Thorndike, 1904). Similarly, multiple distinct terms are often used to describe concepts that are likely to share considerable conceptual overlap (Jones et al., 2019; Kelley, 1927). The academic field of SR research is plagued by conceptual clutter and definitional discrepancies which have given rise to substantial challenges in the measurement of such skills and constructs for both academic research and practical applications in schools (e.g., Jones et al., 2016c, 2019; McKown, 2019). This lack of consensus and poorly defined taxonomy of self-regulation and related skills in the academic literature presents significant challenges to effective dissemination of findings across disciplines at the academic level and may lead to wasteful spending of public education dollars on ineffective programs adopted to target students’ self-regulation skill development in schools. With this systematic review of school-based self-regulation research, we aim to take a first step towards clarifying these conceptual definitions by cataloguing and analyzing how scholars are measuring self-regulation related skills and concepts in their research.

Conceptual Clutter in Self-Regulation Research

The definitional vagueness and conceptual overlap among self-regulation related variables and constructs have been described many times by several scholars in the field (e.g., Booth et al., 2018; Duckworth & Kern, 2011; Eisenberg et al., 2010; Hofmann et al., 2012; Morrison & Grammer, 2016; Nigg, 2017; Zhou et al., 2012). Cognitive neuroscience researchers focus on what is termed cognitive control which is defined generally as one’s ability to flexibly adapt behaviors and cognitive information processing in service of an internal goal (e.g., Hutchison & Morton, 2016). Social psychologists and education researchers view self-regulation and self-control, terms often used interchangeably within the field, as the ability to modify and regulate emotions and behaviors in service of goals (e.g., Blair, 2002; Hagger et al., 2010). However, others suggest that self-control and self-regulation refer to related, but distinct concepts (e.g., Hofmann et al., 2012). Although the monitoring and regulation of emotions and one’s expressions of them is included in many conceptual definitions of self-regulation related concepts, some scholars also draw definitional distinctions between emotion-related self-regulation and emotion-regulation (e.g., Eisenberg et al., 2010). Cognitive and developmental psychologists have focused on executive function skills which are considered to be a set of multidimensional cognitive abilities used to control thoughts, emotions, and actions (e.g., Diamond, 2014). Although executive function research itself is rife with conceptual clutter and definitional overlap (Jones et al., 2016b; Morrison & Grammer, 2016), these cognitive abilities are often thought to subserve the behavioral manifestations of self-regulation and self-control typically studied in social psychology (e.g., Heatherton, 2011; Hofmann et al., 2012). Other scholars suggest that most conceptual differences between executive function and effortful control—another construct often associated with self-regulation—reflect differences in measurement traditions across the two siloed fields rather than a substantive difference between the two constructs (Zhou et al., 2012).

Despite differing opinions about the nature of these constructs and their relationships with one another, these skills are all approached from a strengths-building perspective, and the importance of these skills for future success, regardless of what they are called, is well-documented. Across each of these research subfields, a reliable pattern exists such that stronger performance on tasks measuring self-regulation constructs is associated with a wide range of positive academic, social, and health outcomes; poor performance on these tasks is associated with negative outcomes, and has been linked to some pathological developments in childhood and/or adulthood (e.g., Jones et al., 2015; Moffitt et al., 2011; Robson et al., 2020). Some constructs have been more closely linked with specific outcomes than others. For example, executive function skills in preschool have been linked with performance in mathematics in later childhood (e.g., Verdine et al., 2014). Conversely, some areas of research approach similar phenomena from a deficit perspective, focusing on the measurement of behavior problems, usually framed as symptoms of clinical or subclinical psychopathology, as evidence or manifestation of a lack of self-regulatory functioning (e.g., Lonigan et al., 2017). For example, effortful control and impulsivity in early elementary grades have been linked with symptoms of emotional and behavioral disorders in adolescence (e.g., Wang et al., 2015). Self-regulation skills, measured in a variety of ways, have been consistently and inversely linked to certain externalizing and internalizing behavior problems, to varying extents, across development in early life in both clinical and non-clinical populations (e.g., Eisenberg et al., 2010; Lonigan et al., 2017; Willoughby et al., 2011). Although these conceptualizations tend to frame behavior problems as a correlate of self-regulation skills rather than a component of them, we include this variable in our review due to the frequency with which behavior problems are measured in educational research.

Justification for this Review

Within the field of self-regulation research, scholars have worked to align and clarify conceptual definitions across related subdomains that are often considered to be part and parcel of self-regulation. Most often, these reviews of the literature and calls for integration across models are based on analyses of construct definitions and patterns of findings across subfields (e.g., Booth et al., 2018; Hofmann et al., 2012; Zhou et al., 2012). While these approaches are critical to untangling the web of terminology that clutters the field, a catalogue of the measurement tools and measurement approaches used to capture these variables is needed. The specific methodologies selected to operationalize variables, however consistently defined, ultimately drive the conclusions drawn from the patterns discovered in the data. Efforts that scholars have made on this front have already begun to underscore the challenges inherent to a field that is characterized by vague conceptual definitions and many measurement tools to choose from. For example, a meta-analysis of a wide range of self-control tasks and questionnaires found only modest convergence across them, highlighting that even when definitions align across studies, the specific measurement tool selected to capture the variable might contribute significantly to the study outcome and scholars’ interpretations of the relations among these constructs (Duckworth & Kern, 2011). Similarly, another meta-analysis of 150 studies investigating the associations between early childhood self-regulation and various outcomes including academics showed that the informant used in the measurement of self-regulation (e.g., parent-report, teacher-report, direct assessment) was a significant moderator of pooled mean effects (Robson et al., 2020).

With this systematic review of recent school-based research on children’s self-regulation skills, we contribute to the efforts to declutter this field by answering the following research question: How are researchers measuring self-regulation skills among young children in educational contexts? To address this, we aimed to develop a catalog of the self-regulation related variables researchers study, the specific measurement tools used to measure these variables, and the informants recruited to provide data. To our knowledge this effort represents the first attempt to systematically review the recent educational literature regarding the specific measurements used to capture self-regulation related skills among young children.

Methods

A systematic literature review was conducted utilizing the PRISMA protocol (Page et al., 2021); see Fig. 1 for PRISMA protocol flow chart.

Fig. 1
figure 1

PRISMA flow chart

Search Strategy

The first and second author conducted independent searches of PsycINFO and Educational Resources Information Center (ERIC) EBSCO for empirical English language articles published in peer-reviewed journals from 2010 through 2020. We chose this specific publication time period in light of the widespread disruptions to daily school functioning and in-school data collection resulting from the COVID-19 pandemic. As school-based data collection methods were dramatically altered or paused altogether in 2020 and 2021, we limited our search to focus exclusively on research that was conducted and published prior to these disruptions as it is likely to reflect researchers’ data collection procedures in the future as schools return to pre-pandemic functioning. Our aim was to capture all empirical articles published in this period that included a measurement of a self-regulation related variable (SR variable) that was captured within an educational setting among young children. For the initial database searches, the following search term combinations were used: (a) “emotion* regulation” AND “early childhood” + school OR preschool* + school; (b) “challenging behavior*” AND “early childhood” + school OR preschool* + school; (c) “behavior problem*”AND “early childhood” + school OR preschool* + school, (d) “self-regulation” AND “early childhood” + school OR preschool* + school; (e) “self-management” AND “early childhood” + school OR preschool* + school; (f) “executive function” AND “early childhood” + school OR preschool* + school. Many of these variables overlap in their content and definitions, such as behavior problems and challenging behaviors, however scholars often use terms interchangeably and inconsistently across studies. Our aim was to capture as many articles as possible that include variables related to self-regulation to build a catalog of studies.

Eligibility Criteria

Studies were included based on several criteria. First, studies measured at least one of the SR variables listed in our search terms. This was determined based on an article authors’ descriptions of variables and measurement tools included in the methods sections of the paper. In cases where variable labels were vague (e.g., social-emotional skills), coders determined eligibility based on descriptions of these variables and included them when they were characterized as being similar in nature to one of the search terms (e.g., regulating emotions, controlling behavior, aggression). Second, studies included child participants who were 3- to 8-years-old at the time of data collection. In cases of longitudinal designs that included data collection at ages outside of this range, studies were included if the child participants were in the 3- to 8-year-old range at the time that at least one relevant variable was measured. Third, data collection for the relevant SR variable took place in a school setting (e.g., Head Start, childcare center, public 4 k, early intervention classroom, second grade classroom). Finally, to allow comparisons across similar educational settings, only studies with a U.S. sample were included in the review. For the sake of clarity, throughout this literature review we are using the term variable to refer to the constructs that article authors describe measuring (e.g., executive function), and the term measurement tool to refer to the specific assessment, questionnaire, or task authors used to measure that variable.

Data Extraction and Coding

The first and second authors’ searches of the PsycINFO and ERIC databases yielded 5,225 studies. After removing 1,659 duplicates, the remaining 3,566 articles were screened in two phases. First, using a Qualtrics survey, the titles and abstracts of the articles were screened. If an article was coded as “exclude” the coder(s) selected an option that indicated the reason for exclusion from the drop-down menu in the Qualtrics survey. Please see Fig. 1 for specific reasons articles were removed during screening. The initial screening process yielded 583 included articles. Following this initial screening, inter-rater agreement, or the extent to which both coders agreed an article should be included or excluded, was calculated across 30% (n = 1070) of the screened articles. Percent agreement for inclusion and exclusion decisions across coders was 89%. In the second phase of screening, eligibility for inclusion was determined by conducting a full read of 583 articles, yielding 319 included articles. Percent agreement across coders in this phase was 90%.

Descriptive Coding

Descriptive data were extracted from the remaining 319 articles. An excel data sheet was created by the coders for data collection purposes. For training, 30 included articles were randomly selected for each coder to fully code and disagreements were discussed until reliability was obtained. First, coders recorded the specific measurement tool used to capture the relevant variable. It was common for multiple relevant SR variables to be included within an eligible article and for multiple measurement tools to be used to measure the same variable in a study.

Next, each coder recorded the reported informant(s) for each measurement tool. Informants were coded as either teacher, parent, other school official, researcher, peer, or child. Due to the young ages of child participants in this literature review, none of the included articles featured child self-report measures. Measurements that featured a direct child assessment that included having the child complete a specific task or test in real time were coded as having a child informant. When a child’s behaviors in a naturalistic setting (i.e., classroom, playground) were observed or video-recorded and then coded by a researcher, these measurements were coded as direct observations with a researcher informant. Thus, measurements with either a researcher or child informant represent what we consider direct child-level measurements. These are contrasted with adult-report measures, such as parent- or teacher-completed questionnaires about a child’s characteristics. In the case of an adult-reported questionnaire, the informant was considered the person who completed it. Some measurement tools feature direct assessments of children as well as an assessor’s report of the child’s overall behavior patterns during the tasks irrespective of their actual scores on the assessment (e.g., Preschool Self-Regulation Assessment, PSRA; Smith-Donald et al., 2007). When children’s direct assessment scores were considered separately from the assessor’s report, these were coded as distinct measurements. The direct assessment score was considered a direct child assessment with a child informant, while the assessor’s report was considered an observation with a researcher informant.

Next, coders recorded the SR variable (i.e., emotion regulation, challenging behavior, behavior problems, self-regulation, self-management, executive function, other, and unclear) that the specific measurement tool captured, according to the article author(s). Given that a large portion of the variables (44.5%) were initially coded as other or unclear, the coders conferred and decided to revise the coding procedures to better categorize the variables that our search yielded. A new category was then determined if the variable was reportedly measured more than five times across all included articles (e.g., effortful control, approaches to learning). Therefore, if a variable was measured fewer than five times (i.e., 0.7% of all measurements or less), it was categorized as “other” (e.g., proneness to fear, engagement, mental flexibility). This secondary coding resulted in the final SR variable categories that will be presented in the results: (a) emotion regulation, (b) behavior problems, (c) self-regulation, (d) executive function, (e) approaches to learning, (f) attention, (g) effortful control, (h) behavior/social/emotional competence, (i) behavioral regulation, (j) social-emotional skills, and (k) other.

Inter-rater agreement (IRA), or the extent to which both coders distinguished between data extracted from eligible articles, was calculated for 30% of all included articles via percent agreement; IRA was 87.8%. Disagreements between the coders were resolved by discussion. In addition, Cohen’s kappa statistic was run to assess the agreement between the two coders (k = 0.74; McHugh, 2012).

Results

Our findings are based on data extracted directly from the methods sections of 319 articles. To address our research questions, our analyses focused primarily on the specific measurement tools researchers used to capture various SR related variables in their studies. Across these included articles, authors reported 701 unique measurements of 11 different SR related variables (see Table 1). By far, the most frequently measured variable was behavior problems (38% of all measurements) followed by self-regulation (14%) and executive function (12%). Secondly, authors used 74 different measurement tools to assess their reported SR variables (see Table 2 for the most frequently used measurement tools). Lastly, in addition to analyzing the SR variables and measurement tools used across articles, we also catalogued the informants who provided information for each measurement tool. Other school officials and peers each only served as informants once among the included variables and are thus excluded from most of the analyses regarding informants. The remaining measurement tools used in the included articles were either questionnaires that adults (parents or teachers) completed on behalf of a target child or direct measurements of the target child via behavioral observation in a specific context (e.g., classroom, playground), or via direct assessment of the child. Table 3 provides a breakdown of the 11 SR variables and associated frequency for each type of informant. Adult-reports represented a considerable majority, 69%, of the measurements of all SR variables and teachers comprised a substantial portion (75%) of these, making teachers the most frequently used informant among the studies included in this analysis.

Table 1 Frequency of reported self-regulation variables
Table 2 Most reported measurement tools and frequency of use for corresponding variables across articles
Table 3 Frequency and percentage of informant by self-regulation variable

Overview of Measurement Tools

Across the 319 included articles, 74 distinct measurement tools were catalogued that captured 11 different SR variables; however, the 14 most frequently used measurement tools—those that were reportedly used 10 times or more (see Table 2)—captured 564 (80%) of the 701 total instances of an SR variable measurement. In cases where revised or modified versions of similar measurement tools were reported, these were collapsed and grouped together. For example, uses of the Teacher Observation of Classroom Adaptation—Revised (TOCA-R; Werthamer-Larsson et al., 1991), and the Teacher Observation of Classroom Adaptation—Checklist (TOCA-C; Leaf et al., 2002) are grouped together as the same measurement tool (TOCA).

In addition to calculating the number of different variables each measurement tool was used to capture, we also considered the consistency with which each tool was used to capture the same variable. Table 2 shows a breakdown of the 14 most frequently used measurement tools and the frequency with which each tool was used to measure each of the included SR variables. In the following section, we report our findings regarding the top 5 most frequently used measurement tools. Although direct assessments were the most commonly used individual measurement tool, the overwhelming majority of measurements of children’s skills came from adults’ reports.

Direct Child Assessments

The most commonly used measurement tool was a direct child assessment. We coded any measures that required the target child to complete a specific task or test in real time which was then scored by a researcher as a direct child assessment. This resulted in a measurement tool category that includes several different specific assessments or tasks such as the PSRA (Smith-Donald et al., 2007), HTKS (McClelland et al., 2014), and Dimensional Change Card Sort (DCCS; Zelazo, 2006). This kind of tool was used 164 times, representing nearly 23% of all measurements included in the review. While direct child assessments include a variety of different behavioral and cognitive tasks and assessments, this methodological approach was most often used to measure executive function (n = 78) and self-regulation (n = 50). It was also used to measure behavioral regulation (n = 9), effortful control (n = 8), attention (n = 7), other variables (e.g., learning engagement, n = 4), emotion regulation (n = 3), and approaches to learning (n = 2). By far, the most frequently used tool within this category was the PSRA (n = 69), which includes a battery of ten behavioral tasks (e.g., pencil tap, snack delay, balance beam), or one or more individual tasks selected from the battery or tasks very similar to these (i.e., gift delay; Kochanska et al., 1996). According to the developers, the PSRA is designed to capture young children’s behavioral, emotional and attentional self-regulation (Smith-Donald et al., 2007). Among the articles included in this review, researchers reported using the PSRA or similar behavioral tasks primarily to measure executive function (n = 29) or self-regulation (n = 24). For example, McLear et al. (2016) used four tasks from the PSRA battery – the balance beam, pencil tap, toy sort, and gift wrap tasks – to measure self-regulation in kindergarteners in their study investigating how parents’ representations of attachment styles and children’s self-regulation skills predict academic achievement. The HTKS, which developers describe as a measure of behavioral self-regulation and executive function, was also used 23 times to capture self-regulation (n = 12), behavioral regulation, (n = 6), and executive function (n = 5), (McClelland et al., 2014).

Social Skills Rating System

The Social Skills Rating System (SSRS; Gresham & Elliott, 1990) and the subsequent revisions—Social Skills Improvement System (SSIS; Gresham & Elliott, 2008) and the Social Rating Scale (SRS; Meisels & Atkins-Burnett, 1999), were collapsed and coded as a single measurement tool group (i.e., SSRS). SSRS is described as an assessment tool used to measure child and adolescent social domains (i.e., social skills, problem behaviors, and academic competence). SSRS is a multi-informant tool that includes rating forms for teachers, parents and children. We found that the SSRS tool group was selected 82 times. In their methods sections, study authors reported using the SSRS tool group to measure seven distinct SR variables: (a) behavior problems (n = 33); (b) social emotional skills (n = 15); (c) approaches to learning (n = 10); (d) other (e.g., adjustment problems, learning related problems; n = 8); (e) behavioral, social, and emotional competence (n = 7); (f) self-regulation (n = 7); (g) behavioral regulation (n = 2). For example, Lin et al. (2016) examined peer-relations in a rural U.S. setting. The authors included a measure of problem behaviors, SSRS, to examine the impact of children’s externalizing and internalizing behaviors as a possible mediating factor on their learning-related behavior. Lin et al. (2016) found that children with higher SSRS scores in the fall had lower learning-related behavior scores; subsequently mediating their relationships with peers.

Child Behavior Checklist

The Child Behavior Checklist (CBCL; Achenbach, 1999) is described by the developers as an assessment tool for use by teachers and parents to screen for behavioral, social, and emotional problems. The CBCL covers eight domain categories: aggressive behavior, attention problems, anxious/depressed, rule-breaking behavior, social problems, somatic complaints, thought problems, and withdrawn/depressed. We coded 78 unique uses of the CBCL across the 319 included articles. Authors used the CBCL to measure seven SR variables with 50 teacher reports and 28 parent reports. Sixty seven of these 78 measurements (86%) of the CBCL were to measure behavior problems. For example, Bagner et al. (2010) examined the psychometric properties of an assessment tool (e.g., Revised Edition of the School Observation Coding System; REDSOCS) with 64 children diagnosed with Oppositional Defiance Disorder (ODD). The authors reported in their methods section that they used the CBCL, specifically the Aggressive Behavior scale, as part of their diagnostic screening process to identify children ages 4 to 6 years old with ODD. Parents completed the CBCL as part of the recruitment and inclusion process for the study. The CBCL was also infrequently used to measure attention, social-emotional skills, and self-, emotion, and behavioral regulation (see Table 2).

Direct Observation

Researchers’ observations of children’s real time behaviors in natural settings were used to measure relevant variables 66 times across the included articles. According to authors’ descriptions, this measurement tool was most frequently used to capture behavior problems (n = 27, 41%). For example, in their investigation of a teacher implemented classroom intervention, Conroy et al. (2014) used trained researchers to observe preschool students identified as being at-risk for developing emotional or behavioral disorders (EBD) and their teachers during typical classroom activities. The observers coded teacher behaviors, student–teacher interactions, and child behaviors including disruption, aggression, and defiance. The authors found that these observed behavior problems decreased following the implementation of the intervention (Conroy et al., 2014). Our catalogue of the use of this measurement tool indicates that researchers also used direct observation to measure a wide range of other variables, including self-regulation (n = 12), attention (n = 6), emotion regulation (n = 5), approaches to learning (n = 5), other variables (e.g., academic engagement, n = 4), executive function (n = 3), effortful control (n = 3), and behavioral regulation (n = 1).

Children’s Behavior Questionnaire

The Children’s Behavior Questionnaire (CBQ; Rothbart et al., 2001) was developed to be completed by caregivers to understand the temperament of children ages 3 to 7 years. Rothbart et al. (2001) designed the CBQ to capture individual differences in temperament related to 15 characteristics (i.e., Activity Level, Anger/Frustration, Attentional Focusing, Discomfort, Fear, High Intensity Pleasure, Impulsivity, Inhibitory Control, Low Intensity Pleasure, Perceptual Sensitivity, Positive Anticipation, Sadness, Shyness, Smiling/Laughter, and Soothability) and factor analysis indicated three broad temperament dimensions: Extraversion/Surgency, Negative Affectivity, and Effortful Control. Based on our coding process, the CBQ was used 42 times across seven SR variables. The CBQ was used most often to measure effortful control (n = 18; 43%). For example, Valiente et al. (2011) examined the relation between child effortful control and academic achievement over a six-year period using a multi-measure and multi-reporter approach to capturing effortful control. Effortful control, using the CBQ, was reported by parents and teachers at pretest and observations of children in a laboratory. Valiente et al. (2011) found that children who had high levels of effortful control at Time 1 and persisted at difficult tasks achieved higher academic success years later. Additionally, it was used to capture self-, emotion, and behavioral regulation variables, executive function, and attention (see Table 2).

Discussion

As a first step towards decluttering the field of self-regulation research, our aim for this project was to document how self-regulation research is being conducted in educational contexts by cataloguing the most frequently used measurement tools, the variables that researchers used them to measure, and the informants who provided the data. We searched the psychology and educational research literatures for studies that included measurements of variables related to the monitoring and regulation of behaviors, cognitions, and/or emotions.

Tools and Informants Used to Measure SR Variables

An important first step in rectifying the problem of conceptual confusion and clutter in self-regulation research is to take stock of precisely how researchers are measuring the myriad variables that are often included in the constellation of terms related to self-regulation. Our findings revealed that some measurement tools were used almost exclusively to measure a specific variable, while others were used more diversely to capture multiple different variables. For example, most of the times that researchers used the CBCL to capture an SR variable, it was for the measurement of behavior problems. The same pattern was true for several other tools including the Behavioral Assessment Scales for Children (Reynolds & Kamphaus, 1992), Strengths and Difficulties Questionnaire (Goodman, 2006), Adjustment Scales for Preschool Intervention (Lutz et al., 2002), TOCA (Werthamer-Larsson et al., 1991), and MacArthur Health and Behavior Questionnaire (Armstrong & Goldstein, 2003). Each of these measurement tools were used to measure several different SR variables, but usage was highly concentrated in behavior problems, suggesting a fair amount of agreement across scholars regarding how best to measure this variable.

Direct child assessments were the most frequently used measurements across all variables. Forty-one percent and 31% of all direct child assessments were used to measure executive function and self-regulation, respectively. Among the direct child assessments, the PSRA or similar tasks were used most frequently, and nearly all these uses were to measure self-regulation or executive function, further highlighting the conceptual and measurement overlap across these two variables. While our catalogue shows that the PSRA battery and similar tasks were the most common specific tasks within this measurement tool category, this review did not further investigate which specific tasks (e.g., pencil tap, snack delay) were used across each of these variables. Analyzing these differences will be important to pursue to further clarify the ways in which researchers are measuring self-regulation and executive function when assessed directly from children to determine the extent of empirical overlap across these two variables (Jones et al., 2016a; Morrison & Grammer, 2016). However, it is clear that the literature on self-regulation and executive function captured in this review is almost exclusively based on direct child assessments of these skills.

This analysis also revealed several measurement tools that seem to be used quite diversely across the literature. For example, while 42% of CBQ uses were documented to capture effortful control, it was also used broadly to measure emotion regulation, self-regulation, executive function, attention, behavioral regulation, and other variables. Each of these tools contains several subscales and factors that can be dissociated from the larger scales which might contribute to their broad uses (Gresham & Elliott, 1990, 2008; Meisels & Atkins-Burnett, 1999; Putnam & Rothbart, 2006; Rothbart et al., 2001). The coding protocol for this review did not consider the uses of subscales or factor scores separately from a measurement’s full scale, so we cannot conclude whether the wide range of variables that these measures are used to capture reflect uses of the full scale or of component factors or subscales. Future work will attempt to clarify any subscales used and the variables they were used to measure.

The methodological choices that researchers make to measure variables for data collection extend beyond the selection of measurement tools to the selection of the informants who will provide the data. Informants offer differential pictures of a target child’s behaviors depending on a number of factors including whether the data is collected via observation of a child in a naturalistic context, through a direct assessment with a researcher, by questionnaire completed by a teacher or parent, or children’s self-reports (e.g., Achenbach et al., 1987; Duckworth & Kern, 2011). Across all articles included in this review, children’s SR variables were primarily measured by having adults complete questionnaires about a target child’s skills and behaviors and 75% of these adult-reports were completed by teachers. However, specific methodologies and informant use varied across SR variables. For example, behavior problems were most frequently measured by teacher-report and self-regulation and executive function were much more likely to be measured via direct assessments of the child. While the use of adult-reports to measure child characteristics has considerable practical and some empirical advantages over child-level measurements—including the time, effort, and money needed to administer them and the relative convergent validity across them—such questionnaires are also limited by threats to validity through bias and differential reference contexts (Duckworth & Yeager, 2015). Additionally, correspondence across parent- and teacher-reports of children’s social, emotional, and behavioral functioning is consistently low to moderate (Achenbach et al., 1987; Duckworth & Kern, 2011; Meyer et al., 2001), most likely reflecting variance associated with a combination of differential bias for parents and teachers and real differences in behavior patterns across settings (Gresham et al., 2018). Relying exclusively on teachers to serve as informants may be especially practical for data collection in school settings which may contribute to the overrepresentation of teachers and underrepresentation of parents as informants across all variables included in this review. However, failing to gather reports from parents in addition to teachers systematically excludes a valuable source of contextualized information about children’s behaviors and may lead to broad misunderstandings about the nature of such variables (Achenbach et al., 1987; De Los Reyes et al., 2015).

Direct measures of child performance and behavior, although considerably more resource intensive, have certain advantages over adult-reported questionnaires (e.g., reduced effects of bias, increased ecological validity); however, meta-analyses have suggested that children’s performances on behavioral tasks tend to show even weaker convergent validity across measures than among adult-reports and may require a lengthy battery of multiple tasks to approach adequate validity (Duckworth & Kern, 2011; Duckworth & Yeager, 2015; Meyer et al., 2001). Without a single gold standard measurement available to capture aspects of children’s regulatory functioning and behaviors, the use of a multi-informant approach is generally agreed upon as the best strategy for improving overall reliability and validity estimates (e.g., Achenbach, 2006; De Los Reyes et al., 2015; Duckworth & Yeager, 2015; Kraemer et al., 2003), with some scholars specifically advising the use of adult-reports in conjunction with direct assessments or observations of children’s performance (Duckworth & Kern, 2011; Duckworth & Yeager, 2015). We found differences across variable categories in the uses of multiple informants to measure SR related skills and behaviors as well as the practice of collecting adult-reports in addition to child-level measurements. While 27% of studies measuring behavior problems used multiple informant methods, only 1% included a combination of both child-level measurements and adult reports. Both self-regulation and executive function were commonly measured using multiple informants and used the inclusion of both adult-reports and child-level measures frequently. These were also the variables most likely to include child-level measures at all. Together, these patterns suggest that there may be a lack of agreement across scholars regarding the best practices for measuring them and may point to vague definitions in the literature and overall poor understanding of the nature of these variables or how they might meaningfully differ from one another.

Implications for Research and Practice

Our review of the past decade of research on children’s self-regulation skills in educational settings catalogued many specific variable terms that are used frequently in academic literature to describe such skills. Our analysis of the measurement methods used to capture these variables led to several specific recommendations for future research efforts in these areas. First, our review revealed that studies of behavior problems rely heavily on teachers’ reports of students’ behavior patterns, only occasionally included additional informants in these measurements, and very rarely measured children directly. Teachers are well-positioned to provide informed and detailed accounts of student behavior patterns. However, investigations of problematic and disproportional rates of students of color referred for behavioral problems or subjected to exclusionary discipline practices (Department of Education, 2018) have suggested that factors related to teachers’ interpretations of behaviors, such as teacher bias (e.g., Gilliam et al., 2016) and teachers’ own social-emotional competence (e.g., Jennings & Greenberg, 2009) might play a role in these referral rates (Ura & d’Abreu, 2022). To balance potential such issues with teacher reports of student behaviors, scholars studying behavior problems could benefit from using a multi-informant approach, as is the currently agreed upon best practice for such measurements (e.g., Kraemer et al., 2003). Although collecting data from multiple informants—including the child—is especially important in cases where high stakes decisions are being made about individual students (i.e., screening for diagnoses, classroom placement, treatment plans), utilizing such practices will also offer nuanced information for researchers studying the nature and etiology of behavior problems (De Los Reyes et al., 2015). Additionally, including direct measures of self-regulation or related variables along with measures of behavior problems will also contribute to the understanding of the specific role that regulatory functioning plays in children’s behavior problems and can begin to address the question of whether behavior problems represent a deficit of regulatory skills (Campbell, 2016). These methods are resource intensive – particularly direct child assessments – and may not be feasible or appropriate for some study designs, but investing in a multi-informant approach where possible might return a deeper understanding of how behavior problems emerge and best practices for prevention and support.

Self-regulation, executive function, and behavioral regulation were most often measured with child-level measurements and effortful control was also frequently measured this way, suggesting that a large portion of our recent empirical understanding of these variables is primarily determined by observations of children in natural settings or by administering a specific task for them to complete with an experimenter. Our understanding of these variables might also benefit from a multi-informant approach that includes teacher- or parent-reports of such skills. While these variables were the most likely of all the variables included in the review to be measured with multiple informants contributing data and for those to include both child-level measurements and adult-reports, these were still in the minority. Considering the resources needed to collect behavioral data from individual children via observations or direct assessments, the addition of an adult-report from either parents or teachers in such circumstances might not be prohibitively costly and could considerably improve the validity of overall measurements of these variables (Duckworth & Kern, 2011; Duckworth & Yeager, 2015).

The remaining variables included in this review were all primarily measured using teacher-reports, with very few instances of multiple informants providing data, and very little consistency in the measurement tools used to capture each variable. Although this review did not compile specific variable term definitions used in the literature, this lack of measurement consensus may point to an overall lack of clarity or precision at the definition level for some variables (i.e., behavior/social/emotional competence, social-emotional skills, other), which will naturally give rise to a hodgepodge of strategies to operationalize them. However, complete and concise definitions for these variables do exist in their respective literatures (e.g., Darling-Churchill & Lippman, 2016) and scholars have carefully considered best practices for measuring them (Jones et al., 2016c).

Limitations and Future Directions

Although we aimed to conduct an exhaustive search of the literature on self-regulation related concepts in educational settings, this precluded the inclusion of all research featuring the measurement of self-regulation. This topic is of interest to scholars across many different academic domains that extend well beyond what we covered in this review and rich bodies of literature exist for each of them (e.g., Booth et al., 2018). While including all of these domains in the search would not have been feasible, the theories, definitions, and measurement traditions within each of them may meaningfully influence the research being conducted in school contexts and consideration for each of these domains should be included in future attempts to declutter and clarify the field. Additionally, our search of the literature was limited to the use of ERIC and PsycINFO databases which are high quality sources of educational and psychological peer-reviewed literature. The use of other comprehensive databases such as SCOPUS or Web of Science might have introduced additional disciplines into our search results and reduced the possibility of selection bias in our included list of articles. Future catalogues of the SR literature should include searches of these databases.

We found that the most commonly used measurement tool across all SR variables included in this review was a direct assessment of child performance. Our data extraction procedures did not include the documentation of the specific tasks researchers used in these assessments or which variables they were used to measure, so we are limited in the conclusions we can draw regarding the use of direct assessments beyond the broad category. Similarly, direct observations of child behavior were also commonly used across many variables in the review, but we did not specify precisely which observation tools, coding schemes, or observation contexts were used. Future analyses of the literature, particularly efforts aimed at disentangling the measurement methods for self-regulation and executive function, will include a comparison of the specific direct assessment tasks or observation protocols used for each of these variables to identify any empirical overlap.

Many scholars have put forth substantial efforts towards clarifying the specific cognitive and behavioral skills involved in the regulation of the self in service of goals by aligning definitions and conceptual frameworks across subfields (e.g., Booth et al., 2018; Hofmann et al., 2012; Jones et al., 2016a; Zhou et al., 2012), and these efforts should continue. This review aimed to contribute to this body of work by providing a catalog of measurement patterns for multiple SR variables that have been studied in educational contexts in recent years. Although we aligned measurement tools and practices with the specific SR variables that study authors used them to measure, we did not document how authors were specifically defining their variables of interest or broader conceptual frameworks in which their research may be situated. This is an important aspect of the operationalization of variables and should be included in the efforts to clarify the field. A logical next step might be to connect conceptual frameworks and variable definitions that researchers are working from to the specific measurement tools and methodological practices they use to measure those variables. This work would provide a comprehensive view of what scholars think various SR variable terms represent and how they go about measuring them and would bridge the existing work on conceptual definitions to the current work on measurement practices.

Conclusion

Our aim for this review was to provide a catalog and analysis of how scholars have been approaching the study of the skills and behaviors related to young children’s self-regulatory functioning in educational settings and to identify patterns among the variables described and the measurement methods used to capture these variables. Our review revealed a broad range of different variable terms that are included in this literature and the patterns of measurement of these variables suggest there may be more clarity and empirical agreement across the field for some variables than for others. More work on this front is needed to ensure that the academic work that drives educational and policy initiatives around supporting these skills in schools is accurate and methodologically sound. Honing the operational definitions and measurement tools most appropriate for this concept will contribute to more effective and efficient communication across researchers and education stakeholders. As early childhood research becomes more interdisciplinary, it is critical that researchers share a common conceptual understanding allowing us to be more targeted in our work.