Contrasting executive function development among primary school children from Hong Kong and Germany

Previous research findings indicate that young children from East Asia outperform their counterparts from Europe and North America on executive function (EF) tasks. However, very few cross-national studies have focused on EF development during middle childhood. The current study assessed the EF performance of 170 children in grades 2 and 4 from Hong Kong (n = 80) and Germany (n = 90) in a cross-sectional design. Children completed tasks assessing the main components of EF, namely inhibition (child-friendly Stroop task), updating (Object Span task), and shifting (Contingency Naming task). Results of multilevel models showed that all three EF measures differentiated well between younger and older children across the full sample. However, contrary to our hypothesis and previous research, we did not find any significant differences in EF performance between children from Hong Kong and Germany at primary school age. Our findings highlight the possibility that issues related to the measurement of EF and features specific to Hong Kong and Germany underlie our results.


Introduction
An emerging body of research indicates that basic cognitive processes, such as executive functions (EF), are influenced by cultural contexts. Cross-national differences in children's EF development have been previously demonstrated (Imada et al. 2013;Legare et al. 2018;Tran et al. 2019). In particular, preschool children from East Asia have been shown to outperform their counterparts in Europe and North America on EF tasks (Lan et al. 2011;Oh and Lewis 2008;Sabbagh et al. 2006;Schmitt et al. 2019). Understanding cultural influences on children's EF development has been a subject of scholarly interest for over a decade . However, research assessing differences in EF across multiple countries in children beyond the preschool years is still scarce, with only a handful of cross-national studies focusing on middle childhood and adolescence (Ellefson et al. 2017;Imada et al. 2013;Wang et al. 2016). While early childhood is considered a sensitive period for EF development (Zelazo et al. 2016), EF performance is of high relevance throughout childhood and adolescence, predicting life outcomes such as academic achievement well beyond the preschool years (Bull et al. 2008;Mazzocco and Kover 2007;McClelland et al. 2013;Vitaro et al. 2005).
Research on brain maturation further emphasizes the importance of middle childhood as a significant developmental period for EF. When children start primary school, on average at around 6 years of age, the prefrontal cortex (PFC) is typically adequately developed and facilitates the top-down control processes needed to navigate formal schooling environments (Blair and Raver 2015). Hence, our understanding of cross-national variations in EF development beyond the preschool years needs to advance accordingly. The present study aims to narrow this research gap by examining the EF development of primary school age children from Hong Kong and Germany, in a cross-sectional study.
Executive functions in children EF refer to a set of top-down cognitive processes which are essential to goal-directed behavior, such as problem solving, staying focused and on-task, avoiding temptations, planning, and anticipating possible courses of action (Diamond 2013(Diamond , 2016Huizinga et al. 2006;Zelazo et al. 2016). Although views regarding the factor structure of EF vary among theorists (Ackerman and Friedman-Krauss 2017;Nelson et al. 2016), evidence suggests that there are three core EF, namely inhibition, updating, and shifting (Diamond 2013;Lehto et al. 2003;Miyake et al. 2000;Zelazo et al. 2016).
Inhibition or inhibitory control characterizes the ability to suppress distracting stimuli as well as automated responses for the purpose of controlling one's attention, behavior, thoughts, and/or emotions (Diamond 2013;Zelazo et al. 2016). Different aspects of inhibition have been identified (Diamond 2013;Nigg 2000). Inhibition of attention and thoughts are generally regarded as the cognitive facet of inhibition, whereas processes that aim to enhance control over our emotions and behavior are grouped and termed, inter alia, effortful control, hot EF, self-control, emotional self-regulation, and behavioral self-regulation (Brock et al. 2009;Edossa et al. 2018;Jones et al. 2016;McClelland and Cameron 2012). Updating or working memory refers to the process of temporary mental storage and manipulation of information (Baddeley 2000;St Clair-Thompson and Gathercole 2006). Updating is distinguishable from short-term memory, which solely encompasses holding information in mind (Diamond 2013). Shifting or cognitive flexibility or switching is regarded as the ability to switch between alternative tasks, strategies, sets of rules, mental operations, and/or perspectives (Lee et al. 2013;St Clair-Thompson and Gathercole 2006). Shifting builds on the previous two core EF, as it generally requires keeping relevant information in mind and inhibiting the previously exercised response (Diamond 2013).
EF is consistently identified as a strong predictor of academic achievement across childhood, in particular of mathematics Fuhs et al. 2014;Gestsdottir et al. 2014;McClelland et al. 2006McClelland et al. , 2007Morgan et al. 2017;Roethlisberger et al. 2013) and, further, is associated with social-emotional development (Poland et al. 2016;Riggs et al. 2006). The link between EF and academic achievement has been identified in samples of children from Europe, North America, and East Asia alike (Ellefson et al. 2020;Georgiou et al. 2020;Lan et al. 2011).

Cross-national variation in children's EF development
Scholars have investigated specific context factors that influence EF development, as these can inform the development of effective interventions (Raver and Blair 2020). The positive results of studies assessing the effectiveness of EF-enhancing training programs suggest that EF development is plastic and can be modified by contextual factors (Diamond and Lee 2011;Espinet et al. 2013;Holmes et al. 2009;Lakes and Hoyt 2004;Razza et al. 2015). Crossnational comparisons allow us to examine specifically the influence of variations in social contexts across different countries (e.g., differences in social norms and values, educational settings, curriculum, governmental organization, and family structure) on children's development. The culture-specific social context in which children's cognitive development is embedded has received increased attention as cognitive processes are "tied with social goals and with individuals learning to function as participants in cultural communities" (Correa-Chávez and Rogoff 2005, p. 8). Furthermore, social differences across cultures are argued to influence the nature of the cognitive processes of its members (Nisbett et al. 2001). Sabbagh et al. (2006) were among the first researchers to examine cross-national differences in children's EF development. They found that Chinese preschoolers significantly outperformed their age-matched US-American counterparts on several EF tasks. The reported advantage amounted to an equivalent of around 6 months of development (Sabbagh et al. 2006). Similar results have been reported in other studies that have compared EF development in children from East Asia to those from the USA and the UK (Ellefson et al. 2017;Imada et al. 2013;Lan et al. 2011;Oh and Lewis 2008;Wang et al. 2016; for a review see . Some scholars have ascribed differences in EF development between Euro-American and East Asian children to cultural values. In particular, there has been some discussion (Ellefson et al. 2017;Moriguchi et al. 2012) about the strong historic influence on East Asian culture of Confucianism, which emphasizes the importance of self-control for the purpose of social harmony (Rao et al. 2014;Yum 1988). Furthermore, it has been argued that differences in parenting styles can affect the child's EF development via parent-child interactions (Ellefson et al. 2017;Moriguchi et al. 2012;Oh and Lewis 2008). To illustrate, East Asian parents emphasize family interdependence, parental strictness, and academic achievement in their children's upbringing (Sun and Rao 2017;Chao and Tseng 2002). Another highly relevant social context in which cross-cultural differences in everyday routines manifest (Velez-Agosto et al. 2017), and in turn potentially affect children's cognitive development, is the educational setting (Rao et al. 2014;Oh and Lewis 2008). In fact, variations in kindergarten teachers' cultural beliefs have been shown to translate into differences in the structuring of play and learning, in both Germany and Hong Kong (Wu and Rao 2011).

EF development in middle childhood across countries
The vast majority of cross-national studies of children's EF development to date have focused solely on preschoolers. Yet, an examination of EF development in children across countries during the primary school years may offer pertinent new perspectives for understanding cross-national differences in EF development. From a theoretical point of view, three opposing working hypotheses seem plausible. (1) The gap between East Asian and Euro-American children might increase in middle childhood, potentially due to social norms that manifest continuously as children mature. (2) The gap may remain consistent, suggesting that the reported East Asian advantage is possibly linked to specific learning and socialization experiences in early childhood. (3) The Euro-American children might catch up to their East Asian counterparts over time. One possible explanation for the latter could be that the high self-regulatory demands of formal education advance EF development in both cultures, yet are potentially even stronger in Europe and North America as preschool education in Western countries is typically more strongly characterized by free-play when compared to East Asian pedagogy.
However, to the best of our knowledge, to date, only a handful of studies have assessed such developmental patterns in children across multiple countries beyond the preschool time frame. Initial evidence suggests that the reported developmental EF advantage of East Asian children over their Euro-American counterparts persists in middle childhood and adolescence (Ellefson et al. 2017;Imada et al. 2013;Wang et al. 2016). To illustrate, Imada et al. (2013) found that 4-to 9-year olds from Japan display greater levels of shifting performance than children of the same age from the USA. Wang et al. (2016) compared EF performance of Hong Kong-Chinese children attending public schools and ethnically diverse children attending English-speaking private schools in Hong Kong with children from the UK. The results showed that the 9-to 16-year olds in Hong Kong, regardless of type of school, performed better on the EF tasks than the British children of the same age group (Wang et al. 2016). Ellefson and colleagues found Hong Kong-Chinese adolescents to be 2 years ahead of British counterparts in EF development (Ellefson et al. 2017).

Focusing on middle childhood in Hong Kong and Germany
The current study broadens the scope of previous research by comparing EF development in children from Hong Kong and Germany in the second and fourth grades of primary school. Contrasting EF development in children from Hong Kong and Germany across this age range provides unique opportunities. In both contexts, children typically start primary school at the age of six, after having completed 3 years of preschool education. The German and Hong Kong education systems demonstrate similarities in terms of participation, teacher qualifications, and curriculum goals, but also marked differences (Faas et al. 2017). The Hong Kong education system has traditionally been characterized as strongly teacher-orientated, yet in the past decade, preschool education has been transitioning towards a notion of child-centeredness, which is an approach also common in German pedagogy (Wong and Rao 2015;Faas et al. 2017). This alignment of educational approaches in Hong Kong and Germany might encompass similar development of children's EF.
Yet, Hong Kong-Chinese children perform above the level of their German counterparts on measures of reading, mathematics, and science according to the results of the Programme for International Student Assessment (PISA) study (OECD 2019). Notwithstanding, samples from both Hong Kong and Germany score above the average of the Organisation for Economic Cooperation and Development (OECD) member countries on all three subjects assessed within the scope of PISA (OECD 2019). As EF performance is highly predictive of academic achievement (Zelazo et al. 2016), EF performance might vary correspondingly between children from Hong Kong and Germany.

Aim of the current study
In sum, the current study addresses specific gaps in the existing literature. Our study moves beyond the preschool years and focuses on middle childhood. Furthermore, targeting samples from Hong Kong and Germany holds great potential for expanding our understanding of the relationship between educational context and EF development. Hence, our research design, assessing EF performance in primary school children from Hong Kong and Germany, enables us to test the current study's main research question: Is the previously reported East Asian advantage in EF development manifest in comparisons between primary school age children from Hong Kong and Germany?

Participants
Participants included children from Hong Kong (n = 80) and Germany (n = 90), equally distributed across the second and fourth grades of primary school (for a detailed overview of the sample's age and gender distribution, see Table 1).
All participating children were recruited via local primary schools. The final sample included children from 17 classrooms across three schools (Hong Kong, one school, eight classrooms; Germany, two schools, nine classrooms). Project staff contacted the primary school directors to explain the purpose of the study and to outline what participation in the study would entail for the children and the school. After receiving consent from the school management, parental consent forms were handed out via the schoolteachers to all parents whose children were enrolled in the participating classrooms. Only children for whom parents provided written consent were included in the study. In both contexts, members of the research team contacted government-funded schools located in middle-class neighborhoods. Schools within communities with a strong prevalence of families with either a particularly high or low socio-economic background were not included in the current study in order to obtain comparable and representative samples for both cultural regions. The Hong Kong children lived in urban neighborhoods, while the German sample included children from both urban and suburban neighborhoods.

Study design and procedure
The current study implemented a cross-sectional design across the second and fourth grades of primary school. The children were tested individually in 25-to 35-min sessions in a quiet area of the school. Assessments took place towards the end of the school year, between the months of May and July. All tasks were administered in a paper-pencil format. Instructions to the children were given in either German or traditional Chinese. The instructions were initially drafted in English and then translated by native speakers into German and traditional Chinese. Standard back translation procedures were followed to ensure the equivalence of the English and Chinese versions. Graduate and undergraduate research assistants conducted the testing. Prior to data collection, all research assistants completed a 3-to 4-h training session, which included a detailed introduction to each task, practice testing with a partner, and guidance on interaction with the participating children. The research assistants coded the children's answers and reaction time during testing. Thus, in preparation, special emphasis was placed on training the research assistants on coding reaction times. In particular, start and stop signals were established. Time-measuring was started as soon as the research assistance instructed the child to "go" and was stopped right after the child completed the final item of a trial.

Measures
Task selection While a large number of EF measures for children are targeted at preschoolers (Carlson 2005), we sought measures that differentiate well in middle childhood and that were appropriate for both cultural groups. Selection of tasks that are familiar and of equal difficulty for both cultural groups are critical in cross-cultural research as inappropriate comparisons can lead to erroneous conclusions (Chen 2008). Hence, the measures administered in the current study were selected based on two important requirements: (a) the task needed to perform well across the selected age range; and (b) the task needed to be fair across the two cultures.
Inhibitionchild-friendly Stroop task Inhibitory performance was assessed using an adapted version of the Fruit and Vegetable Stroop (Roethlisberger et al. 2010), which is based on the work of Archibald and Kerns (1999). Stroop tasks aim to access the cognitive facet of inhibition (Diamond 2013), whereas other commonly administered inhibition measures such as the Head-Toes-Knees-Shoulders task (HTKS; Ponitz et al. 2008) focus more strongly on behavioral aspects of inhibition (Wanless et al. 2011). Furthermore, the HTKS task may be too easy for primary school children, so we chose an age-appropriate Stroop task. We modified the task for the cross-national nature of the current study after conducting a pilot study in Hong Kong. Roethlisberger et al. (2010) used blue plums as one of the depicted fruits in their study. However, children in Hong Kong are not typically familiar with plums, and therefore, this item was replaced with an orange carrot. The children were presented with four sets of stimulus cards. During the set 1 presentation, they were shown colored squares (yellow, red, orange, green) and were asked to name each color as fast as possible. The stimulus cards of sets 2, 3, and 4 depicted fruit and vegetables. The set 2 fruit and vegetables were shown in their natural colors (banana yellow, tomato red, carrot orange, lettuce green) and the children were instructed to name the colors (not the name of the fruit or vegetable) as quickly as possible. The colors used in set 1 and 2 were identical. Only black and white outlines of the fruit and vegetables were presented in set 3 and the children needed to name the color each fruit or vegetable would naturally have. In set 4, the fruit and vegetables were colored in the original four colors, but these colors were mismatched (e.g., carrot green, lettuce red). During this final set, the children were asked to name the normal colors of the fruit and vegetables as quickly as possible. Each set included one practice card and one test card. The practice cards showed three rows of four items each. The first practice row was used by the experimenter to explain the task and the child was then asked to practice once using the next two rows. Mistakes were only corrected during the practice trials. Each set of test cards consisted of 20 items organized in five rows with four items each. For each set, the numbers of correct answers, mistakes, and the time, in seconds, were recorded for the test trial. The dependent variable to measure inhibition was based on the child's reaction times and calculated as follows: time set 4 -[time set 1 × time set 3]/(time set 1 + time set 3)]. As the dependent variable represents the degree of interference in speed due to the Stroop stimuli, higher scores indicate lower inhibition performance.
Updating -Object Span task Updating capacity was assessed using a modified version of the Object Span task of the German Working Memory Test Battery for Children Aged 5 to 12 years (Hasselhorn et al. 2012). We deliberately did not implement the slightly more common backward digit span task. The PISA study results indicate that children from Hong Kong perform above the level of age-matched children from Germany on mathematical tasks (OECD 2019). Hence, we believe that selecting a task which involves operating with numbers might pose an unfair disadvantage for the German children. Therefore, we opted for an object span task as an indicator of updating performance. The original test is administered in a computerized format but for the current study, it was modified to a paper-pencil format. The child was presented with cards (9 cm by 9 cm) picturing objects (e.g., apple, basket, flower, cake, book, strawberry) on the front and a blue question mark symbol on the back. The experimenter first showed the child the object for 3 s. Afterwards, the experimenter turned the card around for the child to see the question mark. The child was instructed to not name the object but rather, when shown the question mark symbol, to state whether it was edible or not, by saying "yes" (edible) or "no" (not edible). The child was further instructed to memorize the objects in order of presentation and was asked to repeat them at the end of each trial. The simultaneity of demands distinguishes this kind of complex span task from simple span tasks as the children need to engage in the processing activity while holding the relevant information in mind (Unsworth and Engle 2007). The task was composed of one practice trial and five test trials, each trial entailing two spans. The spans presented during the practice trial and the first test trial were comprised of two objects each. The spans' lengths increased by one object per trial, with the longest spans (fifth test trial) including six objects. The child had to recite both spans of a trial in the correct order to advance to the next trial. If not, the task was discontinued, and the child received zero points for the remaining spans. The number of correct spans across the five test trials was coded as the dependent variable. The updating dependent variable therefore presents an accuracy measure. Each child could receive a possible maximum score of 10 for correct recalls of all spans within the five test trials.
To ensure testing was fair in both contexts, the names of the objects depicted in the original task were first translated into English and then into traditional Chinese. Only those items with a maximum of two syllables in both German and traditional Chinese were used. In an additional step, the Hong Kong-based authors of the current study rated all items on the degree to which children in Hong Kong would be accustomed to the objects. As a result, an additional eight items were excluded, leading to a total of 40 objects included in the task.
Shifting -Contingency Naming task The Contingency Naming task (Anderson et al. 2000;Taylor et al. 1987) was implemented as a measure of shifting capacity. The Contingency Naming task (CNT) is designed to assess shifting capacities in school-aged children and includes baseline naming measurement as well as one-dimensional and two-dimensional shifting (Anderson et al. 2000). During the CNT, children were asked to name objects according to either their colors or their shapes. The children were presented with an A4 stimulus card composed of three rows of nine shapes (squares, triangles, circles). These shapes were each printed in a color (yellow, red, green), and within each shape, the outline of a smaller shape (square, triangle, circle) was depicted. During the first set, the child was asked to name only the color of the object. In the second set, a new naming rule was introduced, and the child was asked to name the large outside shape. In the third set, the two naming rules were implemented simultaneously, as the child was asked to name the larger outside shape of the object if the large outside shape and the small inside shape were different from one another, and to name the color of the object if the large outside shape and the small inside shape were identical. The children were given the opportunity to practice each naming rule before completing the actual test trials using a practice card, composed of three rows with three shapes each. This practice card was used further by the experimenter to explain each naming rule to the child. The number of correct items, self-corrections, mistakes, and time in seconds was coded for each test trial. Efficiency scores, with high values indicating strong shifting performance, were calculated as the dependent variable according to Anderson et al. (2000): Efficiency = [(1/time)/SQRT(errors + 1)] × 100.

Missing data
Overall, less than 1% of the data were missing as the children were assessed in individual sessions. Hence, no imputation procedure for missing data was conducted.

Descriptive statistics and bivariate correlations
Descriptive statistics for the dependent variables are presented in Table 2. The dependent variable for the Object Span task was accuracy, with a possible maximum score of 10 (one point for each correct span). The dependent variables for the CNT and Stroop tasks were accuracy and speed, respectively. Thus, for the latter tasks, the statistical minimum and maximum were not equivalent to the actual range of correct items. All EF measures were significantly correlated across the full sample. The bivariate correlations among all EF variables for the Hong Kong and Germany samples separately are presented in Table 3. Scores on the Object Span task and the CNT were moderately correlated. The correlation between the Stroop task and the Object Span task was negative and of small magnitude. Scores on the Stroop task and the CNT were negatively correlated with moderate magnitude. It is important to note that higher test scores on the Stroop task indicated lower levels of inhibition, whereas higher scores on the other indicators corresponded to better performance; hence, the negative correlations were to be expected.

Multilevel analyses
To test for differences in EF performance between Hong Kong and Germany, linear mixed effects models were calculated on the bases of maximum likelihood estimations (Twisk 2006). All data analyses were run using the IBM Statistical Package of Social Sciences (SPSS) version 26. We opted for multilevel models to account for children being nested within 17 classrooms in three schools (Hong Kong, one school, eight classrooms; Germany, two schools, nine classrooms). t tests assessing differences in age between the two regions within each grade level yielded significant results (2nd grade, t(84) = − 3.39, p < 0.01, r = 0.12; 4th grade, t(83) = − 3.01, p < 0.01, r = 0.10). In the second grade, the German children (M age , 98.64 months) were, on average, approximately 3 months older than their Hong Kong counterparts (M age , 95.75 months). In the fourth grade, the age difference between the German (M age , 122.49 months) and the Hong Kong children (M age , 119.88 months) amounted on average to approximately two and a half months. In addition, the German sample comprised a significantly higher percentage of boys compared to the Hong Kong sample (t(168) = − 3.34, p < 0.01, r = 0.25). Thus, we controlled for both age and gender in all the multilevel analyses and adjusted for clustering within classrooms by including classrooms as a level two variable in all analyses. Separate linear mixed effects models were calculated for each of the three EF measures across both grade levels. The inhibition, updating, and shifting variables each presented the dependent variable in one of the three models. In each model, the independent variables were region (Hong Kong, Germany), and an interaction term between region and age. The continuous age variable, indicated by age in months, was conceptualized both as a control variable, assessing the main effect of age, and as the independent variable in order to test the interaction term with cultural region. The results from the linear mixed effects models are displayed in Table 4.
Inhibition The results of the multilevel analysis assessing performance on the child-friendly Stroop task showed a significant main effect of age; inhibition performance on the Stroop task improved with age across both regions combined (F(1, 169.00) = 12.59, p < 0.001). However, contrary to our hypothesis, we did not find a significant main effect of region after controlling for age and gender (F(1, 169.00) = 0.60, p = 0.44). The Region X Age interaction, while controlling for gender, was also not significant; variation in EF performance with age therefore was similar in Hong Kong and Germany (F(1, 169.00) = 0.41, p = 0.52). Furthermore, no significant effect of gender was apparent, while controlling for age and region (F(1, 169.00) = 0.68, p = 0.41).
Updating With regard to updating performance on the Object Span task, the results of the multilevel analysis yielded a similar pattern of results as for inhibition. The main effect of age was significant: across regions, and controlling for gender, older children performed better than their younger peers (F(1, 19.28) = 24.04, p < 0.001). Both the main effect of region, controlling for age and gender (F(1, 18.46) = 1.87, p = 0.19) and the Region X Age interaction, controlling for gender (F(1, 18.67) = 1.28, p = 0.27) were not significant. Thus, on average, the Hong Kong children did not perform above the level of the German children on the Object Span task and the increase in EF performance with age did not differ between Hong Kong and Germany. Furthermore, there were no apparent significant effects of gender on updating performance, controlling for age and region (F(1, 163.05) = 2.04, p = 0.16).
Shifting Shifting performance on the CNT improved with age across both regions combined, controlling for gender (F(1, 19.24) = 6.37, p < 0.05). However, the results of the multilevel analysis further indicated that efficiency on the CNT did not differ between children from Hong Kong and Germany, controlling for age and gender (F(1, 19.46) = 2.19, p = 0.16). The Region X Age interaction, controlling for gender, was not statistically significant either (F(1, 19.73) = 2.20, p = 0.15); shifting performance thus improved similarly with age in Hong Kong and Germany. Girls did not perform significantly differently from boys on the shifting measure, while controlling for age and region (F(1, 158.17) = 1.03, p = 0.31).

Discussion
The current study assessed EF performance of 170 children from Hong Kong (n = 80) and Germany (n = 90) cross-sectionally at primary school age. The children completed tasks assessing the main components of EF, namely inhibition (child-friendly Stroop task), updating (Object Span task), and shifting (Contingency Naming task). Previous research findings have indicated that East Asian children outperform their Euro-American counterparts on EF tasks ) but very few cross-national studies have focused on EF development during middle childhood (Imada et al. 2013;Wang et al. 2016). Our motivation for conducting the current study was to test whether the previously reported developmental lead of East Asian children is also evident in middle childhood in comparisons between children from Hong Kong and Germany. To test for differences in EF performance between these two cultural regions, we calculated multilevel models, adjusting for children being nested within different classrooms as well as controlling for age and gender. The results of the linear mixed effects models clearly indicate that all three EF measures differentiate well between younger and older children, with older children performing above the level of their younger peers, in Hong Kong and Germany alike. However, contrary to our hypothesis and previous research, we do not find any significant cross-national differences in EF performance between children from Hong Kong and Germany at primary school age. Furthermore, no significant interaction effects between age and region are apparent in our study.

Task characteristics in cross-national EF measurement
Our results contradict the findings of previous cross-national studies, which typically report a developmental lead in EF development for children from East Asian countries compared to Euro-American counterparts in middle childhood (Imada et al. 2013;Wang et al. 2016) as well as at preschool age and in adolescence (Ellefson et al. 2017;Grabell et al. 2015;Lan et al. 2011;Oh and Lewis 2008;Sabbagh et al. 2006;Schmitt et al. 2019). These cross-national differences in children's EF development have been recorded most consistently relating to the EF facet of inhibition . Against this background, our null result for inhibition is particularly surprising. This inconsistency might be linked to divergence between the measures used to assess inhibition. The child-friendly Stroop task we used in the current study required considerable cognitive inhibition, whereas many measures of inhibition previously implemented in cross-national research have typically required more behavioral inhibition. For instance, the Head-Toes-Knees-Shoulders task (HTKS; Ponitz et al. 2008) or Luria's Hand Game (Luria et al. 1964), which both primarily assess behavioral inhibition, was administered as measures of inhibition in many previous cross-national EF studies . Therefore, the East Asian advantage in inhibition presumably might be a function of behavioral inhibition. In fact, in East Asian culture, behavioral inhibition in children is encouraged strongly by parents, caregivers, and teachers, leading back to the strong influence of Confucianism on East Asian culture, which emphasizes the practice of self-control (Rao et al. 2014;Sun and Rao 2017;Chao and Tseng 2002). The development of purely cognitive inhibition, on the other hand, might in fact be more similar in children across Western and East Asian cultures than suggested by previous research findings. In order to test this hypothesis, further cross-national research conjoining multiple measures of both cognitive and behavioral facets of inhibition is necessary.
With regard to the updating facet of EF, our null results, on the other hand, concur with findings by Lan et al. (2011), who reported similar levels of updating development for three-to 5-year-old Chinese and US-American preschool children (Lan et al. 2011). Similar to the current study, Lan and colleagues also administered a complex span task to measure updating. The scholars argue that updating might be less sensitive to variation in contextual factors in children's daily lives across cultures than other EF facets (Lan et al. 2011). However, other studies have documented developmental advantages in updating measures for preschool, primary school, and secondary school children from East Asia compared to counterparts from Europe and North America (Ellefson et al. 2017;Oh and Lewis 2008;Sabbagh et al. 2006;Wang et al. 2016;Weixler 2012), although in some of these studies, aggregated EF scores or a latent EF factor was created, combining performance on different EF tasks, which does not enable comparison of updating performance individually. Taken together, these results do not yet give a clear picture and further research is needed to understand if children's updating skills vary cross-nationally.
We do not find any differences between Hong Kong and Germany in primary school children's shifting performances. In another recent cross-national study, Tran et al. (2019) also reported no differences in shifting performance between preschool children from Vietnam, the USA, and Argentina (Tran et al. 2019). In contrast, other studies have previously reported an advantage of East Asian children over Euro-American peers on shifting measures in middle childhood (Imada et al. 2013;Wang et al. 2016). Some scholars have documented diverging results for different shifting tasks (Moriguchi et al. 2012;Oh and Lewis 2008;Yang et al. 2011). Research by Moriguchi et al. (2012) specifically highlights the implications of task characteristics in different cultural contexts. Their findings indicate that Japanese children had greater difficulty shifting to a new dimension after having watched an adult model operate solely based on the previous dimension than their US-American counterparts. Task performance without this social manipulation however did not differ between the cultural groups (Moriguchi et al. 2012).
Taken together, the diverging findings of the current study in contrast to previous research stress the notion that the characteristics of EF measurement are of high relevance in understanding the influence of contextual factors on EF development during childhood. The large variation in tasks used to measure EF in cross-national studies poses particular challenges. First, the integration of findings derived via heterogeneous task formats is more ambiguous. Second, as outlined above, EF testing is socially situated. The social context receives particular significance in cross-national research, as task implementation is inherently imbedded in the cultural context, which in turn is intertwined with the social setting of the task. Therefore, special emphasis needs to be placed on task selection and task diversity within future cross-national EF research.

Similarities of the education systems in Hong Kong and Germany
While potential reasons for previously reported cross-national differences in early EF are not fully understood, it has been speculated that variations across educational systems underlie these differences (Oh and Lewis 2008;Wang et al. 2016). To the best of our knowledge, the current study is the first to compare direct assessment of EF in children from Germany and Hong Kong. Thus, it seems plausible that unique characteristics of the respective education systems engender conformity in EF development.
East Asian pedagogy, during preschool and early primary school years, has historically been described as more controlling and teacher-centered, as well as emphasizing obedience more than educational approaches common in Europe and North America, which traditionally focus more strongly on free-play and child-centeredness (Wu and Rao 2011;Faas et al. 2017;Tobin et al. 2009;Tobin et al. 1989). Previously, scholars have argued that the structured and regulated educational setting common in many East Asian educational contexts fosters EF (Oh and Lewis 2008). However, the postulated relationship between the degrees of structure in children's daily lives and educational settings and EF development could in fact also be argued from the opposite point of view. In fact, research by Barker et al. (2014) on the nature of the relationship between children's contextual experiences in their daily lives and EF development showed that more time spent in less-structured activities (such as free-play) is beneficial to children's self-directed EF (Barker et al. 2014). Time spent in structured activities (such as soccer practice, piano lessons, tutoring), on the other hand, negatively predicted self-directed EF. Thus, there have been some arguments in favor of contextual opportunities for children to mind-wander and play to foster EF development (Barker and Munakata 2015).
The Hong Kong early childhood education system has also traditionally been characterized as strongly teacher-orientated, yet in recent years, it has undergone a phase of transition towards a notion of child-centeredness, which is an approach also common in German pedagogy (Wong and Rao 2015;Faas et al. 2017). Furthermore, the German and Hong Kong education systems share similarities in terms of participation, teacher qualifications, and curriculum goals (Faas et al. 2017). It might seem plausible that the alignment of educational approaches in Germany and Hong Kong results in equally beneficial developmental conditions for children's EF development in the two contexts.

Limitations
The current study was conceptualized as a pilot study and therefore subject to limited resources. As a result, a number of limitations need to be addressed. In particular, our study deployed a cross-sectional design. Hence, no information regarding children's development over the course of the primary school years can be derived. Furthermore, we cannot exclude the possibility of cohort effects. In addition, we were unable to obtain specific information on socio-demographic variables from the families themselves. We aimed to achieve comparable samples by implementing requirement restrictions, as schools from both neighborhoods with a strong prevalence of very high or very low SES were not included in the current study. The current study, like the majority of previously published cross-national studies on EF, focused on children from middle-class communities. This strategy, however, limits the generalizability of the reported results. Furthermore, our approach has not allowed us to determine the extent to which potential differences in SES between the samples from the two countries contributed to our findings. Having said that, although SES has been identified as a significant predictor of EF development (Calvo and Bialystok 2014;Chung et al. 2017;Hackman et al. 2015;Hartanto et al. 2019), initial evidence indicates that cross-cultural differences in EF growth in preschool children do not vary by SES (Schmitt et al. 2019).
Regarding the measurement of EF, we would further like to address specific constraints within the current study. First, two of the administered EF tasks, namely the CNT and the Stroop task, entailed a measurement of speed. Given the paperpencil format of these tasks, such time measures were potentially subject to human error. Some current research has offered new insights into the benefits of computerized EF testing which present a promising approach for further research in the field (Ellefson et al. 2017;Legare et al. 2018;Willoughby and Blair 2016). Second, we used an adapted version of the Object Span task from the German Working Memory Test Battery for Children aged 5 to 12 years (Hasselhorn et al. 2012) as the measure of updating capacity. As the PISA study results show that Hong Kong children outperform German counterparts on mathematics (OECD 2019), we opted for an Object Span task instead of a digit span task. We believe that a task involving numbers might disadvantage the German children. However, the Object Span task was adapted from a computerized format to paper-pencil testing. This adaption potentially affected task validity. Furthermore, the depicted objects were originally designed to target children growing up in Germany. Although we excluded items unsuitable for cross-national comparison, we cannot fully guarantee that the measure was not culturally biased. Having said that, recent findings, demonstrating metrical and functional measurement equality for a number of German complex span tasks and translated versions implemented in a North American sample, seem promising in that regard (Rummel et al. 2019).

Implications and future directions
In light of the inconsistent findings of the current and previous studies, the need for further research in the field, particularly in middle childhood, is clearly evident. EF performance is identified consistently as a strong predictor of academic achievement Fuhs et al. 2014;Gestsdottir et al. 2014) in samples of children from Europe, North America, and East Asia alike (Georgiou et al. 2020;Lan et al. 2011). Hence, cross-national variations within the developmental trajectories of EF at primary school age as well as the influence of educational contexts in different nations on EF development entail highly relevant practical implications for promoting EF and academic achievement in all students.
Having said that, understanding the underlying mechanisms of cross-national similarities and differences in EF development remains a scholarly challenge. We share the view of Sarma and Thomas (2020) that cross-national EF research needs to focus more strongly on the characteristics of the cultural context in which the children's development is embedded. Investigating everyday practices of social groups, such as interactions between children, parents, and teachers, has been proposed as a promising approach to understand the influences of culture on human development (Velez-Agosto et al. 2017). The current study aimed to follow this line of thought by focusing on cross-national variation within the context of primary school education and corresponding patterns in EF development. However, the current study can only be seen as a starting point and further research is needed. We believe that the systematic comparison of different teaching styles at different ages across countries, and their effects on children's EF development, poses a particularly interesting approach for future research to understand variations in EF development across countries. In addition, longitudinal assessment is needed in order to test causally for a relationship between educational settings and EF. In our view, both the systematic assessment of variations in pedagogic approaches and longitudinal research are critical to isolate potential effects of education contexts from other contextual influences on children's EF development across cultures, such as cultural norms and family factors. Ultimately, we might be able to answer the intriguing question of which pedagogical approach seems most promising in fostering the development of children's self-regulation.
With regard to future research, we would further advocate the inclusion of multiple indicators for each EF sub-facet in order to attain a broader assessment of the constructs. Within the current study, the EF facet inhibition, updating, and shifting were assessed by one task each. The combination of multiple measures of cognitive and behavioral EF within one study is needed in order to test for distinct differences with regard to measurement approaches. Furthermore, implementing tasks assessing "hot" EF alongside measures of "cold" EF within cross-national research presents an opportunity. Hot and cold EF have both been linked to life outcomes (Duckworth and Seligman 2005;Peterson and Welsh 2014;Zelazo and Carlson 2012); however, little is known about how children's development varies across cultures within in these two domains of EF. Taken together, examining unique developmental patterns for specific formats of EF assessment across countries would significantly advance our understanding of cross-cultural variation within children's development of EF.

Conclusion
The aim of the current study was to assess EF development in primary school children in Hong Kong and Germany in a cross-sectional pilot study. We were interested in examining whether the previously reported developmental lead of East Asian children compared to children from the USA, Canada, and the UK is also evident in middle childhood and in contrast with other Western countries, in this case, Germany. Contradicting our hypothesis, we do not find any significant cross-national differences or significant interaction effects between age and region in EF performance between primary school children from Hong Kong and Germany. Our findings highlight the possibility that characteristics specific to Hong Kong and Germany underlie our results but require independent replication and highlight the need for further research in the field.
Authors' contributions All authors contributed to the study conception and design. Material preparation and data collection were performed by K. Schirmbeck, R. Wang, and S. Chan. Data analysis was performed by K. Schirmbeck and supervised by B. Richards. This research was conducted under the supervision of N. Rao and C. Maehler who contributed to the research process and the different versions of the manuscript. The first draft of the manuscript was written by K. Schirmbeck and all authors read and approved the manuscript.
Funding Open Access funding enabled and organized by Projekt DEAL.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Code availability Not applicable APPENDIX The reported p values present the results of t tests assessing the difference in age and gender distribution between the samples from Hong Kong and Germany CNT, Contingency Naming task; Min., statistical minimum; Max., statistical maximum Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.