Identifying early predictors of later mathematics achievement is important to theoretically understand mathematical development and implement targeted intervention to prevent at-risk children from falling further behind (Gersten et al., 2005; Jordan et al., 2009). To this end, early arithmetic ability and strategy use seem promising, as they have been found to be related to later mathematics achievement (Bailey et al., 2012; Geary, 2011; Geary et al., 2017).

Although children tend to demonstrate a repertoire of strategies for single-digit arithmetic—from counting to fact-based strategies—irrespective of their general mathematical ability (Torbeyns et al., 2004), there are substantial differences on the individual and group levels in the frequency of use of these strategies (Gray & Tall, 1994; Siegler, 1996; Sunde et al., 2020; Torbeyns et al., 2004). In general, high-achieving children tend to use counting strategies less often than fact-based strategies, that is, direct retrieval and using known facts to derive the answer, whereas the opposite is the case for low-achieving children (Dowker, 2014; Torbeyns et al., 2004).

Strategies for single-digit addition

Strategies for single-digit addition are categorised into counting strategies and fact-based strategies. Counting strategies are considered the first strategies children are introduced to and even use before formal schooling (e.g. Baroody & Wilkins, 1999; Clements & Sarama, 2007). Counting as an addition strategy requires an understanding of one-to-one correspondence and cardinality (Fuson, 1992) and is considered less advanced compared to fact-based strategies. Counting strategies can be silent or verbal, and manipulatives or fingers can be used either to represent the addends or to keep track of when to stop counting (Baroody, 1989; Carpenter & Moser, 1984; De Corte & Verschaffel, 1987). Both material and verbal counting can be divided into several subcategories, but the most common is to distinguish between count all, in which both addends are counted, count on from first, in which the child counts on from the first addend, and count on from larger, in which the child counts on from the larger of the addends, also known as the min-count strategy. Count all is considered to be the least advanced, and min-counting is the most advanced counting strategy. Furthermore, material counting (counting using physical objects) is less advanced than verbal counting.

Fact-based strategies comprise direct retrieval, in which the sum is retrieved directly from memory, and derived fact strategies, also known as decomposition or regrouping, in which the sum is calculated by decomposing the addends into other known sums (which are subsequently retrieved). This is a step-wise process in which known addition facts are used to derive the result as, for example, the near-ties strategy (e.g. 7 + 8 = 7 + 7 + 1 = 15 or 8 + 8 − 1 = 15) and the bridging ten strategy or base-ten decomposition (e.g. 8 + 5 = 8 + 2 + 3 = 13). Thus, fact-based strategies build on prior knowledge of number facts (e.g. Geary, 2011; Threlfall, 2002) as well as the understanding of part-whole relations and partitioning of numbers (e.g. Ambrose et al., 2003; Gray & Tall, 1994).

In general, fact-based strategies are construed as more advanced strategies compared to counting, not only because they are generally faster than counting strategies (Cowan, 2003; Cowan & Powell, 2014) but also because they build on more complex number understanding. The use of derived fact strategies has, for example, been linked to a conceptual understanding of addition principles and the decomposing of addends (Canobi et al., 1998, 2003). Direct fact strategies and derived-fact strategies have been associated with symbolic numerical magnitude processing skills early in grade one (Vanbinst et al., 2015) and knowledge of base-10 number structures in kindergarten (Laski et al., 2014). Furthermore, better conceptual subitising skills—that is, the ability to recognise and combine small sets without counting (Clements, 1999)—at the beginning of grade one have been found to significantly predict the use of fact-based strategies by the end of grade one (Gaidoschik, 2012). For grades two and three children, the use of decomposition strategies has been found to be related to the understanding of mathematical equivalence (Chesney et al., 2014). Thus, children who understand that a number can be represented by many combinations of addends (e.g. 6 = 2 + 4, 6 = 1 + 5, 6 = 3 + 1 + 1 + 1) and can organise their addition knowledge based on these equivalent combinations have a better understanding of mathematical equivalence based on scores in a three-component measurement on the ability and strategies for (a) solving equations (e.g. 4 + 5 = 3 + _), (b) encoding equations (e.g. correctly rewrite an Eq. (4 + 5 = 3 + _) after viewing for 5 s), and (c) defining the equal sign.

Early predictors of later achievement

Learning difficulty screenings are often conducted using standardised norm-referenced achievement tests and curriculum-based measurements (CBM) (Lockwood et al., 2021). However, the use of total scores in these broader measures does not reveal the child’s specific areas of difficulty. Even when testing within a single topic like arithmetic, as many CBM in mathematics do, this can be an issue. As Dowker (2005) pointed out, ‘arithmetic is not a single entity’ (p. 324). Arithmetic is made up of many different components of both procedural and conceptual knowledge, including knowledge of facts and procedures and understanding and using arithmetical principles. It is not uncommon for children to be proficient in one component but struggle in another (Dowker, 2005). Furthermore, achievement tests are generally scored based on correct or incorrect answers to different kinds of items. Children with poor arithmetic understanding may perform quite well in untimed achievement tests on simple problems in arithmetic if they have developed fluency in less advanced strategies, such as counting. Jordan and Montani (1997) for example found that children with mathematical difficulties performed worse than their non-impaired peers in timed but not in untimed conditions, and this could be related to the skilful, but slow, use of counting strategies by the children with mathematical difficulties. Thus, the frequency with which children use immature versus more advanced strategies may possibly be more informative about their arithmetic understanding than the number of problems they solve correctly. It could therefore be expected that information on early strategy use would provide additional information on later achievement that is not available from the overall results of a standardised achievement test. Such information is important not only for detecting children at risk but also for designing targeted teaching.

Longitudinal studies have further shown that strategy use predicts later mathematics achievement: children who mainly use fact-based strategies early in school tend to have higher overall mathematics performance in later years compared to children who mainly rely on counting strategies (Carr & Alexeev, 2011; Carr et al., 2008; Fennema et al., 1998; Geary, 2011; Geary et al., 2004; Gersten et al., 2005; Nguyen et al., 2016; Ostad, 1997; Price et al., 2013). Geary (2011), for example, found that early (grade one) arithmetic strategies, including the skilled use of counting on, derived fact, and direct retrieval strategies, positively predicted later mathematics achievement in grade five. Furthermore, Bailey et al. (2014) found that children’s whole-number arithmetic knowledge (number of correct arithmetic tasks and direct retrieval of single-digit additions) in grade one predicted knowledge of fraction arithmetic in grade seven and knowledge of fraction magnitude in grade eight. Bailey et al. (2014), however, focused on the direct retrieval strategy and did not investigate the relative contribution of other arithmetic strategies, such as counting and derived fact strategies. However, the apparent differences in the knowledge base underlying the use of distinct strategies, as outlined above, and the previous findings (e.g. Bailey, 2014; Geary, 2011) suggest the need for more detailed studies on the relationship between early strategy use, more specifically different strategies, and later achievement.

Understanding these connections in the development of mathematical understanding and achievement is of interest for establishing early predictors of later achievement and for developing mathematics curricula and teaching (e.g. Bailey et al., 2014). Expanding our knowledge of the relationships between early strategy use and later mathematical development could provide valuable information for developing targeted teaching and early intervention. The current study contributes to our understanding of these relationships by investigating the predictive contribution of different arithmetic strategies on later achievement in mathematics.

Present study

In this study, we investigated whether grade one (6-year-old) children’s strategy use in single-digit addition provided predictive information about their mathematical achievement 3 years later that is not available yet from a standardised mathematics achievement test. By means of a statistical multilevel approach, we analysed the relationship between the use of different strategies for single-digit addition in grade one and mathematic achievement in grade four, controlling for grade one mathematics achievement, non-verbal reasoning skills, and sex, with children nested within classes. When analysing data that are influenced by teaching and classroom culture, it is important to address and adjust for between-class variation, which we do by addressing these multilevel structures in our analyses.

In general, girls seem to use counting strategies more often than boys, while boys use fact-based strategies more often than girls (Bailey et al., 2012; Carr & Alexeev, 2011; Carr & Davis, 2001; Shen et al., 2016; Sunde et al., 2020). In the analyses, we controlled for possible sex differences in these relationships, as we previously reported extensive sex differences in strategy use in grade one for a larger sample, including the present sample (Sunde et al., 2020). Furthermore, children’s non-verbal ability is known to play a role in mathematics achievement (e.g. Taub et al., 2008), and this was controlled for by measuring children’s non-verbal reasoning skills.

Methods

Participants

The data consisted of assessment results from grade one and grade four of children from six classes in three different Danish public schools. Informed written parental consent was obtained for all children. The children were informed orally about the study and that participation was voluntary. In November 2015, all children in the individual grade one classes (109 children in total) participated in the assessments of mathematics achievement and non-verbal reasoning skills. For the strategy interview, 11 to 15 children from each class were sampled at random but balanced with respect to gender (79 children in total). In August of grade four, the children’s mathematics achievement was assessed. Due to dropout and missing assessment results, that is, the child was not present on the day of assessment, the final sample consisted of 61 children (34 girls, 27 boys). This sample size represents 80% probability that a true (medium) effect size of |r|≥ 0.35 (r2 ≥ 0.12) would come out as statistically significant (p < 0.05 in a two-tailed test) in a correlation analysis (if split on sex; girls: r2 ≥ 0.22, boys: r2 ≥ 0.27). The mean age at the start of grade one (15 August) was 6.3 years (SD = 0.34; range, 5.6–7.1; no difference between boys and girls: t59 = 0.29, p = 0.77).

Assessment of strategy use grade one

Strategy use in grade one was assessed in one-to-one interviews in which the child was presented with flashcards with single-digit addition tasks. The set of tasks comprised 36 addition tasks with addends 2 to 9, including the doubles (Table 1). One of each pair of commutative tasks was selected to ensure an equal number of tasks with a larger addend first. Flashcards with sums less than 10 were presented first to avoid less confident children giving up or only using counting because they were presented with difficult tasks (e.g. 6 + 8) at the beginning of the interview. The sequence of the flashcards was the same for all the children.

Table 1 List of items used for assessing strategy use and sequence of presentation

The interviews took place in the child’s school in a familiar, quiet setting. During the interview, the child did not have access to manipulatives or paper and pencil, but the use of fingers and gestures was allowed. The child was asked to find the answer to the given task and to self-report the strategy used on a trial-by-trial basis. The researcher stated to each child, ‘I will show you some single-digit addition tasks. First, I would like you to find the answer to the task, and then we will talk about how you found the answer. There are many ways to find the answer to an addition task. Sometimes you might know the answer or count, or perhaps you use other tasks to find the answer. I am interested in knowing how you find the answer’. Then the child was presented a flashcard, for example, 4 + 5, and the interviewer asked, ‘What is the answer to four plus five?’ If the child did not give an explanation following the answer, the interviewer asked, ‘How did you find the answer?’ and if further prompting was needed, ‘Did you count, or did you just know the answer, or did you use some other tasks you know to find the answer?’ For each item, accuracy and strategy use were recorded. Each interview lasted 10–30 min.

The children’s answers were categorised during the interview into five predetermined categories, four of which represented strategy use, and one represented failure in solving the problem (discharged). As for the present purpose, we were only interested in strategy use resulting in success (i.e. the skilled use of strategies); incorrect answers or instances in which the child gave up were coded ‘error’ and excluded from the analyses. Correct answers were categorised based on the child’s self-report of strategy use and observations by the interviewer (e.g. visible signs of finger counting or lip movements). In the case of disagreement between the child’s self-report and the observed overt behaviour, the strategy was categorised based on the latter. Correct answers were categorised in the following four categories: ‘counting all’ (C-all, counting both addends and then all together), ‘counting on’ (C-on, counting on from one of the addends; we do not distinguish between counting on from first and larger addend as it is not possible to separate the two in items of the form a + b where a is larger or in a + a items), ‘direct retrieval’ (DiR, reported just knowing the answer), and ‘derived fact’ (DeF, decomposing addends and calculating answers using automatised sums with subsequent use of addition, e.g. 4 + 5 = 4 + 4 + 1 or subtraction, e.g. 4 + 5 = 5 + 5 − 1). We did not distinguish between self-reports of mental counting and visible (e.g. on fingers) or audible counting. Neither did we distinguish between material and verbal counting.

The child’s use of the different categories was assessed as the proportion of answers to the total number of correctly solved tasks scored as being solved with a given strategy (C-all, C-on, DiF, or DeF). The original values were logit-transformed prior to the analysis (logit p = ln[p/{1-p}]). We substituted zero values (not defined on the logit scale) by 0.01, which is 2.8 times lower than the minimum achievable value larger than 0 (1 of 36 items = 0.028).

Assessment of mathematics achievement grade one, MAT

Children’s mathematics achievement in grade one was assessed with a curriculum-based Danish standardised test, MAT (Jensen & Jørgensen, 2007). MAT is a paper-and-pencil test with 105 items (a mix of open answer and multiple-choice) on number knowledge (36), addition and subtraction in context (4), addition with maximum sum of 20 (23), subtraction items with maximum minuend of 20 (23), geometry and measurement (14), and data (5). For a detailed description of the MAT test, see Sunde and Pind (2016). The number of correctly answered items was the dependent variable in our analyses.

Assessment of non-verbal reasoning skills grade one, CHIPS

Children’s non-verbal reasoning skills in grade one were assessed with Children’s Problem Solving (CHIPS), a Danish test for 6- to 12-year-old children (Hansen et al., 1993; Kreiner et al., 2006). It is a paper-and-pencil test and consists of 40 Raven’s matrices items. Each task consists of a picture or a matrix with a missing piece. The child has to choose by crossing out the correct piece among six options displayed beneath the picture or matrix. For the present purpose, the total number of correct answers was used.

Assessment of mathematical achievement grade four, T-Mat

Mathematical achievement in grade four was assessed by a computerised test, T-Mat, consisting of the following four sub-tests: (1) number and arithmetic (AR) comprises 40 items of number knowledge and calculation in the four operations, whole number and decimal number; (2) fractions (FR) comprises 36 items on comparing fractions, addition and subtraction with fractions; and (3) equations (EQ) consist of 27 items on missing number equations (e.g. 31 − ___ = 23) and simple equations (e.g. 2x = 16); and (4) word problems (WP) include 20 word problems in the four operations and fractions. AR, FR, and EQ were time-restricted (10 min). In the WP, children were allowed to use a calculator. Using calculators is standard procedure in Danish classroom practice from Grade 1 on. The score for each test was the number of correct items.

The children were tested at the beginning of grade four in the classroom using either tablets or computers. All children were familiar with using computers or tablets for testing as well as in everyday lessons. The children’s mathematics teacher administered and timed the tests.

In the subsequent analyses, we used the total score (AR + FR + EQ + WP) as the dependent variable. To calculate the total score, we compensated for incomplete sub-test results by imputing missing sub-score values with the average sub-score for the other children of the same sex. Even though the replacement of missing values with population averages may introduce a small bias towards underestimation of the underlying variation in year four achievement scores, any such bias will be conservative in relation to the estimate of the relationship between grade one predictors and the grade four response variable.

Statistical analyses

All statistical analyses were run in SAS 9.4.

Variation within and between grade one predictors

We first measured the mean and spread of grade one test scores for each sex and tested for differences in mean and variance with a general linear mixed (multilevel) model (GLIMMIX procedure), with sex as a fixed effect, class as random effect, and with different variances for boys and girls. To explore the extent to which the six different grade one test scores (four strategy use variables, MAT, and CHIPS) correlated internally, we created a correlation matrix of these predictor variables.

Variation of grade four achievement in general and as univariate functions of grade one predictors

We described the mean and spread of the grade four measure of mathematics achievement, T-Mat, for each sex and tested for sex differences in mean and variance with a general linear mixed model (GLIMMIX procedure), with sex as a fixed effect with different variance for boys and girls and class ID as random effect. We then assessed the extent to which grade four T-Mat scores correlated with the different grade one predictors by creating a correlation matrix between the grade four mathematical achievement scores and each grade one predictors. The amount of variation explained in the grade four achievement score by the individual grade one predictors was assessed from the squared values of Pearson’s r.

Grade four achievement as complex functions of grade one predictors

To investigate how much variation in grade four achievement (T-Mat) could be explained by the available grade one predictors, we used a forward selection approach, where the most significant predictors were included one by one as long as they contributed with significant (p < 0.05) additive information to the model (PHREG procedure). By using this model selection approach, we ensured that the models were not over-parameterised, as all selected predictor variables were statistically justified. As the use of the four strategy variables summed up to 1, maximum three strategy use predictors could enter the same model, as inclusion of the first three variables made the fourth variable redundant. In practice, we expected maximum two strategy use predictors to be selected in the same model, as the proportional use of the different strategies was highly intercorrelated (see later). For the selected models, we assessed the extra and accumulated amount of variation explained by each predictor from Type-1 sum of squares (GLM procedure). We also estimated the partial coefficients, adjusting for class as random effects (GLIMMIX procedure).

We created three types of predictive models: (1) Model A selected stepwise among the four strategy variables (C-all, C-on, DiR, and DeF); (2) Model B had MAT and CHIPS forced in as mandatory variables to which a step-wise selection procedure added significant predictors among the four strategy variables and sex; and (3) Model C was similar to Model B, except that all predictors were equally eligible to be selected.

The rationale for Model A was to establish how much significant variation in grade four achievement could be explained by grade one strategy use predictors only. The purpose of Model B was to explore how much significant variation in grade four mathematical achievement that had not already been explained from the general grade one mathematical achievement test and non-verbal reasoning skills that possibly could be explained by strategy use in grade one. Hence, Model B would only include strategy use variables and/or sex if they contributed significant information after the MAT test and CHIPS variable had been forced into the model. The purpose of Model C was to explore how much significant variation in grade four achievement could be explained in total from all the available predictor information from grade one. Model C also showed whether strategy use information by early grade one was sufficient to explain all relevant variations in grade four mathematical achievement without the inclusion of information from the MAT and/or the CHIPS test. In that case, the selected model would include strategy use predictors but not the MAT and/or the CHIPS test variable.

To ensure that the aforementioned models did not blur any possible differential response patterns by boys and girls, as a post hoc model check, we tested for possible interactions between sex and the covariate(s) if included in the model selected by run C (no such associations were revealed).

Results

Strategy use, achievement, and non-verbal reasoning skills in grade one

In grade one, the 61 children on average solved 33.6 of the 36 items of single-digit addition problems correctly (93%, median = 35.0 [97%]; range, 22–36 [61–100%]). Of the correct answers, the children on average used C-all for 26%, C-on for 34%, DiR for 27%, and DeF for 13% of the items (Table 2). The frequency by which a specific strategy was used differed significantly between boys and girls for DiR (girls, 23% vs. boys, 32%, Table 2) and DeF (6% vs. 21%, Table 1). Boys also had a higher variance in the frequency with which they used the C-on strategy (Table 2). Boys and girls had similar means and variances in MAT and CHIPS scores (Table 2).

Table 2 Central tendencies and spread of test score results in grade one, divided on and tested between sexes

Several frequencies of strategy use were correlated (Table 3). Most notably, the least advanced strategy C-all correlated negatively with the use of all other strategies (r =  − 0.64 −  − 0.43, Table 3). By comparison, the use of the two most advanced strategies—DiR and DeF—correlated positively (r = 0.53, Table 3). MAT scores correlated negatively with C-all (r =  − 0.41) but positively with C-on (r = 0.25), DeF (r = 0.29), and CHIPS score (r = 0.49) (Table 3).

Table 3 Pearson correlation coefficients between grade one test scores of the 61 children

Mathematics achievement in year four as simple functions of grade one predictors

In the grade four achievement test, girls and boys had a mean T-Mat score of 27% and 33% correct answers, respectively. Boys had significantly higher mean and variance in T-Mat scores than girls (Table 4).

Table 4 Central tendencies and spread of test score results in grade four, divided on and tested between sexes

The T-Mat achievement score for grade four correlated significantly with all grade one test scores (Table 5), with C-all (negative), MAT (positive), and CHIPS (positive), explaining 29–31% of the variation (r2). With the exception of positive correlations between DiR and grade four achievements, that was significantly larger for boys than for girls, the direction and magnitude of the correlation coefficients were similar for boys and girls (Table 5).

Table 5 Correlations (Pearson’s r with 95% confidence intervals) between variables of mathematical achievement in grade four (columns) and test scores in grade one (rows)

Mathematics achievement in grade four as complex functions of grade one predictors

The selection procedure among strategy use variables (Model A) resulted only in a model comprised of C-all as the only predictor, with 30% explained variation (negative correlation: Table 6, Fig. 1A). In the model with forced entry of MAT and CHIPS (Model B), C-all was still included as a predictor, explaining an additional 12% of the variation to the 37% already explained by MAT and CHIPS. Sex (higher scores for boys) was also included in the model, explaining a further 5%, resulting in a total of 54% of the variation in grade four achievement being explained by fixed effects available in early grade one (Table 6, Fig. 1B). The model with the open entry of all predictors (Model C) resulted in the same model as Model B (Table 6). Post hoc examinations revealed no significant interactions between sex and any of the other significant grade one predictors (all p > 0.05).

Table 6 Selected models to predict achievement in total math achievement score in grade four from predictors of strategy use and achievement in grade one, sex, and class identity, identified by different constrained forward step-wise selection procedures (A, only strategy use variables eligible; B, Mat and Chips forced into the model; C, all predictors equally eligible). With the exception of predictors forced into the models (B) and class (entered manually at last), the order of the variables reflects the order by which they were included in the models (most significant predictor entered, second most significant predictor entered second, etc.)
Fig. 1
figure 1

A Scatter plot and predicted function (grey area indicates 95% confidence zone) of T-Mat score as function of C-all (selected model from run A in Table 6; R2adj = 0.30). B Observed T-Mat scores regressed (grey area indicates 95% confidence zone) against predicted T-Mat scores (multiple function of MAT, CHIPS, C-all and sex: selected model from Model B and C in Table 6; R.2adj = 0.54)

Discussion

Our analysis shows that more than half of the variation (54%) in grade four mathematics achievement can be explained as a combined function of strategy use (C-all), mathematics achievement (MAT), non-verbal reasoning skills (CHIPS), and sex in grade one. More notably, our results suggest that data gathered on how children solve simple addition problems in grade one (strategies) provide significant predictive information about their future mathematical performance beyond the (performance-based) information that is achievable from standardised tests of mathematics achievement used in primary schools. This conclusion is based on the finding that the strategy use predictor C-all was not only the single most important grade one predictor (30% explained variation), but especially that it explained a significant amount of additional variation in grade four achievement (12%) not accounted for by other predictors of mathematics achievement (MAT) and non-verbal reasoning skills (CHIPS). The mathematics achievement test in grade one could be considered a rather strong control, that is, an autoregressive effect of the expected stability of individual differences in similar measurements (construct) across time, thus contributing to the strength of the results.

Our explanation for this finding is that the mathematically simple problems in the achievement test designed for grade one children could be efficiently solved by the use of the unsophisticated C-all strategy, especially because the achievement test in grade one was not time limited. Thus, children who relied excessively on C-all in their mathematical problem solving in grade one, without having developed any more advanced counting or fact-based strategies, could perform reasonably well in this untimed mathematics achievement test in grade one. This is in line with the findings of Jordan and Montani (1997), that the performance of MLD children equalled their non-impaired peers in untimed but not in timed tests, and these differences in the two test situations could be related to the MLD children’s skilled use of backup strategies. Thus, our findings suggest that more fundamental problems of understanding number and arithmetic, which may influence achievement in conceptually more demanding mathematics 3 years later, were not revealed in grade one as clearly through the grade one scores on the standardised achievement test (MAT) as if the information from strategy use patterns had been included.

Overall, our findings are in line with previous studies indicating that advanced counting (Geary, 2011; Nguyen et al., 2016) and derived fact strategies (Casey et al., 2017; Foley et al., 2017; Geary, 2011; Geary et al., 2017) are strong positive predictors of mathematical achievement. In this study, we found the less sophisticated counting strategy, C-all, to be the strongest, albeit negative, predictor among the strategy categories measured. C-all is a highly procedural strategy that presupposes fairly simple number understanding (Fuson, 1992) compared to more sophisticated fact-based strategies. Therefore, frequent use of C-all in grade one could be related to a less developed conceptual understanding of numbers and arithmetic relations. This resonates with the findings of Chesney et al. (2014) of the relationship between conceptual knowledge of decomposition and equivalence, further highlighting the need for children with excessive use of counting strategies to engage in opportunities that develop better number sense and arithmetic understanding (Baroody et al., 2009).

An additional, interesting finding in this study was that although sex differences were apparent in both grade one strategy use and grade four achievement, these sex differences were not related, as the raw sex difference in grade four achievement (Table 4) was similar to the partial effect, adjusted for grade one variation in strategy use, mathematical achievement, and non-verbal reasoning (Table 6). This indicates that sex differences in strategy use and mathematics achievement are rooted in different underlying causes (e.g. Casey & Ganley, 2021). Furthermore, the differences for boys and girls in the correlation between DiR and grade four mathematical achievement indicate that the strategies play different roles for boys and girls as predictors. Taken together, these findings outline the importance of always examining for possible sex differences in mathematical cognitive variables, i.e. strategies in arithmetic.

The importance of differentiating between several strategies is apparent in our finding that grade one use of the C-all and DeF strategies correlated equally strongly, but in opposite directions, with mathematics achievement in grade four. The general pattern in the results is that C-all correlated negatively, whereas the other strategies correlated positively with the grade four measurement. As the usage of all strategies sums up to 1, the pattern indicates that high use of C-all and low use of the other three strategies are two sides of the same coin. The results indicate the importance of differentiating not only into counting, direct retrieval, and derived fact strategies, but also, at least with young children, to differentiate counting strategies in at least C-all and C-on as they exhibit opposite trends. It is worth noting that at least early in grade one, automatisation—that is, DiR—is apparently the least important predictor for later achievement, and our results hint that if we had only differentiated in DiR and other strategies (i.e., combined C-all, C-on, and DeF), as done by, for example, Bailey et al. (2014), DiR could have appeared as the strongest predictor because pooling these three categories would have masked important variation.

In this study, children’s strategies for single-digit addition were found to be strong predictors of later achievement. Whether this holds for children’s strategy use in mixed and multi-digit problems is a relevant next step to investigate.

Finally, the study further revealed differential associations between non-verbal reasoning and the strategies under investigation. This suggests that there might be different cognitive mechanisms underlying the acquisition of these strategies. Indeed, various cognitive skills, such as working memory, rapid automatised naming, processing speed, have been associated with different types of mathematical skills (De Smedt, 2022), including various strategies. Future studies should further investigate the mechanisms underlying these differential associations.

Limitations

The results of this study are based on a relatively modest number of children and classes. By adopting a multilevel approach in the statistical analyses, we have accounted for statistical dependences of observations within classes, and we have carefully addressed sex-related variations in the means and variance of predictors as response variables. On this basis, we consider our results statically robust, although a larger sample size potentially would have resulted in more predictors being selected, with an even larger amount of significant explained variation in grade four achievement. Regarding all empirical studies, our conclusions are suggestive until the results and predictions obtained from this analysis have been tested on independent data.

Implications

This study suggests that early strategy use is a unique predictor of later mathematics achievement and can thus be used to detect children at risk early in school. This group of children is often identified via cut-off scores on standardised mathematics achievement tests (e.g. Geary, 2013). Our results add to previous findings that composite measurements of achievement are not sufficient in predicting later achievement and the risk of low performance (Dowker, 2005; Geary, 2004). When children are tested at such an early stage of their formal education trail, information on how they solve single-digit addition problems, and not just how well they solve them, could provide in-depth insight into children’s understanding of number and arithmetic. More importantly, this could provide valuable information on children at risk of developing mathematical difficulties or becoming low performers and, not least, on how to develop targeted teaching to remedy these difficulties.