Patterns of symbolic numerical magnitude processing and working memory as predictors of early mathematics performance

Although the roles of symbolic numerical magnitude processing (SNMP) and working memory (WM) in mathematics performance are well acknowledged, studies examining their joint effects are few. Here, we investigated the profiles of SNMP (1- and 2-digit comparison) and WM (verbal, visual and central executive) among Norwegian first graders (N = 256), and how these predict performance in counting, arithmetic facts and word problem–solving. Using latent class cluster analysis, four groups were identified: (1) weak SNMP (33.6%), (2) strong SNMP (25.8%), (3) weak SNMP and WM (23.4%) and (4) strong WM (17.2%). Group differences in mathematics performance were significant with explained variance ranging from 7 to 16%, even after controlling for relevant demographics and domain-general cognitive skills. Our findings suggest that children may display relative strengths in SNMP and WM, and that they both have a unique, even compensatory role in mathematics performance.


Introduction
Several cross-sectional and longitudinal studies have looked at numerical magnitude processing (NMP) and working memory (WM) as possible domain-specific and domain-general cognitive precursors of children's mathematics performance and development (e.g. Alloway & Alloway, 2010;De Smedt et al., 2013;Friso-van den Bos et al., 2013;Kroesbergen & van Dijk, 2015;Li et al., 2018;Toll et al., 2016). Both NMP and WM have been consistent findings and stronger effects on mathematics performance than studies using non-symbolic format (i.e. dots) (De Smedt et al., 2013;Schneider et al., 2017). The inconsistencies regarding the non-symbolic format might partly be due to methodological issues, such as using many different types of non-symbolic comparison measures, or the measures not being sensitive enough. Alternatively, the non-symbolic magnitude processes measured may simply be less critical in the context of school mathematics (De Smedt et al., 2013). Given that these constraints do not seem to apply when NMP is measured using a symbolic format (De Smedt et al., 2013), we focused only on SNMP in this study.
SNMP predicts mathematics achievement within and across different grades in elementary school (Brankaer et al., 2017;Holloway & Ansari, 2009), also after controlling for age, intellectual ability and speed of number identification (De Smedt et al., 2009). Furthermore, students performing well in SNMP exhibit more effective arithmetic strategy use (i.e. being faster in retrieving facts and using procedural strategies), even when taking into account differences in intellectual ability, digit naming and general mathematics achievement (Vanbinst et al., 2012).
Tasks measuring SNMP aim to tap a person's ability to access the number magnitudes in symbols (Rousselle & Noël, 2007). A vast majority of studies have used 1-digit numbers on symbolic comparison tasks (Brankaer et al., 2017). Some studies show that children solve multi-digit comparison tasks slower than single-digit tasks (Landerl et al., 2009), which may stem from the fact that in such tasks, children process multi-digit numbers differently than single-digit numbers. Two alternative explanations have been proposed; one suggesting that children process the number as a uniform unit (i.e. holistic view) (Reynvoet & Brysbaert, 1999), and another suggesting that children process decade-digit and unit-digit of the number independently (i.e. compositional model) (Nuerk et al., 2001). In support of the compositional model, research has found children to compare compatible number pairs (i.e. when both digits of the number are bigger than in the number to be compared; 25 vs. 68) faster than incompatible number pairs (e.g. 51 vs. 37).

The role of WM in mathematics performance
A significant relation between WM and mathematics performance has been evidenced in several studies (Friso-van den Bos et al., 2013; for a meta-analysis). Previous research has often employed the multicomponent WM model by Baddeley and Hitch (1974), hence referring to the three subcomponents of WM: the two slave systems, phonological loop and visuospatial sketchpad for storing verbal and visuospatial information, respectively, and central executive for coordinating information of the slave systems. This three-component model is still drawing the most attention in educational research, even though a fourth component, episodic buffer, has been included in the model (Baddeley, 2010).
According to the meta-analysis by Friso-van den Bos et al. (2013), all three WM components are linked with children's mathematics performance. This relation, however, was dependent on the type of mathematics test used. General mathematics tests, such as national curriculum tests, which demand more switching between different operations and updating sets of information, yielded stronger correlations with WM than those focusing only on some specific mathematical skills. When looking at the relation between WM and specific mathematical skills, both verbal and visuospatial WM seem to be important predictors of counting (e.g. tasks involving number sequences or linking quantities to number words) at kindergarten age (Preßler et al., 2013). Kyttälä et al. (2019) showed that verbal, but not visuospatial WM, predicted word problem-solving from kindergarten to second grade, whereas Andersson (2008) found both verbal and central executive functions to predict word problem-solving among second to fourth graders. The use of central executive resources in solving single-digit arithmetic problems has also been highlighted, and verbal WM seems to play a role if such problems are solved using counting strategies (DeStefano & LeFevre, 2004, for a review).
However, the effect of WM seems to diminish when other factors are controlled. For example, basic academic skills (e.g. reading and calculation) and fluid intelligence have been found to account for some of the effects of WM on word problem-solving accuracy in early grades (Fung & Swanson, 2017;Zheng et al., 2011). Also, the effect of verbal WM on counting skills diminished when vocabulary, morphology, phonology, intelligence, task orientation and gender were controlled (Koponen et al., 2018).

SNMP and WM as predictors of mathematical learning difficulties
Mathematical learning difficulties (MLD) and dyscalculia (i.e. severe and persistent learning difficulties in mathematics) are another relevant issue in the present context. Given the impact of SNMP and WM on mathematics performance and development, children with MLD or dyscalculia would then be expected to display inferior SNMP and/or WM skills compared to their peers without such difficulties. Indeed, over the years, different theoretical models have been proposed as a cause of MLD (for an overview, see e.g. Siemann & Petermann, 2018), highlighting a deficit in domain-specific skills, namely in NMPeither non-symbolic NMP (i.e. defective number module, Butterworth, 2005) or SNMP (i.e. access deficit, Rousselle & Noël, 2007), or in domain-general skills, such as WM (i.e. cognitive deficit, Geary, 2004;Karagiannakis et al., 2014) or in both (i.e. double deficit, Kroesbergen & van Dijk, 2015;Wolf & Bowers, 1999). Regarding school beginners, as is the case in the current study, a weak performance in either NMP or WM or both might thus point out to a risk for MLD.
Some studies using non-symbolic NMP measures have shown children with dyscalculia to perform poorly in tasks of comparing two magnitudes (Desoete et al., 2012;Landerl et al., 2009;Mazzocco et al., 2011), or matching them (i.e. deciding whether two magnitudes are same or not) (Lafay et al., 2019), thus supporting the "defective number module" (Butterworth, 2005) as a primary cause of dyscalculia. Other studies, however, have not found such effects (e.g. De Smedt & Gilmore, 2011;Lafay et al., 2019). In contrast, more consistent and robust differences between children with or without MLD or dyscalculia (Cañizares et al., 2012;De Smedt et al., 2013;De Smedt & Gilmore, 2011;Desoete et al., 2012;Landerl et al., 2009) have been found in studies using SNMP measures, thus supporting "access deficit" (Rousselle & Noël, 2007) as an alternative explanation for dyscalculia, according to which the core deficit lies in not being able to access the number magnitude in symbols.
In support of the cognitive deficit as a cause for MLD, not only has it been shown that children with higher WM capacity outperform their peers with lower WM capacity in different mathematics tasks (e.g. Preßler et al., 2013), but also that children with varying degrees of MLD display weaker WM skills compared to their peers without MLD (e.g. Geary et al., 2004;Menon, 2016;Passolunghi & Siegel, 2004). In a meta-analysis comparing children with MLD to average-achieving age-matched children on measures of WM, large effects were found for central executive and visuospatial sketchpad (d = 0.95 and d = 0.59, respectively) and medium effects for phonological loop (d = 0.36), in favour of the average-achieving children (David, 2012).

Investigating the roles of (S)NMP and WM simultaneously in mathematics performance
Surprisingly few studies have investigated the roles of NMP or SNMP and WM in mathematics performance, or in relation to MLD, simultaneously. Passolunghi et al. (2014) found WM to be a stronger predictor of mathematics performance than non-symbolic NMP in the beginning of the first grade (6-year-olds), after controlling for intelligence (i.e. verbal and fluid intelligence). Furthermore, NMP lost its significance by the end of the first grade when predicting teacherrated mathematics performance in a similar manner. Note, however, that intelligence turned out to be an even stronger precursor of mathematics performance than either WM or NMP.
Recently, Chan and Wong (2019) tested whether the prediction of visuospatial WM at grade 1 (7-year-olds) on mathematics achievement at Grade 2 was mediated by numerical magnitude representation (i.e. SNMP and computation) and problem representation (i.e. word problems). Both pathways were found to be significant, even after controlling nonverbal intelligence, reading fluency, processing speed, and verbal WM. However, as the direct effect from visuospatial WM to mathematics achievement remained significant as well, the authors concluded that numerical magnitude representation and problem representation failed to fully explain the relation between visuospatial WM and mathematics achievement. Kroesbergen and van Dijk (2015) examined visuospatial WM and NMP (although they used the term "number sense", which was represented by a combined score of nonsymbolic and symbolic comparison, and number line tasks) in relation to arithmetic fluency and word problem-solving among 6-10 years old, also including a subgroup of children with MLD. Using a cut-off point of scoring below the 25th percentile, the participants were first divided into four groups according to their performance in number sense and WM: weakness in number sense, weakness in WM, weakness in both number sense and WM (i.e. double weakness) and without weakness. When comparing these groups on mathematics performance, they found a deficit in either number sense or WM, or both, to be connected with lower performance. Those identified as having double weakness displayed the most inferior performance of all groups, even after controlling for age, IQ and verbal WM, thus supporting the double-deficit hypothesis of MLD.
Partly replicating the study by Kroesbergen and van Dijk (2015), Toll et al. (2016) also investigated the predictions of NMP (or "number sense", in their study) and visuospatial WM from the first year of kindergarten (5-year-olds) to grade one (7-year-olds). Non-symbolic NMP was operationalised in terms of dot comparison tasks, and although conceptualised as "symbolic number sense", which refers to SNMP in some other studies, the authors used counting-based tasks instead of a digit comparison task. The results showed non-symbolic NMP to predict only word problem-solving 2 years later, whereas symbolic number sense predicted both arithmetic facts and word problems. When grouping the participants scoring below the 25th percentiles, weakness in both visual WM and number sense was connected with the lowest performance in arithmetic facts and word problems, even when controlling for fluid intelligence, thus replicating the results of Kroesbergen and van Dijk (2015).

Current study
By expanding on previous captivating findings on the roles of SNMP and WM in mathematics performance, we will in this study first explore the patterning of SNMP and WM through children's performance profiles, and then link these profiles with several aspects of children's mathematics performance (i.e. counting, arithmetic facts and word problem-solving). In contrast to previous studies using a priori defined cut-off points in performance (i.e. below or above the 25th percentile) to classify children into different groups, we examined the relative strength in children's SNPM and WM skills by extracting empirical profiles through latent class clustering. Thus, instead of forming fixed categories of specific combinations (e.g. high/low, high/high), we relied on more naturally occurring data-driven patterns representing groups of children similar to each other, but different from the others. In a sense, then, this approach might result in a somewhat less artificial account of the patterning of SNMP and WM among children. Prior studies have mainly used visuospatial WM as an indicator of WM, whereas we included three WM components-verbal, visual and central executive-in the classification, thus seeking to have a more comprehensive empirical representation of individual differences in WM. As to the outcome measures, we included verbal counting skills in addition to arithmetic facts (addition and subtraction) and word problem-solving used in previous studies (Kroesbergen & van Dijk, 2015;Toll et al., 2016), as it is one of the core skills developing in this age group, and a significant predictor of later mathematics performance (Aunola et al., 2004). Finally, we controlled for several demographics (i.e. age, gender, parental educational level, status of second language learner) and domain-general cognitive skills (i.e. fluid intelligence, rapid automatized naming and word comprehension) shown to be associated with children's mathematics performance (e.g. Alloway & Alloway, 2010;Koponen et al., 2018;Purpura & Ganley, 2014), to capture the unique contribution of children's skill profiles on mathematics performance.

Participants
The study is part of a project, which traces individual differences in children's development of mathematical skills. The participants were 256 Norwegian first graders (M age = 6 y. 9 m., SD = 3.33 m., 46.1% girls), from 12 classrooms in five public schools in the capital region of Norway (see demographics of participants in Table 1). Most of the schools were located nearby the data collection site, to avoid long transportation times for the children during the school day. Regarding the family socio-economic status, we used parents' highest educational level as a proxy, reported by parents in a questionnaire. These data indicate that our sample of children came mainly from families having middle or high levels of education. As reported by their parents, 10.6% of the children had Norwegian as their second language.
Before the data collection, an ethical approval was applied for and granted by the Norwegian Centre for Research Data, and consents for the participation from children's parents and teachers were collected accordingly.

Symbolic numerical magnitude processing
The SYmbolic Magnitude Processing (SYMP) Test (Brankaer et al., 2017) was used to measure children's SNMP. The SYMP consists of two subtests to be used with elementary school-aged children: a subtest with digits between 1 and 9 (SYMP 1-digit) and a subtest with digits ranging from 11 to 99 (SYMP 2-digit). Each subtest comprises 60 digit pairs, presented in four columns of 15 pairs. During the test, the child is asked to cross out the larger of the two digits. For both subtests, the children are given 30 s to solve as many items as possible. To ensure that the child understands the task, four practice trials are included in both subtests (for more details on item description and test validation, see Brankaer et al., 2017). One point is given for a correct answer and zero for an incorrect answer, the sum score for each subtest thus being the number of the items correctly solved in 30 s.

Working memory
WM was measured using three subtests of the standardised Wechsler Intelligence Scale for Children ages 6 through 16 (V Norwegian version) (Wechsler, 2017) to tap the different components of WM; Forwards Digit Span for the phonological loop, Backwards Digit Span for the central executive and Picture Span for the visual WM. In both Digit Span subtests, there are nine blocks in the test, each consisting of two items. The child needs to recall orally given digits (2-8) either forwards or backwards. The test is stopped if both items in the block are incorrect. One point is given for a correct answer and zero for an incorrect answer. In the Picture Span, the child is shown 2-8 pictures (such as a ball, flower and car) in an increasing number of pictures, for 5 s in the stimulus book. After this, the child needs to point these pictures in the same order from another collection of pictures. There are 25 items in the test (with 2 practice items). One point is given if the child is able to recall the correct pictures and two points if the order of them is also correct, thus the total maximum score being 46 points. The test is stopped after three consecutive errors, as instructed in the handbook.

Mathematics outcome measures
Three different mathematical skills were measured: verbal counting and arithmetic facts (addition and subtraction), as well as word problem-solving. To measure children's verbal counting skills, we used the Number Sequences task, which is a subtest from a standardised LukiMat Mathematics test battery for first graders (Salminen & Koponen, 2011). This test measures children's verbal counting skills as number sequences (counting by 1 s, 2 s, 5 s and 10 s forwards and backwards) within the number range 1-100. There are 29 items in the task (13 counting forwards and 19 counting backwards), and the child responds orally to all tasks (i.e. reciting number sequences). One point is given for a correct answer and zero for an incorrect answer. Addition and subtraction facts were measured using a standardised test Regnefaktaprøven [Test of arithmetic facts] (Klausen & Reikerås, 2016), designed for grades 2-10. We chose this measure to be used with first graders as the data collection was conducted at the end of the first grade, we needed the same measure to be used in different grades throughout the project years, and it had been developed and standardised in Norway. The test includes 45 items per page, either addition or subtraction problems, and 2 min are given for each subtest to solve as many problems as possible. One point is given for a correct answer and zero for an incorrect answer, the sum score for each subtest thus being the number of problems correctly solved in 2 min.
Word problem-solving was measured using the standardised test WISC-V: Regning [Arithmetic] (Wechsler, 2017), to be used with 6-16-year-olds. The test comprises 34 items. An arithmetic word problem is read aloud to the child one at a time. For the first five items, visual materials (i.e. pictures) are provided to support solving the problem, and 30 s are given to solve each problem. The test is stopped after three consecutive errors, as guided in the manual. One point is given for a correct answer and zero for an incorrect answer, or if the problem was not solved in time.

Covariates
To take into account additional relevant individual differences, a set of cognitive skills shown to affect mathematics performance was used as covariates in the analyses. Fluid intelligence was measured using a standardised test of Raven's Coloured Progressive Matrices (Raven et al., 1990), to be used with 5-11-year-old children. The test consists of 36 items in 3 sets (A, Ab, B), with 12 items per set. The child needs to find a piece out of six given possibilities that fits into the picture pattern. One point is given for a correct answer and zero for an incorrect answer. As the first two items in set A are practice items, the maximum total score thus is 34 points. Rapid automatized naming was measured using a subtest, Hurtig Benevning [Rapid naming] from the standardised test battery Clinical Evaluation of Language Fundamentals -Fourth Edition (CELF-4) (Semel et al., 2003) designed for 5-12-year-old children. This subtest consisted of two tasks, naming of colours and figures (i.e. non-alphanumeric RAN). The child needs to name as fast as possible 36 colours (i.e. yellow, red, green and blue) or figures (i.e. circle, square, triangle and star) in a given order presented on a sheet by six rows of six items in each. For both tasks, a composite score was calculated: the number of total correct responses was divided by total naming time, meaning that the higher the value, the more items the child could name correctly per second, thus displaying better RAN. Language skills were measured using Ordforståelse [Word comprehension], a subtest from the standardised WISC-V (Wechsler, 2017), where the child needs to name pictures and define words. There are 29 items in total, and the task is stopped after three consecutive errors. For picture items (1-4), either 0 or 1 point is given, and for the rest of the items (5-29) 0, 1 or 2 points are given, based on the correctness of the definition the child gives. Therefore, the total maximum score is 54 points.
To control for demographic factors, we included children's age (date of birth), gender, status of second language learner (i.e. "Norwegian is a child's second language" -yes/no) and mother's and father's highest educational level (1 = comprehensive school, 2 = upper secondary school, 3 = bachelor, 4 = master, 5 = PhD), as reported by the parents in a questionnaire.

Procedure
When needed, the test instructions were translated first into English and then into Norwegian, and the item level questions were also back-translated to ensure the correspondence and quality of the translations, either by the research team members fluent in both languages or using authorized translators. Children's mathematical and other cognitive skills were measured as part of the larger data collection in the project. Apart from Raven, which was administered in the classrooms, all tests were conducted when the children participated in one 4-h session at the data collection site, during which they were tested individually (WISC-V Arithmetic, LukiMat, WISC-V Digit and Picture Spans, RAN, Word comprehension) and in small groups (SYMP, Regnefaktaprøven) by trained research assistants. The research assistants were bachelor and master students studying (special) education. Small breaks and a lunch break were given during the testing session.
The children in Norway start schooling in August the year they turn the age of six, and teaching follows the guidelines of the national core curriculum (The Norwegian Directorate for Education & Training, 2013). Accordingly, children receive around 3 h of mathematics instruction per week, and the main foci in mathematics learning are on Numbers on number range up to 20 (e.g. counting, addition and subtraction, comparing numbers), Geometry (e.g. recognise and describe characteristics of simple two-and three-dimensional figures), Measurement (e.g. length and money) and Statistics (e.g. illustrate data using tally marks and bar graphs). The data collection took place between mid-March and the beginning of May 2019, meaning that the children had received 7-9 months of formal mathematics instruction.
The tests were scored and coded by three trained research assistants, and around 14% of the data (i.e. the data of three randomly chosen participants from each classroom for each test) were double coded by the first author. The correlations of sum scores in each test ranged between r = 0.95 and 1.00 after double coding. When needed, the original data (paper sheets) were checked regarding the non-matching sum scores, and the final data matrices were corrected at item level accordingly.

Data analysis
The grouping on the two measures of SNMP and three measures of WM was conducted using latent class cluster analysis (LCCA) as implemented in Latent GOLD 5.1 software (Vermunt & Magidson, 2016). The measures were first discretized to balance the distributions, and then used as clustering variables. Information criteria (e.g. AIC, AIC3, BIC, CAIC and SABIC, as implemented in Latent GOLD) were used for evaluating the appropriateness of the models, and a conditional bootstrap procedure was performed to compare sets of competing models. Based on the final model, each participant was classified into one group according to the classification probabilities, and the resulting variable was then used as a grouping variable for further analyses. Distribution of gender in different groups was examined using configural frequency analysis (CFA; Eye et al., 1996). Group differences in mathematics outcome measures were examined by a series of ANOVAs, which were then extended to ANCOVAs with measures of nonverbal intelligence, rapid automatized naming and word comprehension as well as children's age, gender, status of second language learner and mother's and father's educational level as covariates. Both sets of analyses were conducted with statistics software Jamovi v. 1.2.5.0 (The jamovi project, 2020).

Results
Descriptive statistics and correlations between all measures are given in the Appendix, Table A1 and A2, respectively. All measures showed acceptable reliability values in terms of McDonald's omega (ω = 0.73-0.94). Small statistically significant correlations between SNMP and WM measures were found only between 1-digit SYMP and digit backwards (r = 0.26), 1-digit SYMP and picture span (r = 0.21) and 2-digit SYMP and digit backwards (r = 0.15), thus reflecting the relative independence of the two constructs. Measures of mathematics performance correlated strongly with each other (r = 0.53-0.79), relatively weakly with digit forwards (r = 0.15-0.29) and picture span (r = 0.15-0.26) and moderately with digit backwards (r = -0.38-0.45). Both SNMP measures were also moderately associated with the measures of mathematics performance (r = 0.41-0.50 for 1-digit SYMP and r = 0.34-0.54 for 2-digit SYMP).

Grouping
An examination of the different information criteria from a series of LCCAs showed the lowest values (i.e. indicating best fit) for solutions with two to six classes, but them to level off between two to four classes across all indices (Table 2). This is rather expected given the different characteristics of the information criteria (Morgan, 2015). After narrowing down the most likely solutions, we used conditional bootstrapping to compare consecutive groups starting from the model with two groups. These analyses showed that the three-group solution was superior compared to the two-group solution (-2LL Diff = 30.58, p < 0.001), and the four-group solution superior to the three-group solution (-2LL Diff = 21.76, p < 0.01), but the five-group solution did not seem to significantly add to the description of the data (-2LL Diff = 14.278, p = 0.15). Therefore, we chose the model with four groups for further analyses.
The clustering solution explained variance significantly in all criterion variables, ranging from 25% (Backwards Digit Span) to 55% (SYMP 2-digit). In relative terms, groups one (33.6% of participants) and three (23.4%) had the lowest scores in SNMP, but compared to group one, members of group three scored significantly lower on WM tests, particularly on Forwards Digit Span and Backwards Digit Span. Group two (25.8%) had the highest scores on both tests of SNMP, but scores close to sample mean in WM. Group four (17.2%), in contrast, performed relatively well on all measures, especially on WM and most distinctively on visual WM. Based on absolute and standardised means, the groups from one to four were labelled according to the most pronounced characteristics as (1) weak SNMP, (2) strong SNMP, (3) weak SNMP and WM, and (4) strong WM. Group differences on measures of SNMP and WM are reported in Table 3, and profiles are illustrated in Fig. 1.
Configural frequency analysis (with Lehmachers test with continuity correction) suggested some gender differences in the distribution of girls and boys into the different groups. The overall variation was significant, χ 2 (3) = 28.72, p < 0.001, and the analyses flagged for two types (i.e. frequency higher than expected by chance alone) and antitypes (i.e. frequency lower than expected by chance alone). As can be inferred from Table 4, girls were overrepresented in the weak SNMP group (type) and underrepresented in the strong SNMP group (antitype), while boys, conversely, were overrepresented in the strong SNMP group (type) and underrepresented in the weak SNMP group (antitype).

Mathematics performance by groups
Next, group differences on mathematics outcome measures were examined. Again, the grouping explained individual differences in the target variables significantly, with explained variance ranging from 21% (word problem-solving) to 30% (counting). Groups strong SNMP and strong WM did not differ from each other in any of the measures, and the weak SNMP group and the weak SNMP and WM group did not differ from each other in addition and subtraction facts. Other than that, pairwise group differences were all significant (see Table 5). The strong SNMP and strong WM groups  received the highest scores on all measures, followed by both the weak SNMP group and the weak SNMP and WM group on addition and subtraction facts, and first by the weak SNMP and then by the weak SNMP and WM group on both counting and word problem-solving. In order to control for the effects of factors that likely contribute to children's mathematics performance (i.e. domain-general skills, language and some key demographics), we ran a series of ANCOVAs. Group differences remained significant (Table 6), yet the explained variance somewhat reduced, ranging now from 7% (word problem-solving) to 16% (addition facts). Gender (F = 6.39, p = 0.012, η 2 p = 0.03), fluid intelligence (F = 5.27, p = 0.023, η 2 p = 0.03) and RAN Colours (F = 6.32, p = 0.013, η 2 p = 0.03) had a significant effect on counting; word comprehension (F = 25.30, p < 0.001, η 2 p = 0.12) on word problem-solving; and RAN Figures on both addition (F = 4.76, p = 0.030, η 2 p = 0.02) and subtraction facts (F = 5.77, p = 0.017, η 2 p = 0.03) (see Table 7).
An inspection of adjusted means showed the covariates to moderate group differences so that groups weak SNMP and strong WM did not differ from each other anymore in counting skills. In word problem-solving, only the strong SNMP and the weak SNMP and WM groups differed from each other, and in both addition and subtraction facts, only the   weak SNMP and strong SNMP groups, and the strong SNMP and the weak SNMP and WM groups, differed from each other, respectively. These results, however, need to be interpreted with caution, due to considerable missing data in some of the covariates. 1

Discussion
The aim of this study was to investigate the patterning of symbolic numerical magnitude processing (SNMP) and working memory (WM) by identifying different performance profiles among first graders, and, furthermore, how those profiles predict children's performance in mathematics. The novelty of this study lays in (i) examining the two constructs, SNMP and WM, simultaneously, (ii) including three components of WM (verbal, visual and central executive), (iii) extracting empirical performance profiles of SNMP and WM by means of latent class clustering and (iv) predicting mathematics performance while controlling for differences in key demographics and cognitive factors. The main findings, which are elaborated in the following, were as follows: (i) four different profiles of SNMP and WM performance were identified among first graders with children displaying relative strengths in SNMP and WM, and (ii) SNMP and WM were shown to have a unique, even compensatory role in children's mathematics performance, yet their individual contribution might depend on the types of mathematical skill in question (e.g. arithmetic facts vs. word problem-solving). Four groups of children with different profiles in their performance on SNMP and WM were identified. Congruent with previous studies (Kroesbergen & van Dijk, 2015;Toll et al., 2016), we found a group with weak SNMP skills (33.6%) and a group with weak SNMP and WM (23.4%). We did not find a group with weak WM skills only, which is different from previous studies using a priori defined cut-off scores (Kroesbergen & van Dijk, 2015;Toll et al., 2016). Note, however, that even then, Toll et al. (2016) reported only relatively few children (5-year-olds) in the weak WM group compared to other groups. Furthermore, in contrast to previous studies (Kroesbergen & van Dijk, 2015;Toll et al., 2016), which treated children having no weakness in either NMP or WM as one group, our clustering revealed two separate groups, one characterized by strong SNMP (25.8%) and another group by strong WM, especially visual WM (17.2%). Moreover, girls were slightly overrepresented in the weak SNMP group and boys in the strong SNMP group, while in Toll et al.'s study (2016), no gender differences were detected. The type of tasks used for measuring SNMP in each study might partly explain this difference.
When comparing our results to previous research, two differing aspects need to be noted. Firstly, as mentioned before, prior studies (Kroesbergen & van Dijk, 2015;Toll et al., 2016) used a priori defined cut-off points to classify children into different groups and treated all those with performance above the 25th percentile on both NMP and WM as one group (i.e. without weakness). The empirically derived classification in the present study may thus provide a different, less artificial, in a sense, account of the patterning of SNMP and WM among children, which, consequently, also influences the predictions on mathematics performance. Secondly, previous studies used only visuospatial WM in the grouping as an indicator of WM, whereas we included three WM components in our study. As the correlations between the WM components were not particularly strong (r < 0.30), thus reflecting some degree of independence, those were treated as separate indicators in the clustering. The patterning of the different components of WM within each group was nevertheless rather similar, with the exceptions of the weak SNMP and WM group displaying relatively low verbal WM and central executive in relation to other groups, and the strong WM skills group exhibiting the most distinctive level of visual WM. To sum, the use of latent class clustering enabled us to identify four groups with different performance profiles on SNMP and WM among first graders; two groups similar to previous researchthat is the weak SNMP group and the weak SNMP and WM group-and two additional groups characterized by either strong SNMP or strong WM.
When group differences in mathematics outcome measures (i.e. verbal counting skills, word problem-solving, addition and subtraction facts) were examined, the grouping explained variance ranging from 21 to 30%. In a sense, strong SNMP or WM skills seem to be compensatory in relation to mathematics performance, as those groups (i.e. strong SNMP and strong WM) outperformed the others, but did not differ from each other on mathematics performance. Previous research has also highlighted that good SNMP and WM skills are often conjoined with good mathematics performance. For example, SNMP has been connected with arithmetic fluency (Vanbinst et al., 2012) and WM with word problem-solving (Fung & Swanson, 2017). As in previous research (Kroesbergen & van Dijk, 2015;Toll et al., 2016), in our study, weakness in either SNMP or both in SNMP and WM was related to lower mathematics performance. The weak SNMP group and the weak SNMP and WM group were inferior in addition and subtraction facts compared to groups with better SNMP skills. However, there was no difference between these two groups, meaning that already weak SNMP skills alone were a sufficient condition for lower performance in arithmetic facts. The results regarding counting and word problems were similar to those on arithmetic facts although, here, the weak SNMP group outperformed the weak SNMP and WM group, suggesting that in tasks of counting and word problem-solving, WM skills may compensate for weak SNMP skills. This is in line with previous research showing a connection between weak WM capacity and inferior performance in counting (Preßler et al., 2013).
The inclusion of covariates reduced the explained variance to a range of 7 to 16%. Even though the covariates individually had a relatively small impact on mathematics outcome measures, they did moderate group differences especially on arithmetic facts and word problem-solving. The strongest effect (η 2 p = 0.12) was found from word comprehension on word problems, which was expected, as language skills (both receptive and expressive) are required in word problem-solving tasks. That is, the child needs to both understand the meaning of the (mathematical) words given verbally to them and express the answer for the problem orally. Hence, in word problem-solving, the only group difference was now found between the strong SNMP group and the weak SNMP and WM group, favouring the former. In contrast to previous research (Kroesbergen & van Dijk, 2015), where differences in word comprehension skills were not taken into account, the group having weakness only in SNMP did not perform worse in word problem-solving than the groups without such weakness. In arithmetic facts, the results were similar to those without covariates, except that no differences were found between the strong WM group and the groups having at least one weakness. This further supports the inference that individual differences in SNMP contribute to arithmetic facts. In counting, the group differences remained the same, except that the weak SNMP and strong WM groups did not differ anymore in their performance.
In addition to contributing to research on the roles of a domain-specific (i.e. SNMP) and a domain-general (i.e. WM) factors as precursors of early mathematics performance, the present findings also have implications for the discussion on the extent to which underlying factors could or should be emphasized in early screening (e.g. for identifying children at risk for mathematical learning difficulties). Based on our results, assessing children's SNMP skills already in the first grade might be sufficient for such screening. Using a valid and reliable measure of SNMP, such as SYMP (Brankaer et al., 2017), which can be administered effortlessly by classroom teachers repeatedly over the school year, might serve as a good screening tool for this purpose. This suggestion is supported also by previous studies showing children with mathematical difficulties to perform lower in SNMP tasks compared to their peers without such difficulties (e.g. Brankaer et al., 2017;Landerl et al., 2009).

Limitations and future directions
Due to being cross-sectional, the findings of our study on the patterning of SNMP and WM and their predictions on early mathematics performance can only be considered correlational, not causal. However, we believe they set a good foundation for further hypotheses that could be tested within a longitudinal design, both in terms of the stability and change in the given patterning (i.e. qualitative changes in profiles or shifts in group membership) and the developmental trajectories or long-term predictions over time. Our measurements were somewhat limited in the sense that single measures were used to represent different constructs, which was mostly due to practical reasons. The children already completed numerous tasks and tests, including the ones we used as covariates in the study. Future studies would nevertheless benefit from including more than one measure for different constructs to enhance validity and to better take into account unreliability and sources of variation. Another important limitation was the missing information on key demographics, due to which the number of cases in the analyses of covariance was reduced. Because of this, the effects of covariates and how they moderated differences in mathematics outcome measures must be interpreted with caution. Also, the fact that our sample of children was somewhat selective in the sense that they came mainly from families of middle or high educational level must be taken into consideration, even though it is not clear whether this, in fact, would have any influence on the relations, and hence also predictions, between the target constructs. To better account for the role of familial background, a more heterogeneous sample would be needed, and perhaps also a better proxy for SES (e.g. yearly income of household).

Conclusions
Our empirical approach enabled us to identify four groups of children with different patterns of SNMP and WM performance. Two groups resembled those found in previous studies (the weak SNMP group and the weak SNMP and WM group), while two were somewhat unique (the strong SNMP group and the strong WM group), which potentially provides us with new insights into the complexity of factors underlying the development of children's mathematical skills. Our findings suggest that WM and particularly SNMP may play an important role in mathematics performance, even after taking into account various key demographics and cognitive factors, and that their contribution might be partly compensatory. As an implication for practice, screening SNMP skills already in the first grade might be an effective and feasible approach to identify children with potential problems in mathematics learning and at risk for mathematical learning difficulties later on. Note. ω = McDonald's omega for reliability, ***p < .001, **p < .01, *p < .05