1 Introduction

Specific difficulties in dealing with numbers are often hard to grasp for teachers and parents. Children have difficulties in understanding basic arithmetic concepts and use their fingers to solve basic calculations. However, many of them perform well in other school subjects. Such specific mathematical learning difficulties (MLD) are not a uniform phenomenon, but rather appear in many different forms (Von Aster and Lorenz 2013). If children are not encouraged consequently at an early age, this may have a negative impact on a child’s future in and after school in many different areas. Standard classroom teaching is not enough for these children, and their difficulties remain even if they practice arithmetic tasks a lot. Even though the effects of untreated MLD are equally severe as those of other related learning impairments, there is much less research on MLD than on dyslexia or ADHD (Butterworth and Laurillard 2010). To gain a better understanding of MLD and to derive conclusions to support MLD children in school practice, research in mathematics education provides significant findings on diagnosis, prevention and support of MLD (e.g., Benz et al. 2017; Dornheim 2008; Fuchs et al. 2020; Graß and Krammer 2018; Häsel-Weide 2016; Schindler et al. 2020; Schipper 2002). Results emphasize the importance of effective mathematics teaching with natural differentiation for inclusive classrooms as well as of substantial learning environments for the development of a sustainable understanding of numbers and numerical operations in primary school (Gaidoschik et al. 2021, p. 7 f.; Scherer et al. 2017).

At the same time, research in cognitive neuroscience and developmental psychology highlight the importance of more domain-general cognitive resources besides the importance of math-specific approaches and prior knowledge. The role of working memory (WM), in particular, has already been explicitly acknowledged in several theories of mathematical problem solving (Verschaffel et al. 2020, p. 8) and there is a wide consensus that mathematical ability does not only depend on skills in the mathematical domain, but that it is also related to other domain-general basic cognitive skills like intelligence, working memory, attention and language (Benz et al. 2017, p. 95; Schneider et al. 2021, p. 64 ff.). Working memory is generally responsible for holding information actively in mind while processing it and connecting new input to prior knowledge (Baddeley 1983). When learning mathematics or solving mathematical problems, several pieces of information like facts, tasks, questions, rules, patterns, numerical values, solution approaches or interim results must be held in mind at once, and linked together to understand mathematical contexts or to perform arithmetic or geometric operations (Passolunghi and Costa 2019). Likewise, WM capacity is needed when dealing with mathematical word problems (e.g., Fuchs et al. 2020). A child needs to make sense of the meaning of the text, memorize the key information, model the problem mathematically and keep all these pieces of information in mind to solve the problem. Since the capacity of human working memory—in contrast to the capacity of sensory memory and long-term memory—is clearly limited, it is also referred to as a “bottleneck for learning in classroom activities” (see Gathercole et al. 2006, p. 278). For children whose WM capacity is exhausted for a moment, their mathematical learning or problem solving process becomes harder, slower and more prone to error, even if the child does normally not have limitations in mathematical understanding (Passolunghi and Costa 2019).

Until recently, it was assumed that WM cannot be improved by training (e.g., Häsel-Weide 2016, p. 29). It is thus hardly surprising that WM has been given little attention in mathematics education. The effectiveness of WM training is still a subject of debate but increasing evidence over the past couple of years allows rejecting this assumption. Several studies reveal that WM can indeed be improved by training. A study in children with learning disabilities (Peijnenborgh et al. 2016) and two recent intervention studies with typically developing primary school children show that digital adaptive training can lead to increased visual-spatial WM, and this increased WM transfers to better achievement in arithmetic (Judd and Klingberg 2021) and in geometry over time (Berger et al. 2020). Likewise, regular mental arithmetic exercises with visual representations of numbers can cause an increase in visual-spatial WM capacity (Wang et al. 2019). It can be assumed that both WM and mathematical learning activity can influence one another (e.g., Benz et al. 2017, p. 95 ff.). This perspective on being able to promote WM skills in the classroom allows a reconsideration of WM within the field of mathematics education.

One general recommendation derived from research is that the MLD treatment should be adapted to individual children to increase the effectiveness of the intervention (e.g., DGKJP 2018a; Kaufmann and von Aster 2012). While claiming causality is difficult in the case of individual MLD, one can identify risk factors or predictors for MLD (Schipper 2002). The evidence of WM being a strong and also malleable predictor of MLD (see Berger et al. 2020; Judd and Klingberg 2021) supports Schneider et al.’s argument (2021, p. 164) that WM should be included in the early diagnosis of MLD. In order to be able to customize MLD treatment after diagnosing WM difficulties, more detailed knowledge on interrelations between components of WM and specific mathematical skills is needed. The aim of the present paper is to contribute to this by using data from digital tests of accuracy and response times in all three WM components as well as tests on number and arithmetic processing in the first year of primary school.

Existing evidence on WM components as predictors for MLD is summarized in a meta-analysis on diagnosis and treatment of MLD (DGKJP 2018aFootnote 1; Haberstroh and Schulte-Körne 2019). With reference to Ruwisch and Lorenz (2018, p. 43), the present paper takes this meta-analysis by Haberstroh and colleagues as a prompt to choose similar research methods and at the same time discuss the results from the perspective of mathematics education and school practice. However, before we go into more detail about our specific research questions derived from the aforementioned meta-analysis, we clarify how MLD is conceptualized in this study, document the current state of research on the association between WM and MLD and focus the specific research gap to be addressed in the present paper.

2 Background: Mathematical Learning Difficulties and Working Memory

2.1 Conceptualizing Mathematical Learning Difficulties

Mathematical ability has to be considered in a multidimensional way and requires a variety of skills such as knowledge of number facts, arithmetic procedures and concepts (Graß and Krammer 2018) as well as the ability to deal with multiple representations (Gagatsis and Shiakalli 2004). Specific difficulties in number and arithmetic processing are so widespread that they affect about 2 to 8% of all children (DGKJP 2018a). However, none of the existing terms related to these difficulties are used fully consistently nor are their definitions or inclusion criteria. Terms like ‘dyscalculia’ or ‘Rechenstörung’ (in German) are typically used in clinical and psychological approaches (Gaidoschik et al. 2021). In mathematics education, paraphrases like ‘specific mathematical learning difficulties’ are preferred to express that causes are not exclusively attributable to individual factors and that it is a multifaceted phenomenon with multiple causes (Gaidoschik et al. 2021, p. 4). Furthermore, in the latest version of the ICD-11 classification the term ‘dyscalculia’ is no longer used for young children (WHO 2022, code MB4B.5). The official clinical term has recently become “developmental learning disorder with impairment in mathematics” (WHO 2022, code 6A03.2). According to the World Health Organization, this developmental learning disorder “is characterized by significant and persistent difficulties in learning academic skills related to mathematics or arithmetic, such as number sense, memorization of number facts, accurate calculation, fluent calculation, and accurate mathematic reasoning” (WHO 2022, code 6A03.2). Respectively, the diagnosis requires the exclusion of more severe intellectual or neurological disorders, vision or hearing impairments and a lack of language skills in the teaching language. Besides comparing mathematical performance to the child’s age, the ICD-11 requires the application of an IQ discrepancy criterion by the condition of a minimum difference between mathematics achievement and the respective child’s IQ. In contrast, scientists in mathematics education suggest not to use the IQ discrepancy criterion (Gaidoschik et al. 2021, p. 9) and the current discussion supports this suggestion with several scientific and practical arguments against the application of the IQ discrepancy criterion (e.g., Fischbach et al. 2013, p. 66; Maehler 2021, p. 219). Unaffected by this discussion, most studies do use a general IQ cut-off to separate between MLD and more severe disorders (e.g., of intellectual development). The purpose of this common approach is to derive particular conclusions for mathematics beyond comorbidities and low intellectual functioning. On the other hand, for an applied perspective in inclusive classrooms the cut-offs are to be considered as questionable since a child’s need for mathematical support remains regardless of additional comorbidities.

In the domain of mathematics, several skills on number and arithmetic processing are affected in children with MLD (WHO 2022). Number sense as a measure for the development of the cardinal number concept is a strong predictor of MLD (Dornheim 2008; Jordan et al. 2010; Schneider et al. 2021, p. 74) and a recent eye tracking study identifies more counting strategies and longer response times in number sense tasks in children with MLD (Schindler et al. 2020). Basic arithmetic processing is one of the most fundamental skills to be learned in primary school mathematics and later mathematics instruction is constantly building on it.

2.2 State of Research on the Association Between Working Memory and Mathematics

Many studies in the field attest low WM to be a strong predictor of MLD (e.g., De Weerdt et al. 2013; Schuchardt et al. 2008; Szucs et al. 2013) and meta-analyses confirm these findings (Friso-van den Bos et al. 2013; Peng and Fuchs 2016). However, how exactly WM and mathematical ability are interrelated still lacks a conclusive answer (Viesel-Nordmeyer et al. 2020). Answering this question is challenging because cognitive profiles differ between children. The extent to which WM abilities are needed also depends on the mathematical problem, as well as the individual developmental stage and learning level of a child. The association between mathematical learning and WM evolves over time (Menon 2016). One reason for this is that the procedures and strategies used vary across different learning stages and mathematical topics, and that they require different levels of WM involvement. Geary (1993) points to a relationship with long-term memory: low WM capacities hinder children from correctly retrieving number facts from long-term memory and therefore trigger higher computational error rates. Difficulties in retrieving number facts from long-term memory may result, however, in higher WM load. If WM fails or becomes overwhelmed, crucial information gets lost and learning becomes more difficult (Gathercole et al. 2016).

2.3 Working Memory and Its Components

The most referred-to WM model is developed by Baddeley and Hitch (1974). WM is a brain system “needed for holding and manipulating information and for transferring it to the more permanent long-term memory system” (Baddeley 1983, p. 74). In this model, WM is divided into three components: the central executive and its two subsystems, the phonological loop and the visual-spatial sketchpad. The central executive system forms the core of the model and serves as a supervisory system of limited capacity. It is responsible for the coordination of information provided by the phonological loop and the visual-spatial sketchpad: first, the phonological loop represents the verbal WM and covers the processing of verbal stimuli. Second, the visual-spatial sketchpad representing the visual WM is responsible for coping with visually presented information (Baddeley 1983). This multicomponent approach has been referred to in many studies which focus on the link between single components of the WM system and mathematical difficulties (for an overview see Dornheim 2008). The central executive system requires a high degree of cognitive flexibility in handing cues from different inputs. Tasks designed to measure the capacity of the central executive system are more complex than tasks designed to measure the capacity of the verbal or visual-spatial WM components. While evidence points to difficulties of the central executive system in children with MLD (see e.g., Schuchardt et al. 2008), findings for the verbal and visual subsystems are heterogeneous. Allen et al. (2020) claim that this may be the case because different components of WM are involved in different domains of mathematics. In general, the differences in WM capacity between children with difficulties in learning mathematics and their typically developing peers are larger for visual-spatial WM than for verbal WM. The studies included in the aforementioned meta-analysis by Haberstroh and Schulte-Körne (2019) show on average a significant medium or large effect size for the central executive system (Hedges’ g = 0.65), a significant large mean effect size (g = 0.84) for the visual-spatial WM and a smaller, but still significant effect size (g = 0.37) for the verbal WM (DGKJP 2018a). Spatial abilities that are linked to visual-spatial WM explain roughly one fifth of the variance in arithmetical ability in the age group of primary school children (Graß and Krammer 2018). Considering the many visual representations and spatial patterns and structures in mathematics, it is plausible that visual-spatial WM capacity is essential for dealing with visually presented numbers and further spatially presented mathematical information (Toll et al. 2016). On the other hand, within the age group of 7‑ to 8‑year-old children, Allen et al. (2020) found a higher correlation to verbal WM than to visual-spatial WM.

2.4 Accuracy and Response Times Measures

When measuring differences between children with MLD and their typically developing peers, there are two relevant measures in quantitative research in the field: Accuracy (or error rates) and response times. The meta-analysis by Haberstroh and colleagues (DGKJP 2018a; Haberstroh and Schulte-Körne 2019) lists these two outcome dimensions separately in a large table for various test measures. For example, in tests on numerical processing, basic arithmetic operations or word problems, children with MLD make significantly more mistakes and they also need more time to solve the tasks. Likewise, children with MLD make more mistakes in all tests on WM. According to Ranger and Kuhn (2012), accounting for response times is increasingly important. Gordon et al. (2020) provide evidence for 7‑ to 8‑year-old children that response times in working memory tests are predictive for performance. Their results suggest that deficiencies in WM not only channel through lower accuracy measures (or higher error rates, respectively) but also through longer response times in the same tests. Due to the limited capacity of human WM, the major challenge in WM tasks is to temporarily keep several pieces of information in mind for a short period of time and to relate them to each other. A longer time period in turn increases the probability of losing task-critical information and thus making additional mistakes (Cornoldi and Giofrè 2014). Hence, fast processing in WM is crucial to keep information active in WM (Gordon et al. 2020). For arithmetic learning, this means that if numbers, interim results and operations are processed faster, the probability of losing information during the calculation process is lower. The same task could be more challenging for a child who needs more time for the answer because this child has to hold the same task-critical information (e.g., numbers) actively in mind for a longer time.

Response time in WM tasks needs to be distinguished from response time in processing speed tasks. Processing speed is a related but distinct concept that addresses mental processing at a very low cognitive level (Fry and Hale 2000). Processing speed measures the time a person needs from being presented a stimulus over accessing this incoming information to responding to this information (ibid.). For example, processing speed in this basic sense is the time between the moment when three cards with single digit numbers are presented and the moment when the child taps the card with the largest number (when assuming that the child has developed an ordinary understanding of numbers already). Importantly, processing speed measures comprise only very basic tasks between stimulus perception and response. Hence, processing speed cannot be a measure for number fact retrieval because the task can be considered simple enough only when it is assumed that all children have memorized the numerical facts. Processing speed is thus neither a measure of the ability to recall number facts from long-term memory nor is it relevant for solving mathematical problems.

Despite this importance of the time dimension in information processing in WM, research on MLD predictors still lacks studies with this focus (DGKJP 2018a, p. 15). The only study included in the meta-analysis by the DGKJP was conducted by De Weerdt et al. (2013) and reported response times for their sample of 112 children, but they could not detect any differences between children with MLD and controls. To our knowledge, there are no recent studies focusing on response times in WM in children with MLD either.

3 Objective and Research Questions

Our study contributes to the strand of literature that investigates the relationship between MLD and WM. It aims to provide new evidence on the cognitive correlates of MLD. More specifically, we focus on accuracy and response time differences between children with and without MLD with respect to the central executive system and verbal and visual-spatial WM. Based on the findings of Gordon et al. (2020), who emphasize the role of response times in WM as a measure for higher order cognitive processing speed, we hypothesize that children with MLD do not only differ regarding accuracy but also regarding response times in WM tasks. Thus, we expect both accuracy and the time a child needs for higher order cognitive processing to be related to the ability to process numbers and operations.

This leads us to the following two research questions:

  1. 1.

    How do children with mathematical learning difficulties differ from typically developing peers regarding their accuracy in working memory tests? Does our data replicate current evidence?

  2. 2.

    How do children with mathematical learning difficulties differ from typically developing peers regarding their response times in working memory tests?

4 Methodology

4.1 Data

Our sample contains over five hundred first grade students (N = 572) from 12 primary schools in Mainz/Germany. All tests (except IQ) were specifically developed and programmed for our main study (Berger et al. 2020), which focused on causal effects of a randomized WM intervention. The present paper only uses correlational data from the first testing period before the intervention started. Data collection was conducted in the schools in groups of five children, supervised by interviewers experienced in standardized testing procedures with children.

In order to obtain precise time and accuracy data from children who cannot yet read or write fluently, we conducted digital tests and used large touchscreens and headphones with fully programmed instructions and practice trials for testing the children. Children entered their responses by tapping on the screen with their fingers. All inputs and precise response times were logged digitally. The children were tested in spring after they had been in school for half a year; mean age at test was 7.0 years. Children completed several tests on academic and cognitive skills. Full data for all relevant variables was available for 543 children. For this study, we use data from tests on number and arithmetic processing and WM as well as tests on reading comprehension and fluid IQ. Interviewers were blind to classroom behavior and teachers were blind to testing. The class teachers filled out a questionnaire for each child. However, in the current study, we mainly use the digital test data from the children.

4.2 Measures

4.2.1 Measures for Mathematical Learning Difficulties

Conceptualizing MLD as difficulties in number and arithmetic processing and given, that these skills are multidimensional, any test can only capture a subset of the broad range of mathematics. Standardized tests for MLD differ in mathematical constructs and tasks used and in the way of implementation and evaluation (see Schneider et al. 2021). Given the limited reading and writing skills in our sample’s age group, the implementation as a group test, and limited testing time, we did not include tests on mathematical reasoning or word problem solving in our study. We operationalize MLD as follows: we focus on number sense related to the cardinal number concept and on basic arithmetic processing in addition and subtraction up to 20. Hence, when we talk about MLD in our study, we are referring to learning difficulties in number and arithmetic processing. Our digital tests were provided in such a way that tasks were presented on the screen and via headphones and children were asked to enter resulting numbers into an input matrix on the touchscreen. The digits on this matrix were arranged in a way that they could not be used as a visual counting aid (see Fig. 1).

Fig. 1
figure 1

Examples for digital mathematical subtasks for the assessment of number and arithmetic processing skills

According to the curriculum for the second half year of grade 1, some easy items were in the number range up to 10 and all items were in the number range up to 20. The difficulty level varies across items. We used a number sense task as well as an auditory arithmetic task with 10 items each and a written arithmetic task on addition and subtraction with 11 items to assess arithmetic skills. Figure 1 demonstrates examples of the three mathematical subtasks.

The number sense task aims to measure number processing skills based on the cardinal number concept. The task demands subitizing skills when there are just a few balls and groupitizing skills when for example 5 + 3 balls are displayed for a short time (Dornheim 2008, p. 257; Schindler et al. 2020). The balls were presented for 1.7 seconds on the screen. This display duration is typically too short to count the balls. Hence, children had to internalize and structure the visual number representation and operate with the numbers in mind in order to determine the number of balls. We used this number sense task because it is known as a strong math-specific predictor of MLD (Schneider et al. 2021, p. 74).

In line with other standardized tests for the assessment of basic arithmetic skills (for an overview see Schneider et al. 2021, p. 171 ff.), we collected data on basic arithmetic processing. To this end, we used two tasks on addition and subtraction, an auditory one and a written one. The auditory task included typical mental arithmetic tasks with two numbers to add and subtract. Each test item in the auditory task was presented only once and required children to memorize and process verbal numerical information. In the written arithmetic task, more difficult items requiring addition and subtraction of several numbers were included. Most items in this task went beyond pure retrieval of number facts. Adapted to the fact that children with MLD often use counting strategies (e.g., Häsel-Weide 2016), we selected items involving obvious disadvantages when using counting strategies and obvious advantages for flexible strategy use. For example, if a child solves the task “1 + 5 + 4 =” by counting without using the commutative law of addition, the child might begin at 1, count 5 steps forward (with fingers) and then count 4 steps forward. This takes time and includes the risk of miscounting. Another child with skills on flexible strategy use might combine the first and the last number to the number fact “1 + 4 = 5” by applying the commutative law intuitively. Then the child can recall the remaining task “5 + 5 = 10” as a number fact from long-term memory, too. Hence, the solution from the second child is expected to be much faster and less error-prone than the counting strategy from the first child.

All three mathematical subtasks correlate significantly with each other (p < 0.001). This indicates a common underlying construct and allows for combining into a composite score for mathematical skills. As suggested by Rousselle and Noël (2007), we generated our composite score by adding up the scores from all three subtasks.

Additionally, we use teacher assessments of mathematical abilities for each child to validate our math test measures. Teachers were asked to rate children’s overall ability in math on a scale from 1 to 7. We find a highly significant (p < 0.001) correlation coefficient of around r = 0.6. Since teachers were blind to tests and interviewers were blind to teaching, we consider this correlation to be a good indicator for the robustness of our math test scores.

4.2.2 Measures for Working Memory

The selection of tests was motivated by the intention to cover a broad spectrum of WM skills, including the central executive, the phonological loop and visual-spatial sketchpad as well as different degrees of complexity (two complex tasks and one simple) while sticking to the limit of a total of three different WM tasks. The WM tasks are described in Fig. 2.

Fig. 2
figure 2

Examples for the three working memory subtests

We used a complex span task on memorizing auditorily presented words as a measure for the central executive (comp. DGKJP 2018a, p. 67). We used a simple span task on memorizing a series of digits as a measure for a math-related WM demand in the phonological loop. To measure the capacity of the visual-spatial sketchpad, we used a complex span task in which children had to memorize the positions of shapes in a row. In all three WM tasks, the difficulty level was increased by varying the number of stimuli within an item stepwise from 2 to 7 in the two complex span tasks and from 2 to 9 in the simple span task.

4.2.3 Processing Speed

In line with processing speed tests used in related literature (e.g., Bull and Johnston 1997), our task did neither challenge WM nor intelligence nor mathematics skills or understanding (see Fry and Hale 2000). Our processing speed measure was taken as a separate side measure within the visual-spatial span task which was not included in WM measures. It is based on the child’s selection of the ‘odd’ shape. We measure the time children needed from getting presented the three shapes until they chose the ‘odd’ shape. Although children were not explicitly asked to respond quickly, a comparison of average response times per item and of response time variability between the sample in Bull and Johnston (1997, p. 12) and our sample allows us to assume that children recognized the ‘odd’ shape at first glance and tapped on it immediately.

4.2.4 Construction of Accuracy and Response Time Measures

In all tests, we chose equivalent measures for the scale “Accuracy” (“number of solved items, or number of errors”) and for the scale “Response Time” (“time needed for solving”) as in the meta-analysis (see Haberstroh and Schulte-Körne 2019, p. 109). Our accuracy measures were the number of correctly solved items in a test. We z‑standardized these measures over the whole representative sample with a mean of 0 and a standard deviation of 1 to account for better comparability across tests. Our measures for the time scale were based on the precise response times in the WM tasks. For each child, we calculate average response times for each item in each of our three WM tasks. As an additional proof of robustness for our results, we evaluate our response time measures in two different ways: our main response time measure is based on all items of the respective test. We compute a second response time measure based only on the correctly solved items to control for a student’s outcome orientation (see Landerl et al. 2004; Rousselle and Noël 2007). To make scores of tests with different content, times, and levels of difficulty comparable, we z‑standardize our response time measures on the level of each single item.

4.3 Characteristics of MLD and Control Group

We follow the most common procedure to investigate differences between children with MLD and typically developing children but we do not apply the IQ discrepancy criterion in the present paper for the abovementioned arguments by Gaidoschik et al. (2021, p. 9).

In the pertinent literature, it has become an established procedure to define cut-off criteria, split up the sample into two groups, one group of children with MLD and one group of typically achieving children, and compare statistical differences between them. More specifically, our cut-off criteria exclude children with a fluid IQ below the 9th percentile (comparable to an IQ < 80), children with reading skills below the 16th percentile and children who are reported as having severe problems with the language of instruction by their teacher. Fluid intelligence was measured based on a subset of 17 items of Raven’s Matrices IQ test scores (Bullheller and Häcker 2002). Reading comprehension was measured on the level of sentences with an age-appropriate test with 10 items of varying difficulty. Children had to choose the missing word in a sentence with a gap from a list of four alternatives (e.g., “Leo is at the [ ____ ].” with four words to choose from: [Mum], [lake], [hat], [name]).

After applying the cut-offs, our sample consists of children with typical intelligence and reading skills who can follow the teaching. The cut-offs are typically applied in order to accentuate difficulties in the area of mathematics versus other difficulties. In addition to the procedure with these typically applied exclusions, we document parallel analyses on our full sample without excluding any children. By doing so, we aim to generalize our results for an applied perspective in inclusive classrooms in primary schools.

In all our analyses, children are assigned to the group of children with MLD if their mean score in tests on number and arithmetic processing is below the 25th percentile. In doing so, we follow the majority of peer studies, although this percentile exceeds estimated prevalence rates (for a discussion see Murphy et al. 2007). Our full sample—analyzed in the online supplement—includes 543 children with full sets of available data. After applying the cut-off criteria, we have a final sample of 409 children for the analyses in the main paper. 57 children with mathematical learning difficulties are assigned to the MLD group and the remaining 352 children to the control group. Table 1 provides summary statistics for these two groups.

Table 1 Descriptive statistics by group

4.4 Statistical Procedure

Our statistical procedure also follows the procedure of most peer studies (comp. DGKJP 2018b). First, we conduct correlation analyses and simple univariate t‑tests to investigate the relationship between our mathematical and WM measures. Further t‑tests are performed to check whether the groups differ from the mean of the population. Second, we run regression analyses of our accuracy and response time measures in single WM components on a group indicator and a list of covariates including demographic variables (age and gender) and fluid intelligence. Furthermore, to allow for a more differentiated view at the level of mathematical subtasks, we include continuous measures of mathematical ability in a further specification of our analysis.

Besides these main analyses and the parallel analysis with the full sample, we run different robustness checks. In a first robustness check, we include the response time measure in our accuracy analysis and vice versa to disentangle them. In a second robustness check, we run our regression using an alternative response time measure that is calculated on only correctly solved items. Lastly, we run regressions including teacher and school fixed effects to control for systematic differences between schools.

5 Results

Correlation analysis shows positive associations between both accuracy and response time measure in mathematical tests and in WM tests. Moreover, our t‑tests indicate that the deviation of our MLD from our estimated population mean is positively on accuracy measures and negatively on response time measures (see online supplement Table OS.1 to OS.3). Based on these basic findings, we conduct regression analyses to test whether group differences are significant. Our main interest variable is the group dummy ‘MLD vs. control’. This dummy variable serves as an indicator for the difference between the group of children with mathematical difficulties and typically achieving control children. The coefficient can be interpreted as the accuracy or response time difference of a child being in the MLD group compared to a child being in the control group. Our robustness checks in the online supplement overall provide evidence that our results are robust to changes in model specification (see online supplement Table OS.8 to OS.16).

5.1 Accuracy of Working Memory Tasks

Our regression analysis links the accuracy measures (number of correctly solved items) of each of our three WM tests to their affiliation with the MLD group (see Table 2). Children in the MLD group on average scored significantly lower (0.39 to 0.48 SDs) than children in the control group in all three WM subtasks while keeping control variables constant in the linear regressions.

Table 2 Differences in the accuracy (number of solved problems) in WM tasks between children with and without mathematical learning difficulties

Our results from the parallel analyses on our full sample without excluding any children are documented in the online supplement (Table OS.4). The results remain almost the same—regardless of whether we apply cut-off criteria or not. This allows to generalize our results to the full sample.

5.2 Response Times in Working Memory Tasks

In order to analyze the time needed in WM tasks, we regress response time measures in the mathematical and WM tests on our group dummy variable and on controls. Apart from using time measures as dependent variables, regressions are identical to the regressions we run for exploring coherence on the accuracy scale.

Table 3 presents results for the regressions of mean response times on the group dummy and on controls. The coefficients are smaller than those from the accuracy measures, but we find significant differences in response times in the tasks measuring two components of WM, the verbal WM task (0.22 SD, p < 0.001) and the visual-spatial WM task (0.2 SD, p < 0.05). Translated into actual response times, the mean group differences amount to 77 milliseconds in the verbal WM task and 49 milliseconds in the visual-spatial WM task respectively. The coefficients in the central executive and in the processing speed model are smaller and insignificant.

Table 3 Differences in response time in WM tasks between children with and without MLD

5.3 Detailed Results for Mathematical Subtasks

To provide a differentiated view on the mathematical tasks, we rerun our regressions with a continuous composite measure of mathematical ability instead of the group dummy as well as a continuous measure of the mathematical subtasks. We find that accuracy and response times in the verbal and visual-spatial WM are significantly associated with all three mathematical subtasks (see Table 4).

Table 4 Differences of accuracy and response times in working memory tasks, using the math composite score and mathematical subtasks as independent variables

5.4 Effect Sizes

In Table 5, we translate our results into effect sizes to compare with the effect sizes reported in the meta-analysis which was published by the DGKJP in 2018 in German (DGKJP 2018a) and 2019 by Haberstroh & Schulte-Körne in English language (Haberstroh and Schulte-Körne 2019).

Table 5 Comparison of effect sizes of belonging to the MLD group for WM components in the meta-analysis (DGKJP 2018a, p. 17) and in our present study

For the first dimension of accuracy, the effect sizes we obtain for the central executive and the visual-spatial sketchpad are similar to the effect sizes reported in the meta-analysis (ibid.). In the WM subcategory phonological loop, our estimated effect size is higher than the effect size reported by the meta-analysis (ibid.). For the dimension of response times, the meta-analysis stated missing evidence related to WM (DGKJP 2018a, p. 15). Our results indeed show a difference between children with and without MLD also regarding response times in WM tasks. Compared with the accuracy dimension, the effect sizes for response times are generally smaller. However, in the verbal and the visual WM components, small to medium effect sizes can be found, although we are talking about times of mostly less than one second per item with group differences of 4 to 11% (comp. Table 1).

6 Discussion

The present study aims to shed light on the association between MLD and WM skills—beyond fluid intelligence—in first grade children. Our research question addresses differences in accuracy and response times between children with and without MLD.

6.1 Differences in Accuracy of Working Memory Tasks

Concerning our first research question on differences in the accuracy of WM tasks, our study reveals significant differences in all three WM components, the central executive, the verbal and the visual-spatial WM, whereas some studies (e.g., Maehler and Schuchardt 2016; Schuchardt et al. 2008) report significant differences in single WM components only. Our estimated effect size for the phonological loop is much larger than the mean effects size in the meta-analysis by Haberstroh and Schulte-Körne (2019). A potential explanation for this finding might be the young age of children in our sample. Children in the majority of studies included in the meta-analysis were older (DGKJP 2018b). Menon (2016) argues that verbal WM plays a prominent role particularly in the early stages of mathematical learning when children learn to verbally relate numbers and quantities to each other. However, these results must be viewed with caution due to the math-related content of our digit span task and the fact that we do not have a parallel measure from a letter span task. Hence, it can be assumed that the detected large effect size could be an overestimation. Concerning the central executive, our results confirm the finding that children with MLD make more mistakes in these WM tasks (see e.g., De Weerdt et al. 2013; Maehler and Schuchardt 2016; Schuchardt et al. 2008). For visual-spatial WM, this statement applies as well (see Table 5). This strong effect size is in line with a substantial number of studies indicating that MLD is particularly associated with problems in the visual-spatial WM component (e.g., Graß and Krammer 2018; Maehler and Schuchardt 2016; Szucs et al. 2013). In summary, our results confirm the importance of low WM skills as a strong predictor of MLD.

6.2 Differences in Response Times in Working Memory Tasks

Based on recent findings on the crucial need for fast processing in WM to keep information actively in mind (Gordon et al. 2020), we focus on response times as a measure for higher order cognitive processing speed. Our main contribution to the current literature is the following finding: limitations in WM in children with MLD might not only channel through higher error rates but also through longer response times in WM. The effects sizes in the time dimension are smaller than the effect sizes in the accuracy dimension but children with MLD needed significantly more time for their responses in two out of three WM tasks compared to their typically achieving peers. The use of touchscreens has made it possible to detect these small differences in response times.

Since we are talking about differences of around 50 milliseconds, we assume that we might not have been able to detect these small differences without using touchscreen as an input device, because prior experience of children with using computer mice might have caused disturbing noise in the time data (comp. De Weerdt et al. 2013).

Our results for response time differences reveal an effect size of 0.42 for verbal WM and of 0.31 for visual-spatial WM. The coefficients for the central executive in our main specification point in the same direction but are insignificant. Hence, we take a closer look at our WM tasks for verbal and visual-spatial WM. Children with MLD needed more time than their typically achieving classmates to indicate the spatial positions of objects although they were not significantly slower in distinguishing these objects before. The control measure for processing time is insignificant. With respect to the close relationship between WM and processing time (Cornoldi and Giofrè 2014), this finding suggests that children with MLD are not generally slower in cognitive processing, but they are slower when it comes to higher order cognitive processing where several pieces of information have to be processed at a time. Concerning mathematical learning, it is plausible that this kind of higher order cognitive processing speed might be related to e.g., multistep arithmetic operations whereas our more basic control measure for processing speed might be more related to retrieval of e.g., basic number facts. Unfortunately, we did not collect systematic data on number fact retrieval to prove this hypothesis. Accordingly, we can only speculate that children with MLD are slower in processing visual information. Szucs et al. (2013) argue that the visual-spatial WM component is responsible for transformations of visual representations and operations that are relevant in mathematical problem solving. If more time is needed for these processes, it could be concluded that encoding information that is required for accurate transformation is hindered, and visual information cannot be properly processed in the visual-spatial workspace (Toll et al. 2016, p. 430 f.). Since WM is considered a system of limited capacity (Baddeley 1983; Gathercole et al. 2016), our results match the points raised by Cornoldi and Giofrè (2014). They argue that response times could be interrelated with error rates in a way that longer response times might lead to losses of task-critical information that is held in temporary storage before being adequately manipulated or connected to long-term memory contents. Considering the role of fact retrieval from long-term memory for mathematical problem solving (e.g., Geary 1993) and the higher WM load in the case of using counting strategies instead of retrieving number facts from memory, we conclude that prolonged processing in WM might also affect mathematical performance.

A similar argument could be applied to account for longer response times in the verbal span task that we applied to assess verbal WM. Children with MLD needed significantly more time when they were asked to recall several numbers in a row. More numbers in mind might get lost when children need longer to process several numbers simultaneously in WM.

6.3 Differences in Mathematical Subtasks

With regard to the separate regressions on the different mathematical subtasks, we find accuracy and response time measures for verbal and visual-spatial WM tasks to be significantly related to all mathematical subtasks. This highlights WM to be an important domain-general cognitive resource for mental mathematical information. The fact that coefficients do not only become significant in tasks with an obvious relation to either verbal or visual-spatial type of information highlights that underlying cognitive processes of these relatively basic mathematical tasks are already multidimensional. Whenever we observe larger coefficients, they can be explained by the type of information which needs to be processed in the specific task. Verbal WM is particularly associated with the auditory arithmetic task. This might be due to the auditorily presented information in both, the verbal WM task and the auditory arithmetic task. This also applies to the central executive task, although the auditorily given information to be held active in the phonological loop are words instead of numbers. The need for visual-spatial processing in our written arithmetic task might not be obvious for this task with only written numbers as inputs and as results, but the high coefficient reflects visual-spatial processing to be relevant anyhow.

From research in mathematics education, it is well known that children with MLD typically have more difficulties switching between different forms of mathematical representations and that this skill is crucial in arithmetic problem solving (Gagatsis and Shiakalli 2004). In a similar manner, McCloskey et al. (1985) reasoned that number processing from verbally presented stimuli requires processing number words. Accordingly, successful translation depends on the child’s ability to identify digits or numbers and to assemble the transmitted input into processable information. Moreover, the translation process involves the internalization of one external representation to produce another external representation (Pape and Tchoshanov 2001). If children fail to create an appropriate mental representation of the external one, this may result in an incorrect or slowed processing of information in WM already.

7 Conclusion and Outlook

Taken together, our results indicate that children with difficulties in learning mathematics do not only make more mistakes in WM tasks, but that they also need more time. This new result provides insights into the cognitive patterns underlying MLD.

7.1 Limitations

Nevertheless, we have to mention several limitations. It must be emphasized that WM and MLD are highly complex constructs that have to be simplified for the purpose of our research. The relationship between WM and MLD differs depending on age and on the specific WM and mathematical tasks (e.g., Menon 2016). For our study, this implies specific limitations: first, our phonological loop task was a forward digit span task. This is in line with the handling in the meta-analysis by Haberstroh and colleagues (DGKJP 2018a, p. 67), but the task is also used as a measure for short term memory in other literature (e.g., Bull and Johnston 1997) and backward span tasks are used more often as measures for the phonological loop. A stronger limitation results from the use of number as information to be remembered in this task. This leads to a confounding of content-specific processing difficulties of children with MLD. Hence, it is likely that the remarkably high effect size might thus be overestimated relative to e.g., a letter span task. Second, our three mathematical subtasks were methodologically restricted to tasks which could be solved on a touchscreen after a relatively short and highly standardized auditory instruction without any text and which could be rated as correct versus false. This remains an obvious limitation. Our tests only cover a small part of the multidimensional concept of mathematics. At the same time, peer studies cover slightly different parts in similar age groups (e.g., Klesczewski et al. 2018). Hence, comparisons or generalizations have to be taken with caution until more evidence is available. For example, we did not include mathematical word-problem solving tasks as in other studies in mathematics education (Fuchs et al. 2020; Verschaffel et al. 2020). Moreover, we did not include a task focusing on retrieval of number facts although these are also observed as a crucial difficulty in children with MLD (e.g., Häsel-Weide 2016). This additional measure would have enabled us to better disentangle the effect of fact retrieval (with a low WM load) versus our tasks with a higher WM load as risk factors of MLD.

7.2 Outlook on Classroom Practice and on Further Research in Mathematics Education

Taking into consideration the previously mentioned evidence that WM is malleable by training and that this WM training can transfer to arithmetic and geometric performance in the long run (Berger et al. 2020; Judd and Klingberg 2021), our study highlights the importance of considering WM skills in the diagnosis and treatment of MLD. Following on from this, we discuss what can be derived for school practice and for future research in mathematics education.

First, including the diagnosis of WM skills and time dimensions in early digitally supported screenings might provide an opportunity to adapt the treatment more closely to the needs of the individual child. For example, if two children with comparably low mathematical skills differ in their WM capacity, the child with lower WM capacity might benefit more from a math-related digital WM training parallel to the MLD treatment than the other child with typical WM skills. The more detailed knowledge we have on the interrelations between WM and MLD, the more we can customize digitally supported treatment and prevention programs to the needs of individual children in inclusive classrooms.

Second, to ensure that WM does not become a limiting factor for children with MLD in mathematics learning, it is important to keep this cognitive resource in mind when planning lessons. On the one hand, our results emphasize the need for children with MLD to use visual mathematical aids (see e.g., Gaidoschik et al. 2021, p. 11) or (digitally generated) visual representations to reduce WM load and free WM capacities for mathematical learning (see Ladel 2020). Further practical implications to supporting children with MLD emerge when we consider our results in the context of recent evidence on the malleability of WM mentioned above: Adaptive digital WM exercises (Berger et al. 2020) or mathematical exercises on mental operations with visual representations of numbers (Wang et al. 2019) could be offered to children after diagnosing low mathematical and low WM skills to adaptively increase their WM capacity for future learning (see Winkel and Ladel 2022). The evidence by Wang and colleagues reminds us that our correlational results cannot simply be interpreted as causal evidence in one direction, but that WM skills and mathematical skills both can be mutually influential. Likewise, this raises new research questions about how we can design MLD treatments or everyday mathematics exercises to simultaneously promote WM and mathematical skills in children with MLD.

How exactly visual-spatial and verbal WM skills translate into mathematical skills or where exactly individual difficulties are anchored needs to be examined in depth in further interdisciplinary studies based on knowledge and methods from mathematics education, educational psychology and neurosciences. We see a high potential in the possibilities of digital tools to meet heterogeneous learning needs related to WM and mathematics (see also Winkel and Ladel 2022). Future research is needed to develop and evaluate evidence-based learning environments to make this potential effective for children with MLD in inclusive classrooms.