Introduction

Learning mathematics is oftentimes assumed to be learning for everyday life. We share this assumption and frame it in educational standards. According to the Program for International Student Assessment (PISA; Organisation for Economic Co-operation and Development (OECD) 2019), Mathematical literacy (ML) is defined as “an individual’s capacity to formulate, employ, and interpret mathematics in a variety of contexts.” The importance of ML as an element in the definition of educational standards has been made apparent in, for example, the USA and Germany (National Council of Teachers of Mathematics 2003; Standing Conference of the Ministers of Education and Cultural Affairs of the Federal Republic of Germany 2004). ML is crucial for students’ understanding of mathematics in today’s life contexts (Baumert et al. 2007).

International educational studies such as PISA and the Trends in International Mathematics and Science Study (TIMSS) aim to assess students’ ML by having them solve everyday problems with mathematical means (Mullis et al. 2009; OECD 2003). Researchers have investigated the development of ML by using large-scale longitudinal studies, for instance, in Germany, PISA studies (PISA Plus 2012–2013: OECD 2013; PISA-I-Plus: Prenzel 2006), and by conducting national studies, for instance the COACTIVFootnote 1 research program (Kunter et al. 2013), the Study of Initial Achievement Levels and Academic Growth in Secondary Schools in the City of Hamburg (e.g., Caro and Lehmann 2009), and the longitudinal Element study (Lehmann and Nikolova 2007).

This large body of research investigating predictors and outcomes of ML has generated partly contradictory results. Among others, prior achievement, migration, and social background (Kiemer et al. 2017), socioeconomic status (Caro and Lehmann 2009), self-efficacy, self-concept, interest, and learning goals (Kriegbaum et al. 2015) were identified as relevant predictors of ML. For the relationship between ML and achievement in other domains, such as reading, studies using longitudinal data found covariation effects in Grades 1 to 7 (Korpipää et al. 2017) and predictive effects of third-grade reading comprehension on ML throughout early primary school when controlling for prior achievement (Grimm 2008).

Hence, ML is also thought to determine later academic achievement in many ways (cf., Duncan et al. 2007; Gut et al. 2012; Siegler et al. 2012). In the context of solving realistic problems, studies on mathematical word problems, mathematical modeling competence, and mathematical problem-solving in general showed that predictors such as calculation skills, mathematical self-concept, reading comprehension, and cognitive skills are relevant for ML development and the relationship to later achievement (Blum and Borromeo Ferri 2009; Brown and Stillman 2017; Leiss et al. 2010; Leutner et al. 2012; Phonapichat et al. 2014). Studies on mathematical modeling, which is typically considered a cognitive process consisting of different phases of solving a real-world problem by means of mathematics, showed that reading comprehension, cognitive skills, and self-concept are important predictors of problem-solving success (Jensen 2007; Leiss et al. 2010; Maass 2006).

However, as argued in a recent study on teaching practice in ML (Kuger et al. 2017), the influence of single predictors is often overestimated, especially in cross-sectional analyses. Therefore, longitudinal studies that take various predictors comprehensively into account are necessary. Furthermore, the empirical support that does exist for predictors and outcomes of ML is mostly restricted to early primary school (e.g., Grimm 2008; Korpipää et al. 2017) or tertiary education (e.g., Hwang and Riccomini 2016; Pape and Wang 2003; Sokolowski 2015). Only a few studies are related to secondary school (Caro and Lehmann 2009; Kriegbaum et al. 2015). Hence, we argue that longitudinal studies across secondary school are needed to examine the influence of ML on later academic achievement while at the same time taking its crucial predictors into account.

Theoretical background of ML

The relationship between ML, reading, and achievement in other domains

We understand academic achievement as achievement in different school domains, for instance, mathematics, language, and science. Achievement in most studies is either operationalized as grade point average (GPA) or assessed with achievement tests in the respective domain, sometimes by using multiple-domain tests such as the Wide Range Achievement Test (Wilkinson 1993), the California Achievement Test, or the Stanford Achievement Test (e.g., Sirin 2005). In line with studies on the development of problem-solving competence, we argue that fostering ML results in higher achievement in other domains later on (Leutner et al. 2012). Later academic achievement is assumed to be linked with success in mathematical tasks, because a strong relationship has been found with mathematical performance in general (Duncan et al. 2007).

Recent research on the covariation of ML and reading achievement indicated that gains in ML-related skills such as problem-solving and reasoning as well as cognitive abilities in general lead to better achievement in other domains (Baumert et al. 2012). The authors suggested the cumulative advantage effect (DiPrete and Eirich 2006) as a possible explanation. Additionally, a transfer effect can be assumed; ML involves skills that are shared with other processes, such as reasoning and general cognitive abilities, so gains in ML will support students’ progress in other achievement domains. A study comparing adults with PISA students showed that the average ML in adults was on the level of a secondary school student (Ehmke et al. 2005). These authors also showed that ML in adults was linked to an individual’s vocational degree. However, research on mathematical modeling mainly focused on distinct phases of the problem-solving process (Baumert et al. 2007; Blomhoj and Jensen 2003; Blum et al. 2004; Jensen 2007; Leiss and Tropper 2014). Intervention studies have shown that teaching students to construct a situational model of a problem given in text or pictorial form improves their ability to solve mathematical problems (English and Watters 2005; Hwang and Riccomini 2016; Kaiser et al. 2015; Schukajlow et al. 2015). This seems to be especially relevant for students with difficulties in learning mathematics (Phonapichat et al. 2014).

With respect to achievement in other domains, a meta-analysis on applying mathematical modeling to support students’ mathematical knowledge acquisition at the high school and college level found positive effects of mathematical-modeling techniques on achievement in different content domains (Sokolowski 2015). Hoffman and Spatariu (2008) examined influences on problem-solving efficiency and found middle to high cross-sectional correlations between GPA and performance on a math achievement test as well as between GPA and problem-solving efficiency. When predictors such as reading competence and self-concept were considered using path analysis, lower coefficients were found (Schommer-Aikins et al. 2005). Jordan et al. (2002) showed longitudinally that over 2 years, growth in reading competence was diminished for children with difficulties in mathematical problem-solving. This, again, accounts for the assumption that ML affects achievement in other domains. Moreover, Korpipää et al. (2017) found that reading and arithmetic in Grades 1 to 7 covaried substantially over time.

In summary, reported covariations of ML and achievement in other domains (Jordan et al. 2002; Korpipää et al. 2017; Sokolowski 2015) as well as a relationship between ML and gains in general cognitive abilities (Baumert et al. 2012) hint at common elements in ML and skills relevant to various school domains. This leads to assuming transfer effects of ML on academic achievement. Current research on the relationship between ML and academic achievement is barely conclusive because important predictors remain unconsidered. The necessity for research that takes an integrative view of ML that considers comprehensive predictors and outcomes using longitudinal data is evident.

The role of predictors of ML

A longitudinal study from Chu et al. (2016) on the development of ML that followed children’s gains in reading and mathematics achievement while also assessing preliteracy knowledge, intelligence, executive functions, and parental educational background identified all variables assessed as being predictive for children’s ML from preschool to kindergarten. The authors concluded that a combination of domain-general and domain-specific abilities plays an important role in ML development. Using a large sample (N = 6020) of 15-year-old German PISA students, Kriegbaum et al. (2015) showed that besides task-specific self-efficacy, intelligence and prior achievement predicted ML 1 year later.

Current research indicates that multiple predictors play a role in the development of ML. These empirically investigated predictors of ML can also be derived from theories on mathematical modeling that view the cognitive process of solving realistic problems as consisting of several distinct but interdependent phases (Leiss and Tropper 2014). Depending on a given task, certain challenges (e.g., reading correctly, extracting the mathematical information, understanding the context) are essential to solving the problem (Kaiser et al. 2015). Predictors of success in mathematical tasks can be deduced from studying these challenges.

As problems are mainly given in text form, reading comprehension was found to be crucial to understanding the problem and its context (Borromeo Ferri 2006). Qualitative studies showed that many students have difficulties comprehending key words (Phonapichat et al. 2014). A middle to high correlation was reported between mathematical reading comprehension and modeling competence (Leiss et al. 2010). Lee et al. (2004) showed that the strength of this relationship is comparable with that of the influence of cognitive skills on solving mathematical word problems. Reading comprehension for a given problem, additionally, seems independent of technical reading skills such as reading speed and accuracy (Vilenius-Tuohimaa et al. 2008).

To correctly solve the mathematics extracted from a problem, basic calculation skills are needed. Leiss et al. (2010) found a positive correlation between students’ results in a general mathematics test used as a measure of non-subject-specific mathematics skills and modeling competence. Counting skills were found to be a valid predictor of later problem-solving skills (Aunola et al. 2004). Using multiple regression analysis, Andersson (2007) showed that calculation had an influence on solving word problems that was larger than that of reading comprehension.

Mathematical self-concept is also considered crucial for problem-solving achievement (Pajares and Miller 1994). Additionally, academic self-concept was demonstrated to play a role in achievement in many school domains (e.g., Marsh et al. 2005). Examining reciprocal effects of mathematical self-concept and achievement, Marsh et al. (2005) found significant path coefficients favoring the effect of self-concept on later achievement. This finding seems to be domain specific (Schöber et al. 2018). Self-efficacy was found to be linked with efficient problem-solving (Hoffman and Spatariu 2008). Belief in one’s own capability to solve mathematical problems was also found to be linked with problem-solving performance (Schommer-Aikins et al. 2005). Also gender differences seem to play a role in this relationship: Studies found that boys, especially when stereotypes were evident, outperformed girls when they had higher scores on a self-concept measure (Ehrtmann and Wolter 2018; Preckel et al. 2008).

Besides math-related predictors such as basic calculation skills and mathematical self-concept, domain-general abilities, that is, cognitive processes, are found to be associated with ML. Baumert et al. (2007) argued that reasoning skills and mathematical modeling cannot be investigated independently. There exists research on the relationship of subskills of ML and cognitive skills such as working memory and fluid intelligence (Lee et al. 2004; Swanson 2011; Swanson et al. 2008) as well as executive functioning and intelligence (Arán Filippetti and Richaud 2016; Best et al. 2011). Fuchs et al. (2006) conducted path analyses and found significant path coefficients for language comprehension and nonverbal problem-solving skills on solving arithmetic word problems. Taken together, prior calculation skills, mathematical self-concept, reading comprehension, and cognitive skills are theoretically derived as well as empirically studied predictors of ML.

Socioeconomic status and gender

In studies on ML and academic achievement, among several control variables, two in particular seem to play a prominent role: socioeconomic status (SES) and gender (e.g., Grimm 2008). Children of higher SES tend to receive better grades (Lekholm and Cliffordson 2008) and perform better on academic achievement measures (Sirin 2005). Kiemer et al. (2017) found in PISA data that migration status and SES were interconnected because much of the difference in achievement was due to financial resources when prior achievement was controlled for. Over the secondary school years, the achievement gap associated with SES seems to narrow (Caro and Lehmann 2009).

Gender differences have been found in some studies, presumably depending on the operationalization of outcome variables. For instance, Robinson and Lubienski (2011) found that teachers rated female students higher on mathematics and reading, while cognitive assessments suggest males have an advantage in mathematics. We can conclude that it is important to consider gender and SES when investigating the effects of ML on academic achievement.

Objectives of the current study

The current state of research lacks empirical evidence for the relationship between ML and achievement in domains outside mathematical development. However, this is very relevant because educational standards as well as international studies have focused on promoting ML as a means of enabling students to use their mathematical knowledge in their everyday lives (Hwang and Riccomini 2016; Kaiser et al. 2015; Schukajlow et al. 2015). Moreover, studies so far have not paid enough attention to the comprehensive influence of ML on academic achievement in different school domains. Theories on ML indicate that a gain in ML leads to domain-general problem-solving abilities from which students’ overall academic achievement could profit in the sense of transfer effects from learning mathematics on other school domains (Baumert et al. 2012; Chu et al. 2016; Korpipää et al. 2017). Assuming common skill sets for reasoning, reading comprehension, and problem-solving, transfer effects from ML to achievement in other school domains are expected.

Several predictors have been empirically documented as having an effect on or being related to ML (Chu et al. 2016; Kriegbaum et al. 2015; Leiss et al. 2010; Marsh et al. 2005). Although calculation skills (Andersson 2007; Aunola et al. 2004), mathematical self-concept (Hoffman and Spatariu 2008; Marsh et al. 2005), reading comprehension (Lee et al. 2004; Leiss et al. 2010), other prior achievement (Kriegbaum et al. 2015), and reasoning (Fuchs et al. 2006) were separately found to be empirically related to ML, research so far has lacked an integrative view of these predictors using longitudinal data to account for effects on both ML and academic achievement in general.

Applying mathematical knowledge in the sense of ML becomes crucial for further mathematical development above the primary-school level, particularly throughout secondary school (United Nations Educational, Scientific and Cultural Organization Institute for Statistics 2013), yet several studies examining mathematical development have focused on primary school (e.g., Duncan et al. 2007; Geary 2011; Korpipää et al. 2017). Furthermore, studies on SES indicate that development in secondary school is a determinant for later achievement because cumulative advantages (Baumert et al. 2012) and the gap between low and high SES (Caro and Lehmann 2009) play an important role at this stage. Thus, it becomes apparent that studies on ML investigating the secondary school years, which constitute an important phase in ML development, are needed.

Hypotheses and research questions

Our study extends previous research by taking an integrative view of ML that considers multiple predictors to explore the effects of ML on later academic achievement, using longitudinal data from a large sample in Grades 5 to 9. Our goal was to investigate how ML predicts academic achievement (information and communication technology (ICT) literacy, scientific literacy, reading comprehension, and listening comprehension) in different school domains throughout secondary school while controlling for prior achievement. We assumed that ML would still have an effect on later academic achievement in different domains when prior achievement in the respective domain is controlled for—in the sense of a transfer effect of ML on achievement in other domains.

We investigated whether existing results regarding predictors of ML can be replicated when a comprehensive set of predictors is studied simultaneously. On the basis of previous findings, we assumed that calculation skills, mathematical self-concept, reasoning, and prior achievement in ML as well as in other domains are linked to later ML. We presumed that these predictors would show effects on ML throughout secondary school when effects of prior achievement are controlled for. Therefore, our research questions and hypotheses are as follows:

  1. 1.

    How does ML predict academic achievement in different school domains (i.e., ICT literacy, scientific literacy, reading comprehension, and listening comprehension) throughout the secondary school years?

  • Hypothesis 1. ML predicts achievement in different school domains later on even when prior achievement in the respective domain is controlled for.

  1. 2.

    How do calculation skills, mathematical self-concept, reading comprehension, and reasoning affect ML when studied comprehensively?

  • Hypothesis 2. Calculation skills, mathematical self-concept, reading comprehension, and reasoning predict ML later on when the respective other predictors are controlled for.

As minor questions, we investigated whether hypotheses 1 and 2 are still valid when potential effects of the control variables SES and gender are taken into account.

In sum, it is notable that recent research focused mainly on either preschool and the early school years or high school and college. We examined the postulated relationships from Grades 5 to 9, hence across secondary school.

Method

Sample

We examined our research questions using data from the National Educational Panel Study (NEPS), a longitudinal study conducted in Germany designed for research on educational trajectories (Blossfeld et al. 2011). We used data provided as scientific-use files for registered users of NEPS, which began investigating fifth-grade students in 2010, at two measurement points (Grades 5 or 6, and 9). NEPS provided separate files for competence measures, cohort information, and both student and parent questionnaires, which were prepared for scientific use following professional guidelines and by publishing reports about validity, scaling, and reliability. The original panel cohort consisted of 6112 students, of whom 5778 participated in the first wave in Grade 5. Four years later, 5452 students from the original sample were targeted, of whom 4001 participated in Grade 9 (cf., Zinn et al. 2018). The sample is suited for answering our research questions because this cohort follows students through the course of secondary school (Fabian et al. 2019).

Analyses for testing the postulated hypotheses were conducted with domain-specific competence data (ML, reading, and reasoning) and questionnaire information (math grade, mathematical self-concept, and gender) in Grade 5, domain-specific competence data in Grade 6 (ICT literacy, scientific literacy, and listening comprehension), and the corresponding domain-specific competence data in Grade 9. Information about SES was obtained from parent questionnaires.

The final sample consisted of 4001 students (N = 1963 female (49.4%), missing values in gender: N = 26). Students in Germany in most federal states follow one of three school tracks (Hauptschule, Realschule, or Gymnasium, typically considered general, intermediate, and advanced secondary school, respectively) after the end of Grade 4. In our sample, 51.9% (N = 1701) were allocated to Gymnasium, 30.0% (N = 917) to Realschule, and 7.6% (N = 249) to Hauptschule, thus being fairly representative regarding German school tracks. Relying on the German federal school system, students attending some schools were not yet divided into school tracks (N = 238), and for some students, the declared school track was unclear (N = 722) or missing (N = 174). Regarding SES, 13.8% (N = 418) of students were categorized as having low, 62.9% (N = 1771) intermediate, and 23.3% (N = 668) high SES (missing: N = 1144).

Measures

ML

A mathematical competence assessment was administered consisting of 24 items in Grade 5 and 34 items in Grade 9. The theoretical framework for the mathematics test construction was based on the PISA as well as national educational standards and was designed to measure ML on the basis of mathematical problems from students’ life contexts (Neumann et al. 2013). Students were asked to solve mathematical problems and answer mostly multiple-choice questions. As in the PISA, items could be assigned to one of four content areas: quantity, change and relationship, shape and space, and data and chance. Six different cognitive components were distributed over these items with modeling as one of these six components. A sample item titled “The Fence” goes as follows: “Mr. Brown owns a rectangular piece of land that he wants to fence. After calculating, he buys 40 meters of fence. The piece of land has a width of 8 meters. How long is the piece of land?” This item involving modeling belongs to the content area space and shape (Schnittjer and Duchhardt 2015).

Weighted maximum likelihood estimates (WLEs), which are estimates of a student’s most likely competence (Pohl and Carstensen 2013), were computed to indicate domain-specific competence. Scaling relies on item response theory (IRT; Pohl and Carstensen 2012). For the following analyses, WLEs of ML were used from Grades 5 to 9, for which sufficient reliabilities (.78 and .81, respectively) have been reported (Duchhardt and Gerdes 2012; Van den Ham et al. 2018). WLEs, theoretically, are standardized scores with M = 0.00 and SD = 1.00, but the values can differ from zero or one, respectively, due to sample selection procedures. Analyses on panel attrition in the NEPS showed that students in this sample with good or medium ML have a stronger tendency to drop out (Zinn et al. 2018). This could mean that students with lower ML are overrepresented in our sample.

Math grade

Students reported their final math grade from the previous year, which, therefore, referred to Grade 4. Students in Grade 4 were not yet divided into school tracks, which enhances the comparability and validity of our measure. Germany uses a grading system of 1 to 6, with 1 indicating the best grade possible. In German-speaking countries, basic calculation skills, for instance, unit operations, are learned in primary school during Grades 1 to 4. For this reason, we argue that math grade at this age represents students’ ability in the basic calculation skills needed to solve ML problems, which is why we used it as a measure of mathematical performance in terms of students’ calculation skills.

Mathematical self-concept

Mathematical self-concept was conceptualized as domain specific and was geared to the PISA (Wohlkinger et al. 2011). Domain-specific self-concept is widely seen as a subdimension of students’ overall academic self-concept and can be investigated in a subject-specific way (Wohlkinger et al. 2016). It was measured using three items (i.e., “I get good grades in mathematics,” “Mathematics is one of my best subjects,” and “I have always been good at mathematics”) with answers ranging from 1 (does not apply at all) to 4 (applies completely) on a 4-point Likert scale. A mean score was calculated from these items. Mathematical self-concept was assessed in Grade 5; Cronbach’s alpha for this scale was .87.

ICT literacy

ICT literacy was conceptualized as computer literacy from a functional perspective and relies on everyday problems in modern-day societies (Weinert et al. 2011). Students were presented corresponding problems and asked to accomplish computer-based tasks mostly with screenshots of applications. Moreover, students were asked to answer 30 (Grade 6) or 36 (Grade 9) multiple-choice items. The test was designed to measure students’ ability to access, create, manage, and evaluate software applications. ICT literacy was used from Grades 6 to 9, for which sufficient reliabilities (.69 and .81, respectively) were reported (Senkbeil and Ihme 2017; Senkbeil et al. 2014).

Scientific literacy

Scientific literacy was conceptualized as the ability to apply scientific knowledge of personal, social, and global importance in the contexts of environment, technology, and health (Hahn et al. 2013). The test was designed to measure knowledge about matter, systems, development, and interactions in scientific inquiry and reasoning. WLEs were calculated and used from 27 (Grade 6) to 28 (Grade 9) items. Good reliabilities of .77 (Grade 6) and .83 (Grade 9) were reported (Funke et al. 2016; Hahn et al. 2013).

Reading comprehension

Reading comprehension was conceptualized as functional understanding of texts (Gehrer et al. 2013). Students were asked to answer multiple-choice questions about texts meant to represent everyday reading such as information, commentaries, argumentations, instructions, or advertisements. Requirements were categorized into finding information, drawing conclusions, and reflecting texts. Reading comprehension (WLEs) was used from Grades 5 to 9, for which sufficient reliabilities (.77 and .79, respectively) were reported (Pohl et al. 2012; Scharl et al. 2017).

Listening comprehension

Listening comprehension in the NEPS was assessed differently over time. In Grade 6, receptive vocabulary was assessed using an adapted version of the Peabody Picture Vocabulary Test (PPVT; Roßbach et al. 2005). In the PPVT with 77 items in total, students choose one of four pictures based on a given word. Sum scores were calculated for listening comprehension in Grade 6. Cronbach’s alpha for this scale was .88.

In Grade 9, listening comprehension was conceptualized as the ability to extract information from spoken texts and to draw conclusions that are implied in these texts (Hecker et al. 2015). In line with the literacy perspective, texts were based on realistic contexts. Students heard two texts (a conversation and a narration) and were asked to answer two sets of eight complex multiple-choice items. Listening comprehension (WLE) was used from Grade 9, for which sufficient reliability (.76) was reported (Rohm et al. 2017).

Reasoning

To measure basic cognitive skills, two tests were constructed for the NEPS: a picture symbol test assessing perceptual speed, and a matrices test assessing reasoning (Haberkorn and Pohl 2013). It has been argued that these two indicators are suitable for assessing fluid intelligence because they are theoretically central and have been empirically found to be crucial for successful development (Brunner et al. 2014). For our study, students’ results from the matrices test in Grades 5 and 9 were used to investigate reasoning. Reasoning was assessed using three sets of four items, with Cronbach’s alpha = .66 in both grades. Sum scores were calculated, which resulted in a maximum of 12.

SES

In the parent questionnaire at Grade 5, a parent reported his or her highest educational attainment. Responses were rated based on the Comparative Analysis of Social Mobility in Industrial Nations Scale (Brauns et al. 2003). The scale was recoded into three categories: low (no degree or degree with basic work-related training), intermediate (advanced work-related training or postsecondary school), and high (university level or higher), as has been suggested for studies using NEPS data (cf., Zinn et al. 2018).

Modeling issues and missing data

We estimated a structural equation model with regression analyses for assumed paths using the lavaan package in R (R Development Core Team 2008). Fit indices (root mean square error of approximation (RMSEA), confirmatory fit index (CFI), Tucker–Lewis index (TLI)) were used to examine model fit. We applied cutoff values of .06 for RMSEA and .95 for CFI and TLI, which according to Hu and Bentler (1999) indicate a good fit between a hypothesized model and the observed data.

Measurement invariance for ML, ICT literacy, scientific literacy, reading comprehension, and listening comprehension (all by means of WLE) was ensured with an elaborated conceptualization in the NEPS based on models of IRT as well as an anchor-item design (cf., Pohl et al. 2015). It was strengthened by testing for unidimensionality and demonstrating the absence of differential item functioning (e.g., Fischer et al. 2016). Reasoning was measured using the same items on both occasions.

Missing data in single competence tests were already treated when calculating WLEs (cf., Pohl and Carstensen 2013). For missing data in one of the other instruments as well as for participants missing whole competence test assessments, we used a full information maximum likelihood (FIML) approach because statistical power is maintained and FIML typically produces less biased results than listwise deletion (Enders 2010).

Results

Descriptive statistics

Table 1 shows manifest correlations, minima, maxima, means, and standard deviations for all variables. Bivariate correlations were found between variables of interest, of which most appeared to be highly significant. All predictor and outcome variables correlated positively with each other, showing that ML was associated with academic achievement in different school domains 4 years later. Moreover, the correlations among predictors support our assumption that calculation skills, mathematical self-concept, reading comprehension, and reasoning need to be studied together. With respect to control variables, gender differences were found for all variables except SES, ICT literacy, and reasoning in Grade 9. Male students scored higher on all variables except listening comprehension in Grade 9 and reading comprehension in Grades 5 and 9, on which female students scored higher. This finding indicates that gender is an important control variable for our analyses. For all variables of interest apart from mathematical self-concept, significant correlations with SES were found, with higher SES students scoring higher on all measures.

Table 1 Manifest correlations of mathematical literacy, math grade, mathematical self-concept, information and communication technology (ICT) literacy, scientific literacy, reading comprehension, listening comprehension, reasoning, socioeconomic status (SES), and gender; minima, maxima, means, and standard deviations of all variables

Structural equation model

As our descriptive statistics indicate, ML was associated with later academic achievement in different school domains (research question 1). Furthermore, predictors of ML were correlated with each other, implying that they should be studied together (research question 2). To explore our hypotheses about controlling for prior achievement in the respective domain when looking for predictive effects of ML on achievement in different domains (hypothesis 1) and studying predictors comprehensively (hypothesis 2), we calculated a structural equation model. The estimated model is presented in Fig. 1. Since, as expected, we did not find math grade and mathematical self-concept to significantly add to later achievement in school domains other than ML when controlling for prior achievement, we did not include corresponding paths in the model. We found significant covariances for all exogenous variables (values are not depicted in Fig. 1 for better readability). For better readability of the figure, the corresponding path coefficients for our regression analyses are presented in Table 2. Given skewness ranging from − 1.12 (reasoning) to 0.63 (math grade) and kurtosis ranging from − 0.69 (mathematical self-concept) to 1.21 (reasoning), normal distribution of the data was assumed (cf., Kaplan 2009). The model produced a good fit of the data (cf., Hu and Bentler 1999), with RMSEA = .035, CFI = .998, and TLI = .982.

Fig. 1
figure 1

Estimated model without coefficients. Root mean square error of approximation = .035, comparative fit index = .998, Tucker–Lewis index = .982. ICT = information and communication technology; SES = socioeconomic status. On the left, ICT literacy, scientific literacy, and listening comprehension from Grade 6 (indented), math grade from Grade 4, others from Grade 5. Paths for the effect of math grade and mathematical self-concept on later achievement in school domains other than mathematical literacy were not included because no significant effects were found

Table 2 Standardized path coefficients (β) of the structural equation model in Fig. 1

As we wanted to investigate only the effects of ML on later academic achievement while controlling for prior achievement in the respective domain (research question 1), we did not need to predict mathematical self-concept in Grade 9. While math grade was used as a predictor in terms of students’ basic calculation skills in Grade 5, we did not include students’ math grade as an outcome variable in Grade 9. Moreover, SES was assessed in Grade 5 and was used as a control variable, which is why we did not include SES in Grade 9 in the model. To account for the nested structure of the data, we tested for models with values centered around schoolhouse as well as classroom means but found no changes in the significances of path coefficients.

As we found bivariate correlations with gender for several variables, we estimated a multigroup model to test for gender differences. No significant gender differences were found for loadings or path coefficients when testing for differences in model fit indices among different models. Regarding our minor hypothesis on gender effects, we conclude that analyses with respect to our other hypotheses are valid since we did not find any gender differences. We also tested for different models with paths regarding influences of SES on later reasoning but did not find a significant effect.

Research question 1: impact of ML on academic achievement

Our model revealed significant path coefficients from ML in Grade 5 to achievement in four school domains and reasoning in Grade 9 while considering prior achievement in the respective domain. First, on ICT literacy in Grade 9 (R2 = .54 for all predictors), a significant effect (β = .19) of ML was found. Second, a significant path coefficient (β = .17) was found from ML in Grade 5 to scientific literacy in Grade 9 (R2 = .57). Third, for reading comprehension in Grade 9 (R2 = .49), a significant path coefficient (β = .06) was found from ML in Grade 5. Fourth, a significant path coefficient of β = .09 was found from ML in Grade 5 to listening comprehension in Grade 9 (R2 = .42). Fifth, regarding reasoning in Grade 9 (R2 = .33), a significant path coefficient (β = .23) was found from ML in Grade 5. Prior achievement in other domains showed no significant effects on reasoning in Grade 9, apart from scientific literacy, for which there was also a significant path coefficient (β = .10). Though we did not make assumptions on the effects of ML on reasoning, ML was found to impact reasoning 4 years later, while prior achievements in other domains except scientific literacy did not have a significant effect on later reasoning. Moreover, significant paths were found for SES to later academic achievement in different school domains except for ICT literacy, confirming the importance of SES as a control variable. Our results confirm hypothesis 1, showing that ML predicted achievement in different school domains 4 years later when prior achievement in the respective domain was taken into account. This provides evidence for the presumed transfer effect of ML on achievement in other school domains over time.

Research question 2: predictors of ML

Significant paths were found for all assumed predictors on ML in Grade 9, namely math grade, mathematical self-concept, prior achievement in different domains (i.e., ICT literacy, scientific literacy, reading, and listening comprehension), reasoning, and SES with the autoregressive path showing the highest effect (β = .35). The regression analysis explained 60% of the variance in ML in Grade 9. This finding confirms hypothesis 2, supporting the assumption that predictors found in previous studies still have an effect on ML throughout the secondary school years even if predictive effects among them and prior achievement in ML and other domains are accounted for. This supports taking an integrative view of predictor variables when investigating ML.

Discussion

The main goal of this study was to examine transfer effects of ML on later academic achievement in different school domains (i.e., ICT literacy, scientific literacy, reading comprehension, and listening comprehension) while taking known predictors into account and considering prior achievement in the respective domain, SES, and gender. In an extension to previous research, we focused on ML across the secondary school years using representative longitudinal data from the NEPS (Blossfeld et al. 2011).

Consistent with our hypotheses, we found effects of ML on later academic achievement in different domains. These results confirm our hypotheses regarding the influence of ML on achievement in different school domains in terms of a transfer effect. In line with previous findings from Baumert et al. (2012), Chu et al. (2016), and Korpipää et al. (2017), we suggest that the transfer effect of ML is due to the promotion of domain-general abilities such as problem-solving and a deeper understanding of texts, realistic contexts, and students’ everyday life through competence development linked with students’ ML. We even found a significant effect of ML in Grade 5 on reasoning in Grade 9—while controlling for prior reasoning—which strengthens this explanation.

In line with current research on predictors of ML, our study confirmed the role of calculation skills, mathematical self-concept, reading comprehension, and reasoning as predictive factors of ML (Andersson 2007; Baumert et al. 2007; Borromeo Ferri 2006; Chu et al. 2016; Kriegbaum et al. 2015; Pajares and Miller 1994). We assumed these predictors, found in separate studies to have an effect on ML, to predict later ML when examined at the same time and while controlling for prior achievement. Our results support an integrative view of different phases of the process of solving mathematical problems, as has been suggested in various theoretical articles and qualitative studies (e.g., Kaiser et al. 2015; Leiss and Tropper 2014). Concerning cognitive models of ML, which typically view the process of solving mathematical problems as consisting of different phases, our results reveal that mathematical self-concept still has an effect on ML when prior achievement is controlled for. This is in line with findings from Marsh et al. (2005) and self-enhancement models of self-concept (Bandura 1997), which suggest there is an underlying motivational basis to the effects of mathematical self-concept on ML. Students profit from high self-concept because it enhances their motivation, which is especially useful when attempting the complex tasks involved in ML. We found that reasoning also showed an effect on later ML, which is in accordance with previous arguments on the role of cognitive skills in ML (Baumert et al. 2007) and underlines the role of general cognitive abilities, in the sense of shared skill sets of ML and achievement in other domains.

In contrast to what previous research showed (e.g., Robinson and Lubienski 2011), we found no gender differences in paths from comprehensive predictors to ML or from ML, prior achievement, and reasoning to achievement in different school domains. We presume that this may be due to our operationalization of outcome variables as competence measures (in contrast to GPA) and the fact that taking other explanatory variables such as math grade or self-concept into account may diminish gender effects.

Regarding SES, we found significant paths to ML and achievement in different school domains 4 years later, which is in line with previous research on achievement measures (Sirin 2005). Though an SES-related achievement gap appears to narrow through the course of secondary school (Caro and Lehmann 2009), students with higher SES still score higher on different achievement measures in Grade 9.

Limitations and directions for future research

We examined the influence of ML on later academic achievement longitudinally throughout secondary school with a national representative sample. While this is an important time span previously neglected in studies on ML, a longer time span reaching into students’ vocational years would shed light on further academic as well as nonacademic outcomes.

Because we used data from the large-scale NEPS assessments, limitations lie in the restriction to scales used in the study. Mathematical self-concept, for instance, while having the strength of being domain specific (Schöber et al. 2018), consisted of only three items. Broader and more differentiating concepts such as self-efficacy, attitudes, or motivational variables would be interesting to take into account to cover self-concept. Especially the role of self-efficacy in ML would appear to warrant further attention (Krawitz and Schukajlow 2017; Schukajlow et al. 2012). With respect to cognitive skills, which have been argued and empirically found to play an important role in mathematical problem-solving, the domain of verbal intelligence should be examined as well, because reading and listening comprehension were found to influence ML.

Disregarded by our research approach were other phases in the cognitive process of ML, such as metacognitive skills (Maass 2006), working memory (Swanson 2011), and personality (Phonapichat et al. 2014), all argued to have an influence on solving mathematical problems. Concerning individual differences in competence development, differences regarding migration background are a broadly discussed topic in the current literature. With controlling for SES, we have partly addressed this issue because SES is considered to be linked to migration background (e.g., Lenkeit et al. 2015). Future studies should address the influence of migration background directly to broaden the integrative view of predictive factors of ML, as differences in mathematical development were found to be entangled with reading competence (e.g., Lehner et al. 2017).

From a methodological perspective, though our results are longitudinal, they are restricted in that they are not interventional. Besides this, the possibility of there being other paths that were not analyzed restricts the validity of causal effects. The argument for causal transfer effects of ML on achievement in different school domains relies on theoretical assumptions (i.e., shared skill sets being fostered by gains in ML) and is methodologically strengthened by controlling for prior achievement in the respective domains as well as taking ML 4 years previous into account. Nevertheless, we only tested for predictive patterns, resulting in the need for experimental manipulations (e.g., instructional approaches to teaching ML) to get more evidence of causal transfer effects. Moreover, with respect to other scholars, Leiss et al. (2010) argued that addressing how teachers themselves are taught to foster their students’ ML is essential. Lehner et al. (2017) postulated encouraging, motivating lessons, classroom management, and schooling structure to be central for mathematical competence development. Applying a solution plan when solving mathematical problems could foster ML as well as mathematical self-efficacy (Schukajlow et al. 2015). Also, class sizes seem to play a role, though findings are still inconclusive (Ehrenberg et al. 2001; Hattie 2009; Schukajlow and Blum 2011). To enlarge the integrative view, interventional studies are needed to further investigate ML and its role in achievement and life success.

Conclusion and practical implications

We found consistent support for the importance of ML for general academic achievement. ML was found to be linked with achievement in math-related domains (ICT literacy, scientific literacy) as well as domains not directly related to mathematics (listening and reading comprehension). We argue that this not only underlines the effect of ML on understanding mathematics in today’s life contexts (Baumert et al. 2007) but also indicates the existence of a transfer effect of applying this understanding for better life competence in domains other than mathematics. In our view, this means that students profit—in terms of their future academic achievement—when educators and policy makers invest in students’ ML. Our results suggest that the success of such investment is comparable with that obtained by the investment in fostering other competences such as reading comprehension. The universality of our model regarding gender differences reinforces the idea of not differentiating between boys and girls concerning the support they get in learning mathematics and reading.

The consistent support we found in line with previous studies regarding the influence of predictive factors on ML could encourage teachers and parents to foster students’ mathematical self-concept as well as their ML from early on. This also supports the current directions in international educational standards to implement the literacy perspective in the domain of mathematics in the conceptualizations of school curricula (National Council of Teachers of Mathematics 2003; Standing Conference of the Ministers of Education and Cultural Affairs of the Federal Republic of Germany 2004). Given our findings and following the ML concept of PISA 2021 (OECD 2019), teachers should, when designing their mathematics lessons and motivating their students, keep in mind the applicability for current and post-school participation in a culture. The theoretical insights and evidence of our study indicate that making ML part of a daily routine elicits mathematical development and trains skills that advance students in other school domains. We assume that this different perspective on ML leads to a deeper, broadened understanding of the significance of mathematics in students’ everyday lives. Keeping in mind the relationships of reasoning and self-concept as well as the links between SES and ML, we strongly suggest integrating encouraging, vivid, and resource-based teaching methods that foster students’ mathematical self-concept when guiding students to tackle the challenging elements of ML problems. Applying a solution plan (Schukajlow et al. 2015) might prove to be useful in this regard. Following these recommendations, we are convinced that ML is key to motivating a diverse range of students. Finally, the impact we found of SES on later achievement in different school domains when controlling for prior achievement highlights the need for educational equity.