Introduction

Research shows numeracy skills to be influential in many areas of life, including school attainment and dropout, employment, and even psychological well-being (Hakkarainen et al., 2016; Parsons & Bynner, 2005). Given that student motivation in mathematics seems to decline over the school years (Jacobs et al., 2002; Pinxten et al., 2014), education faces an important challenge: how to foster mathematics learning and motivation already during the early years of education, when the foundations of mathematics skills and attitudes are gradually formed. This global challenge is relevant also in Finland, where mathematics achievement, despite its high level by international comparison, has been steadily declining in recent years, while students report simultaneously relatively low mathematics-related motivation and attitudes (OECD, 2013; Mullis et al., 2016).

Co-teaching has been considered as a means of increasing teacher responsiveness and instructional quality, and providing better support for students’ different learning needs (e.g., Friend & Cook, 2013; Villa et al., 2008). The promise of co-teaching lies within the increased student-teacher interaction and effective teaching practices, such as individualisation and systematic feedback (Friend & Cook, 2013; Sweigart & Landrum, 2015; Villa et al., 2008). The motivational implications here are of particular importance, as research shows both feedback and individualised support to reinforce students’ positive self-concept (O’Mara et al., 2006; Lüdtke et al., 2005) and higher interest (Kiemer et al., 2015). In this study, we investigated the potential of co-teaching in the context of primary school mathematics. More specifically, using a unique design, we compared the development of sixth-grade students’ mathematics motivation (i.e., self-concept and individual interest) and achievement between co- and solo-taught classes.

Mathematics self-concept and individual interest

Research on mathematics achievement shows students’ evaluations of both themselves in relation to mathematics, and of mathematics as a school subject, to play an important role in mathematics learning (Middleton & Spanias, 1999). Particularly self-concept and interest in mathematics seem to be of specific relevance (e.g., Marsh et al., 2005). Mathematics self-concept refers to an individual’s perception and evaluation of their mathematics abilities (Bong & Skaalvik, 2003). These perceptions are formed through experiences of interacting with the learning environment, and are influenced by environmental reinforcements, past achievements, and significant others (Shavelson et al., 1976). How students judge their achievements is considered a major determinant of self-concept, and this judgement grounds on internal and external comparison (Marsh & Craven, 2000; Skaalvik & Skaalvik, 2002). Internal comparison refers to an individual’s comparison of their current abilities or achievement to past achievement within the same subject or between different subjects (Marsh & Craven, 1997; Skaalvik & Skaalvik, 2002), while external comparison reflects the comparison of one’s own achievement to others’ achievement, partly through feedback from peers, parents, and teachers (Gniewosz et al., 2012; Marsh & Craven, 1997; Skaalvik & Skaalvik, 2002).

Interest is another important factor facilitating learning (Harackiewicz et al., 2016). Learning being different when interest is either present or absent was noted already by Dewey (1913) – interest can “catch and hold” our attention to approach learning in a “whole-hearted way”. Interest is thereby characterised by increased attention, concentration, and affect that are specific to a person, object, activity, or subject, and can be further divided into situational and individual forms (Hidi, 2006; Kaplan & Patrick, 2016; Schiefele, 2009). Situational interest refers to a short-lived temporary state, where environmental cues and factors can catch and maintain the attention of the individual, thus facilitating motivation to act in a certain way (Schiefele, 2009). Individual interest, instead, reflects a relatively long-lasting affective evaluation-orientation towards a specific subject or object (Hidi & Ainley, 2002; Schiefele, 2009). Sustained and maintained situational interest may gradually lead to individual interest over a time, although it may require support from the environment (Hidi & Renninger, 2006). Existing individual interest may, in turn, facilitate the triggering and maintenance of situational interest (Nuutila et al., 2020; Schiefele, 2009; Tapola et al., 2013); a student with high individual interest in mathematics is more likely to enjoy and engage in mathematics tasks than a student with low individual interest.

Research shows mathematics self-concept to promote a range of positive educational outcomes, such as academic choices and aspirations (e.g., Marsh et al., 2005), and contribute to achievement in a reciprocal manner (e.g., Marsh et al., 2005; Marsh & Craven, 2006; Marsh & Martin, 2011). That is, prior self-concept may boost later achievement, which, in turn, is likely to enhance subsequent self-concept. However, this reciprocity may not be as evident with younger students (Ganley & Lubienski, 2016; Viljaranta et al., 2014), as their self-concept tends to be inflated due to unrealistic appraisals (Wigfield & Eccles, 2000). Interest, instead, has been shown to facilitate attention as well as cognitive and affective processing (Ainley et al., 2002), and to predict effort (Arens & Hasselhorn, 2015) and course selection (Köller et al., 2001). High interest may also act as a buffer against the negative effects of non-optimal learning conditions (Hidi & Renninger, 2006; Katz et al., 2006; Tsai et al., 2008). Consequently, while self-concept seems to be more directly linked with achievement outcomes (e.g., Marsh et al., 2005), individual interest may serve as the mediating fuel to act towards learning as described by Dewey (1913).

There is also some evidence for self-concept and interest being interrelated both concurrently (Marsh et al., 2005) and longitudinally (Petersen & Hyde, 2017). That is, when students feel competent in mathematics, they are also more likely to experience interest and enjoyment in mathematics-related activities. However, the linkage between the two seems complex; previous findings provide support for both reciprocal relationships (Marsh et al., 2005) and the causal predominance of self-concept (Jacobs et al., 2002; Viljaranta et al., 2014; Wigfield & Eccles, 2000).

Research further indicates that students’ mathematics self-concept (Jacobs et al., 2002; Petersen & Hyde, 2017; Wigfield & Eccles, 2000) and interest (Lazarides et al., 2019; Petersen & Hyde, 2017; Renninger & Hidi, 2011) decrease as students get older, although the level of interest may plateau across the later school years (Frenzel et al., 2010). It has been suggested that these developments, especially after mid-primary grades, are linked with more realistic appraisals of ability (Wigfield & Eccles, 2000), increased social comparison processes, and changes in instructional practices and curriculum (Stipek & Iver, 1989). Most prior research has focused on long-term changes over several years (e.g., Denissen et al., 2007), while less is known about the short-term (e.g., within a school year) developmental dynamics between mathematics self-concept and interest, and how these dynamics are connected with the pedagogical context and changes in achievement. Addressing these questions is one of the main aims and contributions of this study.

Supporting self-concept and interest in mathematics learning

Despite the consistent observation of developmental decline in mathematics motivation, research also shows that students’ mathematics self-concept (Watson et al., 2019; for a review see, O’Mara et al., 2006) and interest (Høgheim & Reber, 2015; Rotgans & Schmidt, 2017) can be supported through pedagogical practices. Teachers play an important role in the development of students’ self-concept, as their actions and communication in the classroom provide the students with information for external and internal comparisons (e.g., Skaalvik & Skaalvik, 2002). One example is the teacher’s use of social frame of reference standard, meaning that the teacher evaluates student achievement by a comparison between students, which may prompt students to view learning as competition (Lüdtke et al., 2005). Opposed to this is the teacher’s use of individual frame of reference standard, whereby students’ achievement is compared to their own past achievement, which, in turn, may facilitate personal effort and diminish social comparison (Lüdtke et al., 2005). Further, conveying high expectancies on students while downplaying social comparison may also help to maintain mathematics self-concept (Watson et al., 2019). Teacher feedback focusing on effort, improvement over time, and individual performance is likely to have a positive impact not only on students’ mathematics self-concept (Lüdtke et al., 2005; O’Mara et al., 2006) but also on achievement emotions (Pekrun, 2016) and interest (Kaplan & Partick, 2016; Kiemer et al., 2015). In contrast, uninformative and controlling feedback (Deci et al., 1999; Tsai et al., 2008), and feedback endorsing social comparison or competition, may not only impede learning but also undermine intrinsic motivation (including interest) and threaten positive self-evaluations (Brophy, 2011; Marsh & Craven, 1997; Pekrun, 2016; Ryan & Deci, 2017).

Student interest can be further supported through interest-provoking didactic means such as introducing specific learning problems (Rotgans & Schmidt, 2017). Personalisation (Bernacki & Walkington, 2018; Høgheim & Reber, 2015) and differentiation seem to be the key issues here. For example, Durik et al. (2015) found that providing students with utility value information boosted their mathematics interest, but this was mostly the case for those with high success expectancies. In contrast, students with lower expectations benefited more from encouraging feedback that stressed effort and the belief that they possessed the required potential to perform well. Although such targeted pedagogical practices might seem rather self-evident, they do require effort and time. Personalising learning tasks for each student in order to spark their interest in a classroom full of students is no simple task (e.g., Hidi, 1990).

Co-teaching as a means to improve instructional practices

One potential way to organise teaching to facilitate previously mentioned instructional practices and meet the students’ needs better is co-teaching. This is a collaborative effort of two or more educators to combine their expertise to teach a heterogeneous class, usually within the same physical space, where they share all aspects of teaching, planning, and evaluation (Friend & Cook, 2013; Villa et al., 2008). The main goal is to provide an educational environment for all students to learn and succeed through an emphasis on such effective instructional practices that cannot be produced by a single teacher alone (Friend, 2008).

Co-teaching has been referred to, often erroneously, by many names, including collaborative teaching or team teaching. Co-teaching differs from team teaching by pairing professionals of different expertise within one class as opposed to two or more teachers pooling their classes into one, thus resulting in smaller teacher-student ratio (t-s ratio) (Conderman & Hedin, 2015; Friend et al., 2010; Sweigart & Landrum, 2015; Villa et al., 2008). It has been suggested that when co-teachers have more time to interact with their students, this may lead to not only increased scaffolding and individual support, but also more frequent and thorough feedback (Friend & Cook, 2013; Sweigart & Landrum, 2015).

Co-teaching thus involves much more than just adding adults in the classrooms (Sweigart & Landrum, 2015; Villa et al., 2008) or reducing class sizes. Pairing up with another teacher can offer several advantages to both teaching and planning. For instance, teachers may be able to identify and support different learners better, as some students benefit more from feedback, and some from teacher modelling and scaffolding (Kaplan & Patrick, 2016). When a teacher has the time to reflect on their instructional processes with another professional, it may lead to a reflective stance on their own professionalism that can enrich and further improve educational practices (Brophy, 1983; Rytivaara & Kershner, 2012). This co-teacher discourse and planning can break some ritualised routines that might hold back student learning (Nuthall, 2005), in contrast to a solo teacher who may be more restricted to internal dialogue, where self-justification might be more likely than critical reflection (Bright, 1996). It might not be about having the necessary skill-set, but to see the need for change. The dialogue as well as the increased student support may also shift the assessment of learning from the traditional end-of-period assessment to a daily routine that truly facilitates the teachers’ individual frame of reference standard (e.g., Lüdtke et al., 2005).

Although the idea of co-teaching is appealing, research-based evidence on how it contributes to student outcomes is limited, and has often focused on students with special education needs (see, Cook et al., 2017; Strogilos et al., 2023). Murawski and Swanson’s (2001) meta-analysis found co-teaching to be moderately effective in terms of learning outcomes. However, firm conclusions could not be drawn, since only six of the 89 reviewed studies were eligible for the actual analysis: some studies lacked comparison groups, and in some, the treatment fidelity was not adequately documented (Murawski & Swanson, 2001). One recent study showed co-teaching to impact students’ test scores in mathematics positively (Jones & Winters, 2022). In the context of English classes, students reported having better access to individualised assistance, and co-teaching to contribute positively to their improvement (Wilson & Michaels, 2006). Lochner et al. (2019) compared solo- and co-taught classes using observational data, and found co-taught students to be more cognitively engaged in their learning, a finding similar to that of class-size reduction studies (Blatchford et al., 2011; Finn et al., 2003). This finding seems particularly relevant for the present study: could such an increase in engagement also boost student motivation?

Present study

Although the research on mathematics self-concept, individual interest, and their development over time is relatively rich, we know less about how changes in them are connected with each other, and, particularly, how they correlate with changes in mathematics learning and achievement. Also, given how instructional practices found to support student learning and motivation (e.g., individualised tasks, feedback, support, and time to interact with different students) have been associated with co-teaching (Villa et al., 2008), surprisingly little research has been conducted on its relative effectiveness. This would seem particularly relevant in the context of challenging, task intensive, and error-prone school subjects such as mathematics.

In this study, we investigated, (RQ1) how mathematics self-concept and individual interest change over one school year, and how these changes are related to each other, (RQ2) whether the levels and changes in self-concept and individual interest are predicted by the teaching condition (co-teaching versus solo-teaching), after controlling for teacher-rated achievement and test performance, and most importantly, (RQ3) how the teaching condition as well as changes in self-concept and interest further predict later achievement and test performance (i.e., teacher-rated grades and test performance at the end of sixth grade). As previous research has found some relatively consistent gender differences in mathematics motivation (Ganley & Lubienski, 2016; Jacobs et al., 2002), we also accounted for this by including gender as a covariate. To our knowledge, no previous studies have examined these developmental dynamics in the context of co-teaching.

Regarding the first research question, we expected both self-concept and interest to decline over time (Denissen et al., 2007; Frenzel et al., 2010; Jacobs et al., 2002; Pinxten et al., 2014), and these changes to be correlated (Petersen & Hyde, 2017).

As to the second research question, we anticipated co-teaching (as compared to solo teaching) to have a positive effect on the change in students’ self-concept and interest due to more personalised teaching and pedagogical practices (e.g., Villa et al., 2008), while we assumed an effect of mathematics achievement and test performance on the onset of both self-concept and interest (Marsh et al., 2005; Ganley & Lubienski, 2016; Petersen & Hyde, 2017) as well as their changes (Denissen et al., 2007). That is, high achieving students were expected to have not only more positive self-concept and interest at the beginning of the sixth grade, but also show more positive changes in them.

Regarding the third research question, we expected both the initial level (Marsh et al., 2005) and relative improvement (or less steep decline) in self-concept and interest to contribute to better mathematics achievement and test performance by the end of the year (Denissen et al., 2007; Petersen & Hyde, 2017), after controlling for the effects of prior achievement. Most importantly, changes in self-concept and interest were in turn presumed to mediate the anticipated positive effect of co-teaching on mathematics achievement (e.g., Marsh et al., 2005; Marsh & Martin, 2011).

Method

Design

This study could be characterised as a natural experiment with some features of a quasi-experiment (Remler & Van Ryzin, 2011; Shadish et al., 2002). The initiative and design for the study came from the researchers, but the implementation was organised together with the participating schools and teachers. Following this, teachers were recruited to both conditions, and the practices to be implemented by the ”experimental group” (i.e., co-teaching) were facilitated through workshops (for more details, see below). However, the co-teachers were completely free to plan and carry out their classes without any moderation or intervention by the researchers. In this sense, the events and activities that took place in both conditions over the school year were naturally occurring and ecologically valid, despite the experimental setup (see, Bronfenbrenner, 1977).

The empirical arrangement followed a repeated-measures design with a comparison of two teaching conditions: (1) the co-teaching group, where mathematics was taught by pairs of class teachers and special education teachers, and (2) the solo teaching group, where mathematics was taught by individual class teachers who received some assistance from a special education teacher, which is common policy in FinlandFootnote 1. The students had three lessons of mathematics (45 min each) per week in both teaching conditions. The study design is illustrated in Fig. 1.

Fig. 1
figure 1

Study design

Participants and procedure

Four primary schools and ten teachers within a city in Eastern Finland participated in the study. The student participants (Table 1) were 146 sixth grade students, aged 12–13 years. The co-teaching condition consisted of three classes (class size ranging from 23 to 27) for a total of 70 students; 47 girls and 23 boys. The solo-teaching condition consisted of four classes (class size ranging from 21 to 23) totalling 76 students; 40 girls and 36 boys. Gender distribution was relatively equal in both conditions, χ² (1) = 3.19, p = .09.

The data were collected by questionnaires at three timepoints, as shown in Fig. 1: at the beginning of the school year in August (T1), at mid-term in January (T2), and in April (T3). Mathematics performance was tested at the beginning and at the end of the school year. The minor loss of participants at different timepoints was due to normal school absence. All data were collected by the first author in the classrooms, with the exception of T3 measure, which was conducted during an online lesson in April due to Covid−19 restrictions.

The recruitment began by contacting the school principals, who forwarded the request to participate to their teachers. The participating teachers had to have a Master’s degree in education and sufficient teaching experience (i.e., minimum of five years), and they were required to join the project voluntarily (i.e., to not be assigned by administrators), to ensure sufficient comparability of participant background, as recommended by previous research (e.g., Friend & Cook, 2013; Saloviita & Takala, 2010; Scruggs et al., 2007). We also allowed teachers to form their co-teaching pairs within their respective schools by themselves, to facilitate more effective and equal partnerships with matching interests and pedagogical views (Pratt, 2014). All participants signed a written consent that followed the ethical guidelines of The Finnish National Board on Research Integrity, the university guidelines, and European Union GDPR requirements. According to the Finnish National Board on Research Integrity (2019) guidelines, no ethical review was necessary.

As many co-teaching studies suggest lack of knowledge to be one of the main obstacles to successful implementation of co-teaching (Friend et al., 2010; Saloviita & Takala, 2010; Scruggs et al., 2007), the project began with a workshop for each co-teacher pair. The workshop was led by one of the researchers, and included material on the basics and characteristics of co-teaching (Friend & Cook, 2013; Villa et al., 2008), teacher roles, as well as what is required for successful co-teaching (Murawski & Lochner, 2011; Pratt, 2014; Scruggs et al., 2007). The key aspects of co-teaching were emphasised: planning, teaching, and evaluation are to be shared, and these practices should include thorough reflection (Fluijt et al., 2016; Murawski & Lochner, 2011). Note, that the workshops did not include any material on self-concept or individual interest, or how to support them through teaching. The solo teachers did not receive any training from the researchers.

Measures

Mathematics self-concept and interest

The Self Description Questionnaire I (Marsh, 1990) was used to assess mathematics self-concept and individual interest. It is considered to be a reliable and valid instrument across different cultures (Arens & Hasselhorn, 2015), and has been validated in the Finnish context and in this age group (e.g., Savolainen et al., 2018). We followed recommendations to divide the scale into two components, competence self-perceptions and affect perceptions, respectively (Arens et al., 2011). Given the wording of the items within both components, we considered them as accurate representations of mathematics self-concept (i.e., the students’ subjective evaluation of their competence in mathematics: “I am good at mathematics’’, “I find mathematics easy”, “I learn mathematics quickly”, and “I get good marks in mathematics”), and mathematics interest (i.e., students’ affective and cognitive appraisals: “I look forward to mathematics”, “I am interested in mathematics”, “I like mathematics’’, and “I enjoy mathematics”). The students rated each item with a Likert-type scale ranging from 1 (strongly disagree) to 5 (strongly agree).

Both measures showed excellent reliability for all timepoints (T1-T3): the respective McDonald’s omegas (ω) for self-concept were 0.91, 0.90, and 0.92, and for interest 0.96, 0.95, and 0.96.

Mathematics performance and achievement

Students’ mathematics performance was measured twice (at the beginning and at the end of sixth grade) with the standardised RMAT-test (Räsänen, 2004), which is a time-constrained test (i.e., 10 min) for basic numeracy skills in mathematics. The test comprises a set of basic arithmetic tasks (56 items): addition, subtraction, multiplication, division, fraction, units of measurement, and equations. Respective omegas for both test performances were 0.85 and 0.87. The students were told that their test performances would not impact their teacher-rated mathematics grade (i.e., a low-stakes test).

For a more comprehensive measure of students’ mathematics achievement, teacher-rated mathematics grades at the end of fifth and sixth grades were retrieved from the school records. The Finnish grading scale in comprehensive education ranges from 4 (fail) to 10 (excellent) and is based on the Finnish National Core Curriculum. The criteria for the grade eighth are specified in the curriculum and teachers use them as a guideline for determining other grades. The focus of student assessment is on the students’ learning and working skills. Although both formative and dialogical assessment practices and students’ active role in these processes are emphasised in the curriculum, summative assessment has a strong position among Finnish teachers (e.g., Nieminen & Atjonen, 2023).

Fidelity

Various indirect measures were taken to ensure and evaluate the fidelity of the implementation (e.g., Gresham et al., 2000), particularly in terms of adherence (i.e., the extent to which all activities were delivered as designed; Carroll et al., 2007). For the co-teachers, these measures included the above-mentioned workshops in the beginning of the school year, interviews on three occasions on implementation and progress, and a weekly check-list type diary. The diary included a description of the various principles of co-teaching: whether all aspects of teaching were shared, how much time was spent on planning and reflection, and an evaluation of the success of lessons on a scale from 1 (not at all satisfied) to 5 (very satisfied). The purpose of the evaluation was to evoke further reflection on what needs to be improved (e.g., Cook et al., 2017; Fluijt et al., 2016; Murawski & Lochner, 2018). To ensure the commitment of both schools and co-teachers, the special education teachers in the co-teaching dyads received monetary compensation from the municipality. This ensured that the special educators were not pulled to other tasks or to other classes from the co-taught mathematics classes.

Similar diaries and interviews were also completed by the solo teachers to evaluate whether they taught and planned mathematics by themselves (i.e., whether the aforementioned standard practice was maintained), or showed signs of reactivity to the experimental situation or compensatory rivalry (Shadish et al., 2002). The solo teachers’ diary included weekly hours of special education teacher support in mathematics, whether this was in-class or pull-out, and if the special education teachers participated in the planning and evaluation processes.

According to the co-teacher interviews, the teachers planned, taught, and evaluated the mathematics lessons together, and this was corroborated by the co-teacher diaries. The diaries also revealed that mostly due to teacher absence (e.g., sick days and in-service training), co-teaching occurred in 70% of mathematics classes. However, the higher t-s ratio was still maintained as substitute teachers were used, although joint planning or assessment were not carried out during these times. Around three quarters of an hour per week was used for planning and reflection, and different co-teaching models such as team- and alternative teaching (see, Friend & Cook, 2013; Gardesten, 2023) were used flexibly. The co-teachers reported being satisfied with their co-teaching partnership, overall progress, and how the lessons were implemented throughout the year. These positive experiences were also reflected in the diaries: the perceived success of lessons was high (M = 4.4, SD = 0.69). The solo teacher diaries and interviews revealed that the solo-taught classes were supported by mainly pull-out special education for approximately one hour a week, and the class teachers were responsible for planning and assessing students’ progress.

Accordingly, our fidelity measures suggest that the design and implementation was realised as intended: the co-teachers planned, taught, and reflected as dyads, while the solo teachers planned and taught on their own, with some assistance from the special education teachers.

Data analyses

Latent growth curve modelling (LGCM) within the structural equation modelling framework was used, as it provides advantages in studying changes and development in multiwave data (for an overview, see, Duncan & Duncan, 2009). In LGCM, observed variables are used to estimate latent factors that represent both the onset (initial level) and rate of change (slope) of the measured construct over time.

The analyses were carried out in five steps. First, to ensure that the measures reflect same constructs at each occasion, we conducted a stepwise procedure for testing longitudinal measurement invariance: configural (same number of factors and loading pattern over time), weak (identical factor loadings over time), strong (identical item intercepts over time), and strict (identical error variances over time) (Widaman & Reise, 1997). Second, separate unconditional univariate latent growth models for self-concept and individual interest were conducted to estimate trajectories over time. Third, an unconditional parallel process model was established to investigate the relations between the levels and changes of self-concept and interest. Fourth, the parallel process model was expanded to include predictors: teaching condition, test performance (at the beginning of sixth grade), teacher-rated mathematics grade (at end of fifth grade), and gender. Finally, the conditional model was extended to include outcome variables: test performance and teacher-rated mathematics grade at the end of sixth grade. The full model is illustrated in Fig. 2.

For evaluating the degree of measurement invariance, each model’s fit was compared to the previous one by scaled difference χ²-tests (Satorra & Bentler, 2010), and by criteria following the recommendations by Chen (2007): for testing weak invariance, a change of ≤ –0.005 in CFI, supplemented with a change in RMSEA ≥ 0.010, or a change in SRMR ≥ 0.025 would indicate noninvariance, and for strong and strict models a change of ≥ –0.005 in CFI, and a change ≥ 0.010 in RMSEA or a change of ≥ 0.005 in SRMR would indicate noninvariance. As to the latent growth curve models, model fit was evaluated using Comparative Fit Index (CFI: cutoff value > 0.95), Root Mean Square Error of Approximation (RMSEA: cutoff value < 0.06), and the Standardised Root Mean Square Residual (SRMR: cutoff value < 0.08) (Hu & Bentler, 1999).

IBM SPSS Statistics v27 and Jamovi 2.3.18 software (the jamovi project, 2022) were used for descriptive statistics, while all other analyses were conducted using Mplus statistical software 8.6 (Muthén & Muthén, 2019). We used maximum likelihood estimation with robust standard errors (MLR) for all modelsFootnote 2, and missing data (ranging from 0 to 9% across all included variables; see Table 1) were handled with full-information maximum likelihood estimation (Little’s MCAR test: χ²(53) = 59.71; p = .245).

Fig. 2
figure 2

Hypothetical model with predictors and outcomes

Results

Descriptive statistics and measurement invariance

Descriptive statistics and bivariate correlations are shown in Table 1. Both self-concept and interest demonstrated high rank-order stability over time, with between-measurement correlations ranging from 0.80 to 0.85 for self-concept and 0.78 to 0.83 for interest. Similarly, both teacher-rated mathematics achievement (r = .82, p < .001) and test performance (r = .79, p < .001) showed high intraindividual stability over time. The relatively high intercorrelations between teacher-rated grades and test performance (r = .61 – 0.67, p < .001) demonstrated a strong link between the different indicators of mathematics performance. Gender was somewhat associated with earlier (r = –.21, p = .012) and later (r = –.15, p = .064) mathematics achievement, showing girls to have slightly higher teacher-rated grades at the end of fifth and sixth grade, but not with test performance. The stepwise tests of longitudinal measurement invariance (Table 2) showed strict invariance in both self-concept and interest, thus permitting valid inferences of changes in means over time.

Table 1 Descriptive statistics and bivariate correlations for all variables
Table 2 Tests of longitudinal measurement invariance

Mean-level changes in mathematics self-concept and interest

To address the first research question, separate univariate latent growth models for self-concept and interest were estimated. The linear model for self-concept had a satisfactory fit, χ2(1) = 3.70, p = .055, CFI = 0.992, RMSEA = 0.136 ​​(90% CI: 0.000–0.295), SRMR = 0.015. However, given the patterning of observed means (Table 1), we also estimated a latent basis model, where the first and third loadings were fixed to zero and one, respectively, and the second loading was estimated freely (Grimm et al., 2011; Wang & Wang, 2020). Residual variances were fixed to be equal. Since this model fit the data better, χ2(2) = 2.62, p = .270, CFI = 0.998, RMSEA = 0.046 ​​(90% CI: 0.000–0.177), SRMR = 0.027, we chose it for the subsequent analyses. The results from this model showed a negative slope in students’ self-concept (M =−0.081, p = .077), thus indicating a small overall decline in mathematics self-concept over time (see Fig. 3). Variances of both the onset (S2 = 0.470, p < .001) and the slope (S2 = 0.141, p = .020) were statistically significant, hence demonstrating significant individual variability in the initial levels and changes over time in students’ mathematics self-concept.

The univariate linear model for interest fitted the data well, χ2(1) = 0.20, p = .653, CFI = 1.000, RMSEA = 0.000 ​​(90% CI: 0.000–0.168), SRMR = 0.005. Similar to self-concept, the results showed an overall decrease in individual interest (M =−0.154, p < .001) as well as significant individual differences in the variance of both the onset (S2 = 0.938, p < .001) and change over time (S2 = 0.135, p = .017).

Fig. 3
figure 3

Model estimated means of the developmental trajectories of mathematics self-concept and interest

Concurrent and longitudinal relationships between self-concept and interest

To investigate how the levels and changes of self-concept and interest were mutually related, an unconditional parallel process model was estimated. The model had an excellent fit, χ2(8) = 10.14, p = .256, CFI = 0.996, RMSEA = 0.043 ​​(90% CI: 0.000–0.112), SRMR = 0.025. An inspection of latent correlations (Table 3) revealed both the initial levels (r = .65, p < .001) and slopes (r = .64, p < .001) of self-concept and interest to be highly correlated. In other words, students’ mathematics self-concept and interest were strongly connected, both concurrently and longitudinally: when students’ self-concept became less positive, interest declined as well, and vice versa.

Table 3 Descriptive statistics and latent correlations for the initial levels and changes in self-concept and interest

Predictions of the levels and changes in self-concept and interest

Next, predictors (i.e., teaching condition, test performance, teacher-rated grade, and gender) were added to the parallel process model (Table 4). This conditional model fit the data well, χ2(16) = 19.27, p = .255, CFI = 0.995, RMSEA = 0.038 ​​(90% CI: 0.000–0.091), SRMR = 0.023. Only teacher-rated grade predicted positively the onset of self-concept (β = 0.61, p < .001). That is, the students who had higher teacher-rated grades at the end of fifth grade reported higher mathematics self-concept in the beginning of sixth grade. Gender had a small effect on the onset of self-concept (β = 0.27, p = .064), meaning that boys reported somewhat higher mathematics self-concept at the beginning of sixth grade. Contrary to our expectation, teaching condition was not predictive of changes in self-concept and interest.

Predictions of the outcomes

In the final step, the conditional model was expanded to include our two outcomes: teacher-rated mathematics grade and mathematics test performance at the end of sixth grade. The full model had an excellent fit, χ2(20) = 20.30, p = .440, CFI = 1.000, RMSEA = 0.010 ​​(90% CI: 0.000–0.073), SRMR = 0.019. Interestingly, the teaching condition predicted teacher-rated grade (β = 0.25, p = .008), over and above the significant effects of both previous teacher-rated grade (β = 0.62, p < .001) and test performance (β = 0.11, p = .094), but, against our assumptions, neither the initial levels nor changes in self-concept and interest predicted later mathematics performance. All effects are reported in Table 4Footnote 3.

Table 4 Standardised effects from the full model

Discussion

The purpose of this study was to investigate the role of co-teaching in the development of students’ motivation and achievement in the domain of mathematics. More specifically, we investigated how sixth graders’ mathematics self-concept and individual interest changed over the final year of primary school, how these changes were connected with each other and mathematics achievement, and, most importantly, whether these changes differed between solo and co-teaching groups.

Regarding changes in motivation, students’ mathematics self-concept and interest declined over time, which concurs with previous findings (Denissen et al., 2007; Frenzel et al., 2010; Jacobs et al., 2002; Pinxten et al., 2014; Savolainen et al., 2018). More interestingly, those changes were also highly correlated (Petersen & Hyde, 2017), meaning that the decrease in self-concept was associated with a parallel decrease in interest, and vice versa. This, in a sense, supports the view of cyclically developing relations between mathematics self-concept and interest (Marsh et al., 2005; Petersen & Hyde, 2017), although our data do not permit any inferences about their causal predominance.

Whilst our findings concerning the decline in mathematics motivation concur with previous research, it is not clear why this occurred, particularly within just one school year. It has been postulated that such a negative change may stem from normative changes associated with adolescence, the nature of mathematics as a school subject both in terms of content (e.g., cumulative learning and topics becoming increasingly more complex) and pedagogical practice (e.g., repetitive tasks with a focus on correctness), or due to more general contextual changes such as decrease in students’ sense of autonomy and relatedness (Frenzel et al., 2010; Ryan & Deci, 2017; Schiefele, 2009; Stipek, 1996; Stipek & Iver, 1989). Given that we found both overall decline and rank-order stability in mathematics self-concept and interest over time, such more systemic reasons for the changes would seem most likely. Considering that an extensive body of research has found both self-concept and interest to be consequential in learning, more attention should be paid to effective instructional practices that would counter the negative changes. In this context, it is also important to note that the COVID−19 pandemic might have played a role here. Although the last measurement took place shortly after the transition to online teaching, the concerns about the impact of the COVID−19 pandemic on schools started earlier. Despite this, however, the consistency of the observed changes suggests that the pandemic did not necessarily amplify the decline.

Despite the concurrent connections observed between teacher-rated grades and self-concept and interest that also accord with prior research (Marsh et al., 2005; Petersen & Hyde, 2017), the changes in self-concept and interest were independent of previous achievement. This contrasts with our expectations, as well as with the limited findings available reporting a link between better mathematics achievement and less steep negative changes in motivation (Denissen et al., 2007). Consequently, higher achievement at the beginning of sixth grade did not seem to guarantee sustained motivation in mathematics across the school year. This independence from preceding achievement may also point to more systemic reasons for the unfavourable changes in self-concept and interest, which implies that motivational support would be relevant for all students regardless of their skill-level. Perhaps following students’ classroom experiences more intensively (e.g., through experience sampling methodology) would reveal more about the sources of students’ declining motivation in mathematics, including their perceptions of the classroom climate and the subject itself (e.g., Neubauer et al., 2022; Talić et al., 2022).

Regarding the main theme, the role of co-teaching, we found it to be no more successful in supporting students’ declining motivation in mathematics than solo teaching. This is somewhat disappointing, given the previous findings showing both that teachers can be effective in reinforcing students’ self-concept (O’Mara et al., 2006) and interest (Kiemer et al., 2015; Renninger & Hidi, 2011; Rotgans & Schmidt, 2017), and that indirect interventions aiming to improve student-teacher interaction and pedagogical practices – such as this study – may well work (Marsh & Craven, 1997; O’Mara et al., 2006; Watson et al., 2019).

Interestingly, however, we found teaching condition to predict teacher-rated grades at the end of the school year, although not through changes in self-concept and interest as hypothesised. That is, compared to students in the solo teaching group, the mathematics grades of students in the co-teaching group improved more. This might be a valid outcome due to changes in either teaching practices or assessment, or an artefact due to experimental bias. It indeed is possible that students’ gain in achievement was true due to the efforts of co-teaching (e.g., Jones & Winters, 2022), but just not mediated by motivation, or reflected in test performance. Perhaps the students were more cognitively engaged in the co-teaching condition (Blatchford et al., 2011; Finn et al., 2003; Lochner et al., 2019), which, in turn, translated into better achievement. That these efforts were not similarly seen in test performance might be because the tests measured mostly basic mathematics skills, while teacher evaluations likely reflected a more comprehensive view into students’ mathematics achievement (see, Brookhart et al., 2016). Including a broader set of mathematics tests might thus be something to consider in future studies.

It is also possible that the relatively higher gain in co-taught students’ achievement was a consequence of improved assessment practices. The higher teacher-student ratio and the opportunity to reflect on student learning with a teaching partner might have actualised in more frequent and thorough student assessments, including the use of individual frame of reference standard, as opposed to a focus on end-of-period summative evaluation. That is, perhaps students’ improvement in the co-taught group was due to continuous evaluation of their progress that provided the students with more accurate feedback.

However, we cannot exclude the possibility that the said gain was due to a bias in assessment. Although teacher-ratings have been consistently linked with test performance and other measures of achievement, they can also be prone to subjectivity and bias, as such assessments represent various cognitive and non-cognitive factors valued by teachers themselves (Brookhart et al., 2016; Lekholm Klapp & Cliffordson, 2009). Having another teacher in class may have increased this subjectivity even further. It is thus possible that the co-teachers’ positive experiences (as revealed in their diaries) of co-teaching along with an experimental effect introduced some additional bias (e.g., confirmation bias) to their grading. Clearly more research is needed to address the validity of the observed gains as well as the possible sources of bias.

Given the above, why did co-teaching fail to counter the negative changes in students’ mathematics motivation? One possibility is that the co-teaching practices were simply not pertinent to students’ self-concept and interest, but rather promoted student engagement. It may also be that a single school year might be too short a period to have a meaningful impact on either student self-concept or interest through indirect influence. Together, these findings suggest that it might be beneficial to incorporate into co-teaching practices explicit instructional practices proven to support student motivation in mathematics (e.g., differentiating, context personalisation, and feedback) or even interventions directly targeted at self-concept and interest.

All the above also exemplifies that the nature of our study could be considered both as a strength and a weakness. While being ecologically more valid than a highly controlled experiment (Bronfenbrenner, 1977), the everyday events taking place in the classroom occurred naturally without external influence or interference. Thus, despite our fidelity measures indicating successful implementation, we had no means of ensuring what exactly happened in the classrooms, in both solo and co-teaching groups, or the extent to which the learning and teaching activities taking place over the year were influenced by factors other than we intended or were aware of. We also acknowledge that our relatively small sample size (which was mainly due to practical realities) might have limited the statistical power to detect small effects.

Nevertheless, we would consider the outcomes of this rather laborious endeavour promising from the perspective of practice, and informative from the perspective of research. Our experiences are not in disagreement with previous inferences stating that the implementation of co-teaching is resource-intensive (Cook et al., 2017) and far from straightforward (e.g., Friend, 2008; Pratt, 2014), and there is much to be learned from the present efforts, but we do consider the findings encouraging. Naturally, there is room for improvement, and many issues remain open, but in conclusion, we view co-teaching as having the potential for providing an auspicious approach to developing and supporting not only students’ learning but also teaching itself.