Introduction

Structure of academic self-concept and its operationalization

According to the model proposed by Shavelson et al. (1976), academic self-concept is one aspect of the hierarchical and multi-faceted self-concept construct. This complex structure has its origins in individual experience and is subject to an influence from the environment and significant others but, in turn, also influences the way an individual acts. Self-concept is subject to developmental differentiation, with increasing separation of specific aspects with age (Marsh 1989; Marsh and Ayotte 2003; Marsh and Shavelson 1985). Structure of academic self-concept has been found to be internally differentiated. Two uncorrelated facets—mathematical and verbal academic self-concept—are required in order to reflect properly relationships between the multiple subject-specific self-concepts and the non-specific school self-concept (Marsh 1986; Marsh et al. 1988). Of the many possible operational definitions for academic self-concept, the one that uses general school self-concept scale seems recommended when more specific measures are not available or not applicable to the curriculum without clear subject division (Marsh et al. 1988; Marsh and Yeung 1997; Szumski and Karwowski 2015).

Implications for a causal relationship between academic self-concept and achievement

Academic self-concept is demonstrably the most important predictor for many types of behaviour and belief with long-term consequences on individual future prospects and labour-market position, i.e. motivation to learn (Guay et al. 2010; Marsh et al. 2006), educational and occupational aspirations (Marsh 1991; Nagengast and Marsh 2012), coursework selection (Marsh and Yeung 1997), career intentions and choices (Guay et al. 2004), psychological well-being (Craven and Marsh 2008), life satisfaction (Chang et al. 2003) and mental health (Marsh et al. 2004). Academic self-esteem is also closely tied to academic achievement (e.g. Falchikov and Boud 1989; Hansford and Hattie 1982; Mabe and West 1982; Zell and Krizan 2014). This positive correlation does not necessarily imply causality (Marsh 1990b). Therefore, the causal ordering between these constructs poses a major research question (Byrne 1984, 1986) in the study of academic self-concept. This critical aspect has not only theoretical but also important practical implications, since intervention to enhance academic achievement through enhancing self-concept (O’Mara et al. 2006a, b) would be redundant if academic self-concept could not be proved to benefit academic achievement (O’Mara and Marsh 2006).

Self-enhancement, skill development and reciprocal effect models

A review of literature reveals three principle theoretical models for causal ordering between both constructs: self-enhancement (Calsyn and Kenny 1977), skill development (Calsyn and Kenny 1977) and the reciprocal effects model of Marsh (1990b). The self-enhancement model suggests that self-concept is the primary determinant of academic achievement Support for this model would justify interventions focused on self-concept (Marsh and Martin 2011). By contrast, the skill development model implies causative influence on academic achievement by academic self-concept. Support for this model would provide stronger justification for the claim that development of academic skills is the best means to enhance self-concept (Marsh et al. 2012).

The reciprocal effects model, in turn, assumes that academic self-concept and academic achievement are reciprocally related and mutually reinforcing, i.e. a cross-lagged relationship exists between academic self-belief and academic achievement. According to this model, improved academic self-concept would lead to better academic achievement and improved academic achievement, to better academic self-concept (Marsh and Craven 2006). In consequence, based on this model, effective intervention should simultaneously impact academic achievement and academic self-concept (Marsh and Martin 2011).

A substantial body of research, reviewed by Marsh and Craven (2006), Marsh (2007) or Rosen (2010) with supporting meta-analysis by Valentine et al. (2004), found—with only a few notable exceptions—consistent support for reciprocal effects between academic self-belief and achievement, although this effect was more commonly observed in adolescents than pre-adolescents (Marsh et al. 2007).

Age and changes in academic self-concept

Academic self-concept is subject to age-related dynamics and declines from a young age through adolescence (e.g. Eccles 1993; Eccles et al. 1993; Fredricks and Eccles 2002; Jacobs et al. 2002; Marsh 1985, 1989, 1990a; Marsh et al. 1984; Marsh et al. 1991; Marsh et al. 2005a, b; Nagy et al. 2010; Wigfield et al. 1997). Studies by Eccles et al. (1993), Marsh (1989) and Marsh et al. (1984, 1991) indicate that in the first grade of primary school, students are not only already able to evaluate their own competence but also to some extent to differentiate between the knowledge-dependent skills in various areas. There are, however, studies that suggest later development of this ability (Craven et al. 2000; Marsh 1990a; Marsh et al. 1984, 1991).

It is relevant that early child self-concept is characterised by unrealistic overestimation of personal abilities (Dweck 2002). Their perception is only marginally coming from external criteria, such as grades or test scores (Craven et al. 2000; Marsh 1985; Wigfield and Karpathian 1991). After a few years of schooling, self-concept becomes differentiated, increasingly abstract, includes more psychological descriptors and is based more strongly on social comparison with other children (Anderman and Maehr 1994). It is also more systematically related to external outcomes, i.e. parental or teacher feedback (Herbert and Stipek 2005; Marsh and Gouvernet 1989; Wigfield et al. 1997).

Development may prevent researchers from observing a reciprocal prospective relationship between achievement and academic self-concept in the younger age groups, since academic self-esteem may depend on prior achievement and not vice versa (Chapman and Tunmer 1997; Helmke 1995; Newman 1984; Skaalvik 1997; Skaalvik and Valås 1999). These results may then suggest a skill development model for younger children and a reciprocal effects model for older age groups (Fraine et al. 2007). Otherwise, some research (Guay et al. 2003; Muijs 1997; Quirk et al. 2009) even supported the reciprocal effects model for primary school pupils. In summary, only an unclear relation between these constructs for primary school children is evidenced from extensive review of the literature.

Gender effect in academic self-concept during pre-adolescence

Numerous studies report gender differences in level of academic self-concept (see Jansen et al. 2014, for review). A factor that significantly influences their nature is—in line with the multidimensional model of academic self-concept—the specific subject to which self-concept refers. In general, girls show greater self-concept verbally but less with reference to mathematics, biology and physics. It is particularly significant that girls characteristically demonstrate a greater level of general academic self-concept (Marsh 1985, 1989, 1990a; Marsh et al. 1984).

Empirical studies do not offer a uniform resolution of how gender predetermines changes in self-concept. Some indicate similar trajectories (lack of differences) for both sexes, and others point to increasing differences (for review, see Fraine et al. 2007). For general academic self-concept, findings from empirical research have also produced a somewhat equivocal pattern. Some studies conducted in primary schools (Marsh 1989; Marsh et al. 1984) reported the absence of impact of gender on changes in academic self-concept, whereas others described girls’ greater age-related decline in general academic self-concept (Marsh 1985).

Gender gap in school grades

Recent meta-analysis of gender differences in teacher-assigned grades demonstrated a small but significant female advantage (Fischer et al. 2013; Richardson et al. 2012; Voyer and Voyer 2014). Many studies showed that throughout elementary, middle and high school, girls achieved higher grades than boys in all major subjects, even though their scores were lower or equal in tests, especially maths and science (for research reviews, see Duckworth and Seligman 2006; Ekstrom 1994; Kling et al. 2013). These conclusions were also confirmed by some meta-analyses (Else-Quest et al. 2010; Hyde et al. 1990; Lindberg et al. 2010).

Gender-specific differences in educational achievements in favour of girls are observed and often described, together with the gender gap in college attendance and behaviours, as the ‘boy crisis’ in many international studies (Voyer and Voyer 2014), but for fourth grade students, the issue is more complex (Mullis et al. 2012; Perie et al. 2005). Results from the PIRLS study showed that girls outperformed boys in almost all countries. Little reduction in the reading achievement gender gap has been noticeable over the decade. However, the TIMSS study demonstrated no significant gender difference in mathematics or science achievement in the nearly half the countries it covered. Small differences in favour of boys were found in most remaining countries.

Furthermore, the largest nationally representative assessment of American student achievements (NEAP) showed that girls outperformed boys in reading at all levels, as assessed by the study. However, gaps between girls and boys were smaller in fourth grade and increased in the eighth and 12th grades. Overall, girls outperformed boys in reading and writing by greater margins than boys’ outperformance of girls in mathematics, science and geography. This gender gap at ages 9 and 13 have not much changed since 1971. In regards to gender differences in school achievement measured by teacher-assigned marks, meta-analysis (Voyer and Voyer 2014) demonstrated female advantage in all fields of study, but effect sizes were generally small. Moreover, female outperformance in terms of marks seemed stable across the years in data ranging from 1914 to 2011. Hence, discussion of the ‘boy crisis’ should concentrate more on the dynamics describing the decline in boys’ performance and improvement of girls’ performance (Mead 2006; Vail 2006).

This gender gap in school marks may have several plausible explanations (for reviews, see Burusic et al. 2012; Hadjar et al. 2014; Voyer and Voyer 2014). It is, of course, possible that grades—in line with the female underprediction effect theory (Shibley Hyde and Kling 2001)—may reflect additional cognitive abilities that are not captured in standardised tests (Conger and Long 2010), but research suggested that girls’ generally better behaviour in the classroom and differential teacher expectations lead to their higher grades (Kimball 1989; Mullola et al. 2012; Spilt et al. 2012).

Review of the limited research showed that gender did not influence decline in academic performance, as measured by the GPA (e.g. Kling et al. 2013; Wang and Eccles 2012). In studies analysing marks in English and mathematics separately, the results were more mixed. For mathematics, the difference in grades to the girls’ advantage increased with time (Downey and Vogt Yuan 2005; Shapka 2009). Otherwise, for English grades, no gender effect was observed on change (Downey and Vogt Yuan 2005). Owing to the dearth of available research and methodological inconsistencies (including age of population, incomparable sample sizes and the various approaches to measuring school grades), the summary presented above cannot be considered conclusive.

Theoretical summary

Conclusions from studies on the relationship between academic self-concept and academic achievement in primary school children were—as has been shown—varied. Some offered support for the skill development model and others for the reciprocal effects model. Such lack of consistency supports the replication of research. Investigation should not only use satisfactory statistical models (Marsh and Martin 2011) but should also test the underlying premises, related to time and group invariance (Little 2013; Little et al. 2007).

It is noteworthy that the majority of analysis in this respect did not examine mean level changes in constructs between subsequent study waves, most often being limited to determination of auto-predictive and cross-lagged relations. In effect, little is known as to whether gender differences in levels of academic achievement stem from differentiation in the dynamics of change of self-concept or vice versa, which would be helpful to recommendation of gender differentiated intervention.

Another limitation of previous studies on causal relation between academic self-concept and school achievement is reliance primarily on research from Western countries, particularly English-speaking students in Australia, Canada and the USA (Marsh et al. 2002). This overrepresentation may neglect potential cross-cultural differences, especially between individualistic and collectivistic societies (Marsh and Köller 2004). So, the reciprocal effects model requires confirmation in cross-cultural settings to prove its generalizability (Marsh et al. 2015; Marsh and Craven 2006; Wang 2006).

In the present study, we explored the relationship between academic self-concept and academic achievement and its potential differentiation with respect to gender from data in two waves of a large, longitudinal study conducted on a representative Polish primary school sample. Data was collected from children in grades 3 and 5 (aged 10 and 12). Following Marsh and Martin (2011), our analysis used multiple indicator structural equation models (SEM) (Little 2013) accounting for measurement error and so was expected to yield more accurate parameter estimates (Marsh et al. 1998).

Problems and hypotheses

The study was guided by a series of research questions, supported by hypotheses drawn from reviews of earlier work, as described above.

  1. 1.

    What is the relationship between prior academic self-concept and later academic achievement ?

  2. 2.

    How do mean levels of the academic self-concept and academic achievement change between grade 3 and grade 5?

  3. 3.

    Are levels of academic self-concept and school grades in grade 3 and grade 5 gender related?

  4. 4.

    Does gender influence mean levels of academic self-concept and academic achievement in grades 3 and 5?

  5. 5.

    Are gender effects related to the longitudinal relationship between academic self-concept and achievement (i.e. autoregressive and cross-lagged coefficients)?

The following hypotheses were set in relation to the research questions:

  1. 1.

    A reciprocal relationship should exist between academic self-concept and academic achievement. It was anticipated that baseline academic self-concept should allow prediction of achievement at 2-year follow-up (and vice versa). However, it could—in line with the developmental perspective—be that impact from previous academic achievement on later academic self-concept would be the stronger.

  2. 2.

    Both academic self-concept and achievement would tend to attrition.

  3. 3.

    In both waves of the study, girls would demonstrate higher academic self-concept and school grades.

  4. 4.

    With regard to questions 4 and 5, due to the paucity of previous related research and its lack of conclusions, no working hypotheses could be presented for gender influence on (a) change in mean levels of academic self-concept and achievement between grades 3 and 5 and (b) the longitudinal relationship between academic self-concept and academic achievement (i.e. autoregressive and cross-lagged coefficients). In this context, our study was exploratory rather than confirmatory.

Material and methods

Data source and sample

Data was drawn from the larger longitudinal study ‘School Effectiveness Research’ (Dolata 2014), investigating a representative sample of pupils from Polish elementary schools. The study used a stratified two-stage cluster sampling procedure. The strata were determined by type of urbanisation and the number of class units in a school. Within strata, schools were sampled with a probability proportional to size (number of pupils).

At time 1 (T1), third grade pupils (N = 4646; 49.7 % female; ~10-year-olds) completed a general academic self-concept questionnaire and were evaluated by teachers for academic achievement. Two years later (T2), when pupils were in the fifth grade (~12-year-olds), the same children completed the questionnaire and were re-evaluated by teachers. For this study, only participants who had both completed an academic self-concept questionnaire and been evaluated by teachers for academic performance in both grade 3 and 5 were included. The final sample for the current study was 4226 pupils from 270 class units in 157 Polish elementary schools (49.7 % female).

Instruments

Academic self-concept

To measure general academic self-concept (GAS-C), a 15-item Study Motivation (SM) subscale of the Fragebogen zur Erfassung von Dimensionen der Integration von Schülern was used (FDI 4–6; Haeberlin et al. 1989). This subscale contains nine positively and six negatively worded items (e.g. ‘I am a talented student’ or ‘Learning is difficult for me’). No item refers to specific school subjects or skills; all require assessment of one’s own general academic performance and thus the subscale fits the operational definition of school self-concept. Participants indicated their response on a 4-point scale with anchors of 1 (not true) and 4 (completely true).

The measurement of academic self-concept with only the dimension of school self-concept is aligned with the structure of the curriculum for the first three grades of the Polish school system in which there is no formal division into school subjects. In such a setting, it is reasonable to assume that at least some children might have difficulties differentiating between more specific aspects of their academic self-concept (Szumski and Karwowski 2015).

In this study, Cronbach’s alpha coefficient for the SM subscale of the FDI was 0.868 for grade 3 and 0.910 for grade 5.

Teacher-assigned marks

During the first stage of education, i.e. grades 1–3 in Polish schools, there is no formal division of schooling into subjects and end of term assessment is only descriptive. There are no formal requirements for this description, but in general, they are intended to provide an individual summary for pupils’ achievement in all subjects. However, it should not be inferred that pupils at this stage are not given formal grades during the year—such a decision and any rationale for formal grading is at the discretion of the school authority. To obtain data, otherwise not available, teachers were requested to rate each student according to a 4-point scale for Polish language and mathematics.

Teachers classified each pupil as (1) weak—has a weak grasp of the material, makes numerous mistakes and requires systematic assistance; (2) average—deals with requirements but not independent, makes mistakes and needs assistance; (3) performs well, rarely makes mistakes and requires little help; or (4)—excels in all required skills and demonstrates independence in performance of activities.

After third grade at Polish primary schools, each subject is evaluated separately and pupils obtain a final subject grade at the end of each term. Grading is on a criterion-referenced six-degree grade scale: inadequate (1), adequate (2), satisfactory (3), good (4), very good (5) and excellent (6). Grades were obtained from school records at the end of each school year. For comparison of means between the evaluations in grade 3 and grade 5, grades from grade 5 were transformed from six to four levels by combining the scale extrema (1 and 2; 5 and 6).

In further analysis, academic achievement in grade 3 and grade 5 was operationalized as an average grade for language (Polish) and mathematics, with a higher score indicating greater achievement.

Analysis

Preliminary factor analysis of the GAS-C scale

Since factor structure for the GAS-C subscale of the FDI has not yet been established, a series of confirmatory factor analysis (CFA) using the polychoric correlation matrix were used with means and variance adjusted weighted least squares estimators (WLSMV) to account for ordinal responses of GAS-C items.

Three CFA models were tested: unidimensional (M1), two correlated factors (in which all positively worded items defined the first factor and the negatively worded items defined the second factor) (M2) and bifactor models (general academic self-concept factor underlying all variables, as well as two specific dimensions: one for positively worded and the other for negatively worded items; M3). In the bifactor model, all factors were assumed orthogonal, so that domain-specific factors should have captured unique variance of items over and above the general factor, that is, variance not explained by the common factor (Fig. 1).

Fig. 1
figure 1

ac Graphical representation of models used to test GAS-C scale

Model fit was assessed by several commonly used fit indices (Byrne 2011): (1) root mean square error of approximation, RMSEA (Steiger 1990), (2) Tucker-Lewis index, TLI (Tucker and Lewis 1973) and (3) comparative fit index, CFI (Bentler 1990). A model was considered acceptable if RMSEA was less than or equal ro 0.06, and figures for CFI and TLI were close to 0.9 or greater (Marsh et al. 2005a, b; Yu 2002). It was assumed that smaller values of RMSEA and larger values of CFI and TLI would indicate a better fit in comparison of models.

Measurement invariance of the GAS-C scale

Measurement invariance is critical for any longitudinal or multi-group comparison (Millsap 2011). If measurement invariance cannot be established, then a between-time and/or group difference found cannot be interpreted unambiguously (Horn and Mcardle 1992). Following the general procedures proposed by Vandenberg and Lance (2000), this study used a sequential testing procedure (forward approach) to measure invariance. This procedure is based on estimation of a series of hierarchically nested models with increasing constraint of the measurement parameters under test. It starts with the least constrained solution (total lack of invariance) and subsequent restrictions for equality of specific parameters between times and groups are successively imposed. Resulting nested models are then tested against each other and changes to the fit indices analysed (Dimitrov 2010).

The first to be tested was a configural invariance model (M1), i.e. a model with no invariance of parameter estimates for gender and time (i.e. all parameters freely estimated). In the second step, metric invariance (M2) was estimated, restricting equal factor loadings between gender and waves (weak invariance). In the third step, scalar (strong) invariance (M3) was estimated to include restrictions from M2 with the additional constraint of an equal threshold between gender and waves. Residuals for the same items at different study times were correlated. Thus, the longitudinal invariance routine was performed according to the general recommendations of Little (2013).

In order to determine invariance for constrained parameters, following the recommendations of Meade et al. (2008), change for two fit indices between nested models was investigated: CFI and RMSEA. The relatively liberal rule was adopted that the hypothesis of the measurement invariance would be rejected, when the difference (Δ) between the more and less restricted model was lower than −0.002 and for CFI and RMSEA above 0.007 (Meade et al. 2008).

Cross-lagged panel model

In order to examine longitudinal gender relationships between academic self-concept and academic achievement, auto-regressive cross-lagged (ARC-L) modelling with latent variables and multiple indicators were used. The ARC-L model identifies the relationship between variables over time, allowing better understanding of their development. Results from ARC-L models could be used—in tandem with theory—as one element of a larger argument in favour of a causal relationship (Sellig and Little 2012).

In this model, auto-regressive and cross-lagged effects can be analysed (Geiser 2013). The auto-regressive component provides information about relative stability (continuity) of a specified construct over time or—more precisely—about how much variance of the variable X at time T is explained by the same variable measured previously. This implies stability for inter-individual but not necessarily intra-individual differences (Hertzog and Nesselroade 1987). Cross-lagged effects represent the longitudinal prediction of a chosen construct at time T based on another at time T − 1, controlling for the auto-regressive component.

In the context of the planned analysis, it was particularly important that results from ARC-L could be used to determine whether cross-lagged effects occurred in both directions (i.e. whether X 1 predicted Y 2 and Y 1 predicted X 2) or in only one direction (i.e. X 1 predicted Y 2, but Y 1 did not predict X 2) and to assess the relative strength of cross-lagged effects (i.e. X 1 predicting Y 2 and Y 1 predicting X 2, but impact of X 1 on Y 2 stronger than of Y 1 on X 2). This model allowed direct testing of the hypothesis about the relationship between prior academic self-concept and academic achievement at a later stage.

Software and estimation methods

Analysis was performed using Mplus 7.4 (Muthén and Muthén 1998–2015). Models were estimated using weighted least squares mean and variance adjusted (WLSMV) estimators to account for the ordinal responses of the GAS-C items. Since analysed data were hierarchical—children were nested in classes—the complex sample option in Mplus was used to avoid bias of standard errors and test statistics.

Results

Teacher assessments by gender and grades

Consistent with expectations, girls obtained higher average marks than boys in grade 3 (see Fig. 2). The average for boys was 2.81 (SE = 0.03; p < 0.001); for girls, it was higher at 3.02 (SE = 0.03; p < 0.01). The difference was statistically significant (ΔGPAGirls-Boys = 0.21; SE = 0.03; p < 0.01). Results also showed a significant female advantage in grade 5. For girls, M GPA in grade 5 was 2.84 (SE = 0.03) and for boys 2.44 (SE = 0.03; ΔGPAGirls-Boys = 0.40, SE = 0.03, p < 0.01). Although average marks tended to decline over time for both sexes, girls (ΔGPAGrade5-Grade3 = −0.18, SE = 0.03, p < 0.01) and boys (ΔGPAGrade5-Grade3 = −0.37, SE = 0.03, p < 0.01), a larger decrease was observed in boys. The mean between gender change in score was significant (ΔGPAChangeGrade5-Grade3 = 0.19, SE = 0.02, p < 0.01).

Fig. 2
figure 2

Violin plot showing mean marks given by teacher across gender and grade

Preliminary factor analysis of the GAS-C scale

The comparison (see Table 1) of the unidimensional, two correlated factors (in which all positively worded items defined the first factor and the negatively worded items defined the second factor) and bifactor models (general academic self-concept factor underlying all variables, as well as two specific dimensions: one for positively worded items and one for negatively worded ones) showed superiority of the bifactor model for both grade 3 and grade 5. These analyses revealed strong support for the ‘essential unidimensionality’ in Stout’s (1987) sense.

Table 1 Goodness-of-fit of the different models for grade 3 and grade 5

Time and gender invariance of general academic self-concept

Model M3 (bifactor) was used as the baseline model for testing multi-group longitudinal measurement invariance (see Table 2). The model with configural invariance showed good model fit (RMSEA = 0.024; CFI = 0.990; TLI = 0.988). The same applied to models with weak (metric) factorial invariance (model 2 in Table 2). When thresholds were constrained (i.e. assessment of strong/scalar invariance), changes in model fit indices were lower than the cutoff value (ΔCFI = −0.004), and the model deemed non-invariant. Examination of the modification indices suggested that model fit would be improved by freeing the parameter for the third threshold for item 45 in grade 3 for girls (model 3A in Table 2), for third (model 3B) and second (model 3C) threshold parameters in grade 5 for boys (also for item 45). After freeing this last threshold, changes in all model fit indices were above the cutoff value (ΔRMSEA = 0.002; ΔCFI = −0.002). By virtue of the acceptable fit of a model with partially strong/scalar factorial invariance, this model was suitable for use in subsequent analysis.

Table 2 Summary of goodness of fit for models of general academic self-concept—longitudinal measurement invariance for gender

Longitudinal change for general academic self-concept by gender

Based on the partially invariant bifactor GAS-C model, the latent mean analysis approach estimated the differences between general academic self-concept in boys and girls over time (see Fig. 3). The level of academic self-concept in grade 3 for girls was estimated at 0.02 (SE = 0.03; p = 0.64), compared to 0 for boys (in grade 3). From this, it should be inferred that general academic self-concept in grade 3 did not vary significantly between boys and girls. In grade 5, the average GAS-C for boys was −0.15 (SE = 0.03, p < 0.01) and for girls −0.22 (SE = 0.04, p < 0.01). For both gender groups, self-concept showed a general downward tendency. At the same time, grade 5 girls are characterised by a slightly lower level of general academic self-concept (ΔGAS-CGirls-Boys = −0.07, SE = 0.03, p = 0.03). This result means that although mean GAS-C tended to decline with time for girls and boys, a larger decrease for girls was observed.

Fig. 3
figure 3

Path diagram (standardised values) for the auto-regressive cross-lagged panel model assuming partially strict measurement invariance for gender and time (see model 3C in Table 2)

Cross-lagged model for academic achievement and general academic self-concept

The multi-group cross-lagged model with autoregressive effects based on the partially scalar invariance bifactor model (see model 3A in Table 2) fitted the data well (RMSEA = 0.026; CFI = 0.984; TLI = 0.984).

Testing for difference parameters to distinguish girls from boys (see last column in Table 3) failed to demonstrate a difference for unstandardised values for auto-regressive, cross-lagged and between-time correlation coefficients (for all p > 0.05). Therefore, all the ‘structural’ parameters in ARC-L model appear similar for either gender.

Table 3 Selected parameters of the ARCL model

Analysis indicated that between grade 3 and grade 5, there was a demonstrable, significant impact from earlier academic self-concept on later academic self-concept (girls β = 0.41; boys β = 0.48). Also, academic achievement as measured by teachers in school marks assigned in grade 3 significantly influenced marks in grade 5 (for girls β = 0.70; for boys β = 0.65). A stronger auto-regressive effect was observed for grades than for self-concept. This means that school marks are more stable over time than academic self-concept (see Table 3 and Fig. 3).

Marks obtained in grade 3 significantly influenced self-concept, as measured in grade 5 (girls β = 0.34; boys β = 0.29). The positive value of the coefficient indicates that achievement of better marks resulted in increased self-concept. Also, earlier self-concept showed a relationship with later academic achievement (for girls β = 0.09; for boys β = 0.14). The higher the self-concept was in grade 3, the greater the academic achievement 2 years later. However, cross-lagged effects are clearly greater for the impact of marks on academic self-concept than vice versa. It is important to note that all dependencies described so far were not gender dependent.

Discussion

Interpretation of results

The first objective of this study was to investigate the causal longitudinal relationship hypothesised between general academic self-concept and academic achievement. Consistent with prior studies and expectations (hypothesis 1), both constructs—after controlling for significant auto-lagged effects—were found to be reciprocally influenced over time. Further, the results extended prior research by demonstrating that cross-lagged paths from prior academic achievement to subsequent self-concept were stronger than the paths from academic self-concept to academic achievement (e.g. Chapman and Tunmer 1997; Hoge et al. 1995; Muijs 1997).

The relationship between prior achievement and subsequent academic self-concept requires comment in the context of the grading system at the early stages of education in Poland. In general, two mechanisms are offered to explain the influence of school grades on academic self-concept. One is that grades form the basis for pupils’ social comparisons. The second treats grades as a source for feedback with regards to ability (Marsh 1984). However, teacher ratings in grade 3 were not available to students and, as such, could not, directly, fit either role required by these explanations. To demonstrate any influence on academic self-concept, the ratings must correlate with something assuming that role. Some hypotheses are proposed here for this relationship.

It is clear that teachers based their ratings on an image projected by overall academic performance. This image might be construed with the help of the formal grade system employed by teachers during the year. In this case, the grade system might perform such a role, as demanded by the mechanisms, previously mentioned. Since some type of formal grade system is applied in most schools, this might suffice as explanation for the observed relationships. An alternative view, however, can be postulated.

In line with the Shavelson model of self-concept, grades might form just one source of feedback for students (Shavelson et al. 1976). Research shows that instructions or feedback, similar to marks, influence pupil self-concept and thereby motivation to study and form part of the learning process (cf. Lazarides and Ittel 2012; Wentzel et al. 2010).

It is then possible that academic self-concept is dynamically reinforced through experience of daily school life. In this case, feedback during lessons, as well as the dynamics of student–teacher interaction, might play some formative role for academic self-concept. The observed relationship between teacher ratings (and, more generally, grades) would then complement the mechanism described. The extent to which ratings and grades are effective in this role would influence their validity as predictors of academic self-concept. Unfortunately, the data analysed here does not allow verification of this hypothesis.

The results of the present study also support the hypothesis that both academic self-concept and achievement decline with time (hypothesis 2). Attrition of academic self-concept confirmed the results of studies indicating that children in the first grades of primary school are characteristically overoptimistic in terms of academic self-concept (Wigfield et al. 1997). This is related to their limited ability to discriminate between qualities they actually possess and those considered desirable (Lindberg et al. 2013), in addition to ego-centrism (Marsh 1985), which hinders capacity to exploit social comparison mechanisms for accurate self-evaluation (Anderman and Maehr 1994; Harter 1999).

During academic development, academic self-concept becomes more accurate (Wigfield et al. 2006). It is increasingly performance-oriented and based on frequent competence-related feedback. However, at the same time children become increasingly aware of their abilities and can also evaluate quality of their performance without an external feedback (Dweck 2002). At the age of 10–12, many children start to identify their skills with grades obtained and identify failure with a low level of skill (for research review, see Dweck 2002).

Only limited support, however, was found for the influence of gender hypothesised for changes to mean levels of academic self-concept or achievement between grades 3 and 5 (hypothesis 3). It was predicted that at both ages, girls would demonstrate higher academic self-concept and school grades than boys. In line with the assumptions, girls characteristically obtained higher mean grades in both grades 3 and 5. Unexpectedly, though, we found that the mean level of girls’ academic self-concept did not vary from boys’ in grade 3 and it decreased more rapidly with time. This finding is even more surprising, since no gender effects related to the longitudinal relationship between academic self-concept and academic achievement were observed (i.e. autoregressive or cross-lagged coefficients). This points to the active presence of gender-related determinants for academic self-concept.

It is possible that, for boys, academic self-concept is more strongly related with marks for subjects outside those analysed and that this relationship performed a compensatory role. In particular, it might be expected that there are school subjects in which boys are rewarded with higher grades than girls and which preferentially influence their academic self-concept (e.g. sport). Simultaneously, the extent of decrease in academic self-concept in girls may be influenced by factors similar to those responsible for a greater decrease in general female self-concept at this age (e.g. changes related to puberty; Robins and Trzasniewski 2005). Alternatively, marks obtained by boys and girls might be accompanied by differentiated feedback from teachers. In order to observe such an effect, boys would have to receive information that boosts their academic self-concept relative to girls.

Research implications

In different countries, formal evaluation of students (i.e. grading systems) is introduced at various stages. Assuming that grades awarded by teachers shape academic self-concept, it would be relevant to check whether earlier introduction of a grading system encourages more rapid self-concept crystallisation.

If confirmed, this hypothesis would explain why results of certain studies in countries where introduction of school grading is ‘delayed’ resonate more with the skill development model than the reciprocal effects or the self-enhancement models (e.g. Skaalvik 1997; Skaalvik and Valås 1999). In this context, the stronger impact of previous school grades on subsequent self-concept rather than vice versa, as found in our study, might be explained as symptomatic of education system structure.

Further, collection of observational data is recommended to explain the mechanism of translation of grades to self-concept as well as to address the lack of gender differences with respect to average levels of self-concept. Observation should focus on detailed logging of instructions and feedback from teachers.

Practical implications

Impact of grading on student self-concept, in spite of being indirect, indicates that pupils evaluate their own skills based on other information transmitted by teachers during classes. Therefore, for reason of practical implications, the significance of feedback in shaping school children’s academic self-concept should be strongly emphasised.

The primary task for teacher should be to supply feedback to pupils in a manner that does not damage self-concept. Strengthening self-concept potentially improves motivation to learn and involvement with the learning process. As emphasised by Hattie and Timperley (2007), efficient feedback implies providing students with specific information about what they do right and wrong and what requires improvement or is achieved differently than previously. It is necessary to avoid negative social comparison of accomplishment and to emphasise achievement on a time scale as well as to reinforce positive peer relations and pro-social behaviour in class (Manning 2007).

Limitations

The current study was subject to certain limitations:

Firstly, grades were the only measure of academic achievement referenced by the study. The present research did not investigate how more objective measures of academic achievement, such as standardised test scores, might influence academic self-concept. Secondly, academic achievement in grades 3 and 5 was only measured for pupils’ native Polish language and mathematics, by grade average. In future studies, it would be desirable to increase the number of domains evaluated for academic achievement.

Thirdly, the study measured general academic self-concept rather than subject-specific academic self-concept. Future work might focus on academic self-concept for specific school subjects.

Fourthly, results relied only on two study waves. Future research should include data from more stages of education. Consideration of more than two waves of study would allow more complex statistical methods (Reinecke and Seddig 2011), e.g. the latent growth curve model (Duncan and Duncan 2004; Bollen and Curran 2006). Finally, our study did not consider potential moderators or mediators in the relation between academic self-concept and academic achievement, e.g. motivation to learn, self-efficacy, locus of control, attribution of successes and failures, implicit intelligence theories (cf. Wigfield et al. 2006) or level of optimism (Wesson and Derrer-Rendall 2011).

Despite these limitations, we anticipate that this study will shed more light on the longitudinal relationship between general academic self-concept and academic achievement with particular regard to gender-specific effects.

Final remarks

Analysis revealed that between grades 3 and 5 in Polish primary schools, pupil attainment, as measured by marks awarded by teachers, and academic self-concept both declined. This effect was observed for both boys and girls. Further, academic achievement and academic self-concept were reciprocally influenced over time, but influence from prior achievement on self-concept was stronger.

Such a set of relations between the two constructs suggests that delaying formal assessment, in the form of teacher-assigned marks, might lead to slower crystallisation of academic self-concept. Whether such effect is more beneficial in the long or short term—either for academic achievement or personal development—remains a separate issue.

The present study suggests that even if pupils are unaware of their scores which, therefore, do not permit social comparison, assessment does influence academic self-concept. This implies that information about pupil performance is transmitted by the teacher through alternative channels, e.g. feedback. This highlights the importance of the pupil–teacher interaction in the context of academic self-concept research. From the teacher perspective, such interaction might establish the approach adopted for assessment of skills, which then translates into a mark. From the pupil view, feedback is both a foundation and an important factor for development of academic self-concept.

This study highlights the need for special focus on the role and quality of teacher feedback, and its importance as an issue which should be covered during teacher training. This should respect feedback’s potentially diverse influence on academic self-concept for boys and girls. The ultimate goal should be pupil–teacher interaction to enhance each pupil’s academic self-concept and perseverance, providing motivation to embrace active learning and greater educational achievement.