4.3.1 Sample
We included all countries (n = 38) who participated in TIMSS 2007 (n = 170,803 students in grade eight) and 2011 (n = 217,427 students in grade eight).
4.3.2 Constructs
School Emphasis on Academic Success (SEAS)
Teachers’ ratings formed the basis for measuring SEAS (Mullis et al. 2012). In the teacher questionnaire, teachers were asked to characterize the following five aspects within their school: teachers’ understanding of and success in implementing the school’s curriculum, teachers’ expectations for student achievement, parental support for student achievement, and students’ desire to do well in school. TIMSS used a five-point Likert scale for these questions, ranging from very low to very high. Both the scale and the questions were identical in TIMSS 2007 and TIMSS 2011.
Mathematics Education
Teachers were asked what their major or main area of study was by selecting one or more areas from a list, including for instance mathematics, physics and biology. We included the variable reflecting whether teachers’ main area of study was mathematics or not (Major).
Educational Level
The teachers were asked to rate their highest level of formal education, and the responses were coded in the ISCED system, ranging from “Did not complete ISCED level 3” to “Finished ISCED 5A, second degree or higher”.
Professional Development
The teachers were asked: “In the past two years, have you participated in professional development in any of the following? (a) Mathematics content, (b) Mathematics pedagogy/instruction, (c) Mathematics curriculum, (d) Improving students’ critical thinking or problem solving skills, and (e) Mathematics assessment.” Responses were either yes or no.
Teacher Self-efficacy
The teachers were asked: How well prepared do you feel you are to teach the following topics? They rated a number of topics within the domains Number, Algebra, Geometry and Data and Chance on a three-point Likert scale, ranging from “Not well prepared” to “Very well prepared”.
Teacher Experience
The teachers were asked: By the end of this school year, how many years will you have been teaching altogether? This question was an open ended item on a continuous scale.
4.3.3 Method of Analysis
We analyzed data at country level (n = 38). The IDB (International Database) analyzer (IEA 2012) was used to merge micro-data for TIMSS 2007 and TIMSS 2011, and then all variables were aggregated to country-level by computing means. Differences were also computed between corresponding variables for 2011 and 2007. The aggregation of data to the country level took individual sampling weights (MATWGT) into account and was conducted using SPSS 22.
Numerous analytical techniques have been devised to aid causal inference from longitudinal data, and they go under different labels, such as difference-in-differences analysis (Murnane and Willett 2010) or fixed effects regression analysis. The basic idea underlying all the different techniques is to remove the effect of all country characteristics that remain constant over time. Such characteristics are often omitted variables, and, unless their effect is removed, they will cause bias in the estimates of relations between determinants and outcomes. This can, for example, be done by taking differences between measures at different points in time. With measurement of determinants (X) and outcomes (Y) at two points in time, a very simple technique is to first compute the difference between the two outcome measures (ΔY = Y2 − Y1) and also the difference between the two measures of determinants (ΔX = X2 − X1), and then to regress ΔY on ΔX. This regression coefficient will not be influenced by country characteristics that are constant over the two time points, and it typically is very different from what is obtained from regression analyses of data from the two cross-sections. We use such an approach, but implement it in a more general and flexible form using structural equation modeling (SEM) (Bollen and Brand 2008).
In our analytical approach, we assume measurements at two time points, X and Y; Y1 is regressed on X1 and Y2 is regressed on X2, and the two regression coefficients are constrained to be equal (Fig. 4.1). The model also includes a latent variable (Com) that influences Y1 and Y2 by the same fixed amount (1.0). The Com variable captures the effect of the fixed characteristics at the two time points, and the regressions of Y on X estimate the effect of the determinant on the outcome controlling for the fixed country characteristics. Com is assumed to be uncorrelated with X1 and X2 (Fig. 4.1); this model is referred to as the random effects model for longitudinal data. This assumption need not be correct, however, and if there are reasons to believe that Com is correlated with X1 and X2, these correlations can be added to the model. If the correlations are assumed to be equally strong, the resulting model is referred to as the fixed effects model for longitudinal data. If the correlation between Com and X1 is allowed to be different from the correlation between Com and X2, the resulting model is identical with the simple ΔX, ΔY difference model described above.
The terminology is regrettably a bit confusing. The distinction between random effects and fixed effects concern different model assumptions within fixed effects regression analysis, so the different models belong to the same difference-in-differences family.
These alternative models can easily be specified and estimated with SEM software, such as Mplus (Muthén and Muthén 1998–2014). One major advantage with this technique is that it provides information about the degree to which the model fits the data. Should it be found that the restrictive random effects model does not fit data, this suggests that one of the less restrictive models needs to be used instead. However, given that a less restrictive model is less powerful than a more restrictive model, the latter is to be preferred if it fits data.
The SEM approach also provides several other advantages. It makes it possible to also impose constraints of equality on other model parameters, such as variances, covariances and residual variances. It also allows for extensions such as use of latent variables, which can be used to investigate both the construct behind a set of items, and the individual items. SEM also allows multiple group modeling, which makes it possible to investigate whether relations between determinants and outcomes differ for different subsets of countries; we use this to investigate our second research question.
However, as the number of observations by necessity is quite limited in country-level analyses, this imposes restrictions on model complexity. It is thus not possible to estimate models with more free parameters than the number of observations, and for reasons of power, models need to be kept simple. However, a small sample size need not necessarily imply that power is low, because in SEM the amount of correlation among the variables is another important determinant of power, and in country-level longitudinal models correlations tend to be high.
Another problem associated with use of SEM techniques on aggregated country level data is that the rules of thumb developed for goodness-of-fit indices do not always apply (Bollen and Brand 2008). We therefore mainly rely on the chi-square statistic in evaluations of model fit.