Abstract
Research related to the “teacher characteristics” dimension of teacher quality has proven inconclusive and weakly related to student success, and addressing the teaching contexts may be crucial for furthering this line of inquiry. International largescale assessments are well positioned to undertake such questions due to their systematic sampling of students, schools, and education systems. However, researchers are frequently prohibited from answering such questions due to measurement invariance related issues. This study uses the traditional multiple group confirmatory factor analysis (MGCFA) and an alignment optimization method to examine measurement invariance in several constructs from the teacher questionnaires in the Trends in International Mathematics and Science Study (TIMSS) 2015 across 46 education systems. Constructs included mathematics teacher’s Job satisfaction, School emphasis on academic success, School condition and resources, Safe and orderly school, and teacher’s Selfefficacy. The MGCFA results show that just three constructs achieve invariance at the metric level. However, an alignment optimization method is applied, and results show that all five constructs fall within the threshold of acceptable measurement noninvariance. This study therefore presents an argument that they can be validly compared across education systems, and a subsequent comparison of latent factor means compares differences across the groups. Future research may utilize the estimated factor means from the aligned models in order to further investigate the role of teacher characteristics and contexts in student outcomes.
Introduction
Teacher quality: context and comparability
Internationally, teachers have been cited as the most important schoollevel determinant of academic success (DarlingHammond 2000; Hattie 2003; Rivkin et al. 2005; Kyriakides et al. 2013; Nilsen and Gustafsson 2016a, 2016b). However, despite decades of research, there is still considerable debate over the importance of particular teacher characteristics. Research on teacher characteristics varies widely, and ranges from beliefs about intelligence and learning, selfefficacy, job satisfaction and motivation, to workload, and stress (Goe 2007). This study will operate from the theoretical framework defining teacher characteristics within the teacher quality construct outlined by Goe (2007). According to this review, changeable characteristics or teacher “attributes and attitudes” form part of the input dimension of teacher quality. While teachers are considered crucial for student outcomes, evidence on the importance of teacher characteristics is weak or conflicting. A myriad of studies conducted using international largescale assessment data have found mixed results (Goe 2007; Nilsen and Gustafsson 2016a, 2016b; Toropova et al. 2019). With this in mind, Goe (2007) recommends that more research on teacher characteristics be conducted with a particular focus on the teaching context.
International largescale assessments (ILSAs) such as the Trends in International Mathematics and Science Study (TIMSS) are well positioned to answer such questions through information collected in the contextual questionnaires for students, teachers, and principals. While such studies have advanced global educational accountability and contributed valuable knowledge regarding determinants of student outcomes, they have also sparked questioning over the validity of crossnational comparison (Oliveri and von Davier 2011; Biemer and Lyberg 2003). Underlying the contextual questionnaires is the muchdebated assumption of scale score equivalence or measurement invariance (MI). Issues related to MI often prevent researchers from answering important substantive questions, which entail comparing latent factor means and the relationships among latent variables across countries or time. The main reason for the concern over measurement invariance in crossnational comparison is the difficulty involved with measuring psychological traits or constructs across cultures, as cultural factors may influence how respondents interpret and answer such questions. Several scholars have argued that TIMSS is superior to other ILSAs regarding the potential to examine teacher characteristics due to the systematic collection of data directly from teachers. TIMSS is also the only ILSA to link students and teachers directly. Despite this, research on teachers in international largescale assessments is often limited to comparisons of relationships between variables because of the failure to reach scalar invariance across countries (Nilsen and Gustafsson 2016a, 2016b). Such questions which have not yet been investigated involve comparisons of latent construct means in teacher questionnaires across education systems or their subgroups. Certain teacher characteristics may matter in some contexts and not in others (Strong 2011). For instance, teachers have been shown to be especially important for lowachieving and socioeconomically disadvantaged students, and especially in mathematics (Goe 2007; DarlingHammond 2000; Rivkin et al. 2005). Equally, the context (i.e., school, country, or educational system) may predict the teacher characteristics themselves due to differences in system level characteristics or educational policies. Taken together, mean comparisons and subsequent connection to student outcomes may have important insight into teacherrelated policies which researchers have been largely unable to investigate.
As will be discussed in the following section, the alignment optimization method outlined by Asparouhov and Muthén (2014) provides one possible resolution to this problem as well as an empirical basis for investigating such contextual questions. This study will utilize this method and examine measurement invariance in five scales of the teacher background questionnaires in TIMSS 2015. These constructs fall under the category of “teacher characteristics, beliefs, and attributes” according to Goe’s (2007) framework, but vary in their scope. Job satisfaction (JS) refers to how satisfied teachers are with their employment and their plans for continuing to teach in the future. School emphasis on academic success (SEAS) refers to teachers’ perceptions of the academic climate and emphasis on academics of other teachers at their school, and Safe and orderly school (SOS) refers to the teacher’s general feelings of safety and organization at their workplace. School condition and resources (SCR) refers to the teacher’s perceptions of their access to teaching resources and how well the school is maintained. Last, teacher’s Selfefficacy (TSE) refers to the teachers’ perceptions of their confidence and ability to teach mathematics (for more on TSE, see Raudenbush et al. 1992).
The present study applies the alignment method as an exploratory tool to examine measurement invariance in the latent constructs from the teacher questionnaires in TIMSS 2015 across educational systems. Our paper is both content and method focused. Our intention is to provide researchers in comparative education—particularly those interested in teacher effectiveness—with one possible starting point for tackling questions which remain unanswered due to issues surrounding measurement invariance and crossnational comparison. The paper seeks to answer the following research questions:

(1)
What is the level of configural, metric, and scalar invariance of the teacherrelated constructs in the teacher background questionnaires of TIMSS 2015 across educational systems?

(2)
Within these constructs, which indicators display the highest level of noninvariance in the teacherrelated constructs? Is there a statistical basis for making comparisons of these constructs educational across systems?

(3)
Based on the newly constructed group mean values, which education systems have the lowest and highest levels of the teacherrelated constructs?
Approaches to measurement invariance and a review of past literature
MI (Jöreskog 1971; Mellenbergh 1989; Meredith 1993) refers to the assumption that latent constructs and their relations should be unrelated to group membership, and is one of the main challenges of working with ILSA data (Gustafsson 2018). Within the traditional multiple group confirmatory factor analysis (MGCFA) approach, several levels of MI are tested, beginning with the configural or baseline model. In order to confirm configural invariance, factors must be equally configured under a similar variancecovariance structure across groups. Next, factor loadings (regression slopes) are compared; if loadings are similar across groups, metric invariance is achieved. This implies that each indicator is related to its underlying latent variable with a similar gradient. Scalar invariance is the most restricted form of MI and requires regression intercepts to be equivalent, in addition to latent structures and factor loadings. In scalar invariance, the same regression line should be able to estimate the relationship between an indicator and the latent variable for all groups. The three forms of MI build successively upon each other, representing a growing degree of invariance. Violating the assumption of MI results in constraints that inherently limits how researchers may interpret and relay their findings in a comparative context. As meeting the scalar MI assumption is very rare, occasionally, “researchers just ignore MI issues and compare latent factor means across groups or measurement occasions even though the psychometric basis for such a practice does not hold” (van de Schoot et al. 2015, p 1). More cautious approaches avoid comparing constructs altogether. Either scenario may be problematic in the context of ILSA research, given its relevance and potential for educational policy and reform.
There are several conceptual and methodological recommendations for managing MI. Rutkowski and Rutkowski (2010, 2013, 2017) propose the possibility that “one size might not fit all” and that scales be constructed with differing cultural conceptions in mind. A more moderate and early solution comes from Byrne et al. (1989) in partial measurement invariance, which allows intercepts and loadings of individual items to be tested. Following this approach, the majority of scholars recommend basing the types of comparisons on the level of invariance confirmed (i.e., configural, metric, or scalar), and this undoubtedly leads to smaller number of constructs being investigated due to their failure in reaching full invariance. Schulz (2016) argues that focusing only on constructs and variables that are highly similar in terms of measurement may lead to a narrowing in the scope of international studies. Generally, partial measurement invariance is a practical assumption in ILSA research, where invariance at the scalar level is rarely confirmed. However, scholars have debated whether the traditional MGCFA approach to partial measurement invariance is the most “simple or interpretable” solution (for more detail, see Marsh et al. 2018 and Asparouhov and Muthén 2014).
A more recent approach, an alignment optimization method, has been proposed (Asparouhov and Muthén 2014). Alignment optimization allows for invariance of individual items to be tested, for scales to be reformulated in order to take noninvariance into consideration, and to create a more flexible threshold for measurement invariance. Schulz (2016) writes, “the question is also at what point lack of measurement invariance becomes problematic and leads to problematic bias in crossnational surveys” (p. 15). The alignment method (Asparouhov and Muthén 2014) undertakes this question. This method has certain advantages over other approaches to MI. Traditionally, MI is tested using MGCFA at each constraint of the latent factor model, with groups defined by unordered categorical variables (van de Schoot et al. 2015). This approach requires that invariance levels be tested sequentially and for each item, which can result in hundreds of tests. Moreover, such tests can result in inaccurate results if multiple groups are present or if sample sizes are large (Asparouhov and Muthén 2014; Rutkowski and Svetina 2014). The traditional approach to MI also assumes that full measurement invariance can be achieved, which may be an “unachievable ideal” when the number of groups is large (Marsh et al. 2018; Asparouhov and Muthén 2014). Unlike MGCFA, alignment as outlined by Asparouhov and Muthén (2014) does not assume MI, but identifies a result which minimizes parameter invariance across groups through an iterative process analogous to the rotation in an exploratory factor analysis. Several studies have investigated measurement invariance using the alignment method with promising results as an alternative to MGCFA. Munck et al. (2018) investigated MI across 92 groups by country, cycle, and gender using civic education data and found that despite significant noninvariance in some groups, comparison of group mean scores had a statistical basis, and that attitudes toward civic engagement across countries and time could be validly compared. Similarly, both Marsh et al. (2018) and Lomazzi (2018) employ the alignment method to test MI of gender role attitudes across countries.
Much attention has been paid to the phenomenon of MI in the student background questionnaires, but much less in teacherrelated constructs (Caro et al. 2014; Schulz 2016; Segeritz and Pant 2013 He et al. 2018; Rutkowski and Svetina 2014). Nevertheless, some studies have investigated measurement invariance in teacher background questionnaires using traditional approaches. Examining teacher selfefficacy, Vieluf et al. (2013) find evidence supporting metric equivalence, while Scherer et al. (2016) also find evidence for metric but not scalar invariance. Taking a different approach, Zieger et al. (2019) use multiple pairwise mean comparison to teacher job satisfaction in TALIS, whereby they identify the comparability of countries based on such pairs. Similarly to MGCFA, this approach grows in cumbrousness alongside the number of groups in focus. Despite a growing awareness of the potential of the alignment method, application of this approach in investigating the measurement invariance of latent constructs related to teachers and teacher quality is still rare. Our search was able to produce a single study published just this year. Zakariya et al. (2020) examined teacher job satisfaction in TALIS, and also found no evidence for scalar invariance. Extending their analysis to include an alignment optimization approach, they found that teachers in Austria, Spain, Canada, and Chile had the highest mean job satisfaction compared to the other countries in the sample. Our analysis does not use the same sampling procedure as TALIS, as TIMSS focuses on teachers as they represent students in a country. Additionally, our results apply only to mathematics teachers, unlike the results of studies looking at all teachers using TALIS data. As such, it will be especially interesting to compare our results to those of Zakariya et al. (2020) and other past studies.
Methods
Data and measurement
TIMSS is a curriculumbased survey, which tests mathematics and science achievement for students in grades 4 and 8 around the world. TIMSS employs a twostage stratified sampling procedure and samples whole classrooms as well as schools. Additionally, responding to the teacher context questionnaires is mandatory. Student data can therefore be aggregated to the teacher level (Eriksson et al. 2019). TIMSS uses a crosssectional design and is conducted every 4 years. This study consisted of 46 education systems included in the TIMSS 2015 survey. There was a total sample size of 13,508 grade 8 (or equivalent) mathematics teachers. In the total sample, 36.8% of teachers were male and 56.6% female; while 2.7% were under 25, 12.5% were between 25 and 29, 29.9% were between 30 and 39, 24.7% were between 40 and 49, 18.9% were between 50 and 59, and 4.7% were above the age of 60 (6.6% had no response). In total, 42 separate countries participated, but in some cases, subregions of countries were included, such as Buenos Aires in Argentina, Ontario and Quebec in Canada, and Dubai and Abu Dhabi in the United Arab Emirates (UAE), the term “education system” will be used interchangeably with country, system or group. Norway included cohorts from two grades. However, aside from the regions previously listed, the majority of the groups are representative of countries. Table 1 describes each education system and its respective sample size.
Several teacherrelated constructs from the teacher questionnaire were included in the analysis: Teacher Job satisfaction and Selfefficacy, teacher perception of School emphasis on academic success, School condition and resources, and Safe and orderly school. Indicators and coding for each construct can be seen in Table 2.
Each of the constructs included a varying number of indicators. For School emphasis on academic success, only 5 out of a total of 17 indicators were used; as the remaining indicators did not relate to teachers, they were excluded. All indicators were included for each of the other 5 constructs. Coding varied from frequencydimensions (i.e., “Very often” to “Never or almost never”) to agreement (i.e., “Agree a lot” to “Disagree a lot”) and more general ratings (i.e., “Very high” to “Low”).
Alignment optimization
As we have previously discussed, there are three levels of measurement invariance: configural, metric, and scalar. In order to compare latent variable means and variances across subgroups, scalar invariance is required (Millsap 2011). However, this assumption (i.e., equal factor loadings and indicator intercepts across subgroups) often fails. Moreover, the likelihood ratio chisquare testing for each parameter very quickly becomes cumbersome, especially when many subgroups are being compared. The alignment approach does not assume MI and “can estimate the factor mean and variance parameters in each group while discovering the most optimal measurement invariance pattern. The method incorporates a simplicity function similar to the rotation criteria used with exploratory factor analysis” (Asparouhov and Muthén 2014, p. 496). It estimates a factor score for all individuals despite the presence of significant noninvariance in some groups. Alignment starts with estimating such a configural model with groupvarying factor loadings and intercepts to latent variable indicators and the factor mean and variance. Consider a configural MGCFA model, written as:
Here, v_{pj} is the intercept of an indicator p in a group j, λ_{pj} is the factor loading of the indicator i in the group j_{,} η_{j} is the latent variable for group j, and ε_{pj} is the residual for indicator p in group j. In this model, the latent variable mean is fixed to zero and the latent variable variance to 1:
As a second step, the fixed factor mean and variance are set free. Normally, this model would be unidentified. The alignment method, however, constrains the parameter estimation through imposing restrictions to optimize the simplicity function F. As is shown in Eq. 3, the sum of the component loss function for the factor loadings and intercepts of every latent variable indicator p between any pair of groups weighted by their group sizes^{Footnote 1} should be minimal.
The alignment approach estimates the latent variable mean and variance for each pair of groups in such a way that the parameter estimates are optimized to produce the minimal total amount of noninvariance across groups. This procedure leads to a great number of parameters that have no significant noninvariance across groups and a few being largely noninvariant. Significant differences are tested by zstatistics (for a more detailed description of the algorithm, see Asparouhov and Muthén 2014; Muthén and Asparouhov 2018).
The aligned model produces an alignment optimization metric (Ametric) with some useful statistical information for determining measurement invariance of the latent variable across groups. The first important piece of information is the amount of groups that has no significant differences in each intercept and factor loading. The order of the factor mean and the groups who hold the minimum and maximum intercept and factor loading for each factor indicator are also given in the alignment results. In addition, an Rsquare, measuring the degree of invariance of the intercept and factor loading of each factor indicator is estimated in the model.
As is shown in Eqs. 4 and 5, v_{0} and λ_{0} are the intercept and factor loading estimates from the configural model and v and λ are the average intercept and factor loading estimated from the aligned model. The R^{2} “tells us how much of the configural parameter variation across groups can be explained by variation in the factor means and factor variances (Muthén and Asparouhov 2018, p. 643). An R^{2} value close to one indicates a high degree of measurement invariance and close to zero indicates high noninvariance.
Mplus detects missing patterns in the data sets and provides full information maximum likelihood (FIML) estimates of the missing data through the EM algorithm. It also should be noted that all models in the study were estimated with the COMPLEX option implemented in Mplus to account for the nonindependency of the students and teachers caused by the cluster sampling design in TIMSS (Muthén and Muthén 19982017).
Analytical process
The current analysis was done stepwise. All analyses were conducted using Mplus software 8.3 (Muthén and Muthén 19982017). In the first step, a singlefactor measurement model was estimated for each of the teacherrelated constructs with pooled data. These singlefactor measurement models were modified by adding correlated residual terms suggested by the modification indices to get acceptable model fit. The significantly correlated residuals indicate that there are common variances between the pairs of residuals, suggesting some narrow dimensions in addition to the single latent factor. In the current study, we are only interested in precisely measuring the general factor, with no narrow residual factors being specified. With the pooled model structure as the point of departure, the conventional MGCFA models of different teacherrelated factors were conducted, and model fit indices of the configural, metric, and scalar invariance models were compared for each of the constructs. Based on these comparisons, conclusions of the MI were reached. In the next step, the alignment approach was tested to the degree of measurement invariance of the teacherrelated constructs, as mentioned above. The results of the two MI approaches are compared, and the advantages and disadvantages of the two are discussed. In order to check the reliability, a Monte Carlo simulation was done to further test whether the conclusion about measurement invariance based on the aligned model results of the constructs is trustworthy.
Results
Results from the MGCFA approach
A singlefactor measurement model was fitted to each of the teacherrelated constructs with the pooled data of all 46 education systems. These singlefactor measurement models, however, did not fit the data well. Modification indices suggested the inclusion of one or more correlated residuals to improve the model fit. These modified singlefactor model structures were used to test the measurement invariance across the 46 groups in the conventional approach. Table 3 presents the model fit indices of the configural, metric, and scalar MI models for all teacherrelated constructs.
The configural models of all the latent constructs in Table 3 show acceptable or close model fit, with the Root Mean Square Error of Approximation (RMSEA) and Standardized Root Mean Square Residual (SRMR) being below .08, and comparative fit index (CFI) and TuckerLewis index (TLI) being greater than .95 (see, e.g., Hu and Bentler 1999). Three out of the seven teacherrelated factors (teacher perception of School emphasis on academic success, School Condition and Resources, and teacher’s Selfefficacy) reached metric invariance, which implied that the factor loadings of each of the three latent constructs were equal across all educational systems, but not the intercepts of the latent construct indicators. It may also be observed that none of the scalar MI models fits the data, indicating that the assumption that both intercepts and factor loadings be equal across the 46 systems cannot be held true.
With the traditional measurement invariance approach, the restricted MI assumption (scalar invariance) has been proven false. Additionally, metric invariance was only found in three latent constructs. Consequently, crosscountry comparisons cannot be made with the latent variable means as well as the relationships among the latent variables. Given these results, the next section will aim for an approximate partial measurement invariance (e.g., Millsap and Kwok 2004) by using the alignment approach (Muthén and Asparouhov 2014).
Results from alignment optimization
Alignment optimization explores partial (approximate) measurement invariance by starting out with a wellfitting configural model. It then adjusts the factor loadings and intercepts of the factor indicators in such a way that these parameter estimates should be as similar as possible across groups without compromising the model fit. Essentially, the fit for the aligned model stays the same as the configural invariance model. In this section, the aligned model results for each of the seven teacherrelated factors will be presented.
Job satisfaction
Table 4 presents the results from the aligned modeling approach for the latent construct JS. The highest Rsquare of the intercept estimate is observed for the variable My work inspires me. About 87% of the variation in the intercept observed in the configural model can be explained by the variation in latent variable mean and variance in the aligned model, indicating a high degree of invariance. Morocco is the only noninvariant country in the intercept estimate of the indicator I am proud of the work I do. This variable together with the indicator I am enthusiastic about my job also displayed a rather high Rsquare. I am content with my profession as a teacher and My work inspires me hold completely invariant factor loading estimates across all systems. For the variables I am enthusiastic about my job, and I find my work full of meaning and purpose, a large number of groups with invariance in the intercept estimates are also observed, ranging from 44 to 46 educational systems. The variable I am going to continue teaching as long as I can holds the least invariant intercept with the Rsquare being the lowest, 44%. For the factor loadings, the indicator I am proud of what I do is the least invariant, with an Rsquare of 23%.
Countries with extreme parameter estimates can be found in columns 4 to 7. For example, South Korea holds the lowest intercept estimates in My work inspires me, while CanadaOntario has the lowest factor loading estimate. In general, the overall degree of invariance of the construct JS is rather high, with few education systems showing measurement noninvariance in the factor loadings, complying with the close fit for the metric invariance model in Table 3. The average invariance index is 58% for JS. The percentage of significant noninvariance groups is 8.9%, much lower than the limit of 25% suggested by Muthén and Asparouhov 2014. A higher number of groups show invariance in the factor loadings of each of the indicators as compared to the intercepts.
Teacher perception of school emphasis on academic success
Five indicators are used to identify the latent construct of school emphasis on academic success, and the results from the aligned model of SEAS are presented in Table 5.
For factor loading estimates, all five indicators to the construct School emphasis on academic success showed complete invariance over the 46 countries. This agrees with the model fit indices for the metric invariance model in Table 3. For the intercepts, only two countries are noninvariant for the indicator Teachers’ degree of success in implementing the school’s curriculum, corresponding with the high Rsquare estimate 73%. The intercept of Teachers’ expectations for student achievement holds the most variation, with only half of the countries being invariant. The minimum and maximum estimates of the intercept and factor loadings can be found in columns 4 to 7. Only 7.8% of groups have been observed with significant noninvariance. In general, the high degree of confidence indicated by the average invariance index of .65 implies that the mean of the construct SEAS can be compared meaningfully across the different groups.
Teacher perception of school conditions and resources
Table 6 shows the results of approximate invariance from the aligned model of the school condition and resources.
As revealed in Tables 6 and 4 indicators, The school building needs significant repair, Teachers do not have adequate instructional materials and supplies, The school classroom needs maintenance work, and Teachers do not have adequate support for using technology have invariant factor loadings across all education systems. Only Lithuania is noninvariant in the factor loadings for the variables Teachers do not have adequate workplace and Teachers do not have adequate technological resources. The Rsquare for these indicators also showed a high degree of invariance, being above 60%. However, one exception can be observed for the variable The school building needs significant repair, for which the Rsquare is 29%, despite showing complete invariance across all groups. For the intercept estimates, the number of noninvariant systems in each indicator ranges from 4 for the variable The school classroom needs maintenance work (Rsquare = 82%) to 10 for the variable Teachers do not have adequate workplace (Rsquare = 57%). These results were also confirmed by the conventional measurement invariance results, where metric invariance was achieved for the SCR construct but not scalar invariance (see Table 3).
The average invariance index for the construct SCR was 62%, indicating 62% confidence to carry out trustworthy crosssystem comparisons. The total noninvariance measure is 8.39%, below the limit of 25%.
Teacher perception of safe and orderly school
Among the 8 indicators of the latent construct Safe and orderly school (Table 7), The students behave in an orderly manner, The students respect school property, and The students are respectful of the teachers are completely invariant in the factor loadings over the 46 countries. The Rsquare estimate for the factor loading of these three variables is around or above 70%, implying that approximately 70% or above of the variation in the factor loadings estimated in the configural model can be explained by the factor mean and variance across the groups. For these three variables, the standard deviation of the parameter mean is also smaller, compared to those of other indicators. The lowest Rsquare for the factor loading is observed in the indicator The school is located in a safe neighborhood (29%), relating to a larger variation (see column 3 under SD).
Students respect school property holds the highest Rsquare (i.e., 83%) for its intercept estimate, only Lebanon is nonvariant. The lowest Rsquare is found in the indicator The school’s rules are enforced in a fair and consistent manner (35%). The number of countries with noninvariance intercept ranges from 1 and 13. From the model fit indices of the conventional measurement invariance model, metric invariance is supported and was confirmed by the aligned model.
In sum, the parameter estimates of the latent variable model reached 58% confidence to make reliable acrosscountry comparison and the percent of significant noninvariance for education systems is only 9.8% over all estimated parameters.
Teacher’s selfefficacy
Aligned model results for selfefficacy can be seen in Table 8. The intercept estimates show the indicator Developing students’ higherorder thinking skills as the most invariant, with an Rsquare of about 90%. Here, only four educational systems show measurement noninvariance and the variance in the estimated mean intercept is rather small. The intercept estimate for indicator Making mathematics relevant to students also holds a high Rsquare (86%). Improving the understanding of struggling students and Assessing student comprehension of mathematics show the lowest Rsquare values, implying a high degree of noninvariance. This is also confirmed by the higher standard deviations in column 3. Over ten educational systems show noninvariance for these two indicators. Columns 4 to 7 present the education system with the minimum or maximum estimate of the intercepts.
The number of educational systems with invariant factor loadings for the TSE constructs is higher than that of the intercepts. Developing students’ higherorder thinking skills, Improving the understanding of struggling students, Providing challenging tasks for the highest achieving students, and Adapting my teaching to engage students’ interest are completely invariant over all 46 education systems. The factor loading estimate for Inspiring students to learn mathematics has the highest number of noninvariant systems (5).
In general, the average invariance index was rather high for all estimated parameters in the aligned model and a low proportion of significantly noninvariant groups. We, therefore, have 57% confidence to make meaningful comparisons of the means and variances of teacher selfefficacy.
Monte Carlo simulation
As recommended by Asparouhov and Muthén (2014), Monte Carlo simulations were conducted in order to check the quality of the alignment results of the five teacherrelated factors. These simulations used parameter estimates from the alignment models as datagenerated population values. For each of the teacherrelated factors, two sets of simulations were run with 100 replications, 46 groups, and two different group sample sizes (500 vs. 1000). Table 9 shows the correction between the generated population values and estimated parameters.
The correlations in Table 9 are the average of the correlation between the population factor mean (or factor variance) and model estimated factor mean (or factor variance) of the 100 replications. These correlations generally are very high, most of which are .98 or above, with the average correlation higher than the factor variance. However, relatively low correlations also are observed for the simulations based on 500 group sample size, for example, .95 for the average correlation of the factor variance in Job satisfaction and .96 in teacher perception of School emphasis on academic success. These correlations tend to get higher when the group sample size is increased to 1000. Asparouhov and Muthén (2014) suggested a level of .98 for these correlations to be able to confirm reliable alignment estimates, and a correlation below .95 may be cause for concern. The current simulations therefore suggest that to a great extent the aligned results for the teacherrelated constructs are highly reliable for crosscountry comparison, despite some noninvariance among education systems. It can be noted that the aligned models work better when the group sample size is higher, implying an asymptotic accuracy in the alignment results under maximum likelihood estimation.
Average estimates of intercepts and factor loadings across invariant groups
Table 10 presents the weighted average estimates of factor loadings and intercepts across all invariant groups in each teacherrelated construct. These weighted mean values are common for the invariance education systems, and only apply to those invariance systems. The number of such systems can be found in the column next to the weighted mean of intercepts and factor loadings.
As is shown in Table 10, the highest average intercepts for teacher’s Selfefficacy, for example, is observed on its indicator Providing challenging tasks for the highest achieving students (v = 1.616)—and the lowest on Helping students appreciate the value of learning mathematics (v = 1.375). The average factor loading was highest for Developing students’ higherorder thinking skills (λ = .495), indicating that this indicator forms an important part of the construct of selfefficacy in teaching mathematics.
Comparing estimated latent variable means of the teacherrelated constructs
Latent variable means of all teacherrelated latent constructs that were estimated for the 46 education systems by the aligned model (see Appendix Table 11). Groups can be compared based on these factor means.
Teacher job satisfaction
The latent variable mean of teacher job satisfaction is based on indicators concerning teachers’ feelings of contentment with the profession as a whole, their current school, their enthusiasm and pride in their work, and their intention to continue teaching. According to the estimated mean of JS in Fig. 1, students in Japan, Singapore, England, Hong Kong, and Hungary have mathematics teachers with the highest level of job satisfaction as compared to other education systems in TIMSS 2015. Students in Italy, Lithuania, Sweden, South Korea, and New Zealand also have mathematics teachers with relatively low levels of job satisfaction. By contrast, in Chile, Qatar, Thailand, Argentina (Buenos Aires), Kuwait, Oman, Israel, Lebanon, Malaysia, and the United Arab Emirates, students have mathematics teachers who are the least satisfied with their job.
Teacher perception of safe and orderly school
Broadly, SOS refers to whether teachers feel the schools are located in a safe neighborhood and feel the students are respectful. The latent variable mean of SOS is shown in Fig. 2. The results indicated that students in Botswana, South Africa, Morocco, Turkey, Japan, Italy, Slovenia, South Korea, Sweden, and Jordan had mathematics teachers with the highest levels of perceived school safety. In Argentina (Buenos Aires), Ireland, Kazakhstan, Norway, UAE, Lebanon, Qatar, Singapore, Hong Kong, and Lithuania, students had mathematics teachers with the lowest levels of feeling as though the school was orderly and safe.
Teacher perception of school conditions and resources
SCR refers to school infrastructure, whether teachers have adequate workspace and instructional materials, and whether the school environment is well taken care of. Results for latent mean comparisons can be found in Fig. 3. Students’ mathematics teachers in Botswana, South Africa, Turkey, Morocco, Saudi Arabia, Egypt, Jordan, Armenia, Malaysia, and Iran reported the highest levels of satisfaction with school conditions and resources. In UAE, Singapore, and Bahrain, students’ mathematics teachers reported the lowest perceptions of SCR.
Teacher perception of school emphasis on academic success
SEAS is indicated by teachers’ perceptions of whether teachers understand schools’ curricular goals, their success in implementing the curriculum, their expectations for student achievement, and their ability to inspire students. Latent variable means are presented in Fig. 4. Recall that SEAS is reverse coded so countries with the lowest levels show the highest mathematics teacher perceptions of SEAS. Students in Italy, Japan, Russia, Hong Kong, Chile, Hungary, Sweden, Norway, Turkey, and Thailand have mathematics teachers who report the highest levels of SEAS. In Qatar, Malaysia, Oman, Ireland, Canada, South Korea, UAE, Bahrain, and Kazakhstan, students generally have mathematics teachers who report the lowest levels of school emphasis on academic success.
Teacher selfefficacy
Latent variable means for TSE are found in Fig. 5. Teacher selfefficacy is measured by teachers’ feelings of capacity to inspire students in mathematics, show students a variety of problemsolving strategies, adapt their teaching to engage students, make mathematic relevant, and develop higherorder thinking skills. In Japan, Hong Kong, Singapore, Chinese Taipei, Thailand, Iran, Morocco, New Zealand, Sweden, and England, students have mathematics teachers who report the highest levels of selfefficacy in teaching mathematics. In Qatar, UAE, Bahrain, Lebanon, Oman, Argentina (Buenos Aires), Slovenia, Kazakhstan, and Botswana, students have mathematics teachers with the lowest levels of selfefficacy to teach mathematics.
Discussion and concluding remarks
Seeking an optimal alternative to assess measurement invariance of the teacherrelated constructs across multiple countries, the current study compared the more restricted traditional MI approach with an alignment optimization method. With TIMSS 2015 data from 46 countries as the empirical basis, the results confirm the initial position of this study. In the traditional MI approach, the level of metric invariance was only reached for three constructs, namely, teacher perception of School emphasis on academic success, School condition and resources, and teacher Selfefficacy. This result implied a limited comparability across countries restricted to the associations between these constructs and other variables being studied. The quest for furthering crossnational comparability is a worthwhile and essential endeavor in the largescale international studies.
In this study, the purpose of the alignment optimization method is to justify previously unanswerable questions related to group mean comparisons. Scalar invariance was not reached for any of the teacherrelated constructs, signifying that under the traditional MI framework, latent factor means could not be validly compared in any case. The results from the alignment optimization approach, however, have demonstrated a different picture, since it takes into account the partial invariance in the parameters of each latent variable indicator and identifies the most optimal measurement invariance pattern when assessing comparability (Asparouhov and Muthén 2014). Departing from the configural invariance models, the current study found a low number of indicators in each construct and country with significant noninvariance. Despite this, all five constructs fell below the noninvariance threshold of 25% suggested by Asparouhov and Muthén (2014). In general, the Monte Carlo simulations confirm the reliability of the majority of the alignment results, with some caution around Job satisfaction and School emphasis on academic success. These results give valuable information about the specifics of what contributes most to scalar noninvariance. Indeed, the indicatorbyindicator results may be more informative of cultural and societal differences across the constructs than traditional MI approaches.
It was noteworthy that the teacher Selfefficacy construct in particular reached acceptable invariance level, as the cultural comparability of selfefficacy has long been the subject of inquiry in teacher quality literature (see Scherer et al. 2016; Vieluf et al. 2013). The current findings support those of Scherer et al. (2016) in suggesting that teacher Selfefficacy is a construct that can be generalized across cultures. The results for teacher Job satisfaction are more difficult to compare with previous research, as the construct for teacher job satisfaction in TIMSS greatly differs from that in TALIS. In TALIS (2013), the construct includes regretting becoming a teacher, whether teachers would make the same decision if they could decide again, whether they wonder if it would have been better to choose another profession, and the advantages of being teacher outweigh the disadvantages (Zakariya et al. 2020). By contrast, the TIMSS teacher job satisfaction construct includes pride and enthusiasm for the job, ability to feel inspired, intention to continue teaching, and satisfaction with the profession as a whole and with working at the current school. However, both Zakariya et al. (2020) and Zieger et al. (2019) found statistical grounds to compare the construct across some countries. Zieger et al. (2019) present a more conservative approach, however, recommending that comparisons with Chile, Shanghai, Mexico, and Portugal were unreliable. In the current study, only Chile overlaps as an education system with these countries. Interestingly, this is the country in our research which differs the most with previous research. Zakariya et al. (2020) found that teachers in Chile reported among highest levels of job satisfaction compared to other countries, while we find that students in Chile have mathematics teachers with some of the lowest levels of job satisfaction. Perhaps this is a reflection of math teachers differing from other teachers, or perhaps this is a reflection of a more serious issue of comparability. As mentioned JS was a construct that displayed some reliability concerns in the Monte Carlo simulation. This caution may be reflected by other investigation recommendation caution around crosscultural comparisons of teacher JS (Pepe et al. 2017; Zieger et al. 2019). There is little empirical research on MI and the other constructs, including teacher’s perceptions of School emphasis on academic success, Safe and orderly school, and School conditions/resources. The results of this study, therefore, provide the first evidence for the potential of comparability for the majority of these constructs.
Several insights came out of simple observations of the resulting factor mean scores. First, it was possible to detect which countries are on the higher or lower ends of the constructs. As mentioned, Japan, Singapore, England, Hong Kong, and Hungary had the highest levels of mathematics teacher job satisfaction, with Qatar, Chile, Kuwait, Thailand, and Argentina (Buenos Aires) reporting the lowest level of mathematics teacher JS. Interestingly, countries with students with mathematics teachers who reported the highest levels of job satisfaction also tend to be among the top performers in mathematics in 2015 (for Singapore, Japan, and Hong Kong in particular). However, more research needs to be done to investigate the relationship between these newly constructed means and student outcomes. Our results for teacher job satisfaction differ vastly from those of Zakariya et al. (2020). Given the differing sample (we are focused on mathematics teachers only), as well as entirely different indicators of job satisfaction, as well as different countries included, however, this is not so surprising. In addition, recall that TIMSS samples teachers as representative of students in a country, while TALIS samples teachers as representative of teachers in a country. We are more interested in the former for this paper, as our ultimate interest in crossnational comparison is the comparison of educational contexts of students. For TSE, similar patterns emerged, with top mathematics performers Japan, Singapore, Hong Kong, and Chinese Taipei taking the top ranking positions. Middle Eastern countries such as Qatar, UAE, and Bahrain reported the lowest levels of TSE. The Japanese sample also displayed the highest level of selfefficacy, contradicting the oft discussed cultural tendency in Japan to avoid selfenhancement (Takata 2003). The other constructs did not have such similarly evident clusterings of countries, such as the contrast between East Asian countries (who tended to report higher levels of job satisfaction and selfefficacy) and those situated in the Middle East (who tended to report lower levels of most constructs). It was possible to detect a small group of African countries (Botswana, Morocco, and South Africa) which tended to report high levels of both satisfaction with school conditions and resources as well as perceptions of safety and orderliness in the school. Future research can investigate these differences in more detail and investigate potential hypotheses as to why they exist.
This study has some limitations. As mentioned by Munck et al. (2018), differentiating sources of bias from each other (i.e., method bias related to the instrument versus construct bias, see Schulz 2016) is not possible with this method. Next, interpreting the importance of the noninvariance of individual indicators (as compared to the final average invariance index) is not straightforward. Determining the ultimate degree of comparability rests on the total alignment score. In our study, the Monte Carlo results for JS and SEAS fell below the recommended threshold when the N was reduced to 500, indicating potentially unresolvable issues with the comparability of these constructs. Last, there are some important potential limitations of the alignment optimization method itself which call into question its usefulness as an alternative to the traditional MGCFA approach. Svetina et al. (2016) write that “this sort of latent variable standardization implies that the latent variables are not on the same scale, and as a result, cannot be compared” (p. 128). They and other authors argue that it should be used as primarily an exploratory approach. We believe such an exploratory approach is extremely useful in the context of international comparison. Particularly in the case of research on teacher characteristics, where certain questions continue to be ignored because of obstacles related to the issue of MI.
We believe the significance of the present study outweighs its limitations. First, it demonstrates and supports the possibilities of applying the proposed method to the field of comparative psychological and educational research. Next, as mention extensively throughout this paper, it presents ways for ILSA researchers to investigate previously unanswered questions related to group mean comparisons of latent constructs. Alignment can be applied to assess the comparability of a myriad of other studentrelated or schoolrelated constructs. It has implications for policyrelated research, given that system level factors may be related to group mean scores. Last, it has important implications for future research investigating the importance of teacher characteristics for student outcomes.
We have several recommendations for future research regarding this method. First, as mentioned, differences in group mean scores and those in the individual indicators can give us important information about cultural differences which should not necessarily exclude their comparison. Future research can investigate such differences with potential cultural conceptions in mind. Next, policymakers should pay attention to countries which consistently score high on constructs reflecting teacher job satisfaction, selfefficacy, and their working environments. Such countries include Japan, Singapore, Hong Kong, and Chinese Taipei. Similarly, there is much to be learned about countries which consistently score low, such as many countries in the Middle East. Such differences may be attributable to differences in teacher resources and teacherfocused policies. We are also interested in particular in the question of the role of teacher characteristics in student outcomes. Researchers may also use this method to examine first whether JS, SEAS, SCR, SOS, and TSE are comparable across TIMSS surveys, and then to examine changes in teacher characteristics across the last two decades for countries. We can also recommend more in depth comparisons of teacher characteristics across subgroups in participating countries, such as student from disadvantaged socioeconomic backgrounds. Ultimately, further investigations of such questions would yield more insight into the potentially contextdependent aspect of teacher characteristics as they relate to student achievement.
The purpose of international largescale assessments is to examine differences in educational systems across countries. However, as noted by Scherer et al. (2016) in much public policy research “there is a preoccupation with crosscultural differences rather than of crosscultural generalizability” (p. 4). Herein lies the paradox of research with international largescale assessments. ILSA and comparative education research necessitate that education systems have differences—but their differences almost never comply with the restrictive statistical rules necessary for crosscountry comparison. Although not without limitation, the method outlined in this paper provides one way forward. The growing number of studies using this method suggests possible changes in the future of largescale assessment research, and scholars are extending its capacity (Marsh et al. 2018). According to Munck et al. (2018), the alignment optimization method can “update existing databases for more efficient further secondary analysis and with metainformation concerning measurement invariance” (p. 687). Measurement invariance has become a problem that all comparative education researchers must eventually face, either by making illfounded comparisons or avoiding latent factor mean comparisons altogether. This method exists as one promising way that largescale assessment research may reach its full potential for influencing policy and educational reform.
Notes
\( {w}_{j_1,{j}_2}=\sqrt{N_{j1}{N}_{j2}} \), N_{j1}and N_{j2} is the sample size of group j_{1} and j_{2}.
References
Asparouhov, T., & Muthén, B. (2014). Multiplegroup factor analysis alignment. Structural Equation Modeling: A Multidisciplinary Journal, 21, 495–508. https://doi.org/10.1080/10705511.2014.919210.
Biemer, P. P., & Lyberg, L. E. (2003). Introduction to survey quality. Hoboken, NJ: John Wiley & Sons.
Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: the issue of partial measurement invariance. Psycholical Bulletin, 105, 456–466. https://doi.org/10.1037/00332909.105.3.456.
Caro, D., SandovalHernandez, A., & Lüdtke, O. (2014). Cultural, social and economic capital constructs in international assessments: an evaluation using structural equation modelling. School Effectiveness and School Improvement, 25, 433–450.
Eriksson, K., Helenius, O., & Ryve, A. (2019). Using TIMSS items to evaluate the effectiveness of different instructional practices. Instructional Science, 47, 1–18.
Goe, L. (2007). The link between teacher quality and student outcomes: a research synthesis. National comprehensive center for the teacher quality.
Gustafsson, J. E. (2018). International largescale assessments: current status and ways forward. Scandinavian Journal of Educational Research, 62, 328–332.
Hattie, J. (2003). Teachers make a difference: what is the research evidence? In Paper presented at the Australian Council for Educational Research Annual Conference on Building Teacher Quality, Melbourne.
He, J., BarreraPedemonte, F., & Bucholz, J. (2018). Crosscultural comparability of noncognitive constructs in TIMSS and PISA. Assessment in Education: Principles, Policy & Practice, 26, 369–385.
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55. https://doi.org/10.1080/10705519909540118.
Jöreskog, K. G. (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109–133.
Kyriakides, L., Christoforou, C., & Charalambous, C. Y. (2013). What matters for student learning outcomes: a meta analysis of studies exploring factors of effective teaching. Teaching and Teacher Education, 36(143), 152.
Lomazzi, V. (2018). Using alignment optimization to test the measurement invariance of gender role attitudes in 59 countries. Methods, Data, Analyses, 12, 77–104.
Marsh, H., et al. (2018). What to do when scalar invariance fails: the extended alignment method for multigroup factor analysis comparison of latent means across many groups. Psychological Methods, 23, 524–545.
Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525–543. https://doi.org/10.1007/BF02294825.
Millsap, R. E. (2011). Statistical approaches to measurement invariance. New York, NY: Taylor & Francis Group.
Millsap, R. E., & Kwok, O. (2004). Evaluating the impact of partial factorial invariance on selection in two populations. Psychological Methods, 9(1), 93–115. https://doi.org/10.1037/1082989X.9.1.93.
Munck, I. M., Barber, C. H., & TorneyPurta, J. V. (2018). Measurement invariance in comparing attitudes toward immigrants among youth across Europe in 1999 and 2009: The alignment method applied to IEA CIVED and ICCS. Sociological Methods & Research, 47, 687–728.
Muthén, B., & Asparouhov, T. (2014). IRT studies of many groups: the alignment method. Frontiers in Psychology, 5, 978. https://doi.org/10.3389/fpsyg.2014.00978.
Muthén, B., & Asparouhov, T. (2018). Recent methods for the study of measurement invariance with many groups: alignment and random effects. Sociological Methods & Research, 47, 637–664. https://doi.org/10.1177/0049124117701488.
Muthén, L. K., & Muthén, B. O. (19982017). Mplus user’s guide (8th ed.). Los Angeles, CA: Muthén & Muthén.
Nilsen, T., and Gustafsson, J.E. (2016a). The impact of school climate and teacher quality on mathematics achievements: a differenceindifferences approach. (pg 81–95). In Teacher quality, instructional quality and student outcomes: relationships across countries, cohorts and time. Springer International Publishing.
Nilsen, T., and Gustafsson, J.E. (2016b). Teacher quality, instructional quality and student outcomes: relationships across countries, cohorts and time. Springer International Publishing.
Oliveri, M. E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53, 315–333.
Pepe, A., Addimando, L., & Veronese, G. (2017). Measuring teacher jobsatisfaction: assessing invariance in the teacher job satisfaction scale (TJSS) across six countries. European Journal of Psychology, 13, 396–416.
Raudenbush, S. W., Rowan, B., & Fai Cheong, Y. (1992). Contextual effects on the selfperceived efficacy of high school teachers. Sociology of Education, 65, 160–167.
Rivkin, S. G., Hanushek, E. A., & Kain, J. (2005). Teachers, schools and academic achievement. Econometrica, 73, 417–458.
Rutkowski, L., & Rutkowski, D. (2010). Getting it ‘better’: the importance of improving background questionnaires in international largescale assessment. Journal of Curriculum Studies, 42, 411–430.
Rutkowski, D., & Rutkowski, L. (2013). Measuring socioeconomic background in PISA: one size might not fit all. Research in Comparative and International Education, 8, 259–278.
Rutkowski, D., & Rutkowski, L. (2017). Improving the comparability and local usefulness of international assessments: a look back and a way forward. Scandinavian Journal of Educational Research, 62, 354–367.
Rutkowski, L., & Svetina, D. (2014). Assessing the hypothesis of measurement invariance in the context of large scale international surveys. Educational and Psychological Measurement, 74, 31–57.
Scherer, R., Jansen, M., Nilsen, T., Areepattamannil, S., & Marsh, H. W. (2016). The quest for comparability: studying the invariance of the teacher’s sense of selfefficacy (TSES) measure across countries. PLoS One, 11, 1–29.
Schulz, W. (2016). Reviewing measurement invariance of questionnaire constructs in crossnational research: examples from ICCS 2016. Australian Council for Educational Research. Paper prepared for the Annual Meeting of the American Educational Research Association, Washington D.C.
Segeritz, M., & Pant, H. A. (2013). Do they feel the same way about math? Testing measurement invariance of the PISA “Students’ Approaches to Learning” instrument across immigrant groups within Germany. Educational and Psychological Measurement, 73, 601–630.
Strong, M. (2011). The highly qualified teacher: what is teacher quality and how do we measure it? New York, NY: Teachers College Press.
Svetina, D., Rutkowski, L., & Rutkowski, D. (2016). Multiple group invariance with categorical outcomes using updated guidelines: an illustration using Mplus and the lavaan/semTools packages. Teacher’s Corner, 111–130.
Takata, T. (2003). Selfenhancement and selfcriticism in Japanese culture: an experimental analysis. Journal of CrossCultural Psychology, 34, 542–551.
Teaching and Learning International Survey (TALIS). (2013). Technical report. Paris: OECD Publishing.
Toropova, A., Johansson, S., & Myrberg, E. (2019). The role of teacher characteristics for student achievement in mathematics and student perceptions of instructional quality. Education Inquiry, 10, 1–25.
van de Schoot, R., Schmidt, P., De Beuckelaer, A., Lek, K., & ZondervanZwijnenburg, M. (2015). Editorial: measurement invariance. Frontiers in Psychology, 6, 1–5. http://dx.doi.org/10.3389/fpsyg.2015.01064.
Vieluf, S., Kunter, M., & van de Vijver, F. J. (2013). Teacher selfefficacy in crossnational perspective. Teaching and Teacher Education, 35, 92–103.
Zakariya, Y. F., Bjorkestol, K., & Nilsen, H. K. (2020). Teacher job satisfaction across 38 countries and economies: an alignment optimization approach to a crosscultural mean comparison. International Journal of Educational Research, 101, 1–10.
Zieger, L., Sims, S., & Jerrim, J. P. (2019). Comparing teachers’ job satisfaction across countries: multiple pairwise measurement approach. Educational Measurement: Issues and Practice, 38, 75–85.
Funding
Open Access funding provided by University of Gothenburg. This project has received funding from the European Union’s Framework Programme for Research and Innovation Horizon 2020 (20142020) under the Marie SkłodowskaCurie Grant Agreement No. 765400.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Glassow, L.N., Rolfe, V. & Hansen, K.Y. Assessing the comparability of teacherrelated constructs in TIMSS 2015 across 46 education systems: an alignment optimization approach. Educ Asse Eval Acc 33, 105–137 (2021). https://doi.org/10.1007/s11092020093482
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11092020093482
Keywords
 TIMSS
 Measurement invariance
 Alignment optimization
 Teacher characteristics