Measures of equity and efficacy
One of the highly important topics in contemporary educational research is the inequality among students, which remains a major issue for many regions. In this regard, Andrés Strello, Rolf Strietholt, Isa Steinmann, and Charlotte Siepmann identify three types of inequality, namely dispersion inequality, social inequality and educational adequacy, and show the effect of early tracking on different types of inequality. Interestingly, the authors point out that “inequalities” usually connote injustice, but they also claim that dispersion inequality may be regarded as an acceptable outcome. Their conclusions, as well as their identification of these three types of inequality, contribute not only to further research but also to policy evaluation. For example, a key finding is that early tracking largely contributes to social inequality.
How would you measure academic resilience?
The issue of inequality in education is closely connected to academic resilience, which refers to students’ capacity for high performance despite their disadvantaged background. Although most researchers using data from international large-scale assessments (ILSAs) define academic resilience with two criteria – student background and achievement – their conceptualisations and operationalisations vary substantially. In their systematic review, Wangqiong Ye, Sigrid Blömeke, and Rolf Strietholt identify 20 ILSA studies applying measures of socioeconomic status and achievement, different approaches to setting thresholds and consequently, different classifications of individual students as resilient or non-resilient. In their paper, they discuss the validity of these different definitions while showing how the classification of students as resilient depends heavily on the economic context where the students grow up. Moreover, significant interactions with gender and language background call for further research. The authors conclude that strong attention should be paid to the way that academic resilience is operationalized to avoid misfitting inferences. Additionally, they suggest using relative country-specific thresholds in defining students as disadvantaged or high achievers to overcome the risk of making the definition mainly dependent on the countries’ developmental states.
Can students be good in all subjects?
When analysing student performance in ILSAs, in previous research, it was concluded that students varied in their overall performance levels across all subject domains but not in their individual performance profiles (e.g., Bergold et al. 2017; Wendt and Kasper 2016). In contrast, in their study, Olesya Gladushyna, Rolf Strietholt, and Isa Steinmann demonstrate that there are students who possess subject-specific strengths and weaknesses, and at the same time, there are those with similar scholastic performance across subject domains. The authors argue that traditional CFA and latent profile analysis (LPA) approaches have certain methodological limitations and propose using a factor mixture analysis (FMA) model to combine the advantages of both approaches. Indeed, the FMA has a better fit to the data than CFA and LPA models. The authors conclude that the choice of the methodology to analyse student performance is crucial because different methods lead to different results. The main finding of their study is that student performance is more than just general intelligence.
DIF and what it means for ILSA
The issue of comparability in measures of educational achievements across countries is the focus of the paper written by Edwin Cuellar Caicedo, Ivailo Partchev, Robert Zwiser, and Timo Bechger. They argue that measurement non-invariance (i.e., DIF) in ILSA should not only be regarded as a problem but as a potential source of interpretable information in the analysis of differences among educational systems. The authors propose methods to investigate and visualize measurement invariance when a large number of groups are involved (in their illustrative example, countries participating in PISA 2012), and they suggest a form of residual analysis after the dominant component has been removed. Their proposed multivariate techniques can be easily replicated. The analytical approach supports identifying biclusters of countries and items to reveal potentially interesting structures. This strategy connects the spirit of DIF analysis with classical methods of detecting DIF. Hence, their paper provides a methodological contribution, motivated by their belief that proper analysis of DIF may lead to more actionable insights in education.
Race for rankings or a wild-goose chase
Probably, the most popular outcome of international assessments constitutes country rankings. For these rankings to be meaningful, the parameters of test items must be equivalent across participating countries. However, Hüseyin H. Yıldırım argues that it is very difficult, if not impossible, to reach item parameter equivalence in international assessments based on theories describing the culture and human-cognition relationship. It is a well-established finding that test items in international assessments may function differentially across countries. However, the general belief is that such problems may arise from only a few items and among a few countries. It is also assumed that these problems can be avoided if test items are adapted appropriately across countries. However, using the TIMSS 2015 data set, the results of Yıldırım’s study show that this may not be the case. In international assessments, the non-equivalence of item parameters may be a general and inevitable consequence of cultural differences among countries, which calls for further research (see the previous contribution by Edwin Cuellar Caicedo and colleagues). From the comprehensive evidence presented, it is suggested that the current attention to country rankings in the international reports should be redirected to more informative and more useful outcomes to improve educational systems.
Learning by doing: Practice effect in language tests
In contrast to the assumptions made in standard measurement models used in large-scale assessments, student performance may change during the test administration. Andrés Christiansen and Rianne Janssen use an explanatory item response theory framework to analyse item position effects in the 2012 European Survey on Language Competences. Their analysis reveals consistent item position effects for listening but not for reading. More specifically, item difficulty decreases for a large subset of items along with item position, which is known as the practice effect. This practice effect differs among regions but is not related to the test administration mode. As the practice effects are substantial, it seems advisable to include them in the measurement model. Moreover, few educational measurement studies have been able to find practice effects; on the contrary, fatigue effects are commonly found throughout ILSAs. The authors contribute ideas for further research on position effects and their possible consequences for researchers’ and policymakers’ understanding of achievement scores in ILSAs.
Tracking half-century trends in mathematics achievement
In a series of studies on the relation between education and economic growth, Hanushek and Woessmann (see, e.g., 2011, 2012, 2015) based their cognitive outcomes on achievement measures from large-scale assessments by calculating standardized scores for all countries on all assessments. They used the US National Assessment of Educational Progress (NAEP) to link various ILSAs to the same scale using the mean-sigma method. Their approach is based on the assumption that the samples within educational systems are comparable across studies and over time. However, Erika Majoros, Monica Rosén, Stefan Johansson, and Jan-Eric Gustafsson make no such assumptions; instead, their analysis takes into account some variations in both the indicators of mathematics achievement and the comparability of the samples from the participating countries over time. The authors apply a more rigorous linking approach based on the item response theory, where the trait score estimates and their corresponding standard errors are independent of population distributions (Embretson and Reise 2000; Strietholt and Rosén 2016). Thus, they are able to link mathematics achievement using the population of eighth-grade students from the four countries (England, Israel, Japan and the United States) that participated in all assessments from 1964 to 2015, thereby achieving comparable scores over a 50-year period. Their study contributes not only a more well-founded trend scale but also a valuable time perspective. Both should be used in further research to include more countries and to better address issues of stability and change in educational achievement.
Which countries have the happiest teachers?
Research related to the “characteristics” dimension of teacher quality (including self-efficacy, job satisfaction and perceptions of work environments) has proven this factor’s inconclusive or weak relation to student achievement (Goe 2007; Nilsen and Gustafsson 2016). Using data from TIMSS 2015 and multiple group confirmatory factor analysis (MGCFA) with an alignment optimization approach outlined by Asparouhov and Muthen (2014), Leah Natasha Glassow, Victoria Rolfe, and Kajsa Yang Hansen investigate teacher-related characteristics and perceptions of work contexts across countries using the newly constructed latent means of mathematics teacher job satisfaction, self-efficacy, perceptions of school academic climate, perceptions of school conditions and resources, and perceptions of school safety and organization. Particularly interesting results are found for teacher job satisfaction and self-efficacy, where clear geographical patterns emerge in some cases. Teacher job satisfaction and mathematics teacher self-efficacy tend to be higher in East and Southeast Asian countries, such as Japan, Singapore, Chinese Taipei and Hong Kong, and lower in Middle Eastern countries at the bottom of the achievement rankings, such as Qatar, Oman, UAE, Lebanon and Kuwait. Ultimately, in this paper, the authors demonstrate an approach to how educational researchers can tackle previously unanswerable substantive questions through new methodological advancements. Future research can make use of the newly constructed means for further secondary analysis or build on this research to examine teacher characteristic means across subgroups of students within countries.
Muddy waters: Non-participation in international assessments
One of the leading ILSAs in education is the PISA, which claims to put robust measures in place to ensure that the final sample from each participating nation is a true representation of its 15-year-old population. Jake Anders, Silvan Has, John Jerrim, Nikki Shure, and Laura Zieger provide a case study of one “educational superpower” country (Canada), discussing how various issues with the quality of its PISA 2015 data bring into question its status as one of the highest-performing educational systems worldwide. The authors point out how various biases can emerge in the PISA sample and show how the Canadian PISA data fail to meet some of the key quality criteria set by the Organization for Economic Co-operation and Development (OECD; such as vastly exceeding the number of permissible student exclusions). The authors thus conclude their paper by offering some constructive suggestions on how this element of the PISA study could be improved in the future. It should be noted that Canada is not the only country that has not met quality standards for samples; for example, the USA has not met the standards in any PISA cycle so far. Nevertheless, the data from Canada, the USA and other countries with high non-response rates are being used repeatedly, cited and used for far-reaching recommendations (“superpower” and so forth). Even before the publication of this special issue (i.e., after the first publication online), there has been heated debate involving the authors, another stakeholder and the editors. We look forward to continuing such discussions to expand our knowledge of the integrity of the PISA and other international studies.