Intersectionality is a theoretical and analytic framework that is based on two central assumptions. First, social categories such as gender, race/ethnicity/immigrant background, and social class are interwoven and form the “social location” that shapes lived experience at the individual level. Second, these lived experiences directly reflect the complex interaction of various interconnected systems of power, privilege, and oppression at the structural level (Collins, 1990; Crenshaw, 1989; McCall, 2005; see also Beccia et al., 2021). Importantly, social categories are, thus, potentially associated with distinct (but overlapping) privileges or disadvantages. For example, Black female students from socioeconomically disadvantaged families in the USA may face oppression and discrimination due to sexism, racism, and classism (Irwin, 2020; Morris, 2007).

In recent years, there has been a call for a greater application of intersectional approaches in quantitative psychological and educational research that acknowledge the multidimensional and multiplicative impact of social categories on educational inequalities (Bowleg, 2008; Codiroli Mcmaster & Cook, 2019; Cole, 2009; Else-Quest & Hyde, 2016a; Gross et al., 2016a; McCall, 2005). Quantitative methods can contribute to the current intersectional research by providing insights into patterns and structures of educational inequalities in the population (Bauer, 2014), enabling analyses of how educational inequalities evolve over time (Scott, 2010), and providing information that can be used to develop interventions to address these inequalities (Bauer, 2014). Quantitative psychological and educational studies that have examined educational inequalities from an intersectional perspective have mainly used traditional regression models that contain interaction effects (e.g., gender by social class by ethnic identity; see, for example, Berrington et al., 2016; Strand, 2014a, b). However, these conventional approaches have several methodological limitations, including issues of scalability, model parsimony, and sample size (Evans et al., 2018).

In this paper, we introduce the Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy (MAIHDA) approach, which provides solutions to some of the key methodological strains in quantitative intersectional research. Using a multilevel analysis framework, the approach has been developed and successfully applied in epidemiology to explore inequalities in health outcomes (e.g., Axelsson Fisk et al., 2018; Beccia et al., 2021; Evans et al., 2018; Evans & Erickson, 2019; Merlo, 2018; see also Jones et al., 2016). The present article aims to introduce this approach to the field of educational psychology by building on and integrating contributions aimed at disseminating the MAIHDA approach in epidemiology (Axelsson Fisk et al., 2018; Evans et al., 2018; Merlo, 2018). Going beyond these previous contributions, our goals are to (1) explain how the MAIHDA approach can be used to study intersectional inequalities in psychological and educational research, (2) show which educational psychological research questions can be answered with this approach, and (3) demonstrate its application with the help of a hands-on tutorial. We therefore first elaborate on where the MAIHDA approach is situated within the broader framework of research on intersectionality. In particular, we focus on how the statistical features of MAIHDA can help researchers to implement key concepts of this line of research. We also conduct a systematic review to show how this approach has been applied in previous research (e.g., in the health sciences). To demonstrate the application of MAIHDA, we use the empirical example of inequalities in 15-year-old students’ reading achievement at the intersection of students’ gender, immigrant background, parental education, and parental occupational status in Germany with data from the 2018 cycle of the Programme for International Student Assessment (PISA). We provide a thoroughly annotated R syntax to facilitate the replication and future application of MAIHDA analyses.

Intersectionality and Educational Inequalities

Origins of Intersectionality

Intersectionality emerged as “a heuristic term to focus on the vexed dynamics of difference and the solidarities of sameness in the context of antidiscrimination and social movement politics” (Cho et al., 2013, p. 787). The term intersectionality was first introduced in 1989 by the American Black feminist and legal scholar Kimberlé Crenshaw, who drew attention to the particular disadvantages faced by African American women in two seminal articles (Crenshaw, 1989, 1991). She emphasized that the experiences of women of color can only be understood when race and gender are considered simultaneously (Crenshaw, 1991). The origins of intersectionality can be traced back to the reflections of Black feminists such as Sojourner Truth (1851) and the Combahee River Collective (1982). In 1982, the Combahee River Collective formulated a Black feminist statement based on the understanding that “the major systems of oppression are interlocking” (p. 13). Originally applied primarily in gender studies, the concept has gradually entered other disciplines, such as health sciences (Bowleg, 2012), sociology (Choo & Ferree, 2010), psychology (Else-Quest & Hyde, 2016a), and education (Codiroli Mcmaster & Cook, 2019).

Examining Educational Inequalities Within an Intersectional Framework

Educational inequalities are one of the most persistent social problems in contemporary societies and have been widely researched in psychology, education, sociology, economics, and beyond (e.g., Codiroli Mcmaster & Cook, 2019). Educational inequalities can be defined as systematic variations in several aspects of educational attainment (e.g., access to education, experiences and learning processes, and educational outcomes) between individuals based on their social group membership, including but not limited to gender, race/ethnicity/immigrant background, and social class (Gross et al., 2016b; Jacobs, 1996).

Inequalities based on gender, race/ethnicity/immigrant background, and social class are the most researched and widespread forms of inequality in education (Codiroli Mcmaster & Cook, 2019). Gender inequalities in education are defined as differences in educational outcomes between females and males.Footnote 1 Research on race/ethnicity/immigrant background inequalities in education often focuses on disadvantages experienced by minorities (e.g., Heath et al., 2008). Social class inequalities in education are differences in educational outcomes between those who have more financial, cultural, and/or family resources and those who have fewer of those resources (Codiroli Mcmaster & Cook, 2019).

Although all of these social categories contribute to educational inequalities, educational inequalities cannot be explained by simply summing the effects associated with each individual social category or by prioritizing one above all others. Rather, social categories operate simultaneously and are mutually constitutive (e.g., Gillborn & Mirza, 2000; Strand, 2014a, b). In the present paper, we will examine this interplay by exploring how students’ gender, immigrant background, parental education, and parental occupational status (as proxies for students’ socioeconomic status [SES]) contribute to inequalities in reading achievement. Reading literacy is of particular importance because it is a key determinant for lifelong learning and educational success (e.g., in school, vocational training, higher education), and a prerequisite for successful participation in most aspects of adult life (Cunningham & Stanovich, 1997; OECD, 2017; Smith et al., 2000).

The literature has consistently documented inequalities in adolescent reading achievement that are related to the intersectional effects of multiple social categories, involving students’ SES, immigrant background, and gender. There are several mechanisms through which these social categories influence students’ reading achievement. First, SES can have both direct effects on children’s (reading) development, for example, through limited material resources and risk for a range of health and developmental problems (Chaudry & Wimer, 2016; Currie & Goodman, 2020) and indirect effects through parental education, expectations, and aspirations (Davis-Kean, 2005). For instance, higher-SES parents have higher educational aspirations and higher expectations for their children’s achievement at school and are more likely to provide rich language and reading environments compared with lower-SES parents, which likely translates into better reading achievement for students from higher-SES families (Davis-Kean, 2005; Niklas & Schneider, 2013; Pace et al., 2017).

Second, in many countries, immigrant students perform worse on standardized reading tests than their non-immigrant peers (e.g., OECD, 2019c). However, it may be misleading to consider this difference in reading achievement in isolation. In many Western countries, families who have immigrated have a lower SES than non-immigrant families (OECD, 2019b). For example, the proportion of children between the ages of 6 and 15 who were at risk of poverty in Germany (i.e., with a household income below the threshold of 60% of the national median income) in 2019 was almost three times higher for children from immigrant families (33.3%) than for children from non-immigrant families (11.9%; Statistisches Bundesamt, 2020). Accordingly, the meaning of an immigrant background and its effects on reading achievement may vary depending on students’ SES.

Third, gender has an impact on reading achievement in that male students perform worse than female students on standardized reading tests in many countries (e.g., OECD, 2019c). One explanation for these gender differences is that the verbal domain is stereotyped as feminine (e.g., Li & McLellan, 2021; McGeown & Warhurst, 2020), and this stereotyping undermines male students’ engagement in this domain (see, for example, social role theory; Eagly, 1987). However, the meaning of being a male student and the effects on reading achievement may differ by SES. Cultural meanings and processes constituting gender and being male may therefore play out differently in low- and high-SES families. High-SES families have, for example, less traditional gender role expectations than low-SES families (e.g., Marks et al., 2009), which may differentially affect female and male students’ reading achievement in low- and high-SES families (e.g., Entwisle et al., 2007).

To conclude, previous research has shown that the social categories of SES, immigrant background, and gender operate simultaneously and are mutually constitutive in explaining educational inequalities in reading achievement. This suggests that the interplay of these social categories can better be understood in its complexity through an intersectional lens (Codiroli Mcmaster & Cook, 2019; Gross et al., 2016b).

Quantitative Approaches to Model Educational Inequalities Within an Intersectional Framework

McCall (2005) identified three distinct methodological approaches for applying the concept of intersectionality: the anticategorical, intracategorical, and intercategorical approach. These approaches differ in how they understand and use social categories to analyze intersectionality. The anticategorical approach engages in the critique and deconstruction of categories. In this approach, intersectionality is not studied with social categories because social life is considered too complex for fixed categories to adequately represent it. The intracategorical approach acknowledges the relevance of social categories for studying intersectionality, but maintains a critical stance toward them by emphasizing the heterogeneity within social categories. Intersectional research, which takes an intracategorical approach, typically focuses on specific social groups at neglected intersections to reveal the complexity of lived experiences within those groups. Finally, in the intercategorical approach, analytical categories are provisionally adopted to analyze inequalities between social groups as well as interactions between different social categories. The intercategorical approach addresses the complexity of social life with multigroup studies and systematic comparisons (McCall, 2005).

Intersectional research has been strongly associated with qualitative methods (e.g., Shields, 2008; Syed, 2010), which usually take an intracategorical approach to intersectionality (McCall, 2005). However, it has been argued that quantitative methods are also suitable for conducting intersectionality research (Bauer, 2014; Bowleg, 2008; Codiroli Mcmaster & Cook, 2019; Cole, 2009; Else-Quest & Hyde, 2016a; Gross et al., 2016a; McCall, 2005). In quantitative psychological and educational research, intersectional inequality is often investigated using an intercategorical approach, as this line of research is based on the idea that social categories exist and investigates inequalities between social groups.

Based on the mathematical/statistical language often used in intersectional scholarship, it could be concluded that intersectional research is intuitively linked to quantitative methods. For example, in her seminal work, Crenshaw (1989) refers to “interactions” between gender and race, and influential intersectional publications are titled “When multiplication does not equal quick addition: Examining intersectionality as a research paradigm” (Hancock, 2007) or “When Black + lesbian + woman ≠ Black lesbian woman: The methodological challenges of qualitative and quantitative intersectionality research” (Bowleg, 2008). However, it is important to note that whereas quantitative (educational) research distinguishes between additive effects and interaction effects, with interaction effects referring to effects that go beyond additive effects (i.e., the observed effect is greater than the sum of the additive components), the meaning of these terms is different in the intersectional theoretical framework. The meaning of “interaction” in this context is similar to the meaning of “intersection,” and examining the multiplicative interaction of gender and race, for example, does not imply a multiplicative statistical interaction model. This terminological distinction is important for implementing intersectional research with quantitative methods sensibly (Bauer, 2014).

The Traditional Regression Approach

Adopting an intercategorical approach, most quantitative research on educational inequalities has used a traditional regression approach in which main effectsFootnote 2 (\({\beta }_{1}\)\({\beta }_{4}\) in Eq. 1) and a full set of interaction terms (\({\beta }_{5}\)\({\beta }_{15}\) in Eq. 1) are specified to capture the interplay between different social categories (Codiroli Mcmaster & Cook, 2019; Else-Quest & Hyde, 2016b; Hancock, 2007). Equation 1 shows the traditional regression model for examining intersectional effects for each student i on their reading achievement (y) accounting for their gender (G), immigrant background (I), parental education (E), and parental occupational status (O). The residual for each student is represented by \({\varepsilon }_{i}\).

$${y}_{i}= {\beta }_{0}+ {\beta }_{1}{G}_{i}+ {\beta }_{2}{I}_{i}+ {\beta }_{3}{E}_{i}+{\beta }_{4}{O}_{i} + {\beta }_{5}{G}_{i}\cdot {I}_{i} + {\beta }_{6}{G}_{i}\cdot {E}_{i} + {\beta }_{7}{G}_{i}\cdot {O}_{i}+ {\beta }_{8}{I}_{i}\cdot {E}_{i} + {\beta }_{9}{I}_{i}\cdot {O}_{i} + {\beta }_{10}{E}_{i}\cdot {O}_{i} + {\beta }_{11}{G}_{i}\cdot {I}_{i}\cdot {E}_{i} + {\beta }_{12}{G}_{i}\cdot {I}_{i}\cdot {O}_{i} + {\beta }_{13}{G}_{i}\cdot {E}_{i}\cdot {O}_{i} + {\beta }_{14}{I}_{i}\cdot {E}_{i}\cdot {O}_{i}+ {\beta }_{15}{G}_{i}\cdot {I}_{i}\cdot {E}_{i}\cdot {O}_{i} + {\varepsilon }_{i}$$
(1)

However, the traditional regression approach comes with several methodological limitations. As discussed by Evans et al. (2018), one limitation concerns scalability. The more social categories are considered, the higher the number of parameters that would have to be accounted for in the traditional regression approach to represent all possible interaction effects (Figure S1). The geometrically increasing number of interaction terms also causes problems for model parsimony. Importantly, the sample size for certain combinations of different social categories can be very small when many social categories are considered. If this is the case, no robust (statistical) conclusions can be drawn and results are difficult to interpret. It is then at the discretion of the researcher whether or not to exclude the particular combinations of social categories (Evans et al., 2018).

The Intersectional MAIHDA Approach

A recent innovation in epidemiology to overcome the methodological challenges described for the traditional regression approach are intersectional analyses based on multilevel models (Axelsson Fisk et al., 2018; Evans, 2015; Evans et al., 2018; Merlo, 2018; Wemrell et al., 2017). Multilevel models are well established in psychological and educational research (e.g., Goldstein, 2003, 2010; Raudenbush & Bryk, 2002; Snijders & Bosker, 2011) and are typically used to account for the nesting of students, for example, within classrooms, schools, or countries, allowing for an examination of heterogeneity within and across contexts (e.g., classrooms, schools, or countries). Following this idea, the MAIHDA approach models individuals at the first level of analysis and combinations of multiple social categories at the second level of analysis. Thus, a multilevel structure results in which individuals (e.g., students) are nested within intersectional strata (e.g., Evans et al., 2018). These strata can represent, for example, female students without an immigrant background whose parents have low occupational status and do not have a university entrance certificate (see Fig. 1). Importantly, the MAIHDA approach disentangles variance between and within intersectional strata to assess the extent to which (a) intersectionality is relevant overall, taking into account heterogeneity within and between social strata (i.e., providing global measures of intersectionality), (b) specific strata have higher or lower educational outcomes than expected based on individuals’ simultaneous membership in multiple social categories (i.e., providing a specific measure of intersectionality; Bell et al., 2019), and (c) strata membership predicts individuals’ educational outcomes (i.e., the discriminatory accuracy of the strata). In addition, MAIHDA can be used to predict educational outcomes for each intersectional stratum, providing another specific measure of intersectionality.

Fig. 1
figure 1

Data structure in the MAIHDA approach. Note. UEC = university entrance certificate; OS = occupational status; IB = immigrant background

Due to its multilevel structure, the MAIHDA approach is more scalable and parsimonious than the traditional regression approach (Figure S1). In our example with 11 potential memberships across four social categories, 42 parameters (i.e., 40 dummy-coded variables for the social categories, an intercept term, and the variance of the residuals) would have to be modelled in the traditional regression approach, whereas only ten parameters (i.e., seven fixed effects for the social categories, the grand mean, the variance of the residuals at Level 1, and the variance of the random effects at Level 2) would have to be modelled in the MAIHDA approach to account for all main effects and all possible interactions between the social categories.

Based on numerous studies on the heterogeneity between and within contexts (e.g., Merlo, 2003, 2014; Merlo et al., 2005, 2009, 2017), Merlo (2018) introduced the term MAIHDA to distinguish MAIHDA analyses from traditional multilevel analyses in epidemiology that focus on the analysis and interpretation of fixed effects. MAIHDA analyses have been conducted to investigate heterogeneity and discriminatory accuracy in health disparities (a) related to the simultaneous membership in multiple social categories (intersectional MAIHDA; e.g., Axelsson Fisk et al., 2018), (b) across different geographical areas (geographical MAIHDA; e.g., Merlo et al., 2019), and (c) related to both the simultaneous membership in multiple social categories and the geographical area (e.g., Khalaf et al., 2020). Notably, the intersectional MAIHDA approach is considered the “new gold standard” for studying social inequalities in epidemiology (Merlo, 2018).

Systematic Review of Studies That Applied the Intersectional MAIHDA Approach

To learn how the intersectional MAIHDA approach has been applied in previous research and to determine the range of statistical values in previous MAIHDA analyses, we carried out a systematic review. To do this, we searched the databases PsycInfo, Web of Science, PubMed, Medline, ERIC, FIS Bildung, as well as the web search engine Google Scholar using the search terms ‘“MAIHDA” OR “Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy”’ for articles published through February 1, 2022. After deleting 75 duplicates, we screened 205 records. We excluded 52 records that were corrections, conference abstracts, posters, workshop materials, or other non-scientific documents, yielding 153 records that were assessed for eligibility. Our hierarchical exclusion criteria comprised: (a) publications that were published in a language other than English, (b) publications that did not include empirical results, (c) publications in which an analytical approach other than a MAIHDA analysis was used, (d) publications in which no intersectional MAIHDA analysis was conducted, and (e) publications that used the same data and study design as studies that were already included. The screening was conducted by two independent raters (i.e., the first author and a trained research assistant). A hierarchical evaluation implies that if a rater decided that an exclusion criterion was fulfilled, the record was ineligible, and screening stopped (Polanin et al., 2019). Following the best practice guidelines for screening by Polanin et al. (2019), the raters reconciled disagreements throughout the evaluation process. Traditional reliability statistics were therefore not computed because they require independent ratings throughout the entire eligibility evaluation process (Polanin et al., 2019). After evaluation, 21 publications were identified that applied an intersectional MAIHDA analysis (see Fig. 2).

Fig. 2
figure 2

PRISMA flow diagram. Note. Adapted from “The PRISMA 2020 statement: An updated guideline for reporting systematic reviews,” by M. J. Page, J. E. McKenzie, P. M. Bossuyt, I. Boutron, T. C. Hoffmann, C. D. Mulrow, L. Shamseer, J. M. Tetzlaff, E. A. Akl, S. E. Brennan, R. Chou, J. Glanville, J. M. Grimshaw, A. Hróbjartsson, M. M. Lalu, T. Li, E. W. Loder, E. Mayo-Wilson, S. McDonald, … D. Moher, 2021, BMJ, 10(89), 1–11. (10.1186/s13643-021–01,626-4). Creative Commons Attribution 4.0 International License

The results of the systematic review are presented in Tables S1 and S2 in the supplementary materials. The review showed that intersectional MAIHDA has been successfully applied to examine inequalities in various health outcomes, such as chronic obstructive pulmonary disease (Axelsson Fisk et al., 2018), body mass index (Evans, 2019a; Evans et al., 2020; Hernández-Yumar et al., 2018), eating-related pathology (Beccia et al., 2021), and depression (Evans, 2019a; Evans & Erickson, 2019; see Table S1). Table S2 provides information on the methodological characteristics of these previous studies (e.g., type of data and estimation method used). In the following, we present the analytical steps taken in the intersectional MAIHDA approach and show how it can be applied to study educational inequalities in psychological and educational research.

Forming Intersectional Strata

The first step in the approach is to assign each individual i (i = 1, …, N) to a certain intersectional stratum j (j = 1, …, J; see Fig. 1). The intersectional strata are defined by forming all possible combinations from a set of social categories for a group of individuals that share intersecting social categories (e.g., Evans et al., 2018). In our example, the following social categories were considered: gender (male vs. female students), immigrant background (students with vs. without immigrant background), parental education (parents without vs. with at least a university entrance certificate), and parental occupational status (differentiated in five distinct levels of occupational status). This implies that each student is assigned to one of J = 40 strata (i.e., 2 × 2 × 2 × 5 = 40), each of which is characterized by a unique combination of students’ gender, immigrant background, parental education, and parental occupational status.

Importantly, very large and diverse samples are required to apply the intersectional MAIHDA approach for reliably estimating effects. The larger the sample, the more likely it is that more strata with sufficient sample sizes can be created. In addition, it is important to find meaningful cut points for the creation of social categories when (a) continuous variables need to be transformed into categorical ones or (b) categories need to be collapsed to reduce the number of resulting strata, if the sample size would not be sufficient for all possible strata. For instance, in our empirical example, we collapsed the variable “parental education” from seven categories into the categories “below university entrance certificate” and “at least university entrance certificate.” We chose these categories because they reflect socially significant distinctions in educational achievement in Germany (Fend, 2013; McElvany et al., 2009) and partitioned the sample into sufficiently large groups. In other research contexts, however, a different categorization of this variable might be more meaningful. For example, when examining inequalities in tertiary educational attainment, a distinction between “having parents with at least a Bachelor’s degree” and “having parents without a Bachelor’s degree” would be more meaningful (Stifterverband, 2021).

To transform continuous variables into categorical variables, they can also be divided into quantiles. In our empirical example, we divided the continuous variable “parental occupational status” into five equal segments (quintiles). The transformation of a continuous variable into quantiles has the advantage that the resulting categories have balanced sample sizes. Whenever possible, however, transformation into categorical variables should be based on substantive criteria.

Model 1: Simple Intersectional Model (Null Model)

For analyses within the intersectional MAIHDA framework, two different multilevel models are estimated: a null model and a main effects model. The first model, the null model (or simple intersectional model), is an unadjusted, random intercept model, in which individuals are nested within intersectional strata. The simple intersectional model decomposes the total variance in the outcome y (i.e., reading achievement) into variance that can be attributed to (a) mean-level differences between intersectional strata (\({\sigma }_{\mathrm{strata}}^{2}\)) and (b) interindividual differences within intersectional strata (\({\sigma }_{e0}^{2}\)). Using the notation by Raudenbush and Bryk (2002), the simple intersectional model is specified as a two-level model that contains a residual term \({e}_{0ij}\) to represent the within-strata random residual at the individual level (Eq. 2) and a random-coefficient \({\mu }_{0j}\) that represents how the mean level of a certain stratum deviates from the grand mean \({\gamma }_{00}\) of the outcome (Eq. 3). The residual terms \({e}_{0ij}\) and the random effects \({\mu }_{0j}\) are assumed to be uncorrelated and to follow a normal distribution with variances \({\sigma }_{e0}^{2}\) and \({\sigma }_{\mathrm{strata}}^{2}\), respectively. The simple intersectional model depicts the important intersectional idea that social categories may interact simultaneously and multidirectionally, given that the reading achievement mean levels of intersectional strata \({\mu }_{0j}\) may freely vary around the grand mean \({\gamma }_{00}\). Equation 4 shows the multilevel model when combining Eqs. 2 and 3.

$${y}_{ij}={\beta }_{0j}+ {e}_{0ij}$$
(2)
$${\beta }_{0j}= {\gamma }_{00}+{\mu }_{0j}$$
(3)
$${y}_{ij}={\gamma }_{00}+{\mu }_{0j}+ {e}_{0ij}$$
(4)
$$\begin{array}{cc}\left[{\mu }_{0j}\right] \sim N(0, {\sigma }_{\mathrm{strata}}^{2})& (\mathrm{Level\;}2)\end{array}$$
$$\begin{array}{cc}\left[{e}_{0ij}\right] \sim N(0, {\sigma }_{e0}^{2})& (\mathrm{Level\;}1)\end{array}$$

The purpose of the simple intersectional model is to (a) calculate the variance partition coefficient (VPC; a global measure of intersectionality) and (b) map the averages of the intersectional strata to identify inequalities in education (as specific measures of intersectionality). The VPC represents the proportion of the total variance in educational outcomes (\({\sigma }_{\mathrm{strata}}^{2}+ {\sigma }_{e0}^{2}\)) that is located at the strata level (Merlo, 2018; Eq. 5) and thus serves as a global measure of intersectionality (together with the VPC from the main effects model and the proportional change in the between-strata variance [PCV]).

$$\mathrm{VPC}= \frac{{\sigma }_{\mathrm{strata}}^{2}}{{\sigma }_{\mathrm{strata}}^{2}+ {\sigma }_{e0}^{2}}$$
(5)

In the simple intersectional model, the VPC is identical to the intra-class correlation coefficient (ICC), which can be interpreted as the correlation between two randomly selected individuals from the same intersectional stratum (Goldstein, 2010). As a result, the VPC from the simple intersectional model is also a measure of similarity or clustering, which is analogous to a measure of discriminatory accuracy.Footnote 3 The discriminatory accuracy indicates how useful intersectional strata are for classifying individuals with regard to their educational outcome. In sum, the VPC from the simple intersectional model quantifies the intersectional general contextual effect (GCE), that is, the influence of being a member of an intersectional stratum on educational outcomes alone (Merlo, 2018).

A high VPC indicates that intersectional strata are useful for understanding individual differences in, for example, students’ reading achievement. By contrast, a VPC of 0 would indicate that intersectional strata resemble random samples from the student body and are not relevant for understanding differences in educational inequalities (Merlo, 2018). To conclude, the VPC indicates how much intersectional strata may influence a particular outcome overall. Thus, it provides important information about the legitimacy of the defined intersectional strata as valid contextual constructs.

In addition, the simple intersectional model provides a detailed map of educational inequalities by predicting the average educational outcomes for each intersectional stratum. The predicted average strata-specific educational outcomes can therefore be understood as a specific measure of intersectionality. In sum, this analysis helps to identify specific strata with particularly positive or negative educational outcomes. In our example, the strata-specific predicted values for reading achievement help to improve our understanding of the gendered, socioeconomic, and immigration-related patterned distribution of reading achievement.

Model 2: Intersectional Interaction Model (Main Effects Model)

The purpose of the second model, the main effects model (or intersectional interaction model), is threefold: to calculate two additional global measures of intersectionality (i.e., the adjusted VPC and the PCV) and one additional specific measure of intersectionality (i.e., the strata-level residuals). The adjusted VPC indicates what percentage of the total variance is explained by the interaction effects at the strata level after controlling for the additive main effects that are associated with the social categories. To calculate the adjusted VPC, the simple intersectional model is extended, now containing m = 1,…, M predictors Wmj to represent the main effects of the social categories that form the intersectional strata (see Eq. 7). The parameters \({\gamma }_{0m}\) (m = 1, …, M) represent their effects as fixed effects (main effects) to control for the “additive” aspects of the social categories.

$${y}_{ij}={\beta }_{0j.\mathrm{adj}}+ {e}_{0ij.\mathrm{adj}}$$
(6)
$${\beta }_{0j.\mathrm{adj}}= {\gamma }_{00.\mathrm{adj}}+ {\gamma }_{01}\cdot {W}_{1j}+ ...+ {\gamma }_{0M}\cdot {W}_{Mj}{ + \mu }_{0j.\mathrm{adj}}$$
(7)
$${y}_{ij}={\gamma }_{00.\mathrm{adj}}+ {\gamma }_{01}\cdot {W}_{1j} + ... + {\gamma }_{0M}\cdot {W}_{Mj}{+ \mu }_{0j.\mathrm{adj}}+ {e}_{0ij.\mathrm{adj}}$$
(8)
$$\begin{array}{cc}\left[{\mu }_{0j.\mathrm{adj}}\right] \sim N(0, {\sigma }_{\mathrm{strata}.\mathrm{res}}^{2})& (\mathrm{Level \;}2)\end{array}$$
$$\begin{array}{cc}\left[{e}_{0ij.\mathrm{adj}}\right] \sim N(0, {\sigma }_{e0.\mathrm{adj}}^{2})& (\mathrm{Level\;}1)\end{array}$$

In our example, a total of M = 7 main effect predictor variables are included. Specifically, the regression model includes three indicator variables as predictors to depict students’ gender (0 = female, 1 = male), immigrant background (0 = native, 1 = immigrant background), parental education (0 = below university entrance certificate, 1 = at least university entrance certificate), and four indicator variables to depict the five categories related to parental occupational status with low SES (i.e., students in the lowest SES quintile) forming the reference category.

In the intersectional interaction model, the strata-level residual \({\mu }_{0j.adj}\) (i.e., the random effect) now represents the difference between the average reading achievement y in stratum j and the reading achievement y that is expected by the main effects of the social categories for that particular stratum. The values of \({\mu }_{0j.\mathrm{adj}}\) are assumed to be normally distributed with mean 0 and variance \({\sigma }_{\mathrm{strata}.\mathrm{res}}^{2}\). As in the simple intersectional model, the adjusted reading achievement mean levels of the intersectional strata \({\mu }_{0j.\mathrm{adj}}\) may freely vary around the grand mean \({\gamma }_{00.\mathrm{adj}}\). Thus, the extended model also reflects the intersectional idea that social categories may interact simultaneously and multidirectionally. Finally, \({e}_{0ij.\mathrm{adj}}\) in Eq. 6 is the individual level residual and is assumed to be normally distributed with mean 0 and variance \({\sigma }_{e0.\mathrm{adj}}^{2}\). Equation 8 shows the intersectional interaction model when combining Eqs. 6 and 7.

While the VPC of the simple intersectional model represents the upper bound of the explanatory power of the intersectional strata and includes both the additive and potential interaction effects of the variables that define the strata, the adjusted VPC in the intersectional interaction model represents the strata-level variance that is due to interaction effects only, at least in relation to the included variables (Axelsson Fisk et al., 2018). Thus, the adjusted VPC can be understood as a global measure of intersectionality. To calculate the adjusted VPC, the between-strata variance is divided by the total variance (i.e., the sum of the between- and within-strata variance) in the intersectional interaction model.

$${\mathrm{VPC}}_{\mathrm{adj}}= \frac{{\sigma }_{\mathrm{strata}.\mathrm{res}}^{2}}{{\sigma }_{\mathrm{strata}.\mathrm{res}}^{2}+ {\sigma }_{e0.\mathrm{adj}}^{2}}$$
(9)

The third global measure of intersectionality is the PCV. The PCV measures how much between-strata variance observed in the simple intersectional model is explained by additive main effects vs. interaction effects (Axelsson Fisk et al., 2018; Evans, 2019b). To calculate the PCV, the strata-level variance in the intersectional interaction model is first subtracted from the strata-level variance in the simple intersectional model and then divided by the strata-level variance in the simple intersectional model (see Eq. 10). PCVs are usually multiplied by 100 and reported as percentages. Subtracting the PCV from 100% yields the percentage of the strata-level variance that cannot be explained by the main effects, and is therefore likely due to the interaction effects.

$$\mathrm{PCV}= \frac{ {{\sigma }_{\mathrm{strata}}^{2}- \sigma }_{\mathrm{strata}.\mathrm{res}}^{2}}{{\sigma }_{\mathrm{strata}}^{2}}$$
(10)

Low PCV values imply that the main effects cannot fully explain the variation in the outcome and that the remaining between-strata variance is due to the existence of interaction effects between the social categories defining the intersectional strata. In contrast, high PCV values indicate that the main effects explain a large proportion of the mean-level differences between intersectional strata in the outcome. In sum, by calculating the adjusted VPC and PCV, the intersectional interaction model decomposes the intersectional GCE into an additive and an interaction component, which allows for the identification of the portion of the intersectional GCE attributable to each component.

A specific measure of intersectionality is the strata-level residual \({\mu }_{0j.\mathrm{adj}}\) (i.e., the random effects of the intersectional interaction model). It indicates the unique intersectional effect of each stratum after controlling for the main effects of being a member in a certain social category. The strata-level residual is the difference between the predicted reading achievement (due to main effects and interaction effects) and the expected reading achievement (due to main effects only) in the intersectional interaction model (see Table S3). A positive strata-level residual indicates that a stratum’s mean reading achievement is higher than expected from the additive main effects, whereas a negative strata-level residual indicates a lower mean reading achievement than expected for this stratum. When the 95% credible interval of a strata-level residual does not include zero, this points to a statistically significant interaction effect in that stratum. If there are no interactions, the inclusion of main effects would fully explain the variance between intersectional strata and all 40 random effects would be zero.

Precision-Weighting in the MAIHDA Approach

One important feature of multilevel models in general, and of the MAIHDA approach in particular, is that they provide precision-weighted estimates of outcomes for each stratum (Bell et al., 2019; Evans, 2019b; Evans et al., 2018). In multilevel models, a shrinkage factor is applied to obtain the estimated residuals \({\mu }_{0j.\mathrm{adj}}\) (see Eq. 11)

$${\mu }_{0j.\mathrm{adj}}={r}_{j}\cdot \frac{{\sigma }_{\mathrm{strata}.\mathrm{adj}}^{2}}{{\sigma }_{\mathrm{strata}.\mathrm{adj}}^{2}+({\sigma }_{e0.\mathrm{adj}}^{2}\,/\,{n}_{j})}$$
(11)

where \({r}_{j}\) is the raw residual (i.e., the difference between sample mean and predicted outcome for that stratum), and \({n}_{j}\) is the number of individuals in each intersectional stratum \(j\).

Thus, the residuals in the multilevel model are shrunk based on three factors. The shrinkage will be greater if (a) the between-strata variance \({\sigma }_{\mathrm{strata}.\mathrm{adj}}^{2}\) is low, (b) the within-strata variance \({\sigma }_{e0.\mathrm{adj}}^{2}\) is high, or (c) the sample size \({n}_{j}\) of a stratum is small (Bell et al., 2019). Ultimately, the shrinkage has two important advantages. One advantage is that multilevel models thereby provide reliable (albeit conservative) estimates for strata with small sample sizes (Evans, 2019b; Evans et al., 2018). The second advantage is that the shrinking helps to solve the problem of multiple testing when the between-strata variance is low (i.e., when there are no interaction effects). The problem of multiple testing implies that when many parameters (here, the effects of intersections) are tested, the chance of erroneously detecting interactions increases rapidly. This risk is particularly prevalent for strata with small sample sizes. One strategy to control the overall error rate in traditional single-level regression models is to inflate the confidence intervals through a Bonferroni correction. However, it has proven to be more efficient to shift estimates toward each other through this shrinkage process. Thus, automatic shrinkage in multilevel models leads to more appropriately conservative comparisons without reducing the power to detect true differences (Bell et al., 2019; Gelman et al., 2012; Jones et al., 2016).

Empirical Illustration: Educational Inequalities in Reading Achievement

Research Questions

In this section, we illustrate the empirical application of the MAIHDA approach by analyzing educational inequalities in reading achievement among 15-year-old students in Germany across intersections of students’ gender, immigrant background, parental education, and parental occupational status. In doing so, the following research questions will be answered:

  1. 1.

    How useful are intersectional strata for predicting students’ reading achievement?

  2. 2.

    How does the predicted reading achievement differ across intersectional strata?

  3. 3.

    To what extent do complex interactions of students’ gender, immigrant background, parental education, and parental occupational status contribute incrementally to explaining inequalities in reading achievement?

  4. 4.

    Are inequalities in reading achievement more or less pronounced in specific intersectional strata?

Method

Data

We used data from the German sample of the PISA 2018 cycle (N = 5451). PISA is a triennial international large-scale assessment launched in 2000 by the Organisation for Economic Co-operation and Development (OECD). It is aimed at evaluating education systems worldwide by testing the skills and knowledge of 15-year-old students at the end of compulsory education in the key domains of mathematics, reading, and science. Table 1 gives a more detailed description of the sample. For details on the sample size in each intersectional stratum, see Table S3 in the supplementary materials.

Table 1 Sample sizes, mean values (and 95% confidence intervals), and percentage of missing values for each level of the stratum variables gender, immigrant background, highest parental education, and highest parental occupational status

Measures

Outcome: Reading Achievement

In PISA 2018, reading literacy was defined as “understanding, using, evaluating, reflecting on and engaging with texts in order to achieve one’s goals, to develop one’s knowledge and potential and to participate in society” (OECD, 2019a, p. 28). Accordingly, reading literacy was assessed in three different categories: the ability to locate information, to understand texts, and to evaluate and reflect upon texts (OECD, 2019a). When the reading literacy scale was introduced in PISA 2000, reading achievement scores were scaled to have a mean of 500 and a standard deviation of 100 in the OECD student population. The higher students score on the scale, the better they perform in reading (OECD, 2012). To estimate students’ achievement scores, PISA uses plausible values. Applying plausible values offers the methodological advantage of unbiased estimates of population parameters (e.g., means and standard deviations). Overall, PISA provided ten plausible values for the reading achievement scale in the 2018 cycle (OECD, 2021).

Notably, to put intersectional effects into context, there are established benchmarks that help to interpret the substantive meaning of score points and score point differences on the PISA reading achievement scale (OECD, 2019b). In particular, to describe students’ reading proficiency in PISA 2018, the reading scale was divided into eight proficiency levels. Proficiency levels indicate what students at a given proficiency level typically know and are able to do. Approximately 73 score points mark one proficiency level in PISA 2018 (OECD, 2019b), which represents one benchmark to assess differences in reading achievement between intersectional strata and the magnitude of intersectional effects. Another established benchmark is the learning gain of one additional year of schooling, which is about 40 points between students attending grade 9 and grade 10 (OECD, 2010, p. 55).

Social Categories

In our empirical example, we investigated the intersectional effects of four social categories. First, students’ gender was assessed by self-categorization into one of two categories: female or male. Second, to assess students’ immigrant background, we used the index of immigrant background (IMMIG) in PISA 2018. The IMMIG index distinguishes native students (i.e., students with at least one parent born in Germany) from students who are first-generation (i.e., students themselves were born outside Germany) or second-generation immigrants (i.e., students were born in Germany but their parents were not) to Germany (OECD, 2021). Since the number of first-generation students was small, we combined the two latter groups to obtain more balanced group sizes. Third, to assess the level of parental education, we used the index of highest education level of parents (HISCED) in PISA 2018, which corresponds to the higher International Standard Classification of Education (ISCED) level of either parent (OECD, 2019a). The HISCED was measured with seven response categories that we condensed into two socially meaningful categories for Germany (Fend, 2013; McElvany et al., 2009): (0) below university entrance certificate and (1) at least university entrance certificate. Fourth, to assess parents’ highest occupational status, we used the highest international socio-economic index of occupational status (HISEI; Ganzeboom, 2010; Ganzeboom & Treiman, 2003). This index reflects the highest ISEI score of either mother or father if data for both parents were obtained, or the only available parent’s ISEI score (OECD, 2019a). The ISEI scores range from 11 (e.g., subsistence crop farmers) to 89 points (e.g., judges). The higher the ISEI score, the higher the occupational status of an individual (Ganzeboom, 2010). We partitioned the HISEI into five equal categories (quintiles) to facilitate comparisons between adolescents growing up in families with high occupational status (first quintile), medium to high occupational status (second quintile), medium occupational status (third quintile), low to medium occupational status (fourth quintile), and low occupational status (fifth quintile).

Notably, the sample size in our empirical example was sufficiently large and well-distributed across the strata to preclude substantial underestimation of between-strata variance (Table S5). The number of students per strata ranged from 3 to 512 (Table S3), with 85% of the strata having 20 or more students and 90% having ten or more students. The strata with the fewest students represent particularly rare combinations of social categories, such as minority students with low parental education, but high parental occupational status.

Data Analysis

All analyses were conducted in R (version 4.1.2; R Core Team, 2021).Footnote 4 Figures were produced using the R package “ggplot2” (version 3.3.5; Wickham, 2016). The supplemental materials and R code for reproducing the results and figures from the present study can be accessed via https://osf.io/qdcna/. An overview of the analytical steps and the corresponding R code can be found in Table 2.

Table 2 Overview of the analytical steps in MAIHDA

Treatment of Missing Data

Using the MAIHDA approach requires large samples. However, an unavoidable reality of any analysis of large samples is missing data. Before performing the intersectional MAIHDA analysis, we therefore imputed missing data by applying nested multiple imputation (e.g., Weirich et al., 2014) using the R package “miceadds” (version 3.11–6; Robitzsch & Grund, 2020). The percentage of incomplete cases was 22%. The percentage of missing values in specific variables can be found in Table 1. For each plausible value representing students’ reading achievement, we imputed the missing data on students’ immigrant background, parental education, and parental occupational status before category building three times, yielding a total of 3 × 10 = 30 complete data sets. In the imputation model, we included students’ age, students’ attendance in the academic track in high school (i.e., the Gymnasium), as well as two-way and three-way interactions among the variables reading achievement, gender, immigrant background, parental education, and parental occupational status as auxiliary variables. Furthermore, we used the PISA student weight as auxiliary variable, as it may include information not covered by other variables that could be helpful in the imputation (e.g., information about schools; Stapleton, 2013).

Descriptive Statistics

Sample means and standard deviations for all measures, as well as intercorrelations are reported in Table S4. To calculate these estimates, we used the R packages “survey” (version 4.1–1; Lumley, 2004) and “srvyr” (version 1.1.1; Freedman Ellis & Schneider, 2022) and followed the PISA guidelines for weighting data and accounting for clustering and stratification (OECD, 2021). We pooled the results for the descriptive statistics across the 30 multiple imputed data sets using the pool_mni() function from the “miceadds” package.

MAIHDA

In line with most previous research using the MAIHDA approach (see Table S2), we took advantage of a Bayesian multilevel framework. Specifically, the simple intersectional model and the intersectional interaction model (shown in Eqs. 2-4 and 6-8) were estimated using Bayesian Markov chain Monte Carlo (MCMC) estimation methods with the R package “brms” (version 2.16.3; Bürkner, 2017), which is an interface to fit Bayesian generalized (non)linear multivariate multilevel models in Stan (Stan Development Team, 2019). We specified non or very weakly informative priors for all parameters and ran the analyses with one chain, a warmup phase of 5000 iterations, and a total length of 10,000 iterations. We combined posterior draws from the analysis of each imputed dataset and used the combined draws to summarize the posterior distribution to obtain means of the respective MCMC chains as point estimates for each parameter (see Zhou & Reiter, 2010) using the brm_multiple() command. We checked posterior samples by formal assessment of the Gelman-Rubin convergence diagnostic, which indicated model convergence. As a further robustness check, we also conducted the analyses with the traditional maximum-likelihood estimation, REML, that is more commonly used in educational research (see Section S1, Tables S6 and S7, and Figure S2 in the supplemental materials for results).

Results and Discussion

How Useful Are Intersectional Strata for Predicting Students’ Reading Achievement?

To examine the usefulness of intersectional strata for predicting students’ reading achievement, we determined the discriminatory accuracy of the strata using the VPC of the simple intersectional model. We found that 15.9% of the total variance in students’ reading achievement resides at the intersectional strata level (Table 3). Our systematic review indicated that the magnitude of the VPCs in research that focused on health-related outcomes ranged between 0.5% and 41.9% with a median value of 5.5% (Table S1). Thus, the value of the VPC in the present empirical example is considerably higher than the VPC values typically found for health-related outcomes. Although the fields of epidemiology and educational psychology are not directly comparable, we use the VPCs (and PCVs) of health-related outcomes as preliminary benchmarks to assess the magnitude of VPCs (and PCVs) for education-related outcomes, because the MAIHDA approach has not yet been introduced in educational contexts. Thus, the evidence from epidemiology can provide a first orientation for assessing the magnitude of VPCs (and PCVs). However, further research in the field of education is needed to establish appropriate benchmarks for educational inequalities.

Table 3 Parameter estimates for the multilevel models of reading achievement in 15-year-old students

Furthermore, a VPC of 15.9% also suggests that social processes influence the distribution of reading outcomes, underscoring the merit of an intercategorical approach (McCall, 2005) in this context and the need to explore and address the social determinants of these inequalities. When designing interventions to improve reading outcomes for specific strata, however, one needs to consider the considerable heterogeneity in students’ reading achievement within strata and the overlap of distributions between strata. Targeting interventions to specific strata, especially when discriminatory accuracy is low, could lead to stigmatization of students in certain strata and ineffective interventions, because many students with low reading achievement also belong to strata with better average reading achievement, and students with high reading achievement may belong to strata with worse average reading achievement (Merlo, 2018).

How Does the Predicted Reading Achievement Differ Across Intersectional Strata?

The main effects indicated that being male and having an immigrant background, parents without a university entrance certificate, and parents with a lower occupational status were each associated with a substantially lower reading achievement in Germany (Table 3). Figure 3 and Table S3 show the predicted reading achievement in the simple intersectional model for each intersectional stratum (as specific measures of intersectionality). The stratum with the highest predicted reading achievement comprised native female students with more educated parents (i.e., at least one parent earned a university entrance certificate) who are also in the highest occupational status group (580.52, 95% CI [570.91, 590.19]). At the other side of the spectrum, the stratum with the lowest predicted reading achievement comprised male students with an immigrant background and lower educated parents (i.e., no parent earned a university entrance certificate) in the lowest occupational status group (407.99, 95% CI [391.11, 425.12]). Hence, the difference in PISA scores between the stratum with the highest and the stratum with the lowest predicted mean reading achievement is 172.53 points, a difference that corresponds to almost 2.5 proficiency levels or the expected learning gain of more than 4 years of schooling (OECD, 2010).

Fig. 3
figure 3

Predicted mean reading achievement with 95% credible intervals across intersectional strata and PISA proficiency levels in the simple intersectional model. Note. The lighter the color of the background, the higher the proficiency in reading. Low OS = low occupational status; L-m OS = low to middle occupational status; Mid OS = middle occupational status; M-h OS = middle to high occupational status; High OS = high occupational status

To What Extent Do Complex Interactions Between Social Categories Contribute Incrementally to Explaining Inequalities in Reading Achievement?

To examine the extent to which complex interactions of social categories contribute incrementally to explaining inequalities in reading achievement, we used the global measures of intersectionality: The variance in students’ reading achievement residing at the intersectional strata level was reduced from VPC = 15.9% in the simple intersectional model to VPCadj = 1.6% in the intersectional interaction model. This indicates that additive rather than interactive effects of gender, immigrant background, parental education, and parental occupational status explain most of the differences in reading achievement across intersectional strata. However, although an adjusted VPC of 1.6% seems to be low, it does not mean that intersectional effects are irrelevant. It means that the interaction component is smaller than the additive component of the intersectional effect. Based on our systematic review of previous research (Table S1), values between 0.0% and 9.9% (median = 0.6%) have been found for the adjusted VPC for health-related outcomes. Hence, the magnitude of the adjusted VPC value in our empirical example is comparable to those in previous studies on health-related outcomes that applied the intersectional MAIHDA approach.

The PCV indicated that 91.1% of the between-strata variance in reading achievement was explained by the additive main effects, whereas 8.9% of the variance was likely due to interaction effects. Our systematic review indicated that the PCVs ranged from −37.9%Footnote 5 to 99.9% with a median of 92.9% in previous research that applied the intersectional MAIHDA approach in health-related contexts (Table S1). Thus, the PCV from our empirical example is comparable in magnitude to the PCVs reported in those previous studies. In summary, these results suggest that complex interactions between social categories are incrementally relevant for explaining inequalities in reading achievement. In particular, factors operating both at the individual level (e.g., cognitive skills, reading-related motivation, language spoken at home) and at the intersectional strata level (i.e., social determinants rooted in interlocking systems of power, privilege, and oppression that are unique to particular social locations) may be relevant to understanding intersectional inequalities in reading achievement.

Are Educational Inequalities More or Less Pronounced in Specific Intersectional Strata?

To investigate whether educational inequalities are more or less pronounced in specific intersectional strata, we calculated interaction effects for each stratum (i.e., the strata-level residuals in the intersectional interaction model; see Fig. 4 and Table S3), which are considered as specific measures of intersectionality. We found one significant interaction effect (highlighted by the arrow in Fig. 4). Native female students with more educated parents in the highest occupational status group showed a higher reading achievement than expected. Their strata-level residual had a positive value of 18.27 points (95% CI [1.03, 39.64]). This interaction effect corresponds to a learning gain of almost half a school year in reading achievement, or about a quarter of a proficiency level (OECD, 2010, 2019b).

Fig. 4
figure 4

Interaction effects (strata-level residuals, μ0j) with 95% credible intervals from the intersectional interaction model. Note. Low OS = low occupational status; L-m OS = low to middle occupational status; Mid OS = middle occupational status; M-h OS = middle to high occupational status; High OS = high occupational status. Each marker indicates one stratum’s strata-level residual value. Strata are grouped by gender, immigrant background, parental education, and parental occupational status. Negative values indicate a lower reading achievement than expected based on the additive contributions of the main effects, while positive values indicate a higher reading achievement than expected for that stratum. Significant intersectional effects are marked with an arrow

This interaction effect is also particularly noteworthy given that simulation studies have shown that intersectional MAIHDA analyses are less likely to erroneously detect interactions of statistical significance due to chance alone than traditional regression analyses (Bell et al., 2019; Evans, 2019b). Notably, if we apply the REML estimation procedure (see section S1 and Tables S6 and S7 in the supplemental materials), instead of the Bayesian estimation procedure, we find three additional statistically significant interaction effects. This discrepancy can be attributed to the fact that the uncertainty in the estimation of the variance components is not taken into account in REML, resulting in narrower confidence intervals compared to the credible intervals of Bayesian estimation procedures (see Figure S2; Baldwin & Fellingham, 2013). All in all, this speaks to the robustness of the Bayesian result that native female students who have parents with at least a university entrance certificate and an occupation in the highest occupational status group performed even better in reading than would have been expected from the additive main effects.

General Discussion

The MAIHDA approach is a novel intersectional multilevel approach that has already been successfully applied in epidemiology to study health-related inequalities. In the present paper, we conducted a systematic review to show how this approach has been applied in this area of research. Building on and integrating the contributions aimed at disseminating the MAIHDA approach in epidemiology, we introduced MAIHDA to model educational inequalities at the intersection of multiple social categories. We also provided an empirical example that illustrates how this approach can be applied to answer key intersectional research questions in educational psychology. In the following sections, we discuss the contributions of quantitative methods to intersectionality research as well as the advantages, limitations, and future directions of the MAIHDA approach in educational psychology.

Contributions of Quantitative Intersectionality Research

As mentioned in the introduction, the concept of intersectionality has been used much more frequently in qualitative than in quantitative research. One reason why qualitative methods are seen as particularly suited to intersectional research is that they focus on analyzing and representing the complexity and richness of individuals’ experiences (McCall, 2005; Scott, 2010; Shields, 2008; Syed, 2010). Qualitative intersectional studies offer the opportunity to reveal the experienced inequality and discrimination of individuals in specific contexts and to capture their own understanding of their social location as well as their own interpretations of their individual realities. Furthermore, qualitative intersectional studies allow researchers to examine the impact of interconnected systems of power, privilege, and oppression on the opportunities and constraints in individuals’ lives (Scott, 2010).

Only in recent years has intersectionality been incorporated into quantitative research in the social and health sciences (Bauer et al., 2021). Importantly, interest in quantitative intersectional research methods is growing as they can contribute to current intersectionality research in multiple ways. First, quantitative intersectional studies can provide insights into patterns and structures of (educational) inequalities in the population (Bauer, 2014). For example, studies focusing on the unidimensional impact of social categories have indicated that male students typically perform worse in reading than female students (e.g., OECD, 2019c). Such results have led to continued public concern about the poor academic performance of male students (e.g., Kuper & Jacobs, 2018; Nemko, 2014; “Newsweek,” 2008; Roos, 2022; Tyre, 2006). MAIHDA, however, shows that educational inequalities in reading are more complex. For instance, our empirical example demonstrated that strata containing male students who were native and had a high SES were among those with the highest predicted reading achievement. The predicted reading achievement for native male students whose parents had at least a university entrance certificate and were in the highest or second-highest occupational status group were 551.57 points (95% CI [542.49, 560.51]) and 537.47 points (95% CI [528.29, 546.92]), respectively. At the same time, the stratum containing female students who had an immigrant background and a low SES was among the strata with the lowest predicted reading achievement (415.20 points, 95% CI [395.76, 434.68]). This suggests that the development of male and female students’ reading skills is also context dependent, and that both male and female students develop higher reading achievement in more favorable environments than in less favorable ones (although female students outperform male students in almost all strata when all other social categories are equal; see Fig. 3).

Second, quantitative intersectional methods can improve the documentation of inequalities at various intersectional positions and thereby provide policy-relevant information that can help identify starting points for addressing (educational) inequalities and injustice (Bauer, 2014). MAIHDA improves the documentation of educational inequalities by predicting educational outcomes for intersectional strata, allowing the identification of strata whose educational outcomes may require special attention from researchers and policymakers (in our empirical example, e.g., those strata that only just reached PISA proficiency level 2 in reading, see Fig. 3). Furthermore, the MAIHDA approach provides a measure of discriminatory accuracy that indicates how well intersectional strata classify individuals in terms of their educational outcomes and whether it is appropriate to design interventions for specific strata. In our empirical example, the discriminatory accuracy of the strata for reading achievement was relatively high (VPC = 15.9%) compared with the discriminatory accuracy typically found for health-related outcomes. However, benchmarks for educational inequalities need to be established to evaluate the magnitude of the discriminatory accuracy and the appropriateness of targeted interventions for specific intersectional strata to improve educational outcomes.

Third, by quantifying intersectional effects, quantitative intersectional methods allow researchers to analyze how educational inequalities develop over time (Scott, 2010) and thus enable researchers to identify moderating influences on these inequalities. Although our empirical example was based on cross-sectional data, the MAIHDA approach can also be applied to examine trajectories of educational inequalities. In a longitudinal multilevel framework, it would be possible to examine the development of continuous educational outcomes (e.g., academic achievement, achievement motivation, school belonging, attitudes toward education) across intersectional strata. Such an analysis could provide insight into the age at which intersectional effects increase or decrease and thus point to sensitive periods when interventions might be particularly effective.

Finally, quantitative methods advance intersectionality research because they allow new intersectional research questions to be answered and more diverse perspectives on educational phenomena to be provided (Else-Quest & Hyde, 2016a). It has been argued that combining quantitative and qualitative intersectional approaches in mixed method research could be especially fruitful to better understand the influence of interconnected systems of power, privilege, and oppression on (educational) inequalities (e.g., Bauer, 2014; Else-Quest & Hyde, 2016a; Gross et al., 2016a). For instance, the MAIHDA approach could be first used to identify strata in which educational inequalities are particularly pronounced, and these strata could then be further investigated with an intracategorical approach using qualitative methods. In the context of educational inequality in reading achievement, for example, interviews with parents and teachers could be conducted to examine whether there are similarities within strata in the way reading support is practiced in families and schools to generate hypotheses about lack of support, barriers to support, and differences between strata with strong and poor reading skills. To conclude, quantitative methods can make a substantial contribution to our understanding of the intersectionality of educational inequalities.

Advantages and Limitations of the MAIHDA Approach

The MAIHDA approach comes with distinct advantages over the traditional regression approach. These include better scalability for higher dimensions, model parsimony, precision-weighted estimates of strata with small sample sizes, and better handling of the multiple testing problem (Bell et al., 2019; Evans, 2019b; Evans et al., 2018). Although intersectional MAIHDA analyses have several clear strengths, they also have some limitations, many of which are related to the shortcomings of the available data. One limitation is that important aspects of inequality may be overlooked because of data gaps. MAIHDA analyses require large sample sizes. Even though the German PISA sample is not small with N = 5451 students, an even larger sample would have allowed us to create more strata and to investigate the inequalities in reading achievement in a more differentiated way. For example, with a larger sample, students’ ethnicity could have been considered in addition to their immigrant background. Furthermore, students’ gender was assessed in PISA 2018 by asking students whether they are female or male, with no further options. Accordingly, gender and sex may have been conflated in this item, which could have led to misclassification. In addition, as previously discussed, precision weighting in the MAIHDA analyses results in estimates for strata with small sample sizes being more stable but more conservative than in traditional regression models (Evans, 2019b; Evans et al., 2018). In our empirical example, most strata had a sufficient sample size. However, there were a small number of strata that contained few students. For example, the stratum with female immigrant students who had parents with high occupational status but no university entrance certificate included only three individuals. A larger sample (possibly with oversampling of rare social categories) may have resulted in statistically significant random effects (i.e., interaction effects) in these strata.

In addition, an inherent limitation is that the MAIHDA approach requires discrete social categories (Evans et al., 2018). As a result, some social categories (e.g., occupational status in the empirical example) cannot be treated as continuous, whereas this would be possible in other approaches, such as traditional regression analyses. Moreover, it is important to emphasize that due to a possible omitted variable bias, the estimated strata-level variance as well as the strata-level residuals should be interpreted as unexplained variance or residuals at the strata level, which may be due to interaction effects after controlling for additive effects (Evans et al., 2018).

Another limitation related to the analyses is that the standard errors are likely underestimated because the complex survey design (e.g., clustering at the school level) cannot yet be fully accounted for in the MAIHDA framework (see also the results of our systematic review in Table S2). Developing multilevel models that can accommodate complex survey designs in the MAIHDA framework would be an important next step in methodological research. Additionally, as in any cross-sectional observational study, we could only examine associations and no causal relations.

Finally, as with all (quantitative) methods, MAIHDA analyses cannot examine every aspect of intersectionality. MAIHDA analyses tell us how well students perform in reading dependent on their membership in an intersectional stratum and how much of the variation in students’ reading achievement can be attributed to the intersectional strata level. Yet, such analyses do not reflect, for example, how students experience disadvantages or privileges because of their social location with regard to their reading ability. Thus, qualitative research continues to be central to understanding how intersectionality affects these lived experiences.

Future Directions

Educational large-scale studies currently do not yet collect data on many experiences and characteristics that are relevant for intersectional analyses, or do not do so sufficiently. These include, for instance, whether children and adolescents have care responsibilities at home, if they or their parents have some form of disability, their legal status, whether their gender identity differs from that assigned at birth or reflects a gender identity beyond a binary gender definition, and their sexual identity (see also Codiroli Mcmaster & Cook, 2019). Expanding data collection to include these aspects would be highly desirable to depict the social categories that shape educational inequalities more precisely.

Furthermore, intersectional theorists emphasize that structural contexts (e.g., population, neighborhood, or school characteristics and policies) should be considered in intersectional analyses (e.g., Bauer, 2014; Bowleg & Bauer, 2016; Evans, 2019b). There are multiple structural levels at which intersectional influences can affect the individual. This consideration is consistent with social-ecological models of human development (i.e., ecological systems theory; Bronfenbrenner, 1979). Bronfenbrenner theorized that individual psychological development occurs in interactions within five concentric circles of mutually influencing social and ecological systems, ranging from the intrapsychic to the familial to increasingly broader societal contexts. Future research should extend the intersectional MAIHDA approach to include structural contexts situated as additional social categories to enhance our understanding of socio-structural power relations and their differential educational effects at different intersections. In line with ecological systems theory, such additional dimensions could be school or neighborhood SES, degree of urbanization, or gender and ethnicity ratios as school, neighborhood, or population characteristics. At the policy level, relevant structural contexts could be, for example, public funding for education, access to educational resources, or family support programs.

Finally, as in our empirical example, the MAIHDA approach has mainly been used to answer exploratory research questions (see the results of our systematic review in Table S2). Exploratory approaches are important to identify and describe social phenomena and to derive data-supported hypotheses (Loeb et al., 2017). However, the MAIHDA approach can also be used for hypothesis testing. For example, the multilevel models can be built stepwise by including increasing orders of interactions to test hypotheses about which social categories or combinations of social categories account for the most variance at the strata level.

Further Theoretical and Methodological Guidance

For a general introduction to the MAIHDA approach (in the context of epidemiology), we recommend the papers by Evans et al. (2018) and Merlo (2018). For Stata syntaxes to run MAIHDA analyses via MLwiN and in some cases even detailed methodological descriptions, see the articles by Axelsson Fisk et al. (2018), Evans (2019a), Holman et al. (2020), Khalaf et al. (2020), Ljungman et al. (2021), Persmark et al., (2019, 2020), and Zettermark et al., (2021; see also Table S2). For further theoretical and methodological reflections on the MAIHDA approach see, for example, Evans (2019a).

Conclusion

The intersectional MAIHDA approach is a novel quantitative approach to model educational inequalities at the intersection of multiple social categories and to test (complex) intersectional hypotheses. MAIHDA uses a multilevel framework to define intersectional strata that represent individuals’ membership in multiple social categories. By decomposing the variance between and within these intersectional strata, MAIHDA analyses can assess the extent to which (a) intersectionality is relevant overall, (b) educational inequalities are more or less pronounced in specific strata, and (c) intersectional strata are useful for predicting individuals’ educational outcomes (i.e., their discriminatory accuracy). In addition, MAIHDA analyses provide a detailed mapping of educational inequalities across intersectional strata. For quantitative intersectionality research, the intersectional MAIHDA approach holds great promise because it allows for a more sophisticated assessment of intersectionality in educational contexts and a deeper understanding of educational inequalities compared to the traditional regression approach. In the present article, we systematically reviewed the application of MAIHDA to study health inequalities in epidemiology, introduced this approach to educational psychology research and illustrated its use through the example of inequalities in reading achievement and the freely available statistical software R.