Penalized for Challenging Traditional Gender Roles: Why Heterosexual Relationships in Which Women Wear the Pants May Be More Precarious

There is growing evidence that heterosexual relationships in which traditional gender roles are reversed because women have attained higher societal status than their male partner are more precarious. We argue that this is the case because both partners in role-reversed relationships are evaluated more negatively than partners in more egalitarian or traditional gender role relationships. In two experimental studies conducted in the United States (N = 223) and the Netherlands (N = 269), we found that when encountering role-reversed relationships, participants perceive the woman as the more dominant and agentic one and the man as the weaker one in the relationship. They also perceive women in role-reversed relationships as less likeable, have less respect for men in role-reversed relationships, and expect that such relationships are less satisfying. In addition, in a third partner study (N = 94 heterosexual couples), we found that both male and female partners in role-reversed relationships considered the man to be the weaker one and the woman to be the more dominant one. Moreover, perceiving the man as the weaker one predicted lower relationship satisfaction in role-reversed couples. Overall, this research indicates that gender stereotypes about heterosexual relationships should be considered in efforts to achieve gender equity.

Despite changing gender dynamics in Western countries, gender stereotypes prescribing men to prioritize breadwinning and women to prioritize caregiving persist and are quite resistant to change (Haines et al., 2016;Morgenroth & Heilman, 2017;Park et al., 2010). Some studies show that couples in relationships with these traditional gender roles reversed (e.g., couples in which women are the main provider or in which men are the main caretaker) experience negative relationship outcomes (e.g., decreased marital satisfaction, increased chance of divorce, lower relationship quality; Bertrand et al., 2015;Vink et al. 2022a;Wilcox & Nock, 2006;Zhang, 2015). Furthermore, role-reversed couples are more likely to experience negative relationship outcomes (e.g., lower relationship satisfaction) in countries that strongly endorse traditional gender role expectations (Vink et al., 2022b). These findings suggest that men and women in role-reversed relationships are sensitive to the gender stereotypes in their environment, potentially affecting their perception of their own partner as well.
The current study aims to test a potential underlying mechanism that explains the negative relationship outcomes of partners in role-reversed relationships. Focusing on relationships in which the woman has surpassed her male partner in social status, we draw on the status incongruity hypothesis (Rudman et al., 2012) to argue that these relationships may be more precarious because of the negative perceptions and expectations that people have about the status divisions that run counter to traditional gender norms in role-reversed relationships. The extent to which women and men are penalized for status violations in role-reversed relationships may explain why role-reversed relationships are less socially accepted (Hettinger et al., 2014;MacInnis & Buliga, 2020) and experience more difficulties than traditional role relationships (Bertrand et al., 2015;Vink et al., 2022a;Wilcox & Nock, 2006;Zhang, 2015), and thus may operate to preserve the gender hierarchy.

Prescriptive Gender Stereotypes and the Status Incongruity Hypothesis
Gender stereotypes follow from frequent observations of men and women in gender-typical social roles, such as men who are the breadwinner of their families and have higher status roles in society and women who are homemakers and have lower status roles (Eagly, 1987;Eagly et al., 2000;Heilman, 2001;Prentice & Carranza, 2002). Men and women who violate these prescriptive stereotypes (e.g., by succeeding in an occupation that is dominated by the other gender) receive social and economic penalties (Heilman & Okimoto, 2007;Heilman & Wallen, 2010;Rudman et al., 2012) or "backlash" (Rudman, 1998;Rudman et al., 2012). Though both men and women risk backlash for violating gender norms, some violations are judged to be more severe than others (Eagly & Karau, 2002;Prentice & Carranza, 2002). For instance, competent and strong women receive favorable responses if they also show modesty and caring qualities (Rudman & Glick, 1999). Further, the status incongruity hypothesis (Rudman et al., 2012) states that stereotype incongruent behavior results in backlash only when the behavior is perceived to be threatening to the current gender hierarchy and men's higher status (Ridgeway, 2001;Rudman & Kilianski, 2000), as people tend to be motivated to justify and support the gender status quo (Jost et al., 2004). The status incongruity hypothesis implies that behaviors prescribed for men serve to increase or protect their status (e.g., being career-oriented, being dominant; Rudman et al., 2012), whereas behaviors proscribed for men would serve to reduce their status (e.g., being emotional or weak; Rudman et al., 2012). Similarly, behaviors prescribed for women serve to maintain their lower social status (e.g., being warm, caring, modest), whereas behaviors proscribed for women serve to increase their social status (e.g., being assertive or overly confident or placing themselves above men; Rudman et al., 2012).
Though both men and women experience backlash when they show stereotype incongruent behavior, the penalties they face when violating the gender hierarchy are different. Men who violate gender norms by succeeding in a traditionally feminine occupation or showing behaviors associated with low status are perceived to be ineffectual (being a 'loser') and respected less, even compared to women who show similar behaviors (i.e., the weakness penalty; Heilman & Wallen, 2010;Rudman et al., 2012). Women who violate gender norms by succeeding in a traditional masculine occupation or showing behaviors that might increase status are perceived to be interpersonally hostile and disliked more (i.e., the dominance penalty; Heilman & Okimoto, 2007;Rudman et al., 2012).

Status Violations and Social Backlash Effects for Role-Reversed Couples
Studies on backlash effects have mainly focused on behaviors and outcomes within the work domain and evaluations of individual women and men. However, a growing literature shows that these mechanisms spill over into the relationship domain. To illustrate, cross-status couples in which either the woman earns more, is more highly educated, or has a higher status occupation than the man were viewed negatively by U.S. adults (MacInnis & Buliga, 2020), which the authors suggest may be because these role-reversed couples challenge the gender hierarchy (MacInnis & Buliga, 2020). In couples where the man has a lower status occupation than his female partner, university students in the U.S. predict the male partner to be less satisfied with the relationship and report less sympathy with the female partner compared to traditional couples, and these negative evaluations are driven by people's perceptions that a role-reversed man is not 'masculine' enough and a role-reversed woman is not 'feminine' enough (Hettinger et al., 2014). However, the authors do not make a distinction between positive (agency) and negative masculine traits (dominance), and between positive (communality) and negative feminine traits (weakness) in this study. We argue that especially the negative traits (i.e., dominance and weakness) may explain backlash for role-reversed couples.
In addition, one study of U.S. participants demonstrated that stay-at-home fathers are less respected than fathers who worked outside the home (Brescoll & Uhlmann, 2005). However, in this study, participants judged individual targets and a spouse was not mentioned, making it difficult to assess how these findings generalize to role violations of couples rather than individual men and women. Another study with U.S. adults where both partners were mentioned in their vignettes found that husbands without an income who did most of the domestic chores were perceived to be weaker, less agentic, and less dominant than stay-at-home husbands who work successfully from home or carry out only part of the total domestic chores (Chaney et al., 2019). However, also in this study, the focus was on individual men who violate traditional gender roles rather than on couple dynamics; as the research question was whether men would receive backlash due to a loss of earnings or due to a large share in domestic tasks. When evaluating women and men in role-reversed relationships, we argue that the focus of people's evaluations shifts from the individual to the dyadic level. Thus, we propose that the negative evaluations of men may be driven by their lower status relative to their female partner rather than these men having lower status in general.

Do Backlash Mechanisms Also Operate Within Relationships?
In addition to how couples in role-reversed relationships are perceived and evaluated by others, we examine whether men and women within role-reversed relationships also evaluate the male partner as the weak one and the female partner as the dominant one in the relationship. Moreover, we test whether relative dominance and weakness perceptions predict negative relationship outcomes for role-reversed couples compared to couples in more egalitarian or traditional relationships. On the one hand, it is not self-evident that others' gendered perceptions of the couples' relationships would be endorsed by couples in role-reversed relationships. Partners have a much more detailed and complete mental representation of one another compared to strangers (Trope & Liberman, 2010). Furthermore, people are motivated to see their relationship in a positive light (Murray, 1999). On the other hand, gender role norms have a strong influence on people, and people often try to avoid gender role violations (Amanatullah & Morris, 2010;Cherry & Deaux, 1978;Rudman et al., 2012;Wallen et al., 2017). Theories on self-categorization and self-stereotyping suggest that men and women can internalize stereotypes about the self and close others under certain conditions and adjust their behavior in line with a negative stereotype when faced with that stereotype (i.e., stereotype threat; Cadinu et al., 2013). Research on the backlash avoidance model illustrates that both men and women are aware of backlash and how it can act to constrain roles within relationships, and how fear of backlash can shape behavior (Rudman et al., 2012).
However, the question remains whether backlash affects the men and women within role-reversed relationships. Based on the processes described above, it might be difficult for couples who break with traditional gender norms to avoid the negative consequences of backlash completely. In the current research, we will examine whether men and women's perceptions of their partner's relative dominance and weakness provide one explanation for the negative relationship outcomes that role-reversed couples experience.

Overview of Current Research
Drawing from the status incongruity hypothesis and applying it at the dyadic level, we examine whether heterosexual couples in a role-reversed relationship are at risk of facing backlash from others because status divisions within their relationship threaten the gender hierarchy (see Fig. 1).
Specifically, across three studies, we examine whether people would perceive role-reversed couples as violating gender roles through rating the women as more dominant than their male partner and rating the men as weaker than their female partner in these relationships, compared to traditional role couples; whether people would negatively evaluate the quality of rolereversed relationships and the individual men and women in role-reversed relationships (i.e., likeability of women and respect for men) as a function of these gender role violations; and whether men and women in role-reversed relationships have similar relative dominance and weakness perceptions of themselves and their partner, and whether these perceptions are related to lower relationship satisfaction.
Prior research suggests that although women who violate gender norms are perceived to be more agentic, it is their dominance rather than their agency that results in backlash (Rudmanet al., 2012). Therefore, if people indeed evaluate women as the agentic partners in the relationship, we did not anticipate that this relative agency would affect their judgment of the women and their relationships. Furthermore, we expect that these backlash mechanisms operate for heterosexual couples in which the women have higher status than the men across absolute status levels. This means that we did not expect men's absolute low status would cause backlash toward heterosexual couples, but rather men having lower status relative to their women partners.

Study 1
The aim of Study 1 was to examine the judgments of rolereversed couples (i.e., where the woman has higher status than the man) compared to traditional-role couples (i.e., where the man has higher status than the woman) and status-equal couples. Overall, for role-reversed couples, we expected that participants would perceive the woman as the more agentic and dominant one relative to the man (H1) and the man to be the weaker one relative to the woman (H2). At the relationship level, we predict that when the woman is perceived to be the more dominant one and the man to be the weaker one in the relationship, people will perceive this relationship as less satisfying for the couple (H3). On the individual level and consistent with the dominance penalty, we expect that when the woman is perceived as the more dominant one in the relationship, people will evaluate her as less likeable (H4). Consistent with the weakness penalty, we expect that people will have less respect for the man when they perceive him as the weaker one (H5; see Fig. 1).

Participants and Design
We aimed to recruit 250 participants. A priori calculation of a repeated-measures ANOVA with a within-between interaction, an estimated partial η 2 of .02, a power of .80, and a correlation of .03 between repeated measures in G*Power indicated that we would need a sample size of 228 (Faul et al., 2007). This sample size is also sufficient for our structural equation models, as a priori calculation of linear multiple regression with an R 2 increase of .10 (a small to medium effect), 13 tested predictors, and a total number of 33 predictors indicated a sample size of 174 (Faul et al., 2007). We aimed for 10% higher so that we could exclude those participants that failed the attention check.
We conducted an experiment with a 2 (Partner Gender: male [Ryan]/female [Anna], within-participants) × 2 (Absolute Status Ryan: Low/Medium, between-participants) × 3 (Relative Status Anna: Lower/Equal/Higher than Ryan, between-participants) mixed design. We manipulated Ryan's status to test whether backlash in the relationship domain is indeed predicted by the relative status of the woman compared to the man compared to the absolute status of the man.

Procedure
After providing informed consent, participants were told that we were interested in how the careers of dual-earning couples affected their relationships and they would earn $2.50 for completing the study and passing the attention check. We emphasized that participation was voluntary and anonymous. After filling out demographic background information, participants were randomly assigned to one of the six conditions where they were asked to read one vignette that included the manipulation. After completing questions related to our independent and dependent variables, we debriefed participants on our research goals.

Manipulation
The manipulation consisted of a description of Ryan and Anna as follows: "Now, we will describe the situation of Ryan and Anna. Ryan and Anna have been in a relationship for five years now. They met each other through mutual friends. Ryan works as [occupation] and Anna works as [occupation]. " We chose to manipulate occupation as a proxy for status. Status can be derived from one's educational degree, income, and prestige of the job (Adler et al., 2000), which are all signaled by one's occupation. Using Glick and colleagues' occupations (1995), we conducted a pilot study (N = 31 of which 20 were men, M age = 34.48, SD age = 7.22) to find occupations that differed in prestige but were comparable in the perceived gender ratio of the job holders (see Table 1 and Supplement A in the online supplement). Participants responded to two items; "Please indicate what you expect is the ratio of current male and female job holders in the following occupations:" with responses given on a slider from 0 (only male) to 100 (only female) and "Please indicate the prestige you expect is associated with the following occupations:" with responses ranging from 1 (very low prestige) to 7 (very high prestige).
For every condition, we selected two occupations for Ryan and Anna that were similar in status. We did this to make sure that effects were driven by status rather than a specific, unforeseen characteristic of one occupation. These occupations were counterbalanced for each participant (see Table 1 for all selected occupations). To illustrate, in the condition where Ryan had low absolute status and Anna higher status than Ryan, we stated that Ryan worked either as a bookbinder or food store manager and Anna as a professor or dentist.

Measures
Items were all measured on a 7-point scale with options ranging from 1 (completely disagree) to 7 (completely agree), unless otherwise indicated. The order of the constructs was based on the hypothesized causal chain in the theorized model. As we had no specific expectations about the perceptions of who is the communal one in the relationship, as communality is not associated with status (Rudman et al., 2012). We included measures of communality to test whether this affected the outcomes. As this was not the case in both Study 1 and 2, we did not report on communality here.

Manipulation Check of Perceived Societal Status
Ryan and Anna's perceived status was measured with a subjective socioeconomic status ladder with ten different rungs (Adler et al., 2000). We described that people at the top of the ladder are best off in terms of income, education, and respected jobs, whereas people at the bottom are worst off. Participants were asked to indicate the rung they thought best represented Ryan and Anna's individual situation.

Agency
We adapted five items from traits in the Bem Sex Role Inventory (Bem, 1981) to assess how participants rated Ryan's and Anna's level of agency (5 items; e.g., "Ryan/ Anna defends his/her own beliefs, is willing to take risks, has a strong personality, is independent, is ambitious"; α Ryan = .78; α Anna = .84).

Likeability
Based on Heilman and Wallen (2010), we assessed likeability with two questions: "How much do you think you would like Ryan/Anna?" and "How would you describe Anna/Ryan," on a scale ranging from 1 (not at all likeable) to 7 (very much likeable) (r Anna = .48, p < .001; r Ryan = .59, p < .001).

Respect
Based on Heilman and Wallen (2010), we included two items to measure perceived respect for Ryan and Anna: "How much do you think Ryan/Anna is someone who commands respect from others?" and "How would you describe Ryan/Anna?" on a scale ranging from 1 (respectable) to 7 (unrespectable) (r Anna = .33, p < .001; r Ryan = .55, p < .001). Given the correlation of the two respect items for Anna was lower, we also conducted the analyses with only a single item for respect (i.e., "How much do you think Ryan/Anna is someone who commands respect from others?"). The results were similar wheterh the two-item or single-item measure was used. For this reason, and because we wanted our results to be comparable with those of Heilman and Wallen (2010), we use the two-item measure of perceived respect.

Relationship Satisfaction
Perceived satisfaction within Anna and Ryan's relationship was measured using five items from Rusbult et al. (1998) measure of relationship satisfaction (e.g., "Ryan and Anna feel satisfied with their relationship," "Ryan and Anna's relationship is close to ideal"; α = .90).

Preliminary Analyses
Correlations between the demographic variables and hypothesized variables showed that participants' education level, ethnicity (Asian vs. white ethnic origin), marital status (married vs. single), and employment status (wages vs. selfemployed) were associated with several of the dependent variables (see Table 2), and therefore we conducted the analyses with and without these variables as covariates, which showed similar results (see Supplement B in the online supplement for results without the covariates). Moreover, we controlled for the average societal status of the couple (except in the analyses for the manipulation checks) to be sure that differences in the evaluations of Anna and Ryan and perceived relationship satisfaction were not due to the fact that as a couple, Ryan and Anna had higher average status in some conditions than in others. Q-Q plots showed that all hypothesized variables were normally distributed. There were no missing data in the final sample.

Manipulation Check
We conducted mixed repeated measures ANCOVA's to check that the experimental manipulation evoked the expected differences in dominance, weakness, and agency traits across the conditions. The findings indicated that the manipulation worked as intended, such that participants perceived Ryan to have higher absolute status in the conditions that we assigned him a medium-status occupation compared to the conditions in which he had a low-status occupation. Also, participants perceived Anna's status relative to Ryan as intended across the conditions (see Supplement C in the online supplement for these analyses). Next, were in line with our hypotheses (i.e., H1 and H2).

Perceptions of Agency, Dominance, and Weakness in the Role-Reversed Relationships
We conducted mixed repeated measures ANCOVA's to test whether the relative dominance and weakness perceptions of Ryan and Anna varied as predicted.
There was no interaction effect for partner gender and Ryan's absolute status, F (1,145) = 1.52, p = .220, but consistent with Hypothesis 1, there was an interaction effect for partner gender and Anna's relative status on perceived relative agency (see Table 3). Only in the conditions where Anna had higher status than Ryan did participants perceive her to be more agentic than Ryan. When Ryan had higher status, participants rated him to be more agentic than Anna, and when they had equal status, participants rated their agency similarly. We found no interaction effect for partner gender and Ryan's absolute status, F (1,145) = .04, p = .835, but supporting Hypothesis 1, there was an interaction effect for partner gender and Anna's relative status on perceived relative dominance (see Table 3). Ryan was perceived to be more dominant than Anna when Anna and Ryan had traditional or equal status division. In contrast, when Anna had a higher status than Ryan, participants rated her and Ryan similarly dominant.
There was no interaction effect for partner gender and Ryan's absolute status, F (1,145) = 1.15, p = .286, but consistent with Hypothesis 2, there was an interaction effect for partner gender and Anna's relative status on perceived relative weakness (see Table 3). Whereas Ryan and Anna were perceived to be equally weak when they had a traditional or equal status division, when Anna had higher status than Ryan, participants rated Ryan as significantly weaker than Anna.

Structural Equation Modelling to Test Our Theorized Model
To test our theorized model shown in Fig. 1, we built structural equation models in Mplus (Muthén & Muthén, 2008). These models provide fit indices and allow us to constrain paths of effects we did not anticipate to zero. This way, we can show that the penalties for men and women are different and that these models show a good fit to our data. We created a dummy variable for condition (i.e., Anna had lower or equal status vs. Anna had higher status than Ryan). We created difference scores for dominance, weakness, and agency between the partners. Covariances among the three mediator variables and the five dependent variables were estimated. The theoretical model showed a poor fit with the data, χ 2 (df = 27) = 77.67, p < .001, RMSEA = .11, CFI = .91, SRMR = .09. Based on the modification indices, we freed the paths of Anna's relative agency on Anna's likeability and respect, which significantly improved the fit of the previous model, Δχ 2 (Δdf = 2) = 34.45, p < .001, RMSEA = .03, CFI = .99, SRMR = .04. We tested several alternative models (e.g., a model in which mediator and dependent variables were switched) to confirm that the current model provided the best fit to the data (see Supplement D in the online supplement).

Perceived Relationship Satisfaction in the Role-Reversed Relationship
In line with Hypothesis 3 and as shown in Fig. 2, participants in the role-reversed condition rated Anna to be relatively more dominant than Ryan compared to the other conditions, b = .26, SE = .08, p < .001, and rated Ryan weaker than Anna, b = .26, SE = .08, p < .001. Participants rated the Note. * p < .05; ** p < .01.

Table 2
(continued) .29 ** role-reversed relationship to be less satisfying when they rated Anna as more dominant than Ryan, b = -.17, SE = .07, p = .010, and rated Ryan as weaker than Anna, b = -.13, SE = .06, p = .036, compared to the other conditions. We found a significant indirect effect of condition on perceived relationship satisfaction via Anna's perceived dominance, b = -.04, SE = .02, p = .040, but not via Ryan's perceived weakness, b = -.03, SE = .02, p = .075. However, the overall indirect effect was significant, b = -.08, SE = .03, p = .006, indicating that participants perceived the role-reversed relationship as less satisfying due to their combined evaluation of Anna's relative dominance and Ryan's relative weakness.
The direct effect of condition on relationship satisfaction was significant, b = -.18, SE = .07, p = .011.

Perceived Likeability of Anna in the Role-Reversed Relationship
Consistent with Hypothesis 4, when participants rated Anna as more dominant than Ryan in the role-reversed condition (compared to the other conditions), they liked her less, b = -.32, SE = .06, p < .001, but they did not like Ryan less (see Fig. 2). The indirect effect of condition on how much Anna was liked via Anna's relative dominance was significant, b = -.08, SE = .03, p = .005, and the direct effect of condition on Anna's likeability was not significant, b = .01, SE = .07, p = .972. Unexpectedly, when participants rated Anna as more agentic than Ryan in the relationship, they liked her more, b = .28, SE = .07, p < .001. The indirect effect of condition on how much Anna was liked via Anna's relative agency was also significant, b = .11, SE = .03, p = .001. There was no overall indirect effect of condition on how much Anna was liked via Anna's relative dominance and agency, b = .02, SE = .04, p = .513. Thus, the negative effect via Anna's relative dominance and the positive effect via Anna's relative agency appear to cancel each other out.

Perceived Respect of Ryan in the Role-Reversed Relationship
Consistent with Hypothesis 5, when participants rated Ryan as weaker than Anna, they reported less respect for him, b = -.31, SE = .06, p < .001, but not for Anna (see Fig. 2). There was an indirect effect of condition on how much Ryan was respected via Ryan's relative weakness, b = -.08, SE = .03, p = .004, whereas there was no direct effect of condition on Ryan's respect, b = -.09, SE = .07, p = .201. Unexpectedly, when participants rated Anna as more agentic than Ryan, they had more respect for her, b = .39, SE = .06, p < .001. There was also an indirect effect of condition on how much Anna was respected via Anna's relative agency, b = .15, SE = .04, p < .001. Also, the direct effect of perceived relative status on respect for Anna was significant, b = .13, SE = .06, p = .002.

Study 1 Discussion
In Study 1, we provide initial evidence that backlash effects occur in heterosexual role-reversed relationships. Regardless of their gender, people rated the partner with the higher occupational status to be more agentic than the other. However, they rated Anna as similarly dominant to Ryan in the role-reversed relationship, whereas they rated the Ryan as more dominant than Anna in the traditional and status-equal relationships. Also, people rated Ryan as weaker than Anna in the role-reversed relationship, whereas there was no difference between Anna and Ryan in perceived weakness in the traditional and status-equal relationships. Moreover, participants expected role-reversed relationships to be less satisfying compared to more traditional relationships. Women's relative dominance in role-reversed relationships was also associated with less liking of them, and men's relative weakness in these relationships was associated with less respect attributed to them. There were also positive individual outcomes for women in role-reversed relationships, such that women's relative agency perceptions were associated with more respect and increased liking of them. As we did not expect these two effects, we aimed to replicate our results in Study 2.

Study 2
In order to replicate the findings of Study 1 in a different national context, we conducted a similar experiment (N = 269) in the Netherlands. Although both the Netherlands and the  (Pew Research Center, 2013). By conducting a second study in a Western country that differs in how common it is for couples to reverse gender roles, we could examine the generalizability of these findings to other contexts (Swanborn, 2010). We expect that backlash for role-reversed couples will be comparable in both countries because research has shown that backlash effects are quite persistent across different contexts (Rudman & Fairchild, 2004). We will therefore test the same hypotheses in Study 2 as we did in Study 1.

Procedure
The procedure of Study 2 was identical to Study 1 except for the form of participant compensation. Depending on where participants were recruited, they were either entered into a lottery in which we raffled a 50-Euro gift voucher for every 50 participants, received 1.50 GBP, or 0.25 credit toward partial course requirement.

Manipulation
We used similar vignettes as in Study 1, except that we changed Ryan and Anna's occupations based on a pilot test that we ran in the Netherlands (see Table 4 and Supplement A in the online supplement). To illustrate, in the condition where Ryan had low absolute status and Anna higher status than Ryan, we stated that Ryan worked either as a bookbinder or tailor and Anna as a radiologist or architect.

Measures
We used identical measures as in Study 1 except for the measure of respect. Instead of two items in Study 1, we administered one item in Study 2 to assess respect ("How much do you think Ryan/Anna is someone who commands respect from others?") as the other item did not translate well into Dutch. Reliability analyses showed similar and satisfactory alphas for all other included measures: agency (α Ryan = .82; α Anna = .87), dominance (α Ryan = .80; α Anna = .79), weakness (α Ryan = .90, α Anna = .90), likeability (r Ryan = .47, p < .001; r Anna = .48, p < .001), relationship satisfaction (α = .82).

Preliminary Analyses
Correlations showed that participants' educational level, employment status (wages vs. students), and whether they filled out the survey via Prolific Academic were all correlated with several dependent variables (see Table 5). Therefore, we controlled for these variables. Again, we also conducted the analyses without these covariates, which did not show major differences (see Supplement B in the online supplement for the results without the covariates). Similar to Study 1, we controlled for the couple's average societal status in all analyses. Again, Ryan's absolute status did not affect how stereotypically Ryan and Anna were perceived, nor how their relationship was perceived. Therefore, we decided not to report these effects again in this study. The manipulation worked as intended, such that participants perceived Anna's status relative to Ryan also as intended across the conditions (see Supplement E in the online supplement). Q-Q plots showed that all hypothesized variables were normally distributed. There were no missing data in the final sample.

Perceptions of Agency, Dominance, and Weakness in the Role-Reversed Relationships
Consistent with Hypothesis 1, we found an interaction effect for partner gender and Anna's relative status on perceived relative agency (see Table 6). Participants rated the partner with the higher status in the relationship to be more agentic than the other partner. Consistent with Hypothesis 1, we found an interaction effect for partner gender and Anna's relative status on perceived relative dominance (see Table 6). Participants rated Anna and Ryan to be equally dominant when they had a traditional or equal status division. When Anna had higher social status than Ryan, participants rated her to be more dominant than her partner.
Contrary to Study 1 and Hypothesis 2, Ryan was not rated as weaker than Anna in conditions where Anna had higher status than Ryan. There were no significant main or interaction effects (see Table 6).

Structural Equation Modelling to Test Our Theorized Model
We started our analysis by building a structural equation model in Mplus identical to the final model in Study 1 (see Fig. 3). This model provided a good fit with the data, χ 2 (df = 22) = 53.84, p < .001, RMSEA = .08, CFI = .94, SRMR = .06. Similar to Study 1, this model showed better fit than our initial theoretical model (see Fig. 1), Δχ 2 (Δdf = 2) = 60.29, p < .001, RMSEA = .13, CFI = .82, SRMR = .16 (see Supplement F in the online supplement for alternative models). Furthermore, we conducted a multiple group comparison in SEM to compare whether there were significant differences between the American sample (Study 1) and the Dutch sample (Study 2) for the final model. The constrained model provided no worse fit than the unconstrained model, χ 2 diff (df = 17) = 23.45, p = .135, indicating that there were no significant differences between the two samples for the final models.

Perceived Relationship Satisfaction in the Role-Reversed Relationship
Similar to Study 1 and in line with Hypothesis 3, participants in the role-reversed condition rated Anna as more dominant than Ryan as compared to participants in the other conditions, b = .20, SE = .07, p = .003. Contrary to Study 1 and Hypothesis 3, Ryan was not rated as weaker than Anna in the role-reversed relationship, b = .12, SE = .07, p = .075 (see Fig. 3). Anna's perceived relative dominance predicted less perceived relationship satisfaction, b = -.21, SE = .06, p = .001, but this was not the case for perceived relative weakness, b = -.01, SE = .06, p = .829. Thus, contrary to Study 1 and Hypothesis 3, participants' perceptions of Ryan's relative weakness were not associated with perceived relationship satisfaction. We found a significant indirect effect of condition on perceived relationship satisfaction via Anna's perceived dominance, b = -.04, SE = .02, p = .028, but not via Ryan's perceived weakness, b = -.00, SE = .01, p = .830. The overall indirect effect was significant, b = -.04, SE = .02, p = .027, indicating that participants perceived the role-reversed relationship as less satisfying due to the combined perception that Anna is more dominant than Ryan and Ryan is weaker than Anna. There was no direct effect of condition on perceived relationship satisfaction, b = .07, SE = .07, p = .325.

Perceived Likeability of Anna in the Role-Reversed Relationship
Consistent with Hypothesis 4, when participants rated Anna as more dominant than Ryan in the relationship, they liked her less, b = -.49, SE = .06, p < .001, but not Ryan (see Fig. 3). We found an indirect effect of condition on how much Anna was liked via Anna's perceived relative dominance, b = -.10, SE = .04, p = .006. When participants rated Anna as more agentic than Ryan, they liked her more, b = .17, SE = .07, p = .016. There was an indirect effect of condition on how much Anna was liked via Anna's perceived relative agency, b = .07, SE = .03, p = .023. Furthermore, we found no overall indirect effect, b = -.03, SE = .04, p = .526, indicating that the negative indirect effect of Anna's perceived relative dominance and the positive indirect effect of Anna's perceived relative agency on how much she was liked cancelled each other out. There was also no direct effect of condition on Anna's perceived likeability, b = .05, SE = .06, p = .478.

Perceived Respect of Ryan in the Role-Reversed Relationship
Consistent with Hypothesis 5, when participants rated Ryan's as weaker than Anna in the role reversed condition, they had less respect for him, b = -.28, SE = .06, p < .001, but not for Anna (see Fig. 3). Unexpectedly and contrary to Study 1, there was no indirect effect of condition on how much Ryan was respected via Ryan's perceived relative weakness, b = -.03, SE = .02, p = .099. There was also no direct effect of condition on Ryan's perceived respect, b = .00, SE = .07, p = .989. When participants rated Anna as more agentic than Ryan, they had more respect for her, b = .43, SE = .05, p < .001. There was also an indirect effect of condition on how much Anna was respected via Anna's perceived relative agency, b = .18, SE = .03, p < .001. Moreover, the direct effect of perceived relative status on how much Anna was respected was significant, b = .21, SE = .06, p < .001.

Study 2 Discussion
We again provide evidence that backlash effects occur at the level of the individual as well as the relationship in role-reversed relationships. Regardless of the partner's gender, people rated the partner with the higher occupational status to be more agentic than the other partner. Participants only rated Anna as more dominant in the relationship when she had a higher status occupation than her partner. Though we replicated the direction of this effect, in Study 1, participants rated Anna to be similarly dominant as Ryan when violating traditional gender norms in the relationship. Contrary to Study 1, we did not replicate the weakness penalty for men. We found that role-reversed couples face repercussions both on the relationship level (lower perceived relationship satisfaction) and the individual level (women's likeability). We replicated the finding that women were more respected and perceived as more likeable because of their relative agency in role-reversed relationships.

Study 3
Our next step was to examine how the backlash mechanisms affect the relationship quality within role-reversed relationships. Specifically, we investigated whether women 1 3 in role-reversed relationships perceive their male partner to be the weaker one in the relationship, whether men in role-reversed relationships perceive their female partner to be the more dominant one, and whether these perceptions affect their relationship outcomes. In this way, we can examine how backlash mechanisms may operate among couples in role-reversed relationships. Furthermore, as we conducted a dyadic study, we can investigate the extent to which men and women agree about the status division in their relationship. Investigating couples' perceptions allows us to analyze the effects of relative status perceptions on men and women's own outcomes (actor-effects) as well as on their partner's outcomes (partner-effects; Cook & Kenny, 2005). Finally, we will explore whether men and women in role-reversed relationships have similar relative dominance and weakness perceptions of themselves and their partner, and whether these perceptions are related to lower relationship satisfaction.

Participants and Design
Participants were heterosexual couples in the Netherlands who had been in a relationship for at least one year and of which both partners were over 18 years of age and worked at least 12 h a week. Following Kenny and Cook (1999), we calculated power based on mediation models that were not treated as dyadic models and then translated the sample size for individuals to couples. A priori calculation of multiple linear regression, with an estimated partial R 2 of .10, a power of .80, and six predictors (i.e., two independent variables and four mediators) in G*Power showed that we would need a sample size of 130 (Faul et al., 2007). Therefore, we aimed to recruit at least 65 couples. In total, 94 heterosexual couples (N = 188) met the requirements. Participants' demographics were similar for both men and women (see Table 7). This study had a dyadic design as we recruited heterosexual partners. The ethics committee of the first author's faculty approved the study (FETC17-043).

Procedure
Participants were recruited through convenience sampling. We advertised our study on social media, in supermarkets, and we approached eligible couples in our personal network. Potential participants left their contact information, which we coupled to a unique number for each dyad. Upon receiving the link to the survey, each couple received their own code that they filled out at the start of the survey. We asked each participant to share the survey with their partner if they had not already completed it. After collecting the data, we deleted the file with the personal information of participants to assure their anonymity, whereas we could still identify couples through their unique codes. At the start of the survey, participants gave their informed consent and then completed a survey, including questions regarding their background information (i.e., reporting sex, age in years, highest degree of education, hours working per week, their relationship duration in years, whether they cohabited with their partner, whether they were married to their partner, number of children, age of their youngest child in years, organizational tenure in years, and their monthly net income in euros), social status, and relationship outcomes. We distributed three €100 dinner vouchers through a lottery to couples. The survey took, on average, 15 min to complete.

Measures
All items were measured on 7-point Likert scales with response options ranging from 1 (completely disagree) to 7 (completely agree), unless otherwise indicated.

Perceived Relative Status
We used the same subjective socioeconomic status ladder measure as the previous studies (Adler et al., 2000). In this study, we asked women and men to think about their own situation and to indicate the rung where they would place themselves (M women = 7.20, SD women = 1.15; M men = 7.44, SD men = .90) and their partners (M women = 7.50, SD women = 1.32; M men = 7.27, SD men = 1.04). We then measured participants' own perceived relative status by subtracting the man's perceived status from the woman's perceived status.

Perceived Relative Dominance
To measure relative dominance, we used an adjusted version of Heilman and Wallen (2010) dominance traits by attenuating the tone of the traits. We asked participants the extent to which they felt themselves to be ruthless, dominant and whether they hold the reins in the relationship (α women = .69 and α men = .70). We asked participants to indicate the extent to which they felt their partners possess these traits in the relationship (α women = .67 and α men = .76). In the original scale, we also included the extent to which participants felt themselves and their partner to be 'firm' in the relationship. Reliability analyses showed that the alphas for the 4-item scale were lower than the alphas for the 3-item scale (α womenself = .64; α menself = .73; α womenpartner = .65; α menpartner = .81). We calculated women's perceived relative dominance by subtracting women's own perception of their dominance from women's perception of their partner's dominance in the relationship. We calculated men's perceived relative dominance by subtracting men's perception of their partner's dominance from men's own perception of dominance in the relationship.

Perceived Relative Weakness
To measure relative weakness, we used an adjusted version of Heilman and Wallen (2010) weakness traits by again attenuating the tone of the traits. In the original scale, we asked participants to indicate the extent to which they felt themselves to be passive, insecure, compliant, and a push-over in the relationship (α women = .38 and α men = .34). Next, we asked participants the extent to which they felt their partners possessed these traits in the relationship (α women = .41 and α men = .47). As the reliability of this scale was insufficient, we decided to include two items 'passive' and 'being a push-over' in the final scale (r womenself = .22, p = .016; r womenpartner = .36, p < .001; r menself = .28, p = .004; r menpartner = .28, p = .005). We measured women's perceived relative weakness by subtracting women's own perception of weakness in the relationship from women's perception of their partners' weakness. We calculated men's perceived relative weakness by subtracting men's perception of their partner's weakness from men's own perception of weakness in the relationship. Because of the four weakness items' reduced reliability, we further examined whether dominance and weakness are two different constructs. We conducted four principal component analyses (PCA) with orthogonal rotation (varimax): 1) on the eight items for women's own evaluation of their dominance and weakness in the relationship, 2) on the eight items for women's evaluation of their partner's dominance and weakness, 3) on the eight items for men's own evaluation of their dominance and weakness in the relationship and 4) on the eight items for men's evaluation of their partner's dominance and weakness. The Kaiser-Meyer-Olkin (KMO) measures revealed that each PCA's sampling adequacy was mediocre to good for the analysis, KMO > .63. Barlett's test of sphericity indicated that correlations between items were sufficiently large for all four PCA's, χ 2 (28) > 124.02, all p's < .001. After rotation, all factor loadings showed that the items we included in our final analyses represented two different components (see Supplement G in the online supplement for all rotated factor loadings).

Perceived Relative Agency
We asked participants to rate both themselves and their partner on perceived agency in the relationship. We adapted four items from traits used in the Bem Sex Role Inventory (Bem, 1981) to assess to what extent participants felt themselves to be competitive and independent in their relationship, whether they think they defend their own beliefs and whether it is easy for them to make decisions in their relationship (α women = .52 and α men = .51). Next, we used the same four items to assess the extent to which participants felt their partner is agentic in the relationship (α women = .53 and α men = .39). Scale reliability remained insufficient, and we, therefore, decided to exclude relative agency from our analyses.

Relationship Quality
We measured participants' relationship quality using one item of the time competition survey (Van der Lippe & Glebbeek, 2003). The item ("In general, how satisfied are you with your relationship?) was rated on a 10-point scale ranging from 1 (very unsatisfied) to 10 (very satisfied). Previous studies established that relationship quality is a construct that can be reliably measured with a single item (see, e.g., Blom & Hewitt, 2019;Hardie et al., 2014).

Data Analysis Strategy
To test our theoretical model, we conducted a series of Actor-Partner Interdependence Mediation Model analyses (APIMeM; Ledermann et al., 2011) using structural equation modeling in Mplus (Muthén & Muthén, 2008). We treated dyad members as distinguishable as they were in heterosexual relationships. Following Ledermann et al. (2011), we first estimated saturated models and tested all effects, including control variables. Then, we used the step-wise modeling procedure to find the parsimonious model. We excluded control variables that did not significantly predict any variable. We also investigated gender effects by comparing the saturated model with a model in which the actor and partner effects were constrained across gender. This model was not a worse in fit compared to the saturated model, χ 2 diff (df = 10) = 8.81, p = .551, RMSEA = .00, CFI = 1.00, SRMR = .02.

Preliminary Analyses
Correlations between background, predictor, and outcome variables were analyzed to identify potential covariates (see Table 8). Participants' age, relationship duration, whether they had children, were in a cohabiting relationship, or were married were associated with our most important predictor and outcome variables. For instance, participants' age was associated with women's perception that the men were more dominant in the relationship compared to the women.

3
Also, being married was associated with higher relationship quality among women, whereas having children was associated with lower levels of relationship quality among women (see Table 8). Interestingly, men and women strongly agreed on the status division within their relationship (r = .75, p < .001). There was also convergence in their perception of who is the more dominant one in the relationship (r = .56, p < .001), their perception of who is the weaker one in the relationship (r = .45, p < .001) and their relationship quality (r = .34, p = .001). Not surprisingly, correlations of relevant background variables between men and women (e.g., women reporting about whether they were cohabiting with their partner and their partner reporting about whether they were cohabiting with their partner) were almost completely overlapping (r > .94, p < .001).
Q-Q plots showed that all variables were normally distributed. Missing data ranged from 11 cases (8.6%, for relative weakness and dominance according to female partners) to 26 cases (20.3%, for relative weakness and dominance according to male partners and men's perceived relationship quality). Missing values were handled based on full information maximum likelihood (Muthén & Muthén, 2008).

Actor-Partner Interdependence Mediation Models
First, we tested a saturated APIMeM model including control variables based on the correlational analysis. These were women's age, relationship duration, having children (yes/no), having a cohabiting relationship (yes/no), and being married (yes/no). Furthermore, in this model, we also included the absolute status of the couple as a covariate. We calculated the couple's absolute status by calculating the mean of women's and men's own perceptions of their status (M = 7.25, SD = .90). This saturated APIMeM showed that only women's age and whether couples cohabited affected predictor and outcome variables. Therefore, we dropped the other covariates from further analyses to make the models as parsimonious as possible. A model in which we constrained gender effects showed no worse fit compared to the saturated model, χ 2 diff (df = 10) = 8.81, p = .551, RMSEA = .00, CFI = 1.00, SRMR = .02. This model showed significant actor effects, but no partner effects. For this reason, we tested model fit for a model in which all partner effects were constrained. This model showed no worse fit compared to the model in which we only constrained gender effects, χ 2 diff (df = 5) = 2.65, p = .754, RMSEA = .00, CFI = 1.00, SRMR = .03.

Predicting Relationship Quality via Perceived Relative Dominance
Both men and women perceived the woman as the more dominant one in the relationship when they perceived the woman to have the higher status (see Table 9 and Fig. 4). However, perceived relative dominance was not related to men's and women's relationship satisfaction. The 95% bias-corrected bootstrapping confidence interval of the indirect effect also contained a zero, indicating no indirect effect. Thus, though men and women perceived the woman to be the more dominant one in the relationship when they were in a relationship in which the woman had the higher status, this was not related to their relationship quality (see Table 9 and Fig. 4).

Predicting Relationship Quality via Perceived Relative Weakness
As shown in Table 9 and Fig. 4, there were no direct effects of men's and women's perceived relative status on their perceived relationship quality. However, men and women who perceived the woman to have higher status in the relationship also rated the man as the weaker one. This perceived relative weakness negatively predicted relationship quality. The 95% bias-corrected bootstrapping confidence intervals (C.I.) showed that the indirect effect of men and women's perceived relative status on their relationship quality via their relative weakness perceptions was significant. Thus, for men and women in role-reversed relationships, perceiving the man to be the weak one in the relationship was related to lower relationship quality (see Table 9 and Fig. 4).

Study 3 Discussion
In Studies 1 and 2, we showed how backlash operates when observers evaluate a couple in a role-reversed relationship. In Study 3, we show that men and women in role-reversed relationships have similar perceived relative dominance and weakness. They perceived the woman to be the dominant one and the man to be the weak one in the relationship when they believed the woman had a higher social status than her male partner. Furthermore, perceived relative weakness predicted lower relationship quality for couples in role-reversed relationships. In Study 3, we demonstrated that when couples in role-reversed relationships perceived the man to be weaker than the woman the relationship, they reported lover relationship quality compared to couples in traditional relationships.

General Discussion
Previous research has examined when and why women and men experience backlash when they show behaviors that are incongruent with their gender roles (Eagly & Karau, 2002;Heilman & Okimoto, 2007;Heilman & Wallen, 2010;Rudman et al., 2012). These penalties have mainly been examined in the work domain and for men and women individually. With three studies, we add to this work by showing that backlash also occurs in the relationship domain.
In this case, backlash in the form of perceived dominance of women and perceived weakness of men in role-reversed heterosexual relationships was associated with more negative evaluations of the couple and the individual members of the couple. Specifically, in two experimental studies, we found that a woman with higher status than her male partner risks a dominance penalty; she is perceived to be the more dominant one in the relationship and is therefore disliked more compared to a woman in a traditional or equal status relationship. Moreover, in one of the two experiments, we found that a man with lower status than his partner risks a weakness penalty; he is perceived as the weaker one in the relationship and consequently disrespected more compared to a man in a traditional or equal status relationship. Importantly, beyond these effects at the individual level, we add to the backlash literature by showing that, at the level of relationships, the dominance and weakness penalty also result in the perception that a role-reversed relationship must be less satisfying than a more traditional relationship.
We also demonstrated that this form of social backlash may affect the partners themselves within these relationships when they perceive status divisions that are incongruent with traditional gender roles. Previous research suggests that couples have more intimate knowledge of each other, making them less susceptible to stereotypical judgments than strangers (Trope & Liberman, 2010). However, our results suggest that couples are sensitive to gender norms in their environment and experience negative relationship consequences (e.g., lower relationship quality) when breaking with these norms.
Specifically, both men and women in role-reversed relationships perceive the man as the weaker one and the woman as the more dominant one in the relationship. The perception that the man is the weaker one in the relationship may explain, in part, why couples in role-reversed relationships experience lower relationship satisfaction compared to traditional couples. These findings are a first indication that at least some backlash may spill over to the couples themselves and that couples in role-reversed relationships experience the negative consequences of deviating from the gender hierarchy when male partners have higher status than female partners.

Buffering Effect of Relative Agency for Women
The perception that the woman is the agentic one in a rolereversed relationship also led to more positive impressions of the woman, such that people liked her more and found her worthy of more respect compared to women in traditional or equal status relationships. In the past half-century, women entered male-dominated roles in larger numbers, making women's agency more common and accepted (Croft et al., 2015;Eagly et al., 2020). Also, studies show that over the years, women have increasingly described themselves as agentic (Twenge, 1997(Twenge, , 2009. The changing nature of agency's valence might have even led to situations where agency is desired for women (such as in relationships, which we find here). Although agency is usually seen as benefiting the self, in interpersonal relationships agency might also benefit the close other (Abele & Wojciszke, 2007). In a similar vein, it has been shown that the caregiving penalty for working women can be reduced by providing people with information that the woman is the breadwinner of the family in order to provide for her family rather than for pursuing a career for her own benefit (Bear & Glick, 2017). These findings suggest that having higher status because of family-oriented, communal goals can reduce backlash in the relationship domain for women.
The changing nature of prescriptive stereotypes for women may also explain why the perception that the woman is the more dominant one in the relationship did not cause role-reversed couples to report lower relationship satisfaction. Our results suggest that within role-reversed relationships, only men's lower status in the relationship has repercussions for couples' relationship satisfaction, which is consistent with a growing body of research showing that the violation of gendered norms is more penalizing for men than women (Croft et al., 2015;Vandello et al., 2008). An explanation that is often given for this discrepancy is that lower-status groups (i.e., women) aspire to move into higher status groups (i.e., men), but higher status groups may not be so willing to give up their status (Schmader et al., 2001). Also, according to precarious manhood theory, manhood is unstable and more easily lost than femininity, with harsher social penalties as a consequence (Vandello et al., 2008). Together with the literature showing greater social acceptance of agentic women, the current findings suggest that role-reversed couples may experience less relationship satisfaction due to violating the gender stereotypes prescribing men to have high status within the relationship.

Limitations and Future Research Directions
A limitation across the studies is that we created difference scores of relative status, dominance, weakness, and agency to test the perceived trait divisions within the relationship. It is argued that using difference scores is suboptimal because it causes absolute scores to be aggregated (Cronbach & Furby, 1970). However, in our case, we think this is less of an issue because Anna and Ryan's individual statuses were experimentally manipulated and, therefore, relatively fixed. Furthermore, although we used difference scores as predictors, we also controlled for the absolute average status of the couple to make sure that we investigated the relative effects of the woman in relation to her partner only, rather than effects caused by the absolute status of the man. In the 1 3 dyadic study, we find that the outcomes were not affected by couples' average absolute status, only by their relative status differences. Another limitation is that in Study 3, we measured dominance and weakness traits of couples within the relationship, whereas we measured the perceived dominance and weakness of the fictitious couples in general (e.g., Anna is dominant) in Studies 1 and 2. We decided not to assess perceived dominance and weakness within the relationship in Studies 1 and 2 to keep the measures closely related to the original scale by Heilman and Wallen (2010). As couples have a more detailed construal of their partner than strangers (Trope & Liberman, 2010), we decided to provide more context for these items in Study 3 by asking participants to rate themselves and their partner on traits in the relationship rather than in general. Future research should investigate whether similar findings are found when asking strangers to rate couples' dominance and weakness within their relationship as we found here for ratings of dominance and weakness more generally.
Additionally, as we only investigated explicit reports of relationship satisfaction, more research is needed to investigate the effects of self and partner evaluations on a nonconscious or implicit level. A growing body of research shows that explicit partner evaluations are not associated with implicit partner evaluations and that these implicit partner evaluations are predictive of marital satisfaction over time (McNulty et al., 2013). An often-given explanation for the discrepancy between explicit and implicit relational evaluations is that individuals are motivated to see their relationship in a positive light (McNulty et al., 2014;Murray, 1999;Olson et al., 2007). Although these implicit evaluations of self and relationship mainly include valence (i.e., having a positive versus negative implicit association of one's partner), the findings might indicate a potential drawback of our study. It could be that the explicit evaluations of relative dominance and weakness and experienced relationship satisfaction might not have accurately captured individuals' implicit attitudes of relative dominance, weakness, and relationships satisfaction (e.g., Joel et al., 2017). Future research could examine whether backlash effects also operate at the implicit level.
Furthermore, we have investigated backlash effects in heterosexual relationships only, which raises the question to what extent the findings can be translated to same-gender relationships. It may be the case that men and women in same-gender relationships are less susceptible for gender stereotypes within our society as these relationships have less clear gender role prescriptions. On the other hand, there are also some indications that same-gender couples do not differ that much from heterosexual couples in their extent to which they value status in their partners (Ha et al., 2012;Lippa, 2007). Furthermore, same-gender couples are also susceptible to the influences of gender stereotypes in how others view them (i.e., gay men and lesbians are stereotyped as 'feminine' and 'masculine' by virtue of their respective orientations towards male and female partners; Hegarty & Pratto, 2001;Kite & Deaux, 1987). Future research may for instance investigate whether women who have surpassed their female partner in status have fewer negative evaluations of their partner compared to women who have surpassed a male partner. By including gay men and women in future research on the influence of gender stereotypes for couples who break with traditional gender stereotypes, we will develop a better understanding of when and how gender stereotypes affect the relationship outcomes of non-traditional couples more broadly.

Practice Implications
The findings of these studies are in line with other work showing the social penalties that men and women may face when their relationship is not meeting gendered expectations for the division of status (e.g., Bertrand et al., 2015;Pierce et al., 2013;Wilcox & Nock, 2006;Zhang, 2015). We would like to stress that these findings do not imply that traditional relationships are the most desirable or optimal relationships that couples should strive for. There is strong evidence suggesting that couples who adhere to traditional gender role divisions also experience negative relationship outcomes and may be perceived negatively (Hammond & Overall, 2013; (2011), all variables are standardized. The final results reveal no gender differences and partner effects. Therefore, only actor effects are reported. Significant coefficients are in bold. We controlled for women's age and whether couples cohabited.
APIMem actor-partner interdependence mediation model, RS perceived relative status of the woman compared to the man, DOM perceived dominance of the woman compared to the man, WEAK perceived weakness of the man compared to the woman, RQ relationship quality.

Effects
Estimate SE p C.I. 95% bias-corrected  Helms et al., 2006;Marshall, 2010;Sanchez et al., 2005). For example, while both men and women desire warmth, affection, and understanding when they devote themselves to intimate relationships (Reis & Gable, 2003), individuals low in feminine traits are less likely to experience these outcomes (Miller et al., 2003). In addition, women who perceived their partner to be a feminist report better relationship outcomes (Rudman & Phelan, 2007). Scholars have argued that the most adjusted and happiest individuals in life possess both agentic and communal qualities (Stake & Eisele, 2010). Applied to romantic relationships, individuals who possess both these traits are likely to be seen as desirable spouses and to have satisfied partners (Marshall, 2010). Past research findings on interpersonal outcomes within couples highlight how processes within couples affect their outcomes, whereas our findings suggest that group-related processes (i.e., gender-stereotypical expectations) may affect relationships as well, such that role-reversed couples partially internalize backlash effects. Future research should continue to explore how the negative evaluations of others outside the relationship directly affect experiences and behaviors of role-reversed couples. Couples are not passive bystanders and may also be influenced by the extent to which they themselves internalize and endorse traditional gender role divisions. For example, egalitarian individuals have more favorable perceptions of role-reversed couples (Gaunt, 2013). Within couples, young women with high ambitions seek a communal and family-oriented male partner (Meeussen et al., 2019). Men who endorse hostile sexist attitudes (i.e., perceiving women who challenge men's status as manipulative and subversive) behave more negatively toward their romantic partners and experience lower relationship satisfaction compared to men who do not endorse these attitudes (Hammond & Overall, 2013). The current findings suggest that the gender and sexist stereotypes in couples' environments may have a negative impact on role-reversed couples regardless of couples' own egalitarian views. This is supported by other past work showing that role-reversed couples experience more difficulties when they live in countries that still endorse traditional gender norms (Vink et al., 2022b). In order to help couples' circumvent the negative consequences of performing non-traditional gender roles, it may not be enough to solely focus on couples' own experiences, but rather the systemic context should be considered. This perspective applies not only to researchers investigating role-reversed couples' experiences, but also to relationship therapists, and employers or coaches who aim to achieve gender equality.

Conclusion
Not only do men and women risk penalties from others when they violate traditional gender norms of close relationships, men and women who perceive the woman to have higher status than the man in the relationship also internalize and may exact, at least in part, this backlash in their own relationship. Backlash occurs for couples who Fig. 4 Results of Final APIMeM model challenge the gender status quo by occupying a social status that runs counter to traditional gender norms. The gender norms that proscribe dominance for women, especially relative to their male partner, and proscribe weakness for men, especially relative to their female partner, leads to less liking of women and less respect for men in those relationships, as well as evaluating role-reversed relationships as less satisfying for the partners. Moreover, men's and women's own subjective experience of breaking from traditional status divisions was also related to less relationship satisfaction, which was explained by perceiving the male partner to be the weaker one in the relationship. Overall, these findings suggest that backlash effects for role-reversed, heterosexual relationships are another way in which the gender hierarchy is protected and why traditional gender roles are persistent and difficult to change.
Funding This study was supported by the Netherlands Organization for Scientific Research (NWO) and the Dutch Ministry of Education, Culture and Science (OCW; 2017 Gravitation Program, grant number 024.003.025).
Availability of Data and Materials (Anonymized) datasets and materials for all three studies can be made available upon request to the first author.
Code Availability Coding reports/syntaxes for the data analysis of all three studies can be made available upon request to the first author.

Compliance with Ethical Standards
Ethical Approval The studies involved human subjects, and all procedures performed were in accordance with the ethical standards of the institutional and/or national research committee. For the dyadic study, we requested and received ethical approval by the Faculty's Ethical Board, as this study required participants to answer questions about their own romantic relationship and required participants' partner to fill out the survey as well. For the experiments, we requested and received post-hoc ethical approval as requesting ethical approval was less commonplace at the moment of data collection.

Informed Consent
Each participant in all three studies completed the informed consent process.

Consent for Publication
We confirm that this manuscript has not been published elsewhere and is not under consideration by another journal.

Conflicts of Interest
The authors declare that there are no potential conflicts of interest with regards to ethical standards.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.