Making Things Happen (Un)Expectedly: Interactive Effects of Age, Gender, and Motives on Evaluations of Proactive Behavior

Proactive behavior entails self-starting actions oriented toward change in the future. Other people’s perceptions of an employee’s proactive behavior are likely shaped by personal characteristics of the employee and related expectations. We hypothesized that the intersectionality of age, gender, and two motives (i.e., achievement and benevolence) influences others’ evaluations of proactive behavior. Consistent with the social role theory and the notion of a lack of fit, results of a first experimental vignette methodology study with an employee sample (N = 101; 1818 ratings) showed that proactive behavior was rated as more effective for older men compared to younger men motivated by achievement, whereas proactive behavior was rated as more effective for younger men compared to older men motivated by benevolence. Younger women compared to older women received higher effectiveness ratings for proactive behavior independent of their motive. In a second experimental vignette methodology study with a sample of participants in supervisory roles (N = 164; 1205 ratings), we partially replicated the results of the first study: proactive behavior was rated as more effective for older men compared to younger men motivated by achievement, and proactive behavior was rated as more effective for younger women compared to older women motivated by achievement. In contrast, effectiveness ratings of proactive behavior of younger and older men as well as younger and older women motivated by benevolence did not differ. Overall, by investigating the intersectionality of age, gender, and motives, these findings advance research on influences of person characteristics on others’ evaluations of proactive behavior.


Introduction
In today's work context, shaped by volatility, uncertainty, complexity, and ambiguity, employees are increasingly expected to engage in proactive behavior, "a set of self-starting, action-oriented behaviors aimed at modifying the situation or oneself to achieve greater personal or organizational effectiveness" (Unsworth & Parker, 2003, p. 177). These behaviors extend beyond the scope of usual job roles and job descriptions (Grant & Ashford, 2008) and help employees to achieve their goals (Parker & Bindl, 2017). According to Cangiano and Parker (2016), proactive behavior may expose employees to both praise and criticism from others. One reason for such reactions may be coworkers and supervisors assigning meaning to behavior based on socially constructed aspects, such as stereotypes. Consequently, proactive behavior may be over-or underappreciated. These positive and negative reactions to proactive behavior could, in turn, impact the likelihood that employees behave proactively in the future (Cangiano & Parker, 2016). Given the importance of proactive behavior in modern organizations, a better understanding of the interplay of characteristics that influence others' evaluations of this type of behavior is needed.
Accordingly, we investigate the interactive effects of age, gender, and ascribed motives on evaluations of the effectiveness of proactive behavior based on the multiplicative approach to intersectionality. More specifically, we conceptualize multiple identities as distinctive and not necessarily divisible into their components. We assume that the effect of a certain personal characteristic on others' evaluations of the effectiveness of proactive behavior depends on the interplay of salient person characteristics (i.e., age, gender, and motives in this study). However, to learn about the relative impact of a certain person characteristic when assessing the interplay of characteristics, we compare a single characteristic within a certain identity holding everything else constant (e.g., evaluations of an older woman with the benevolence motive are compared to evaluations of a younger woman with the benevolence motive). Our hypotheses are visualized in Table 1.
The link between proactive behavior and effectiveness was highlighted by De Stobbeleir, Ashford, and Luque (2010), who found that supervisors are more likely to attribute feedback seeking of superior performers to performanceenhancement motives, compared to average performers. It is important to note that, while proactive behavior may be evaluated as part of performance appraisals, it is not part of traditional conceptualizations of job performance (i.e., completing the tasks specified in one's job description), as it is selfinitiated (Bindl & Parker, 2011). While proactive behavior involves "taking action" to change the status quo, the understanding of what it involves to change the status quo may differ between individuals. That is, due to stereotypes, people may expect different kinds of behaviors for men and women, or for younger and older employees. Thus, they may also expect different kinds of behaviors when evaluating proactive behavior as more or less effective. Stereotypical expectations used to evaluate employee behaviors are often based on overt personal characteristics such as age and gender, as these characteristics are visible and frequently used to categorize people (Marcus & Fritzsche, 2015). Ultimately, they could result in different or biased effectiveness ratings, depending on a fit between raters' stereotypical beliefs and the observed behavior. We focus on ratings of effectiveness, because proactivity is defined as a behavior to maximize effectiveness in the work context (Unsworth & Parker, 2003).
Building on intersectionality theory (Cole, 2009), we assume that the interplay of person characteristics (i.e., age, gender, motives ascribed to a proactive behavior), rather than one characteristic alone, determines whose proactive behavior is evaluated as more effective, and whose is evaluated as less effective, despite displaying the same behavior (Bodenhausen, 1990). In accordance with the social role theory (Eagly & Steffen, 1984), we postulate that violating social roles based on the combination of personal characteristics will evoke a perception of a lack of fit (Eagly & Karau, 2002;Heilman, 1983). A lack of fit tends to be penalized (Cialdini & Trost, 1998) and, therefore, results in less favorable evaluations. The present study aims to gain a more accurate and complete understanding of how intersectionality shapes evaluations of proactive behaviors.
We add to the literature on biased effectiveness ratings of proactive behaviors by explicitly addressing intersectionality in the context of effectiveness ratings of proactive behaviors. First, previous studies have focused on the impact of a single personal characteristic and an arising lack of fit between stereotypical expectations and actual behavior. Consequently, not much is known about how intersectionality may affect evaluations of proactive behavior (T. E. Duncan, Duncan, & Strycker, 2013). However, the interaction of different personal characteristics (e.g., age, gender, and ascribed motives) that are recognized simultaneously is important to gain a more realistic impression of what could evoke stereotypic beliefs (see Marcus & Fritzsche, 2015, for a review) and to gain insight into how intersectionality is judged. We therefore investigate the intersectionality of age, gender, and ascribed motives using an experimental vignette methodology design (Aguinis & Bradley, 2014;Rotundo & Sackett, 2002). We specifically focus on age and gender in this study, as these are (a) directly visible when perceiving another person at work, (b) among the strongest characteristics used to categorize others (e.g., A. P. Fiske, Haslam, & Fiske, 1991;North & Fiske, 2012;Posthuma & Campion, 2009), and (c) an essential part of the stereotype content model, which is based on the social role theory (Bürkner, 2012). Another important factor next to personal characteristics such as age and gender may be personal motives for engaging in proactive behavior. Because proactivity is a self-starting behavior, the motive (e.g., a motive to either achieve a higher position or to strengthen interpersonal relations) of the employee for displaying the behavior is crucial, as employees often base their decisions about when and how to engage in proactive behaviors on what is important to them (Grant & Ashford, 2008). In the work context, motives are observable "through verbal statements and behavior patterns" (Grant, Parker, & Collins, 2009, p. 35).
Second, only few studies have investigated evaluations of proactive behavior, but mainly focused on evaluations of organizational citizenship behavior. Organizational citizenship behavior is also behavior that goes beyond contractual obligations (Organ, 1988), with the aim of contributing "to the maintenance and enhancement of the social and psychological context that supports task performance" (Organ, 1997, p. 91), for instance through helping others or the organization. In contrast, proactive behavior is more oriented toward change (Bindl & Parker, 2011;Li, Frese, & Haidar, 2017). This makes it a more versatile behavior that will become increasingly important for organizations (Morrison & Milliken, 2000) with regard to performance and well-being (Cangiano & Parker, 2016;Thomas, Whitman, & Viswesvaran, 2010). By focusing on proactive behavior, the present study can generate specific insights into how effectiveness evaluations of proactive behavior are shaped by the interplay of personal characteristics and stereotypical expectations related to these characteristics. These insights can then be used to advance current knowledge about perceptions of proactive behavior, which can help to understand why some employees are proactive but not recognized for it. Third, self-report measures, which are often used within the stereotyping literature, require participants to acknowledge their own, potentially biased perceptions. Due to an increasing focus on equality and legislation to prevent discriminatory attitudes, participants are unlikely to disclose their prejudice (Riach, 2007). To assess the impact of stereotyping, it is therefore important to employ a more objective approach as used in this study, namely an experimental vignette methodology design. These designs allow for an assessment of how individuals "weigh, combine, or integrate information" to form a judgment (Zedeck, 1977, p. 77), thereby helping to reveal implicit decision processes (Aguinis & Bradley, 2014).

Stereotype Content Model and Social Role Theory
The interplay of characteristics such as age, gender, and motives can be understood from the perspective of the stereotype content model (S. T. Fiske, Cuddy, Glick, & Xu, 2002), which mainly focuses on the characteristics of age and gender. Therein, competence (i.e., the perceived ability of the target group to be successful at tasks seen as high in status and prestige; S. T. Fiske et al., 2002) and warmth (i.e., the perceived socio-emotional orientation toward others of the target group; S. T. Fiske et al., 2002) are the core dimensions of stereotype content. Moreover, many stereotypes are composed of low ratings on one dimension, and high ratings on the other. Men, for example, are often described as achievementoriented (i.e., as having an achievement motive), which translates into a high score on competence, mixed with significantly lower scores on warmth (i.e., benevolence motive). Similarly, younger professionals in a business context are usually seen as intelligent, interested, and courageous (Hummert, 1990), which would translate to high competence and low warmth as well. Conversely, women and older people are often seen as benevolent (i.e., as having a benevolence motive), as manifested by low scores on competence and significantly higher scores on warmth (Eagly, 1987;Eagly & Karau, 2002;. These typical motives form stereotypes, which then evoke certain expectations of how someone will (i.e., descriptive aspect) and should (i.e., prescriptive aspect) behave (i.e., a social role; Eagly, 1987).
According to the social role theory, perceivers of a certain behavior infer that there is a match between own opinions (i.e., stereotypes as outlined by the stereotype content model) and the enacted behavior (Eagly & Karau, 2002). Related to the social role theory, other social psychological theories commonly used to explain differences between men and women or younger and older employees at work (i.e., role congruity theory, Eagly & Karau, 2002, as well as the stereotype fit model, Heilman, 1983) assume that individuals base their job performance expectations on the fit between key characteristics of the respective person and the respective job. Thus, according to these theories, evaluators should match the behaviors of the respective employees with stereotypes of the group that these employees belong to. This process often results in biased performance ratings when there is a mismatch between personal characteristics and ascribed motives (e.g., based on age and gender), as the ratee is perceived as most competent when personal characteristics match the skills deemed necessary to succeed on a given task (Dipboye, 1985;Finkelstein, Higgins, & Clancy, 2000). Related to these two theories is the stereotype backlash effect, which builds on the core stereotype dimensions of the stereotype content model and occurs when individuals' behavior deviates from prescriptive stereotypes (Heilman & Wallen, 2010;Moss-Racusin, Phelan, & Rudman, 2010;Rudman, 1998;Rudman & Phelan, 2008). For example, women exhibiting masculine behaviors at work are likely to be perceived as less warm, but more competent (S. T. Fiske et al., 2002), with the latter being a concept related to effectiveness.
In sum, because age, gender, and ascribed motives ultimately relate to stereotypes, their intersectionality might influence evaluations of behavior in work settings. This may occur through perceptions of a fit between social roles and displayed behavior. For example, if ascribed motives to engage in proactive behavior do not match the characteristics of the person performing the behavior, the person may be seen as a poorer performer. As stated by Kidder and Parks (2001), "the observed employee who engages in behaviors which are seen as going above and beyond the call of duty may be rewarded. Whether or not these behaviors are 'objectively' required is beside the point: they are expected by the observer as part of the employee's role" (p. 940).

Intersectionality of Age and Gender
Investigating how the intersectionality of personal characteristics is judged by others may be especially important with regard to performance ratings. Although this interplay has not been investigated in terms of proactive behavior or organizational citizenship behavior, research concerning superiorsubordinate dyads by Tsui and O'Reilly (1989) has shown that demographic variables of subordinates, such as age, gender, race, education, as well as company and job tenure, influence supervisor ratings of performance and liking of their subordinates. While Tsui and O'Reilly's study has not examined interaction effects, they did consider several demographic variables. Moreover, Griffeth and Bedeian (1989) emphasized the importance of interactions between age and gender. They investigated main and interaction effects of age and gender of ratees on raters' performance evaluations, as well as the age and gender of raters on their performance evaluations of ratees. Using 464 supervisor-subordinate dyads, they found that in addition to several main effects, there was a significant interaction of age and gender for the relation between the rater's age and the ratee's gender. Follow-up analyses showed that raters between 30 and 39 years old gave lower ratings to men compared to women but did not reveal any differences for age groups between 40 and 49 years or above 50 years old. Based on these findings, the authors recommended future research to investigate these interactions further to enrich the current understanding of complex employee performance ratings.
Generally, research has mainly focused on stereotypes regarding either age or gender and neglected other group memberships. Therefore, no accurate understanding of stereotypical biases at work can be established, as every worker is not merely "younger" or "older," "man" or "woman." Instead, workers are, for example, "younger men" or "older women." For example, in their stereotype content model, S. T. Fiske et al. (2002) state that stereotype content does not merely reflect evaluative antipathy, but also illustrates different degrees of (dis)like or (dis)respect depending on personal characteristics, as stereotypes often consist of both more and less socially desirable traits. These stereotypes are called paternalistic and apply to race, age, dialect, and gender prejudice. An example is ageism, where the dominant view is that older people are kind but incompetent, while younger people are eager to learn. Furthermore, when women in general are rated, traditional homemakers (warm but not competent) serve as the norm, although business women are rated similarly to men (cold but competent; S. T. Fiske et al., 2002).
The importance of intersectionality of several social group memberships is further highlighted by the intersectional view (Cole, 2009). This perspective developed out of social role theory (Bürkner, 2012), as occupying a social role also means occupying a certain identity. It suggests that individual experiences should be interpreted in terms of one's unique group memberships, as only focusing on one category neglects the complex nature of stereotyping and the interplay of various social categories (Özbilgin, Beauregard, Tatli, & Bell, 2011).
Building on the complexity of social categories and associated stereotypes as defined in the social role theory, the intersectional view describes category memberships as processes, with different combinations of multiple group memberships yielding different consequences (Cole, 2009). This conceptualization is in line with social psychological literature on multiple categorizations, according to which categorization leads to stereotyping of group members by oneself and other group members (Crisp & Turner, 2011). Within intersectionality, there are two approaches: the unitary, or additive, approach and the multiplicative approach. The additive approach stresses that demographic aspects of individuals are distinct from one another and that each category itself is the best predictor of a given outcome (Hancock, 2007;Weldon, 2006). For example, the effects of age and gender would in this case be seen as the sum of the effects of the two separate characteristics (Dubrow, 2008). In contrast, the multiplicative approach states that the effect of a certain personal characteristic depends on the interplay of all assessed characteristics. In its strictest form, the multiplicative approach entails that characteristics are inseparable, as each category does not have an autonomous effect (McCall, 2008;Weldon, 2006). An extreme example for this prediction is status. As status hierarchies often cut through different parts of society, a person can be both advantaged due to a membership in a specific social group, while being disadvantaged (to varying degrees) because of belonging to other social groups (Steinbugler, Press, & Dias, 2006).
As part of the intersectional view, several theories have emerged (e.g., double jeopardy hypothesis, double advantage hypothesis, ethnic prominence, subordinate male hypothesis, tokenism; see Marcus & Fritzsche, 2015, for a review). Although these theories are contradictory in aiming to explain multiple subordinate group memberships, all have received at least some support in the psychological literature. This observation may be due to contextual factors, which could influence the process of multiple subgroup membership, such that there are contextual variations of the stereotypical effects of these memberships (Cole, 2009). While context can be an important factor for stereotypes to arise, we held the organizational context constant in our study to only assess how perceptions of age, gender, and motives shape evaluations of proactive behavior through the assumption of social roles.

Gender and Age Stereotypes at Work
In the work context, observers first perceive overt characteristics such as age and gender of the person showing a given behavior, and interpret these in terms of their own stereotypes (Stajkovic & Luthans, 2003). Those beliefs specifying how a person typically behaves are called descriptive stereotypes (Heilman, 2012).
The content of gender stereotypes has been frequently studied (see Heilman, 2012, for a review). Within these studies, men are often characterized as achievement-oriented (e.g., being competent, task-focused, and ambitious), while women are seen as benevolence-oriented (e.g., warmth and friendliness). These conceptualizations of men and women do not only differ from each other but are often seen as mutually exclusive: Characteristics of women are thought to lack in men and vice versa. An exception to this rule are business women, who are characterized similarly to men (e.g., both men and business women would be characterized as achievement-oriented; S. T. Fiske et al., 2002). Moreover, the characterizations of men and women are consistent within the literature across cultures and contexts (see Heilman, 2012, for a review). Regarding employment settings, a series of studies by Heilman (Heilman, 1983(Heilman, , 1995(Heilman, , 2001 showed that women are evaluated worse in typically male gender-typed jobs due to a perceived lack of fit between the stereotypical attributes of men, women, and the organizational context. Moreover, Luksyte, Unsworth, and Avery (2017) found that because innovative work behavior is stereotypically ascribed to men, women who innovate did not receive better performance ratings compared to women who did not innovate. In contrast, engaging in innovative behaviors was perceived positively for men.
While gender is an individual characteristic that provides a strong basis for categorizing people into groups (A. P. Fiske et al., 1991;Knippenberg, Twuyver, & Pepels, 1994;Stangor, Lynch, Duan, & Glas, 1992), stereotypes against older workers are prevalent in the workplace as well (North & Fiske, 2012;Posthuma & Campion, 2009). Age stereotypes in the workplace describe beliefs and expectations about workers based on their age (Hamilton & Sherman, 1994). More specifically, people are likely to infer social and cognitive competencies, as well as physical abilities, from an individual's age (i.e., older workers are generally perceived as lacking flexibility, innovativeness, and an orientation toward change while being less energetic and motivated about their job in comparison to their younger counterparts; Posthuma & Campion, 2009). For example, in a literature review featuring 117 research articles, Posthuma and Campion (2009) found that older workers are seen to have lower ability, motivation, and productivity compared to younger workers, who are more adaptable, flexible, and more resistant to change. Moreover, older workers are often ascribed a lower ability to learn and assumed to have shorter job tenure, meaning that benefits from training or proactive behavior will yield less benefit compared to younger workers.
Various studies have shown that these stereotypes influence employment-related decisions at work, for example in performance appraisals (see Posthuma & Campion, 2009, for a review). Thus, they may also impact evaluations of related behaviors such as proactivity. Yet, the likelihood of prejudice and discrimination seems to be most prevalent for older employees over 45, or younger employees under 25 (C. Duncan & Loretto, 2004).

The Role of Motives for Effectiveness Evaluations of Proactive Behavior
With regard to proactive behavior, an important factor next to person characteristics such as age and gender may be ascribed motives or intentions for engaging in proactive behavior.
Because proactive behavior is self-starting, the ascribed motive (e.g., a motive to either achieve a higher position or to strengthen interpersonal relations) of the employee for displaying the behavior is crucial, as employees often base their decisions about when and how to engage in proactive behaviors on what is important to them (Grant & Ashford, 2008). Therefore, motives are an essential personal characteristic to consider when investigating intersectionality of age and gender for evaluations of the effectiveness of proactive behavior.
We argue that if there is congruence between motives and stereotypical social roles, then more "credit" is given (i.e., proactive behavior is judged as more effective). The reasoning is supported by a number of studies. Grant et al. (2009) found that employees having benevolent intentions when being proactive received higher ratings of overall job performance compared to employees focusing on self-serving motives. However, the authors neglected possible effects of other personal characteristics such as age and gender. Therefore, the impact of motives on evaluations of proactive behaviors in the context of age and gender remains to be investigated. Similarly, Nguyen, Johnson, Collins, and Parker (2017) investigated proactive behavior in an uncertain and unpredictable environment (i.e., in a hospital emergency department). They found that people who expressed greater confidence were judged by their supervisors as more proactive. Lastly, Fuller, Marler, Hester, and Otondo (2015) found that, consistent with self-determination theory (Deci & Ryan, 1985), followers who engage in taking charge behavior receive higher performance evaluations when leaders feel responsible for constructive change.
Other studies relating to organizational citizenship behavior have noted that attributions for this behavior toward either self-serving or other-serving motives may depend on the quality of the relationship between the person showing the behavior and the person rating it. More specifically, Bowler, Halbesleben, and Paul (2010) found that high-quality relationships are related to attributions of prosocial motives by the leader, but self-serving motives by coworkers. Furthermore, Halbesleben, Bowler, Bolino, and Turnley (2010) found that these ratings also relate to a supervisor's emotional reactions to such behaviors.

Hypothesis Development
The stereotypes described above function as heuristics used to form impressions about a given individual based on overtly perceived person characteristics while saving energy and responding quicker in complex situations (Macrae, Milne, & Bodenhausen, 1994). As these stereotypes are created automatically, individuals often do not recognize the influence of descriptive stereotypes on their own impressions and judgments (Banaji & Hardin, 1996;Banaji, Hardin, & Rothman, 1993). The impact of stereotypes becomes problematic when a perceived "lack of fit" between the assumed characteristics of a person and a given behavior occurs. In the context of proactive behavior, this means that employees who are exhibiting behaviors congruent with their ascribed stereotypes should receive higher effectiveness ratings (see Table 1 for an overview of relevant comparisons).
Due to this study's focus on age (i.e., younger, older), gender (i.e., men, women), and motive (i.e., achievement, benevolence), eight combinations of these characteristics emerge: younger men, younger women, older men, and older women with an either achievement-oriented or benevolence-oriented motive. Based on the intersectionality theory and the notion of multiple group memberships, expectations of typical behaviors should differ between these eight constellations. The reasons for these differences are accounted for by theory and research on category and stereotype formation (see Marcus & Fritzsche, 2015, for a review). According to these theories, age is one of the most dominant categories one can be a member of. When age is not salient, however, more visible categories such as gender will become more influential. Although some situations evoke more age salience than others (e.g., explicit bias toward younger or older workers), it is not the strength of age salience, but the existence of age salience at work that matters. When considering the workplace, younger age is generally associated with a higher motivation to learn, adaptability, and flexibility (Posthuma & Campion, 2009). As proactive behavior is aimed at change, younger individuals should generally be evaluated more favorably when being proactive, as this type of behavior fits the stereotype.
According to Marcus and Fritzsche (2015), who used intersectionality theory to understand ageism at work, younger White men are archetyped as the "norm." In contrast, older White men are associated with leadership and being a "gentleman." The association of men with leadership has been supported by research. For example, Dennis and Kunkel (2004), as well as Powell, Butterfield, and Parent (2002), found that successful leadership is often described with stereotypically male traits. Moreover, in a meta-analysis, Koenig, Eagly, Mitchell, and Ristikari (2011) showed that leaders are cross-culturally perceived as masculine. According to Heilman (2012), these stereotypes fuel a positive evaluation of male supervisors and a more negative evaluation of female supervisors.
When combining the association of men and leadership with age, Marcus and Fritzsche (2015) state that older White men are the dominant societal group in the Western world. For example, the mean age of the 115th United States Congress and U.S. House of Representatives exceeds 57 years, with nearly 80% of these individuals being White men (Manning, 2018). Similar numbers can be found in Europe, Asia, or Australia, where most political leaders are White men (Marcus & Fritzsche, 2015).
In sum, individuals should expect older White men to be leaders (i.e., high achievers), while younger men should be expected to be dominant, but not yet leaders. This may mark an exception to the assumption of younger individuals being seen as more proactive and would imply that proactive behavior reflecting an achievement motive is better evaluated for older compared to younger men. For proactive behavior associated with a benevolence motive, younger men should be evaluated as more effective when acting proactively compared to older men. As the benevolence motive is generally not stereotypically expected of men, proactive behavior conveying a benevolence motive is incongruent with the perception of men as leaders. Since the stereotype of being male and achievement-oriented can be especially strong for older men (Koenig et al., 2011), proactive behavior associated with a benevolence motive should to be more acceptable for younger compared to older men. Therefore, when comparing within categories and holding everything else constant, we hypothesize that: & Hypothesis 1: Older men receive higher effectiveness ratings than younger men for proactive behavior associated with an achievement motive. & Hypothesis 2: Younger men receive higher effectiveness ratings than older men for proactive behavior associated with a benevolence motive.
According to Marcus and Fritzsche (2015), younger White women are characterized as "sweethearts," while older White women are often stereotyped as "grandmas." However, according to S. T. Fiske et al. (2002), women in the business context may also be characterized as more agentic. Thus, it is possible to stereotypically characterize a woman as benevolent when gender attributes are most salient, while it is also possible to characterize her as more achievement-oriented when the business context is salient. Ultimately, it may be the case that these two extremes cancel each other out, resulting in a neutral categorization of women on the group level. This assumption is supported by research from S. T. Fiske et al. (2002), wherein opposed ratings for black professionals (i.e., high competence) and poor blacks (i.e., low competence) canceled each other out, "leaving only the generic group in the middle." (p. 889) As a result, Blacks were characterized not on the basis of their race, but status as the remaining defining characteristic.
In the context of this study, the extreme ends of "typical" women and business women may lead to an ambivalent categorization of women in the study as neither benevolent nor achievement-oriented. Therefore, age would be the defining characteristic and should have a stronger impact compared to gender in this case. More specifically, as younger individuals are stereotypically more likely to be expected to initiate change, younger women should receive higher effectiveness ratings for both proactive behavior that is associated with achievement and benevolence motives compared to olden women. Moreover, as both motives could be attributed to women, the effectiveness ratings for younger and older women should not differ between an achievement and a benevolence motive but focus on age-related expectations. We therefore expect that, when comparing within categories and holding everything else constant: & Hypothesis 3: Younger women receive higher effectiveness ratings than older women for proactive behavior associated with an achievement motive. & Hypothesis 4: Younger women receive higher effectiveness ratings than older women for proactive behavior associated with a benevolence motive.

The Present Studies
We tested our hypotheses using two studies with an experimental vignette methodology design. Experimental vignette methodology studies attempt to reveal the implicit decision rules of individuals, such as overall effectiveness ratings for proactive behaviors. In both studies, the independent variables (i.e., age, gender, and ascribed motives) are systematically manipulated to assess their influence on raters' evaluations of the dependent variable (in this study: the effectiveness of proactive behavior; Aiman-Smith, Scullen, & Barr, 2002). This systematic variation allows an analysis of the importance of the manipulated factors, or independent variables (Aiman-Smith et al., 2002). More specifically, regression coefficients would indicate the relative importance of the presented cues.
In an experimental vignette methodology study, the manipulations are very clear and explicit, and researchers therefore often do not conduct a manipulation check. Instead, it is common to conduct a priori pilot studies to test the cues' effectiveness. To check whether participants rate the same conditions consistently, a duplicate vignette is often included to determine test-retest variability. In addition, as we employed a within-person design with almost balanced gender of participants as well as a broad age range, possible effects of internal factors such as self-confidence and self-efficacy are equally distributed across ratings of all vignettes and should therefore not impact the results of the study. In the following sections, we first report the results of a pilot study designed to test the selected proactive behaviors and motives for use in the main study, followed by a second pilot study to test stereotypical expectations in terms of achievement and benevolence for both gender (i.e., man, woman) and age (i.e., younger [20-25 years] and older [60-65 years]). This confirmation is important, as current conceptualizations of stereotypes are general and do not specifically address proactive behaviors.
Finally, we describe the results of studies 1 and 2. Based on previous research outlined in the "Introduction" (C. Duncan & Loretto, 2004), we assume that age stereotypes are only of concern for employees under 25 and over 45 years. We further focused on achievement and benevolence as typical attributes for men and women, respectively (i.e., men as stereotypically achievement-oriented and women as either benevolent or achievement-oriented). To validate the stereotypical association of gender with benevolence and achievement, as well as the ages most susceptible to stereotyping, we added "middleage" as a neutral age category, as well as "stimulation" as a gender-neutral motive in the first study (Prince-Gibson & Schwartz, 1998;Schwartz & Rubel-Lifschitz, 2009).

Pilot Study 1
Method Based on the recommendations by Rotundo and Sackett (2002), we conducted a pilot study to validate vignettes for the main studies. We selected 9 out of 12 vignettes (i.e., items 1, 2, 3, 5, 6, 7, 8, 10, and 12) for proactive behavior from a situational judgment test on personal initiative by Bledow and Frese (2009). This test on personal initiative can be used to assess proactive behavior, because this concept describes a general, relatively broad form of proactivity (Frese, 2006;Frese & Fay, 2001;Rank, Pace, & Frese, 2004), and is defined as an anticipatory action that employees use to impact their environments (Parker, Williams, & Turner, 2006). It therefore follows a process that can be applied to any set of actions, defined by anticipating, planning, and striving to have an impact (Grant & Ashford, 2008).
We excluded the fourth (i.e., appreciating the department's practice of leaving doors open) and ninth (i.e., talking about opportunities for promotions) items, as these already hinted at certain motives. We also excluded item 11, as it was about solving a problem rather than allowing for a self-starting and future-oriented action, as it would be important for proactive behavior (Grant & Ashford, 2008;Unsworth & Parker, 2003). An example vignette is: "Person A works in a medium-sized organization. A new computer program was installed in the department. No detailed training was provided to save time and money. Person A and some colleagues feel insecure in dealing with this new program. Errors frequently happen which leads to a loss of time." To assess both proactive and non-proactive behaviors, we used the answers on the test yielding the highest (for proactive behavior) and lowest (for non-proactive behavior) scores. Within this vignette, high proactive behavior would be specified as: "(…) Person A organizes an internal training in which more experienced colleagues share their knowledge." Low proactive behavior would be specified as: "(…) Person A does not get upset about it because with more practice Person A will stop making errors." We further designed three statements for each ascribed motive (in this study: achievement and benevolence; stimulation as a control condition) based on Schwartz (2012) value theory, which defines values as guiding principles in life and identifies ten basic motivations or value types of both individuals and cultures, resulting in a total of nine motive statements. Next to achievement and benevolence, stimulation was used as a neutral category, as there are no theories or studies rendering it part of social roles. As there nevertheless is research showing that men and women may differently rate stimulation (Schwartz & Rubel, 2005), we control for the gender of the raters in our analysis.
We recruited 30 participants through our social and professional networks for the first pilot study. The sample consisted of 17 women, 12 men, and 1 other gender between 23 and 58 years (M age = 28.17; SD = 7.17). The online survey was composed of two parts. In the first part, participants were given the definition of proactive behavior. This procedure was needed to establish a common understanding of the proactive behaviors to be assessed. They were then asked to read all 18 of the previously developed vignettes in random order and to rate them in terms of their proactive behavior (1 = not at all proactive, 4 = moderately proactive, 7 = very proactive) and their effectiveness (1 = not at all effective, 4 = moderately effective, 7 = very effective). In the second part, participants were given the definitions of achievement, benevolence, and stimulation motives. Thereafter, they were asked to sort each of the nine randomly presented items into the respective category (i.e., achievement, benevolence, stimulation).

Results
Results for the proactive behavior ratings from the first part of the pilot study are presented in Table 2. Ratings of the proactive behaviors suggest that participants successfully differentiated between the different levels of proactive behavior, as mean ratings align with the level of the presented item and are significantly different for the low and high conditions. Moreover, the statements rated as proactive were also rated as more effective (for proactive statements: M total = 5.38, SD = 0.75, range = 4.20-6.57; for non-proactive statements: M total = 2.86, SD = 0.52, range = 2.10-3.60). Table 3 displays the results from the second part of the pilot study, where participants were asked to categorize statements into motive categories. All items were correctly classified in 80% of the cases or more, suggesting that the statements match well with their ascribed motives.

Pilot Study 2
Method In our hypotheses, we assumed that in the work context, proactive behaviors are mostly expected of younger employees. Furthermore, we assumed that proactive behaviors aimed at achievement are most likely to be expected of men, while both achievement and benevolence motives may be expected of younger women. To examine these crucial assumptions, we conducted a second pilot study. In total, 33 participants (14 women, 18 men, and 1 other gender) completed the study. The age range of the recruited participants was between 20 and 69 years (M = 31.42, SD = 13.07). We ensured that participants of the first pilot study were not part of the second pilot study by not inviting them to the present study.
In the first part of the study, we validated the motive (i.e., achievement and benevolence) using the items from the first pilot study. This was done to ensure that the respective items were correctly perceived in the remainder of the pilot study, as the vignettes already included attributed motive for the respective behavior. Thereafter, we presented participants with the nine positive proactive behavior vignettes validated in the first pilot study. For each vignette, we used synonyms to indicate an individual (e.g., person A), while including an ascribed achievement and a benevolence motive using the motive descriptions validated in the first pilot study, yielding a total of 18 vignettes. Participants were then asked to indicate whether they would expect the respective behavior from a man or a woman, and from a younger (20-25 years) or an older (60-65 years) worker.

Results
Both achievement and benevolence items were correctly classified in more than 90% of cases. Furthermore, proactive behaviors with an associated achievement motive were mostly categorized as being exhibited by younger (100%) men (77.8%), while proactive behaviors associated with benevolence motive were mainly categorized as being displayed by younger (55.56%) women (77.8%). These results confirm our assumption that proactive behaviors aimed at achievement are most likely to be expected of men. They further show that benevolence is more likely to be associated with women in this sample. However, it can also be seen that achievement is not always expected of men, pointing toward possible ratings of women as achievement-oriented. With regard to age, proactive behavior motivated by achievement was attributed to younger individuals only. In regard to proactive behavior associated with benevolence motives, this age division is less clear, but still existent.

Vignette Development
In the vignettes, we used the nine proactive behaviors included in the first pilot study that were categorized as proactive. We added male or female German names to signal gender, as well as ages, for engaging in proactive behavior to each vignette. It is important to note that we have chosen traditional German names that made no reference to ethnicity or religion. We further chose the middle-age range based on the mean age of the German working population, which is currently 44 years old (Statistisches Bundesamt, 2018). Within the vignettes, the middle-age category and stimulation motive served as neutral categories. The purpose of these control categories was to assess the quality of the manipulations within the vignettes. More specifically, past theory and research has focused on achievement and benevolence-related motives when it comes to gender differences (e.g., gender studies using social role theory or the underlying stereotype content model). Furthermore, researchers have concluded that the age most susceptible for prejudice is under 25 or above 45 years (C. Duncan & Loretto, 2004), which is why we choose the middle-age category as a control condition. Ultimately, if our manipulations work, there should be no significant differences for middle-aged employees or employees motivated by stimulation.
During the study, vignettes were randomly presented to each participant to avoid primacy or recency effects that could confound the ratings. An example vignette (younger, woman, achievement) is: "Johanna is 20 years old and works in an open-plan office. The workstations are badly arranged. There is not sufficient space to store everything needed on the desk. Furthermore, Johanna has to walk far. The problem will be resolved for Johanna in a couple of months because she will change the job in the company. She nevertheless rearranges the office furniture together with her colleagues in order to have more space, because it is important to her to always perform well." Items are adapted from the Schwartz Value Survey (SVS; Schwartz, 2003) Another vignette (older, man, benevolence) is: "Dieter is 65 years old and works in a middle-sized organization. Due to a conflict among colleagues, the climate in his department is rather tense. Dieter is not involved in the conflict. However, he feels disturbed in his work. The attempt of one of his colleagues to reconcile the conflict was not appreciated. Even if they react negatively in the beginning, Dieter takes charge of mediating among my colleagues to keep a good team climate." Each participant was asked to rate 18 of those randomly generated vignettes in terms of effectiveness directly after reading each vignette. Furthermore, to assess test-retest reliability, two vignettes were included twice, resulting in a total of 20 vignettes presented to each participant. In this study, test-retest reliability for effectiveness ratings (rvignette6 = 0.72; rvignette15 = 0.78) was sufficiently high.

Participants and Procedure
Participants were recruited via professional contacts and acquaintances to foster heterogeneity and thereby increase external validity (Demerouti & Rispens, 2014). While sampling, we ensured that the convenience sample was heterogeneous, as not only supervisors, but also coworkers may evaluate employees' proactive behaviors and react positively or negatively to such behavior (e.g., Cangiano & Parker, 2016). While convenience samples are often not representative in terms of demographics and absolute values of the central variables (i.e., statistical generalizability), it is still possible to use them to test theoretically derived hypotheses on relationships between variables (i.e., theoretical generalizability; see Highhouse & Gillespie, 2009;Landers & Behrend, 2015).
Only participants with work experience were admitted to the study. Moreover, participants invited for one of the two pilot studies were excluded from participating by not inviting them to the present study. Due to this choice of recruitment, the sample clearly is a convenience sample that includes employees from different industries, occupations, and organizations. The resulting diversity can also be seen in the employee demographics. In total, 110 individuals clicked on the link to the study and provided some or all demographics at the beginning of the study. Of these, a final sample of 101 participants, 51 of which were women (50.5%) and 50 of which were men (49.5%), completed the online study, resulting in a total of 1818 ratings. In the German employment context (see Statistisches Bundesamt, 2020, for an overview), 53% of employees are men and 47% women, which slightly differs from our sample. The average age of the participants was 34.00 years (SD = 13.04), ranging between 20 and 67 years. In the German employment context, the average age of employees is currently 44 years old (Statistisches Bundesamt, 2018), which differs from our sample as well. Regarding their education, 27 participants had a higher education school certificate, 31 had a Bachelor's or equivalent degree, and 22 had a Master's degree or a PhD. The remaining participants either completed secondary school or an apprenticeship or indicated "other." On average, participants were employed for 9.12 years (SD = 15.62, range = 2 weeks-40 years). Job descriptions were mainly located in the service sector and included, for example, dentist, personnel manager, sales representative, media designer, teacher, product manager, or administrative assistant.
Because each participant rated 18 vignettes (excluding the duplicate vignettes), a moderately large sample offers sufficient power to test our hypotheses (Scherbaum & Ferreter, 2009). For example, Maas and Hox (2005) conducted a simulation study with varied numbers of clusters (level 2 variables; N = 30, 50, 100), varying cluster sizes (n = 5, 30, 50), and varying intraclass correlations (ICC = 0.1, 0.2, 0.3) to investigate the effect of different samples on parameter estimates and their sampling errors. Results showed that despite a small sample size, even complex regression models (e.g., with random slopes) were estimated correctly.
Participants were asked to complete an online survey of approximately 30 min that consisted of three parts. In the first part, participants were instructed to read and evaluate each of the randomly presented 20 vignettes (including two duplicate vignettes to assess test-retest reliability; Rotundo & Sackett, 2002). By randomizing the order of the vignettes, potential fatigue effects from rating 20 vignettes can be accounted for. In the second part of the study, participants were instructed to complete different surveys concerning their personality and attitudes, which are not the focus of this study. In the third part, participants were asked to provide demographic information (e.g., age, gender, education, job tenure).

Measures
Effectiveness Past research has not yet offered a clear definition of effectiveness at work. So far, it has been termed broadly as productivity (Kofodimos, 1993), success at work (Caligiuri & Lazarova, 2005), meeting one's own performance standards (Greenhaus & Ten Brummelhuis, 2013), or accomplishing negotiated role expectations between an individual and relevant parties in work and family roles (Grzywacz & Carlson, 2007). In this study, the perceived effectiveness of a proactive behavior was measured by asking participants: "How effective was the behavior of [the employee]?" with answers given on a 7-point scale ranging from 1 = not at all effective to 7 = extremely effective.
Demographics We controlled for the chronological age and gender (i.e., man, woman, none of the above) of the participants (i.e., raters), as age has been shown to differentially affect stereotypical ratings (Jackson & Sullivan, 1988).
Similarly, we assumed that the participant's gender might influence the content of the stereotypical expectations. The respective variables were assessed by asking "What is your current age?" and "What is your gender?"

Statistical Analysis
We tested our hypotheses at the within-person level with a fixed effects model (Meinck & Vandenplas, 2012;Snijders, 2005). We modeled our data using random coefficient modeling, which is the recommended approach for experimental vignette methodology studies (Aguinis & Bradley, 2014). This statistical method describes a multilevel model, in which group differences are modeled using coefficients that can either be fixed or random (i.e., varying across levels). A model with random slopes did not offer a better model fit compared to a model with a random intercept only (χ 2 [20] = 38.15, p = 0.010) when applying a Bonferroni correction for multiple model comparisons (α corrected = α/N, in this case, α < 0.001); therefore, we used a model with fixed slope and random intercept. As described by Tabachnick and Fidell (2001), we computed the model with fixed slopes and random intercepts to predict effectiveness ratings of participants across vignettes (i.e., y ij = γ 00 + γ 10 x ij + e i , where γ 00 denotes the grand mean of the scores of the outcome variable when all groups are fixed at the reference group, γ 10 the slope between a level 1 predictor and the outcome variable, x ij the level 1 predictor for a given rating of person i of a vignette j [level 2], and e i the error term for variance within a person i). We fitted regressions using three nominal code variables representing the different ages (coded young, middle-aged, old), genders (coded man, woman), and motives (coded achievement, benevolence, stimulation). We added the control variables (i.e., age, gender) and their interactions with relevant vignette variables (i.e., rater age and vignette age; rater gender and vignette gender) to the model. Age was centered at the grand mean.
We analyzed the data using RStudio 1.1.463 software. We first checked whether the three-way interactions of age, gender, and motives were significant and followed up with comparisons to test the desired hypotheses. Due to performing multiple comparisons, we also employed Tukey's HSD (honestly significant difference) correction, which involves a modified t test to account for chance capitalization (i.e., an inflated type I error rate). We chose this type of adjustment, as it maintains an acceptable type I error (i.e., equal to α) when sample sizes are equal.

Preliminary Analyses
We first checked whether the middle-age category and stimulation motive can be seen as neutral categories, meaning that there are no significant differences in ratings for these conditions. The middle-age category had similar ratings to the younger age category across all three motives (i.e., achievement, benevolence, stimulation). An exception is men ascribed with a stimulation motive, as well as women ascribed with a benevolence or a stimulation motive. For these groups, middle-aged individuals received similar ratings to older individuals (Table 4). Regarding benevolence, a reason for this exception might be that women can be seen as either generally benevolent or as achievement-oriented in employment context (S. T. Fiske et al., 2002). Being middle-aged may strengthen the age-related expectations of higher age being associated with both warmth and incompetence , and thus lead to ratings being more consistent with these negative age stereotypes.
Second, with regard to the ascribed stimulation motive, individuals received similar ratings as well. Although middle-aged men and older women received slightly higher effectiveness ratings compared to other age groups of the same gender (all p's < 0.05), ratings are mostly similar and confirm that stimulation motive can be seen as a neutral category (see Fig. 1 and Table 4).

Hypothesis Tests
To test our hypotheses, we ran a multilevel analysis predicting the effectiveness ratings. Partitioning of the variance in between-and within-level variance using a null model showed that 23% of the variance in effectiveness resided at the between-person level and 77% of the variance resided within-persons. Since the three-way interaction between age (i.e., young vs. old), gender (i.e., man vs. woman), and ascribed motives (i.e., achievement vs. benevolence) was significant (β = − 3.79, t = − 10.95, p < 0.001), we followed up with multiple comparisons within the same categories holding everything else constant. Thereby, we focused on our variables of interest, namely on age (younger, older), gender (man, woman), and motive (benevolence, achievement). The results of the complete analysis can be seen in Table 5 as well as Fig. 2 (for achievement) and Fig. 3 (for benevolence).
The first hypothesis was that proactive behavior motivated by achievement is evaluated as more effective for older men compared to younger men. This hypothesis was supported, as younger men with achievement motives received lower ratings compared to older men (β = − 1.89, t = − 10.93, p < 0.001). The second hypothesis stated that proactive behavior motivated by benevolence is evaluated as more effective for younger men compared to older men. The second hypothesis was also supported, as younger men received significantly higher effectiveness ratings compared to older men for proactive behavior motivated by benevolence (β = 0.92, t = 5.35, p < 0.001). The third and fourth hypotheses stated that younger women will receive higher effectiveness ratings than older women for proactive behavior being motivated by achievement (Hypothesis 3) or benevolence (Hypothesis 4). Both hypotheses were supported, as younger women received higher effectiveness ratings on both achievement (β = 1.25, t = 7.21, p < 0.001) and benevolence (β = 0.76, t = 4.40, p = 0.002) compared to older women.

Discussion
In this study, we investigated the effects of personal characteristics on effectiveness ratings of proactive behaviors. The results showed that the interplay of age and gender, as well as associated motives for proactive behavior, can influence effectiveness ratings of proactive behavior in the work context. More specifically, in accordance with the first hypothesis, proactive behavior motivated by achievement was evaluated as more effective for older men compared to younger men. The second hypothesis was supported as well, because younger men's proactive behavior motivated by benevolence was rated as more effective than the same behavior of older men. Focusing on effectiveness ratings for women's proactive behavior, we further found that the evaluations of younger women's proactive behavior were higher compared to older women regarding both an associated achievement and benevolence motive. These findings supported the third and fourth hypotheses.
In the second study, we aimed to replicate the findings in a sample of supervisors, who routinely conduct performance evaluations of employees, which provides a different context for evaluating proactive behavior. Moreover, to allow for an estimation of consistency for the effectiveness measure, we will use a multi-item measure to assess effectiveness evaluations in the second study.

Vignette Development
We used the same vignettes as in study 1, except for the vignettes with control conditions (i.e., middle-aged and stimulation motive), resulting in a 2 (i.e., age: younger, older) × 2 (i.e., gender: man, women) × 2 (i.e., motive: achievement, benevolence) design with 8 vignettes in total. In one vignette, we changed the age from originally being middle-aged to younger to keep the same vignettes as in study 1. Similar to the first study, we duplicated one vignette to assess test-retest reliability. Since online panel data sometimes suffers from careless respondents (Hays, Liu, & Kapteyn, 2015), we also asked participants to briefly describe the content of each vignette after responding to the effectiveness items.  Fig. 1 Mean effectiveness ratings of proactive behavior for different ages and genders, for proactive behavior motivated by stimulation (study 1)

Participants and Procedure
We collected our data using a German online panel company, which randomly approached a subsample of their participant pool to take part in the study for compensation. We aimed to recruit 200 participants in supervisory roles, and therefore, some oversampling was necessary. In total, 461 individuals clicked on the link to the study, and 418 provided some or all demographics at the beginning of the study. A sample of 208 participants completed the study by responding to all vignettes. As participants had to rate all vignettes to receive compensation, there was no incomplete data. Two independent raters coded the open-ended responses after each vignette in terms of valid and invalid answers (e.g., invalid answers consisted of random number and/or letter combinations). Interrater reliability was 0.91 and disagreements were resolved via discussion. Excluding participants with multiple invalid responses resulted in a final sample of 164 participants. As in study 1, the sample clearly is a convenience sample. Of the participants, 104 were men (63.4%) and 60 women (36.6%). Their age ranged from 19 to 69 years with a mean age of 44.9 years (SD = 11.47). In the German employment context (see Statistisches Bundesamt, 2020), 53% of employees are men and 47% women with an average age of 34.00 years (SD = 13.04). However, the sample for the second study consists of supervisors only, for which there are no comparable statistics in a German employment context. On average, participants were responsible for approximately 11 employees (M = 10.93, SD = 9.71) and had a job tenure of almost 16 years (M = 15.76, SD = 10.38). All of them were responsible for at least one employee. Moreover, 9 participants had little (5.5%), 34 some (20.7%), 87 much (56.5%), and 35 very much (21.3%) experience with rating employees. The sectors in which the participants worked were mostly service-oriented (35.4%), technical (7.9%), or academic (7.3%). Moreover, most participants were supervisors (36.6%), CEOs (17.7%), IT-managers (9.8%), or HR managers (7.3%). Education mostly included a university degree (45.7%), secondary school (20.7%), high school (15.9%), or other (6.1%). Similar to study 1, the convenience sample is not representative of the German working population in terms of demographics and absolute values of the central variables (i.e., statistical generalizability), yet it is still possible to test theoretically derived hypotheses on relationships between variables (i.e., theoretical generalizability; see Highhouse & Gillespie, 2009;Landers & Behrend, 2015).

Measures
Effectiveness Effectiveness was measured using four items developed for this study. The first item was the same as in the first study, asking: "How effective is the behavior of [name of employee]?" It was rated on a 7-point scale ranging from 1 = not at all effective to 7 = extremely effective. The remaining three questions were answered on 7-point scales as well. The second question was "Please rate the performance of [name of employee]," and responses were provided on a scale adapted from Welbourne, Johnson, and Erez (1998), ranging from 1 = has to be strongly improved to 7 = excellent. The third question was "How successful was the behavior of [name of employee]," rated on a scale ranging from 1 = not at all successful to 7 = very successful. The last question was "How much does [name of employee]'s behavior contribute to attaining organizational goals?" and was answered on a 7point scale ranging from 1 = not at all to 7 = very strongly. The reliability of the 4-item scale ranged between α = 0.90 and α = 0.96 for all vignettes.
Demographics At the beginning of the study, participants were asked about their chronological age in years, their gender (1 = man, 2 = woman), whether they are employed or not (1 = no, 2  Fig. 3 Mean effectiveness ratings of proactive behavior for different ages and genders, for proactive behavior motivated by benevolence (study 1). ***p < 0.001 = yes), supervisory status (1= yes, 2 = no, 3 = not employed, 4= I do not wish to answer), responsibility for employees (1 = no, 2 = yes) and, if yes, for how many, as well as their education (ranging from 1 = no degree to 7 = Bachelor/Master degree), occupational description and position (using the international classification of occupations; International Labour Office, 1990), and job tenure (in years).

Hypothesis Tests
We used the same analytic strategy as for study 1. The mean effectiveness ratings can be found in Table 6. Partitioning of the variance in between-and within-level variance using a null model showed that 33% of the variance in effectiveness resided at the between-person level, which is similar to study 1 (i.e., 23%). Accordingly, 67% of the variance resided at the within-person level. Results of the multilevel analysis are shown in Table 7. The three-way interaction between age (i.e., young vs. old), gender (i.e., woman vs. man), and ascribed motive (i.e., achievement vs. benevolence) was significant (β = 2.14, t = 8.71, p < 0.001). Thus, we followed up with multiple comparisons between conditions, using Tukey corrections as well. As a model with random slopes did not offer a better model fit compared to a model with a random intercept only (χ 2 [9] = 10.69, p = 0.30), we used a model with fixed slope and random intercept, as in the first study. Hypothesis 1 stated that proactive behavior associated with an achievement motive is evaluated better for older men compared to younger men. This hypothesis was supported, as younger men with an achievement motive received lower ratings compared to older men (β = − 1.15, t = − 9.38, p < 0.001). This finding replicates the finding from study 1. Hypothesis 2 proposes that proactive behavior associated with a benevolence motive is evaluated as more effective for older men compared to younger men, which was not replicated in this study (β = 0.18, t = 1.45, p = 0.836). According to Hypothesis 3, younger women should receive higher effectiveness ratings than older women for proactive behavior associated with an achievement motive, which was replicated in this study (β = 0.88, t = 7.19, p < 0.001). Hypothesis 4 suggests that younger women will receive higher effectiveness ratings than older women for proactive behavior associated with a benevolence motive. This hypothesis was also not supported (β = 0.07, t = 0.57, p = 0.999). The results are depicted in Fig. 4 (for achievement) and Fig. 5 (for benevolence).

Supplementary Analyses
We further conducted the same analysis with the full sample of participants, including those who provided invalid responses to the open questions after each vignette (N = 208), leading to a very similar pattern of results. We also repeated the analysis only with the single-item effectiveness outcome measure from study 1, and again, the results did not change substantially.

Discussion
Our second study aimed to replicate the findings of the first study in a sample of supervisors, most of them with at least some performance appraisal experience. We only found effects of age and gender for proactive behavior associated with an achievement motive. One potential explanation for these findings may have to do with the nature of the sample. As supervisors often conduct performance evaluations and decide about who will be promoted (i.e., achieves something), they might pay more attention to the achievement motive. This assumption is supported by an experimental vignette methodology study by Rotundo and Sackett (2002), in which employees' organizational citizenship behavior (e.g., helping others) was of least importance to supervisor ratings of employees' overall performance as compared to task and counterproductive performance. To shed light onto possible differences between supervisory and non-supervisory ratings of employees' work behavior, future research could investigate the different priorities individuals place on the motive for observed behavior.

General Discussion
The two main studies reported in this paper aimed to provide a better understanding of why the same proactive work behavior is perceived differently depending on who exerts it. In support of the first hypothesis, both studies showed that older men received higher ratings for proactive behavior associated with an achievement motive compared to younger men. This result might be explained by the stereotypical attribution of leadership and, thus, achievement, especially to older men (Dennis & Kunkel, 2004;Powell et al., 2002). Specifically, when the personal characteristics of the employee carrying out the proactive, achievement-oriented behavior in the vignettes matched participants' expectations (i.e., was performed by an older, male employee), the congruence of these two aspects may have resulted in higher effectiveness ratings as predicted by the social role theory (Eagly & Steffen, 1984). Our second hypothesis, which was only supported in the first study using a heterogeneous employee sample, showed that younger men would receive higher effectiveness ratings for proactive behavior associated with a benevolence motive compared to older men. In this regard, it is important to highlight that while younger men are generally stereotyped as dominant, benevolence, which is generally not stereotypically expected of men, seems to be more acceptable for younger men compared to older men due to the strong stereotype of men as leaders (Koenig et al., 2011). Support for our third and fourth hypotheses, suggesting that younger women received higher effectiveness ratings for proactive behaviors associated with an achievement motive and a benevolence motive (only found in the first study) compared to older women, respectively, is in line with the findings by S. T. Fiske et al. (2002). That is, the two opposing stereotypes of "typical" women (i.e., benevolence-oriented) and business women (i.e., achievement-oriented) seem to cancel each other out, such that age becomes the defining characteristics for the evaluation of the proactive behavior regardless of the underlying motive. Consistent with Hypotheses 2, 3, and 4, younger employees described in the vignettes received higher effectiveness ratings for proactive behaviors motivated by both achievement (in both studies) and benevolence (in the first study). These results can be explained by a fit between the definition of proactivity (i.e., actively challenging the status quo; Unsworth & Parker, 2003), with younger individuals being stereotypically seen as more oriented toward change, flexibility, and innovation Posthuma & Campion, 2009).
The results of our studies extend existing research on the effect of personal characteristics on evaluations of proactive behavior. While previous studies have mostly looked at gender differences in work behaviors (e.g., organizational citizenship behaviors; Heilman & Chen, 2005;Kidder & Parks, 2001), our results show that it is not a single characteristic, but an interplay of personal characteristics that matters. By ignoring this interplay, conclusions may be biased as, for example, social roles for men and women are likely to differ from those of younger and older men and women. Our research, therefore, is a first step toward a more differentiated account of how personal characteristics may influence the evaluation of proactive behaviors, and ultimately performance evaluations (Thompson, 2005).
We further found that evaluations of proactive behavior associated with a benevolence motive differ between a sample of supervisors and a heterogeneous employee sample. It may be the case that in organizations, proactive behavior motivated by achievement is more important to supervisors who routinely conduct performance evaluations, and therefore likely to evoke stronger stereotypical reactions based on the age and gender of employees carrying out the behavior. Nevertheless, it may be possible that in the social services sector or in closeknit teams, benevolence may be of greater importance as well. This needs to be investigated in future studies.

Theoretical Implications
The results of the present studies have a number of theoretical implications. First, our results suggest that when investigating proactive behavior, researchers need to consider the intersectionality of personal characteristics as a determinant for how proactive behavior is evaluated. Age and gender are key personal characteristics that form social role expectations (Sarbin & Allen, 1968), which can influence effectiveness evaluations (e.g., Kidder & Parks, 2001;Luksyte et al., 2017). Moreover, ascribed motives are crucial for proactive behaviors as they hint at the reasons for engaging in such behaviors (Grant & Ashford, 2008). Yet, current models of proactivity often only consider personality characteristics that may influence proactive behavior (Parker, Bindl, & Strauss, 2010;Parker et al., 2006), whereas the role of other personal characteristics based on social roles (e.g., age, gender) has been neglected (e.g., Zacher & Kooij, 2017). However, our studies show that these characteristics influence others' perceptions of proactive behavior. For example, next to the combined impact of age, gender, and ascribed motives on social roles and, ultimately, evaluations of proactive behavior as found in this study, these characteristics may also act as moderators of the effects of proactive behavior. In that regard, based on the relational demography literature, Ferris, Judge, Chachere, and Liden (1991) found that supervisors who are similar in age to the team they lead give lower performance ratings. Moreover, Sturman (2003) showed that age was negatively related to job performance ratings when ratees were older, but positively related to job performance ratings when ratees were younger. Additionally, the stereotypes based on age, gender, and ascribed motives prevalent in the work context might not only influence other's perceptions, but also negatively influence individual self-report ratings of proactive behavior due to self-stereotyping and the resulting behavior in accordance with these stereotypes (Chen & Bargh, 1997). Ultimately, a broader range of personal characteristics should be included in conceptual models of proactivity. A final theoretical implication of our study has to do with our reliance on the multiplicative approach to intersectionality to investigate the combined impact of personal characteristics on evaluations of proactive work behavior. As part of this approach, which assumes that effects of personal characteristics on evaluations of proactive behavior are relative to and dependent on one another, we did not hypothesize on the unique contributions of each investigated characteristic. Theoretically, this suggests that the evaluations of employees in the vignettes cannot be pinned down to one particular characteristic, which would then have a relatively stronger influence compared to the other characteristics (i.e., additive approach). Therefore, using the multiplicative approach, is it impossible to explain, for example, different ratings for older men with benevolence motive and younger women with achievement motive. The reason is that we cannot determine whether ratings would differ because of age, gender, or ascribed motive, rather than their combined impact. Instead, by assuming that the impact of these characteristics is multiplicative, we can only vary one of the three characteristics to determine their relative role for evaluations of proactive behavior in a particular constellation of the other characteristics. An example would be comparing younger and older women showing proactive behavior motivated by benevolence.

Limitations and Future Research
Despite the high rigor of experimental designs, including experimental vignette methodology studies, there are a number of limitations associated with the current studies. A first limitation of our studies relates to the composition of the samples. As participants in study 1 mostly did not have supervisory positions, they were rather distant from evaluating the effectiveness of other's behaviors. It is likely to be different in an actual organization, as the immediate supervisor usually conducts organizational appraisal ratings. This may be problematic, as ratings from a distance have been shown to differ compared to ratings from close persons (Chen & Bargh, 1997). This is also demonstrated by the findings of study 2, which partially replicated the findings of study 1 in a sample of supervisors only.
Another limitation involves the motives for proactive behavior, which are less easily observed compared to age and gender. Thus, evaluators, be it coworkers or supervisors, often have to infer motives, which likely results in stereotypes and potentially distorts the inferred motives. For example, it might be that supervisors perceive men to be more achievementoriented than women, which may distort evaluations of effectiveness. While we explicitly told participants what the motive (truly) is, future research should control for this potential confound including raters' motives as a control variable to account for processes by which evaluators make up their judgments based on behavior patterns. Relatedly, the nature of the dependent variable (i.e., the rating of the effectiveness of proactive behavior), as asking about effectiveness in general may fit more with achievement or competence, rather than benevolence, and therefore may distort the results. A final limitation may be the choice of stimulation as a neutral motive as statistically shown in this study may need to be revisited. For example, evidence by (Schwartz & Rubel, 2005) suggest that men rated stimulation (along with achievement, power, and hedonism values) consistently higher than women.
While proactive behavior is generally seen as something positive, it may also be negative in certain circumstances (e.g., Belschak, Hartog, & Fay, 2010;Parker & Collins, 2010). For example, Seibert, Kraimer, and Crant (2001) showed that high degrees of proactive voice behavior had a negative effect on employees' career success. To further extend the results of the present studies, future research could include negative proactive behaviors aimed at damaging, for example, the career of others or relationships within the organization. A starting point might be the field of counterproductive work behaviors, which are related to proactive behaviors, as employees may also anticipate counterproductive or damaging actions to reach their desired goals. In this context, the differences in evaluations of those behaviors may depend on personal characteristics of the person executing the behavior as well. Yet, it may be that potentially harming or negative behaviors are more likely to be influenced by group dynamics rather than personal characteristics such as age, gender, and ascribed motives. A possible determinant of effectiveness evaluations could be the black sheep effect (Marques, Yzerbyt, & Leyens, 1988), according to which negative behavior of members of one's own social group is evaluated harsher and more negatively compared to those behaviors of members of other social groups. Future research could investigate these possibilities.
Moreover, it might be interesting to examine evaluations of other specific proactive behaviors such as voice or feedback seeking that more explicitly relate to gender stereotypic expectations, in addition to ascribed motives. Feedback seeking, for example, entails seeking help and is thus stereotypically seen as a more female behavior (Kidder & Parks, 2001). Voice, on the other hand, entails speaking out and challenging the status quo (Van Dyne & LePine, 1998), which fits more with male stereotypes. By investigating these proactive behaviors, researchers could specifically address the importance of social roles with regard to specific types of behavior. For example, women exhibiting proactive helping behavior might not be evaluated as proactive, compared to women engaging in voice behaviors. Future research could also include context variables to further expand the scope of variables important to effectiveness evaluations of proactive behaviors at work. That is, organizational cultures with different masculinity or femininity orientations (Hofstede, 1980) might have different stereotypical expectations, which could influence effectiveness evaluations.
Finally, it would be interesting to examine which personal characteristics are most influential for evaluations of a specific form of proactive behavior. While our results show that the evaluation of proactive behavior associated with achievement motives seems to be shaped mostly by gender, proactive behavior associated with the benevolence motive seems to be most influenced by the age of the person performing the proactive behavior. Future research should expand these findings by investigating possible mediators and moderators of these effects, such as the rater's age or own motives. This is especially important when investigating age effects, as age is only an umbrella variable that captures change over time (Bohlmann, Zacher, & Rudolph, 2018;Zacher & Kooij, 2017). Relating to the raters of the vignettes, future research could examine the effect of ethnically based rater-stereotypes on effectiveness ratings of proactive behavior. While race or ethnicity may be of greater interest for multicultural countries, the German context does suffer from ethnically based hiring discrimination as well (Bartkoski, Lynch, Witt, & Rudolph, 2018;Dietz, Baltes, & Rudolph, 2010;Semyonov, Raijman, Tov, & Schmidt, 2004). In the present study, however, manipulating the race of the person within the vignettes might have diminished external validity as it is not realistic to describe someone as, for example, "Maria, 25, Caucasian, works at a telephone company."

Practical Implications
Our results can help to further the understanding of careers of men and women who may be proactive in terms of exceeding task-specific expectations, but still act in accordance with their ascribed stereotypes. Moreover, they are a first step toward a more differentiated and complete conceptualization of the factors that influence evaluations of the same proactive behavior by other employees as well as supervisors. Practitioners can use the results of our studies to develop a better understanding of the impact of social factors and their interactions for evaluations of proactive work behavior and may even find ways to overcome the influence of stereotypical expectations on these evaluations.
For organizations, the results can help to establish a culture of transparency and fairness when it comes to employee evaluations. Currently, supervisors might perceive the same proactive behavior as differently effective depending on who carries it out, and therefore, do not always reward it in performance evaluations or considerations for promotions. The possibly resulting employee perceptions of injustice may then influence the organization as a whole. Examples are withdrawal from work and decreased job performance, as well as lower organizational commitment or trust (Colquitt, Conlon, Wesson, Porter, & Ng, 2001).
Using the results of this research, organizations may decide to develop objective rating criteria that could help to overcome stereotypical influences on evaluations of the effectiveness of proactive behavior by focusing on rating criteria, rather than subjective perceptions (see Hogue & Lord, 2007 for an example of overcoming gender bias in leadership). Moreover, it might be advisable to also consider the ascribed motives separately by differentiating between, for example, benevolence (e.g., helping) or achievement (e.g., voice). Supervisors could then reach a more differentiated evaluation of employee behavior.

Conclusion
The present research investigated the combined effects of employees' age, gender, and ascribed motives on other employees' and supervisors' effectiveness evaluations of proactive behavior. Results revealed that proactive behavior aligned with common stereotypes generally lead to the highest effectiveness ratings. Our findings point toward the need to take the social context and the interplay between personal characteristics into account when investigating evaluations of proactive behavior at work. They also highlight the importance of using objective rating criteria in organizations, as proactive behavior will increasingly become part of performance evaluations in modern workplaces. statistical analysis, as well as Justin Marcus and Antje Schmitt for their helpful comments on an earlier version of this manuscript.
Compliance with Ethical Standards All procedures performed in this study that involved human participants were in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Conflict of Interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.