Introduction

The last few decades have seen a remarkable proliferation of hypotheses that attempt to use selectionist reasoning to account for many aspects of human relationships and gender differences. The domain of attraction and attractiveness seems particularly rife with such hypotheses, which typically propose hard-wired gender-specific psychological mechanisms. For example, some theorists have hypothesized that men are wired up to prefer women who have a waist to hip ration of approximately .7 (Singh 1993) and to experience greater jealousy over a mate’s sexual infidelity than a mate’s emotional infidelity (Daly and Wilson 1988). Others have advanced hypotheses suggesting various psychological mechanisms allegedly unique to women such as a preference for high status mates and a propensity towards greater emotional jealousy and less sexual jealousy (Buss 1989). The current work focuses on one particular hypothesis of this sort: the idea that women have an evolved mechanism that leads them to be attracted to more masculine faces during the most fertile point in their menstrual cycle, but to prefer less masculine faces during the rest of the cycle (Penton-Voak et al. 1999; Penton-Voak and Perrett 2000). According to the hypothesis under discussion, these supposed tendencies reflect evolved strategies used by women to arrange to be inseminated by the most genetically fit males, and to avoid insemination by relatively less fit males. This theory has received wide attention. The first presentation was in the influential journal, Nature, and the two papers by Penton-Voak and colleagues have been cited at least 330 times in the scientific literature (May, 2009, ISI Web of Knowledge).

The purpose of the current paper is to examine this account at two different levels. First, the empirical soundness of the theory is assessed with new data that bear on the hypothesis. A large sample, primarily from the US and Canada, were asked to judge the attractiveness of male faces. Women participants also reported on their menstrual cycles. Second, the theoretical basis for the account is analyzed in some detail, with attention not only to what the theory states, but also what it would imply about human prehistory, mating practices, and the like. Before turning to more detailed discussion of the menstrual cycle and face preference research in particular, I begin with a general overview of the theoretical approach taken by most contemporary evolutionary psychologists, which provides the context in which the work analyzed here lies.

Contemporary Evolutionary Psychology Approach

Theoretical accounts such as the proposal of women’s facial preferences discussed here are often referred to as the “evolutionary theory” of whatever topic is under discussion. However, such phrasing can be misleading in some respects. After all, few researchers, whatever their theoretical approach, would question the validity of the theory of evolution through natural selection. Typically, what debate centers on is the validity of the specific hypotheses regarding psychological mechanisms offered by modern day evolutionary psychologists.

In practice, theorists who call themselves “evolutionary psychologists” tend to propose that natural selection resulted in the evolution of highly specific psychological tendencies, many of which are assumed to be sexually dimorphic (see Tooby and Cosmides 2005, for a general exposition of the approach). Their general premise is that mechanisms evolved to deal with many distinct threats to inclusive fitness in the ancestral environment, and that each threat required the development of a unique psychological solution. Thus, in the view of most evolutionary psychologists, it is likely that many distinct innate mechanisms or modules evolved to counter these various threats (Symons 1979).

However, by itself, evolutionary theory does not require this conclusion. As several theorists have noted, the currently popular approach of proposing very domain specific mechanisms is only one of a number of possible ways in which natural selection might have shaped human psychology (Caporael 2000; Harris 2003; Miller and Fishkin 1997; Wood and Eagly 2002) According to alternative perspectives, natural selection may instead have created psychological tendencies and structures that are substantially less specific and more malleable than those proposed by evolutionary psychologists (Harris 2003; Miller et al. 2002). Moreover, basic in-born psychological mechanisms might not be sexually dimorphic, with gender differences resulting from differential environmental and cultural factors (Harris and Christenfeld 1996; Wood and Eagly 2002). Thus, in and of itself, the theory of natural selection is as consistent with social learning and gender roles theories of relationships as it is with sexually dimorphic and domain-specific arguments.

Theory of Female Preferences and Facial Masculinity

In many animal species, female ovulation is accompanied by striking behavioral changes (Alcock 2005). Recently, a number of evolutionary psychologists have claimed to observe relatively subtler but still meaningful variations in psychological propensities and reactions of women over the course of the menstrual cycle (see, e.g., Broder and Hohmann 2003; Fessler and Navarette 2003). These changes have often been interpreted as evidence of evolved strategies to arrange insemination by the most genetically fit men. The present article focuses on one such hypothesis, which claims that women are attracted to more masculine facial features during days in which conception is most likely.

This cycle preference hypothesis assumes that there is some type of fitness trade-off between choosing mates with more masculine faces relative to those with more feminized faces. Penton-Voak and colleagues theorize that men with more masculine faces carry better genes, but also have other traits that would make them less desirable as full-time partners and parents (e.g., they are perceived as less co-operative, warm, and honest, Perrett et al. 1998).

Based on these assumptions, they further argue that women in relationships (relative to those without relationships) should be particularly prone to a shift in preferences across their menstrual cycle: such women should choose a permanent mate that has a relatively more feminized face, but during high conception risk should desire a man with more masculine features. This supposedly would have provided inclusive fitness benefits to women; receiving better genes for offspring from masculine faced men, while maintaining relationships with socially more desirable, but less genetically fit, men. The only empirical support Penton-Voak et al. offer for these contentions is the finding of a statistical trend towards women in relationships showing a stronger attraction to masculine faces during fertile phases than during nonfertile phases—an effect that was weaker in women who were not in relationships.

Although intriguing, the theory behind the hypotheses described above is based on several highly speculative suppositions. There is no direct evidence that masculinity in human male faces is associated with better genes. Instead, Penton-Voak and colleagues (Little et al. 2002; Penton-Voak et al. 1999; Penton-Voak and Perrett 2000) rely on complex arguments derived from controversial theories of immunocompetence and testosterone in nonhuman animal studies. Another implicit assumption is that women, in the ancestral environment, engaged in infidelity at considerable rates. The various propositions inherent in this theory and the empirical evidence offered for such propositions will be discussed in more detail and critically analyzed further in the discussion section.

Empirical Studies

The backdrop to the empirical studies that will be examined here is a study by Perrett et al. (1998), which used computer graphic techniques to create synthetic faces said to possess different levels of masculinity or femininity. They did this in several stages. The first was to create a composite male and a composite female Caucasian face by averaging a group of male and female Caucasian faces, respectively. The next step was to “morph” these composites toward or away from the opposite-sex composite, yielding faces that were masculinized or feminized to any chosen degree. Interestingly, Perrett et al. (1998) found that both men and women judged most attractive a male face that was feminized with respect to the average male face (the average degree of preferred feminization ranged from 9–20% depending on stimuli and participant ethnicity).

In two follow-up articles, Penton-Voak and his collaborators (Penton-Voak et al. 1999; Penton-Voak and Perrett 2000) used such synthetic face stimuli to explore effects of menstrual cycle on female preferences. They reported that women likely to be in the most fertile (follicular) phase of their menstrual cycles preferred more masculine faces than women in other phases (or at least preferred faces slightly less feminized than the original composite male face).

In the first paper, Penton-Voak and Perrett (Penton-Voak et al. 1999) had 39 Japanese women (mean age 21 years) pick the most physically attractive (‘miryoku-teki’) of a set of five Caucasian male faces (40% masculinized, 20% masculinized, average, 20% feminized, and 40% feminized) and of a corresponding continuum of five Japanese male faces. Participants were classified as being of “high conception risk” if they were between the end of menses and 14 days prior to the expected time of the next menses. These women preferred less feminized faces (approximately 8% feminized) than did women classified as “low conception risk” (who preferred approximately 17% feminized faces). Another group of British participants (n = 43) were instructed to judge the attractiveness of a face for either a “long-term relationship” or “short-term relationship”. However, in this study, there appeared to be no general effect of menstrual cycle on masculinization preferences. Instead, Penton-Voak et al. only found an effect in a subset of the participants (n = 23), those in the hypothetical short-term relationship condition. This apparent failure to replicate the findings from the earlier study often seems to be overlooked in the large literature that cites this work.

In another study (Penton-Voak and Perrett 2000), readers of a national UK magazine returned a survey judging the attractiveness of faces printed in the magazine. Participants were divided into “high conception risk” (defined as days 6–14 from the start of the previous menses; n = 55) and “low conception risk” (days 0–5 and 15–28; n = 84). It appears that Penton-Voak and colleagues used slightly different techniques for estimating ovulatory status in their two papers, although generally converging results were obtained with both. The high conception risk women were significantly more likely to choose a more masculine face from the options −50%, −30%, average, +30%, and +50%. They proposed that this preference shift reflects an evolved strategy that functions to make women prefer more genetically fit mates, on the assumption that facial masculinization is likely a clue to immunological, and perhaps other, aspects of fitness.

Present Investigation

One of the major goals of the present study was to attempt to replicate the findings of Penton-Voak and colleagues. To do so, the current work presented the same male face stimuli used by Penton-Voak et al. (1999) to a much larger sample—a demographically diverse group primarily from the U.S. and Canada. Participants were asked about face preferences as well as questions relating to ovulatory status and relationship status.

A replication and follow-up of the facial preferences and menstrual cycle work seemed warranted for several reasons. First, the study with the largest sample (Penton-Voak and Perrett 2000) was conducted through a science magazine, and it is not clear what participants knew before they completed the survey which conceivably might have influenced their responses. Second, as mentioned previously, some of the data in the Penton-Voak et al. (1999) paper failed to replicate their other findings. Finally, there are several different methods that can be used to calculate the phase of the menstrual cycle when women are most fertile from self-reports, and there are controversies in the literature on which is best. For example, peak fertility can be estimated by determining days since onset of first day of last menstrual cycle or by estimating first day of next menstrual cycle and counting backwards. Also, estimation methods differ in the number of days included in the peak fertility phase, ranging from three days (Macrae et al. 2002) to nine days (Penton-Voak and Perrett 2000). As noted above, Penton-Voak and colleagues used slightly different methods in their two papers.

The current work focused on several issues and hypotheses. Given the goal to replicate previous findings, the stimuli employed by Penton-Voak et al. (1999) were used. These included two sets of male faces that had been altered to range from 40% masculinized to 40% feminized. One set was of a Caucasian male and the other of an Asian (Japanese) male.

Preferences in the sample as a whole were examined first. Most of the studies by Penton-Voak and colleagues found an overall preference for greater feminization (Perrett et al. 1998; Penton-Voak et al. 1999; although see Penton-Voak and Perrett 2000, for a failure to find this effect). This preference may be due to more feminized faces being judged to be associated with more positive social characteristics such as warmth and honesty (Perrett et al. 1998). Therefore, the first hypothesis tested was the sample would prefer faces that had been altered to be more feminized relative to those that had been altered to be more masculinized (Hypothesis 1).

The use of the stimuli that included two ethnicities enabled us to examine whether a primarily Caucasian U.S. sample would differ in preferences for feminization across the two racial groups. It was predicted that this sample would tend to prefer greater feminization in the Caucasian faces relative to the Asian faces (Hypothesis 2). This prediction was based on several findings from the literature. Perrett et al. (1998) found that Japanese and Scottish participants tended to prefer greater feminization in photos of their own race. Therefore, one might expect that Caucasian North Americans would show a similar pattern to their Caucasian Scottish counterparts. Also, recent research on racial-gender stereotypes more generally suggests that US participants’ think of Asians as more feminine than Caucasians (Galinsky and Cuddy 2009, The overlap between racial and gender stereotypes: Towards an understanding of the gender composition of interracial marriages. Manuscript sudmitted for publication.). From this, one might reason that Asian male faces would need to be adjusted less towards femininity than Caucasian faces, to reach an optimal level of attractiveness.

Participants of both genders were recruited for the study. This provided the opportunity to compare the degree of masculinity preferred by each gender in this predominantly North American sample. However, gender differences were not expected given previous results that suggest that the two genders tend to agree on their perceptions of attractiveness in male faces (Perrett et al. 1998) and more generally, in their overall racial-gender stereotypes (Galinsky and Cuddy 2009, The overlap between racial and gender stereotypes: Towards an understanding of the gender composition of interracial marriages. Manuscript sudmitted for publication.). The similarity between men and women’s judgments of attractiveness of faces, along with cross-cultural differences, have led Perrett and colleagues to hypothesize that learning plays a role in judgments of attractiveness. This point will be returned to in the discussion section. Men and post-menopausal women were also included to provide the opportunity to compare the preference from the sample as a whole to those of women at different points in their menstrual cycle, were an effect of menstrual cycle to be found.

The second part of the paper focused on hypotheses and data pertaining to the “target” sample, namely, women of child-bearing age, who were not pregnant or taking brithcontrol pills. Based on the work by Penton-Voak and colleagues, one hypothesis is that phase of menstrual cycle should affect the degree of masculinity that women prefer in male faces; specifically, women who are in the phase of high conception risk should prefer more masculine faces relative to women who are in the phase of low conception risk (Hypothesis 3). Given that there are controversies in the literature on how best to determine fertility risk based on self-report of day of menstrual cycle, the current work explored four different methods of doing so. H3 was tested using each of these in order to provide the best opportunity for an effect of menstrual cycle on facial preferences to be revealed, if there was indeed one. Hypothesis 4 predicts an interaction between women’s relationships status and menstrual cycle phase for preferences of masculine faces. The face preference shift predicted in H3 should be stronger for women in relationships such that these women should prefer more masculine faces during the high-risk period relative to women not in relationships.

Method

Participants

A total of 853 participants, primarily from the U.S. and Canada completed the study (598 female, 255 male). Demographics for the sample are presented in Table 1. The mean age was 36.6 (SD = 11.9; range = 18–78). The U.S. participants were primarily Caucasian and varied greatly in education and income.

Table 1 Demographic characteristics of sample.

Informed consent was obtained for each participant. Participants were recruited from an online research pool consisting of people recruited with raffle-type incentives (the Study Response Project, Stanton and Weiss 2002). This is a primarily Caucasian, but demographically diverse research panel composed of adults of all ages, residing primarily in the U.S. and Canada. Males were not excluded, although their data were analyzed separately.

Procedure

Participants were invited to participate in return for entry into a drawing for cash prizes. The study’s purpose was described generally as exploring factors associated with different motivations and attitudes. The webpage explained that participants would be asked about personal likes and dislikes, attitudes, experiences, and interests and that they might find some of the questions very personal. They also were informed that their answers were completely anonymous. After providing informed consent, participants were presented with a series of demographic questions, including age, gender, national origin, and education. They were also asked about their relationship status (single, single in steady relationship, living together, married, or divorced/widowed). U.S. residents were asked their race/ethnicity, and annual income in U.S. dollars.

Next, participants were told that they would see photographs of faces and make judgments about these photographs. All participants rated two sets of five pictures. The first showed a set of male Caucasian faces, the second showed a set of male Japanese faces. Pictures were high quality color images arrayed on the computer screen from left to right and were always presented in the following order: picture 1: 40% masculinized; picture 2: 20% masculinized; picture 3: average; picture 4: 20% feminized; picture 5: 40% feminized. The key instruction stated “Please indicate which one of the 5 you find most physically attractive by clicking on the button below that face”. Higher numbers indicated a preference for more feminization of faces while lower numbers indicated a preference for more masculinization. After completing the facial ratings, female participants were asked whether they had reached menopause. Those indicating that they had not were asked whether they were taking an oral contraceptive. These participants also indicated the date their last menstrual period began, rated the regularity of their menstrual periods on a 1–5 scale (1 = not at all; 5 = extremely regular), and the average duration of their menstrual cycles.

Results

Preferences for Whole Sample

The first set of analyses examined preferences for the sample as a whole, focusing on Hypotheses 1–2. To determine if there was an overall preference for feminized faces (H1), t-tests were performed comparing the sample’s mean preference for each of the two sets of faces to the expected mean of 3.0, which would be equivalent to preferring neither greater feminization nor greater masculinization, (i.e., preference for an the unaltered face). Preference for femininity was confirmed for both the Caucasian stimuli, t(851) = 10.66, p <  .001 and the Asian stimuli, t(852) = 6.05, p < .001. Means are presented in Table 2. Participants’ preference among the Japanese faces was significantly correlated with their preference among the Caucasian faces, r(852) = .23, p < .001.

Table 2 Average (standard error) facial preference by participant gender and target ethnicity.

The next analyses examined possible gender differences in face preference as well as whether this primarily Caucasian sample would prefer greater feminization in Caucasian male faces than in Asian male faces. A mixed 2 (participant gender: male vs. female) X 2 (ethnicity of target face: Caucasian vs. Japanese) ANOVA was conducted. (See Table 2 for means and Table 3 for distribution of responses). The two genders did not significantly differ in the amount of feminization-masculinization that they found most attractive, F(1, 850) = 1.73, ns. As predicted (H2), there was a significant effect of ethnicity of target face: F(1, 850) = 8.72, p < .003. Participants preferred more feminization in Caucasian (M = 3.46) than Asian male faces (M = 3.29). These means would correspond to a preference for 9.2% and 5.8% feminization for the Caucasian and Japanese faces, respectively. There was no hint of an interaction between participant gender and ethnicity of target face, F(1, 850) = .00004, ns.

Table 3 Distribution of female and male preferences for the different face options for Caucasians and Japanese photos.

The overall preference for feminization, and the preference of Causasian participants for more feminization of Caucasian as compared to Japanese faces, mirror the findings of Perrett et al. (1998). However, the degree of feminization preferred by our sample appears somewhat smaller.

Menstrual Cycle and Preferences

To examine the effect of menstrual cycle on female preferences (H3), the subset of participants meeting the following criteria were extracted for further analysis: female, not reached menopause, not pregnant, not taking oral contraceptives, and reporting regular menstrual cycles (3 or greater on 1–5 scale). Women who failed to answer these questions, provided inconsistent information, or reported more than 40 days since start of last period were excluded. This yielded a set of 258 participants, referred to as target participants. Their demographic information (see Table 1) closely resembled that of the sample as a whole with the exception of a narrower age range (M = 33.3 years old; Range = 19–51 years old) due to specifically selecting for women who had not reached menopause.

Penton-Voak and Perrett Classification of Risk

Penton-Voak and Perrett (2000) defined high conception risk as including participants 6–14 days from the onset of their previous menses. Low conception risk was defined as days 0–5 and 15–28. They excluded women who reported that more than 28 days had lapsed since their last menstrual cycle. This categorization was used with the present sample and resulted in 168 target participants being categorized as low risk, and 80 as high risk. For the low risk, the average Caucasian face preference was 3.27, and for the high risk, 3.63, higher numbers represent greater preference for feminization. This difference was significant, but opposite in direction to the difference reported by Penton-Voak and Perrett (2000), t(246) = 2.17, p < .03. For the Japanese faces, the average preference was virtually identical (3.15) for low-risk and high-risk groups (p > .99). The 95% confidence interval on the difference between low- and high-risk participants’ preferred face, ranged from −.68 to −.03 for the Caucasian faces, and from −.37 to +.36 for the Japanese faces. Negative difference scores reflect differences in the opposite direction of the cycle preference hypothesis. Figure 1 shows target participants’ mean preference for Caucasian and Asian faces as a function of days since beginning of last menstrual period, binned into 2-day periods.

Fig. 1
figure 1

Mean preference for Caucasian faces (top figure) and Asian faces (bottom figure) as a function of days since beginning of last menstrual period binned into two-day segments (bars show standard error of the mean). Values greater than 3 indicate mean preference for a more feminized face than the composite average male face; values lower than 3 indicate a preference for a more masculinized face. The days within the two vertical dashed lines represent the phase that Penton-Voak and Perrett (2000) counted as the fertile phase, during which women should prefer more masculinized faces.

Alternative Classifications of Ovulatory Risk

As mentioned in the introduction, there are several methods that can be used to determine when fertility risk is high. Therefore, to provide ample opportunity for an effect of menstrual cycle to be revealed, if one exists, the previous analyses were performed again using three different calculations for peak fertility phase.

The first recalculation used a later window for fertility than that employed in the previous section. A study by Wilcox et al. (2000) examined the timing of ovulation in 221 women from daily measurements of a metabolite of estradiol (oestrone-r-glucuronide). Figure 2 (reprinted from their study), shows the probability of a woman being in her fertile window based on day within menstrual cycle, separately for women with regular and irregular cycles. Based on these results, the definition used by Penton-Voak et al. (1999), classifying women as fertile between menses and 14 days prior to the expected time of the next menses, appears substantially overinclusive, because it counts as fertile the earliest days of the cycle when fertility is quite improbable (although not impossible). Day 6–14, as used in Penton-Voak and Perrett (2000), seems fairly reasonable in light of the data of Wilcox et al. (2000). However, both the 6 and 14 day cutoffs appear somewhat earlier in the cycle than would be optimal. Wilcox et al. state that women with regular 28 day cycles are most likely to be potentially fertile on days 8–15, and find that the fertile window occurs later in the cycle for women with longer cycle durations. Based on these data, a range of days 8–16 seems like a reasonable choice for the fertility phase. Using this classification, 80 women in our target sample were defined as high risk, and 178 were defined as low risk. For these two groups, the average preferences for Caucasian faces were 3.44 (high risk) and 3.34 (low risk), a nonsignificant difference, p > .56. The average preferences for Japanese faces were 3.04 (high risk) and 3.21 (low risk), also a non-significant difference, p > .35.

Fig. 2
figure 2

The probability of a woman being in her fertile window based on day within menstrual cycle, separately for women with regular and irregular cycles, derived by and reprinted from Wilcox et al. (2000).

The next two analyses of H3 relied on a “backward” calculation of the fertile period, which is sometimes viewed as a more optimal method. Typically, this is accomplished by assuming that ovulation occurs 14 days before the anticipated date of the beginning of the next menses. Women who reported that their average cycle length was shorter than the amount of time that had already lapsed since their last period were excluded from these analyses.

Using a backward calculation, Macrae et al. (2002) used a 3-day window for high-conception-risk: the day of ovulation and the 2 preceding days. Based on this narrow classification, 33 women were classified as high risk and 207 as low risk in the current sample. For these two groups, the average preferences for Caucasian faces were 3.67 and 3.29, respectively. This produced a nonsignificant trend in the opposite direction from the prediction of Penton-Voak and colleagues, t(238) = 1.62, p = .106. The average preferences for Japanese faces were 3.42 (high risk) and 3.12 (low risk), a nonsignificant difference (p = .23).

A 3-day window is shorter than what would be suggested by the measurements of Wilcox et al. (2000), so a reasonable refinement of the MacRae et al. (2002) backward estimation would use a 5-day rather than a 3-day window. Therefore, a 5-day window was used to calculate high risk in the current sample, resulting in 54 women being classified as high risk (those who were 12–16 days from the predicted start of their next period) and 186 women as low risk. Their average Caucasian face preference was 3.48 for the high-risk and 3.31 for the low-risk. For Japanese faces, the averages were 3.33 for high-risk and 3.11 for low-risk. Both of these differences run in the opposite direction from the prediction, and fall far short of statistical significance (p > .35 and p > .28 for Caucasian and Japanese faces, respectively).

The criteria described in the previous paragraph are probably about as accurate a categorization of ovulatory status as one can achieve with self-report measures. Thus, the results seem to provide reasonably strong evidence that if there are any effects, they are minimal.

Relationship Status and Preferences

The next analyses examined whether women in relationships (single in steady relationships, cohabitating, and married) showed a different pattern of preferences across menstrual cycle compared to women who are not in relationships (single, divorced, widowed) (H4). An ANOVA was conducted with relationship status and ovulatory status as the independent variables and facial masculinity preference as the dependent variable. Contrary to the Penton-Voak et al. proposition, there was no hint of an interaction between relationship and ovulatory status on preferences for the Caucasian stimuli: F(1, 244) = .001, p = .97. There also was no significant main effect of relationship status, F(1, 244) = 2.14, ns. The main effect of ovulatory phase, as reported in a previous section, remained significant, F(1, 244) = 4.16, p < .05, but in a direction inconsistent with the Penton-Voak et al. hypothesis: Peak fertility was associated with finding less masculine faces more attractive. Analyses of the Asian face stimuli revealed no significant main effects or interactions (ps > .19). Similar analyses were also conducted for the Caucasian and Asian stimuli with fertility risk being calculated by using the various alternative methods described in the previous section. None of these analyses revealed any support for the Penton-Voak et al. hypothesis.

Discussion

The present study was conducted on-line, which affords a high degree of anonymity, and repeatedly has been found to elicit more candid responses to questions about socially undesirable behaviors and emotions than paper and pencil or interview methods (cf. Locke and Gilbert 1995; Musch et al. 2001). This would seem a particular advantage in the present work, where some questions are quite personal. The validity of internet testing also has been well supported in recent research (Gosling et al. 2004; Birnbaum 1999; Krantz and Dalal 2000; McGraw et al. 2000).

Despite involving a greater number of participants than the total participating in the studies by Penton-Voak and colleagues (Penton-Voak et al. 1999; Penton-Voak and Perrett 2000), the current study offers no support for the idea that women at risk of conception find more masculine male faces more attractive. This was the case regardless of the method used to calculate time of highest conception risk. The results not only fail to show significant differences in the predicted direction—in several cases, the data contained trends running in the opposite direction, providing strong doubt about the generalizability of the original findings.

What accounts for this failure to confirm the findings? The answer is not clear. The first paper (Penton-Voak et al. 1999) looked at small samples (i.e., 39 Japanese women in the first study, which used essentially identical stimuli and menstrual cycle measures to those reported here). Furthermore, in this original paper, the data from British women (n = 43) partially failed to replicate the menstrual cycle and facial preferences (the effect was only found in the condition that judged facial preferences for hypothetical short-term sexual relationships, n = 23). The second paper (Penton-Voak and Perrett 2000) used a larger sample (n = 139) and again reported an overall effect of menstrual cycle and facial preference (here, short vs. long term context did not appear to be assessed). However, as noted above, it is not clear from the report whether the readers of the British science magazine that served as participants might possibly have been clued in by something they read in the magazine about the hypothesis under investigation, possibly yielding some sort of demand effect.

Theoretical Interpretations

Having found reason to be less sure that the phenomenon reported by Penton-Voak and colleagues is real, we now turn to the rather intriguing ideas that were offered as theoretical explanations for why it might be true. The Penton-Voak group work within a general theoretical framework that contends that features people find attractive are generally honest signals of a person’s “good genes”. Citing chiefly studies from non-human species (Folstad and Karter 1992), Penton-Voak et al. (1999) theorize that “masculine features seem to signal immunological competence” in humans (p. 741) and thus greater facial masculinity should be associated with better genes. Based on this idea, they go on to argue that women should be attracted to more masculinized faces, at least when conception is likely, in order to maximize their inclusive fitness. However, Penton-Voak et al. also suggest that there can be a downside to men with masculinized features (e.g., they may have personality traits that might make them poor mates or fathers). Therefore, they contend, women have an advantage in Darwinian fitness if they are inclined to pair up with full-time mates who have less masculine features, but to seek out sexual relations with men who have more masculine features when conception is likely (the trade-off hypothesis).

The validity of the analysis proposed by Penton-Voak and collegues, including the relevance of animal findings to human male facial characteristics, relies upon a number of inferential steps and conjectures, each of which needs to be considered on its merits. In order for their theory to be plausible, each of the separate propositions noted above also would need to be correct.

Analysis of Assumptions

Do Masculinized Facial Features Signal “Good Genes”?

Penton-Voak and colleagues suggest that by selecting mates with more masculine facial features, women obtain mates with “better genes”. In support of this view, they do not offer direct evidence that masculine facial features are associated with health or other beneficial traits with high heritability (Little and Hancock 2002). Rather, they rely upon a rather complex argument put forward by Folstad and Karter (1992) called the immunocompetence handicap hypothesis (IHH). Folstad and Karter contended, first of all, that testosterone, in addition to promoting secondary sexual characteristics, tends to suppress the immune system, rendering a male more vulnerable to parasitic illness. Why then should this be sought out as an index of good genes? Relying upon the Handicap Principle proposed by some behavioral ecologists (e.g., Zahavi 1975), Folstad and Karter suggested that by selecting a mate who appears to be thriving despite the immunological handicap posed by high testosterone levels, a female can obtain evidence of a genetic fitness sufficient to overcome the immunological handicap produced by the testosterone.

Several elements of this analysis are quite speculative, and have been questioned by biologists since Folstad and Karter’s paper. First, while high doses of testosterone can produce reduced measures of immune functioning, some studies have found the opposite (Ros et al. 1997). Other studies have found that parasite load is negatively, rather than positively, correlated with testosterone (e.g., Klein and Nelson 1998). Braude et al. (1999) challenge the general assumption that when investigators find fewer numbers of immune cells in animals who have been administered testosterone, this necessarily indicates that overall immune functioning has been dampened. Citing various kinds of evidence, they argue that testosterone produces a strategic migration of different kinds of immune cells from some body compartments to other—a sort of redeployment of immunity—rather than a wholesale suppression of immunity.

Moreover, if testosterone levels are heritable and do produce the sort of wholesale immune suppression that Folstad and Karter imply, then one would expect high levels of this hormone to carry costs that would also have to be borne by any offspring that might result from mating with a high-testosterone male. Thus, the immunosuppression would represent a reduction in Darwinian mate value. Indeed, in humans at least, testosterone levels do appear to have substantial heritability (Harris et al. 1998). It is one thing to suggest that females have evolved to select mates who thrive despite bearing temporary handicaps such as seasonal ornamentation, but this kind of Darwinian logic seems to be questionable in cases where the handicap is likely to be permanent and passed onto offspring.

Another limitation in the argument is that studies that shed direct light on the immunocompetence handicap hypothesis have typically looked at species very distant from humans, such as gulls and red jungle fowl. Applying it to human beings is a large leap. A further leap is the assumption that facial masculinization can be equated with the kinds of secondary sexual characteristics measured in birds. Neave et al. (2003) found no relationship between circulating testosterone levels and rated facial masculinity. This may be because masculine facial features reflect effects of testosterone at various earlier points in development, which are not well correlated with adult levels. By contrast, the effects of testosterone considered in the animal studies are usually under control of current circulating testosterone levels.

In summary, Penton-Voak’s argument that facial masculinity is a sign of “good genes” to which females are innately disposed to respond when they are most fertile is, to say the least, a conclusion that rests on a rather complex and speculative web of arguments. There appears to be only one study directly addressing the idea that facial masculinity serves as an index of any more general form of fitness, and it yielded results that are not particularly compelling. Rhodes et al. (2003) reported a small but significant (r = .17) correlation between facial masculinity and objective health in an adolescent sample; however, the facial masculinity was determined by raters, who may have considered variables such as pale complexion as well as facial morphology (consistent with this, rated masculinity was much more strongly associated [r = .37] with raters’ perceptions of health than it was with objective health).

Is there a Generalized Preference for Masculine Features?

From the “good genes signal” account discussed above, one might have expected to see a rather simple pattern: a generalized preference by women for more masculine male faces. However, such a preference is not consistently found. Keating (1985) reported that more masculine and dominant faces were found attractive, while Swaddle and Reierson (2002) have found that facial masculinity increased perceived dominance but not attractiveness. Moreover, in the current work, women actually preferred male faces that had been altered to be more feminine over unaltered or masculinized faces, as did women in Penton-Voak et al. (1999). It has also been found that women tend to judge some degree of neonatal features in male faces as being attractive (Berry and McArthur 1985; Cunningham et al. 1990). In sum, across studies, a general preference for masculinized faces does not seem to occur consistently.

Does a Cyclic Preference View Make Sense?

To account for a lack of an overall preference for masculine faces, Penton-Voak et al. have tried to argue for cyclic preferences that they suppose reflect a trade-off between good genes (accompanied by uncooperative social characteristics) vs. poorer genes (paired with better social qualities). The current work found no evidence for any shift in preferences across the menstrual cycle, which undercuts the empirical argument for this analysis. However, it is also of interest to reflect more closely on the trade-off view.

First, from a Darwinian perspective, one could argue that some of the positive social traits that women are supposedly avoiding in favor of “good genes”, actually reflect good genes themselves. For example, it seems likely that characteristics such as warmth and cooperativeness may be due partly to personality, and may have some heritable aspects. Furthermore, positive social traits are beneficial to social living generally, which would seem to be a big benefit to survival. (Wood and Eagly 2002, make a similar point, including that it is not clear that dominance is necessarily a net positive in social interactions).

Moreover, for the scenario envisioned by Penton-Voak and colleagues to make sense, a number of additional conditions would have to hold. First, there would have to be very high rates of infidelity. Second, it would have to be the case that when infidelities occurred, they were confined to a very short time period (the days when conception was likely). After all, if extra-relationship affairs lasted even a few weeks, then the exact point in the woman’s menstrual cycle at which they began would have little bearing on the probability of conception. So although they do not note this fact, the Penton-Voak and associates’ analysis essentially requires that human extra-pair mating tendencies must have evolved in the context of something like “one-night stands”. There does not appear to have been any discussion of whether this scenario is consistent with what is known about social arrangements in preliterate societies.

Relatedly, Penton-Voak and colleagues offer what they contend to be direct evidence that infidelity is mostly occurring in the follicular phase of a woman’s cycle. The evidence cited by Penton-Voak et al. (1999) for this comes exclusively from one study by Bellis and Baker (1990; see Moore et al. 1999; Birkhead et al. 1997, for criticisms). Using a large sample from a British magazine, Bellis and Baker inquired when women had last engaged in sexual intercourse and with whom. They reported that females were more likely to have sexual intercourse with their lovers during the follicular phase relative to the luteal phase, and that such a phase difference was not found with women reporting on sex with their primary partner. The authors report that this was a statistically significant difference. However, when the data reported by Bellis and Baker are examined in more detail, it is not so clear that they comport well with Penton-Voak and colleagues’ reasoning.

First, most sexual intercourse reported in the Bellis and Baker study was occurring within the primary relationship (94% of acts). Second, this pattern remains if one examines sexual relations just during the phase when pregnancy is likely (92% were with primary partner). Third, Bellis and Baker note that when chance of conception was high, women who had lovers tended to “double-mate”, had sex with a lover and with their mate within 5 days of each other, relative to when conception was low. (Bellis and Baker used this to argue for evolved mechanisms of sperm competition in humans.) Although Penton-Voak et al. rely heavily on this study to support one of their contentions, its results do not appear to fit well with their analysis. If women are mating with masculinized men to acquire good genes for offspring, then they ought to be wired up to find their supposedly genetically poorer-quality primary mates less desirable during high periods of fertility.

There are a number of other findings in the literature regarding relationships and infidelity that also do not fit very readily with the Penton-Voak hypothesis. It seems likely that decisions made throughout the menstrual cycle would affect which man a woman would choose as a mate and produce children with. Returning to the Baker and Bellis data, of those women who reported having sex with someone other than their primary relationship, the majority (61%) reported that the last time they had sex with the other man was during a non-fertile period. If women are designed to want masculine men during fertile phase, then who are they choosing as lovers during these other phases? Penton-Voak and colleagues proposal really would seem to require that women are choosing different types of lovers at different cycle phases, otherwise there would be no need for a preference “shift” during ovulation. Not only is this reasoning inherently a bit odd, but it also does not square well with the findings of research on motivations for infidelity in women.

For women, sexual aspects of romantic relationships tend to be tightly woven with the emotional aspects (e.g., Reiss 1967; Harris and Christenfeld 1996). This pattern occurs not only in primary relationships but also in women’s extramarital involvements (Glass and Wright 1985; Spanier and Margolis 1983; Thompson 1984). For example, in a middle-class Caucasian sample, Glass and Wright found that few women (11%) reported having an affair that included sexual intercourse but which had little or no emotional involvement. Furthermore, some studies have found a correlation between marital dissatisfaction and extramarital affairs, and there is some evidence that this relationship may be stronger for women than for men (Glass and Wright 1985; Hunt 1969). Although these data are correlational, one interpretation is that dissatisfaction in the primary relationship may be a primary factor that leads women to engage in extra-marital sexual intercourse.

Additional Findings Supporting Socio-cultural Accounts of Attractiveness

One point that is rarely discussed by evolutionary psychologists writing about attraction is that features that are considered attractive are often ones that could not possibly signal genetic fitness. For example, consider the fickle nature of male facial hair fashion over the years. What is considered to be appealing in facial hair has ranged from smooth skin to beards, goatees, and/or mustaches. The same is true for hair length and texture (shaved heads, short vs. long hair or straight vs. permed). It is not simply that norms change over time regarding what is stylish, but rather that these changes are also accompanied by strong feelings of attractiveness versus unattractiveness (e.g., consider current reactions to the long sideburns often seen in movies from the 1970s). The fact that such visceral reactions are as malleable as they are should tell us that it cannot be assumed that strongly held preferences necessarily signal genetically wired-in preferences that are keyed to a potential mate’s genetic fitness.

Indeed, several of the findings from the current study as well as from previous work on feminized and masculinized faces seem to suggest the importance of socio-cultural factors in judgments of attractiveness. First, women and men agreed in their choice that a feminized male face was considered the most attractive. Similar results were also reported in an earlier paper by Penton-Voak and his collaborators (Perrett et al. 1998). The broad agreement between the genders on what is attractive in male and female faces also seems much easier to explain if one invokes social learning mechanisms rather than specific adaptive mate-selection mechanisms in women that are tied to hormonal changes.

Second, ethnicity influenced attractiveness judgments. The present study found that a sample of predominantly Caucasian participants had a preference for more feminization of Caucasian male faces as compared to Japanese faces, which mirrors the findings of Perrett et al. (1998). Perrett et al. additionally found the opposite effect in Japanese participants, namely a preference for more feminization of Japanese faces relative to Caucasian faces. Perrett and colleagues actually acknowledge that this within culture preference for greater feminized faces indicates a role for learning in determining what people find attractive.

Broader Conceptual Issues

This brings up one of the major limitations, not only in the reasoning of Penton-Voak and colleagues, but also in much of the current work in evolutionary psychology. Writers working in this tradition seem often to assume that if an effect or difference of some kind is identified and some potential evolutionary analysis can be constructed, this provides overwhelming evidence that the effect is indeed innately wired in by evolution. This assumption is particularly apparent in studies of gender differences. In fact, uncovering a gender difference on a particular measure in and of itself says very little about the origin of that difference. For example, finding that men predict they would feel more sexual jealousy than women does not inform us whether this is some innate difference in jealousy mechanisms, or some effect of socio-cultural and gender roles or some more circuitous pathway (due to men’s general interest in sex or a reaction to the different cognitive implications that the genders draw about the meaning of sexual infidelity—see Harris 2003 for a discussion).

Moreover, even when effects are shown to occur consistently across several cultures, this does not necessarily imply that the effect is a result of evolutionary pressures selecting for it directly. This point is illustrated nicely in work by Eagly and Wood (1999) on mate preferences. In a reanalysis of Buss’ (1989) data from 37 cultures, Eagly and Wood found that the magnitude of several gender differences was correlated with societal factors across cultures. For example, as gender equality increased there was a decrease in women’s desire for good earning potential in a mate, suggesting the importance of social and economic factors in influencing mate preferences. Furthermore, Eagly and Wood point out that the gender difference in how much a “good cook and housekeeper” was desired in a mate “was of comparable magnitude to those obtained on the attributes most strongly emphasized by evolutionary psychologists” (p.417). Yet, it can hardly be contended that the preference for a good cook is an innate preference. Such findings point to another serious limitation to the current practice of evolutionary psychology related to the one noted above, namely, the failure to attempt to test alternative hypotheses, including less domain specific mechanisms of any type, (including socio-cultural explanations).

In closing, the present article has considered the proposal that female preferences for facial masculinity represent an evolved (and menstrual-cycle dependent) adaptation. The theoretical basis for this account has been questioned, and new data have been presented that challenge its empirical solidity. It is clear that further validation with additional (and, ideally, even larger) samples is needed before the validity of this phenomenon can be determined. Even if the effect should turn out to be real, its connection to the theoretical analysis offered by Penton-Voak and colleagues is open to question, for the reasons described above. There is no doubt that Penton-Voak and colleagues deserve credit for offering a very intriguing hypothesis about facial attractiveness, and for devising clever methods of generating stimuli to examine these hypotheses. However, the general assumption in the literature that their findings and theoretical analysis are well established seems to require serious reconsideration.