Human Genetics

, Volume 116, Issue 4, pp 272–278

A genomewide scan of male sexual orientation


    • Laboratory of Biochemistry, National Cancer InstituteNational Institutes of Health
    • Institute for Juvenile Research Department of PsychiatryUniversity of Illinois at Chicago (M/C 747)
  • Michael G. DuPree
    • Laboratory of Biochemistry, National Cancer InstituteNational Institutes of Health
    • Department of AnthropologyPennsylvania State University
  • Caroline M. Nievergelt
    • Department of PsychiatryUniversity of California
  • Sven Bocklandt
    • Laboratory of Biochemistry, National Cancer InstituteNational Institutes of Health
    • Department of Human GeneticsDavid Geffen School of Medicine at UCLA
  • Nicholas J. Schork
    • Department of PsychiatryUniversity of California
  • Dean H. Hamer
    • Laboratory of Biochemistry, National Cancer InstituteNational Institutes of Health
Original Investigation

DOI: 10.1007/s00439-004-1241-4

Cite this article as:
Mustanski, B.S., DuPree, M.G., Nievergelt, C.M. et al. Hum Genet (2005) 116: 272. doi:10.1007/s00439-004-1241-4


This is the first report of a full genome scan of sexual orientation in men. A sample of 456 individuals from 146 families with two or more gay brothers was genotyped with 403 microsatellite markers at 10-cM intervals. Given that previously reported evidence of maternal loading of transmission of sexual orientation could indicate epigenetic factors acting on autosomal genes, maximum likelihood estimations (mlod) scores were calculated separated for maternal, paternal, and combined transmission. The highest mlod score was 3.45 at a position near D7S798 in 7q36 with approximately equivalent maternal and paternal contributions. The second highest mlod score of 1.96 was located near D8S505 in 8p12, again with equal maternal and paternal contributions. A maternal origin effect was found near marker D10S217 in 10q26, with a mlod score of 1.81 for maternal meioses and no paternal contribution. We did not find linkage to Xq28 in the full sample, but given the previously reported evidence of linkage in this region, we conducted supplemental analyses to clarify these findings. First, we re-analyzed our previously reported data and found a mlod of 6.47. We then re-analyzed our current data, after limiting the sample to those families previously reported, and found a mlod of 1.99. These Xq28 findings are discussed in detail. The results of this first genome screen for normal variation in the behavioral trait of sexual orientation in males should encourage efforts to replicate these findings in new samples with denser linkage maps in the suggested regions.


Although most males report primarily heterosexual attractions, a significant minority (approximately 2%–6%) of males report predominantly homosexual attractions (Diamond 1993; Laumann et al. 1994; Wellings et al. 1994). Multiple lines of evidence suggest that biological factors play a role in explaining individual differences in male sexual orientation (MIM 306995). For example, the third interstitial nuclei of the human anterior hypothalamus (INAH3), which is significantly smaller in females, is also reported to be smaller in homosexual males (LeVay 1991). Byne and colleagues (2001) followed up on this finding by reporting a trend for INAH3 to occupy a smaller volume in homosexual men than in heterosexual men, with no significant difference in the number of neurons within the nucleus. Neuropsychological studies have reported differences in performance with respect to tasks that show sex differences, such as spatial processing (e.g., Rahman and Wilson 2003), which may indicate differences in relevant neural correlates (e.g., parietal cortex). The strong link between adult sexual orientation and childhood gender-related traits expressed at an early age (Bailey and Zucker 1995) suggests that such biological influences act early in development, possibly prenatally. Similarly, the correlation between sexual orientation and a variety of prenatally canalized anthropometric traits suggests that sexual orientation differentiation probably occurs before birth (for a review, see Mustanski et al. 2002). Despite this evidence, specific neurodevelopmental pathways have yet to be elucidated.

Family and twin studies have provided evidence for a genetic component to male sexual orientation. Family studies, using a variety of ascertainment strategies, document an elevation in the rate of homosexuality among relatives of homosexual probands (for a review, see Bailey and Pillard 1995). Several family studies report evidence of increased maternal transmission of male homosexuality (Hamer et al. 1993; Rice et al. 1999a), whereas others find no increase relative to paternal transmission (Bailey et al. 1999; McKnight and Malcolm 2000). Twin studies consistently show that male sexual orientation is moderately heritable (for a review, see Mustanski et al. 2002). For example, two recent twin studies in population-based samples both report moderate heritability estimates, with the remaining variance being explained by nonshared environmental influences (Kendler et al. 2000; Kirk et al. 2000). The results from family and twin studies demonstrate that sexual orientation is a complex (i.e., does not show simple Medelian inheritance) and multifactorial phenotype.

A more limited number of studies have attempted to map specific genes contributing to variation in sexual orientation. Given the evidence for increased maternal transmission, initial efforts focused on the X chromosome. One study produced evidence of significant linkage, based on Lander and Kruglyak (1995) criteria, to markers on Xq28 (Hamer et al. 1993). Another study, from the same laboratory but with a new sample, reported a significant replication of these findings (Hu et al. 1995). An independent group produced inconclusive results regarding linkage to Xq28 (discussed in Sanders and Dawood 2003) but did not publish the findings in a peer-reviewed journal. All three of these studies excluded families showing evidence for non-maternal transmission. A fourth study from another independent group found no support for linkage, even when excluding cases with suggestive father-to-son transmission (Rice et al. 1999b). An analysis of the results across all four studies produced a statistically suggestive multiple scan probability (MSP) value of 0.00003 (Sanders and Dawood 2003). Two candidate gene studies have been conducted, both producing null results: one for the androgen receptor (AR; Macke et al. 1993) and another for aromatase (CYP19A1; Dupree et al. 2004), on Xq12 and 15q21.2, respectively.

Given the complexity of sexual orientation, numerous genes are likely to be involved, many of which are expected to be autosomal rather than sex-linked. Indeed, the modest levels of linkage that have been reported for the X chromosome can account for, at most, only a fraction of the overall heritability of male sexual orientation as deduced from twin studies. Therefore, we have undertaken a genomewide linkage scan to aid in the identification of genes contributing to variation in sexual orientation. As in previous studies, we diminished the probability of false positives (i.e., gay men who identify as heterosexual) by only studying self-identified gay men. Unlike previous studies that have focused solely on the X-chromosome and thus excluded families showing evidence of non-maternal transmission, this study did not use transmission pattern as an exclusion criteria. To consider the possibility that previously reported evidence of maternal loading of transmission of sexual orientation was attributable to epigenetic factors acting on autosomal genes, we calculated maximum likelihood estimations (mlod) scores separated by maternal or paternal transmission and the combined statistic. Based on Lander and Kruglyak’s (1995) criteria, we found one region of near significance and two regions close to the criteria for suggestive linkage.

Materials and methods

Family ascertainment and assessment

The sample consisted of a total of 456 individuals from 146 unrelated families, of which 137 families had two gay brothers and 9 families had three gay brothers. Thirty of the families included one parent, and 30 of the families included both parents. Additionally, 46 of the families included at least one heterosexual male or female full sibling (up to 6 additional siblings per family). The sample included 40 families previously reported by Hamer et al. (1993), 33 families previously reported by Hu et al. (1995), and 73 previously unreported families. The 73 previously described families were selected for the presence of two gay brothers with no indication of non-maternal transmission by the criteria described previously (Hamer et al. 1993; Hu et al. 1995). For the 73 new families, the sole inclusion criterion was the presence of at least two self-acknowledged gay male siblings.

Subjects were recruited through advertisements in local and national homophile publications as described elsewhere (Hamer et al. 1993; Hu et al. 1995). The participants were predominantly white (94.5%), college educated (87.4%), and of middle to upper socioeconomic status. The mean (SD) age for the gay siblings was 36.98 (8.64). The protocol was approved by the NCI Institutional Review Board, and each participant signed an informed consent form prior to interview, questionnaire completion, and the donation of blood for DNA extraction.

Sexual orientation was assessed through a structured interview or a questionnaire that included a sexual history and the Kinsey scales of sexual attraction, fantasy, behavior, and self-identification (Kinsey et al. 1948). Each scale ranges from 0 (exclusively heterosexual) to 6 (exclusively homosexual). The mean (SD) of these four scales for the gay males in this study was 5.65 (0.46)


DNA was extracted from peripheral blood by a commercial service (Genetic Design, Greensboro, N.C., USA). A multiplex polymerase chain reaction (PCR) was conducted as described (Dupree et al. 2004), with 403 microsatellite markers from the ABI PRISM Linkage Mapping Set Version 2.5 with an average resolution of 10 cM. Following the manufacturer’s guidelines, products were analyzed on an ABI Prism 310 or 3100 and sized with the GeneScan version 3.1.2 program (PE Biosystems, Foster City, Calif., USA), and genotypes were assigned with the Genotyper version 3.6 program (PE Biosystems). A PCR product from a DNA reference sample (CEPH 1347-02) was used to monitor sizing conformity (PE Biosystems). Across the 403 markers, genotypes were ascertained on average for 95% of the 456 individuals. Mendelian incompatibilities (>0.05% of genotypes) were removed from the data prior to analyses by using the sib_clean routine from ASPEX version 2.4 (Hinds and Risch 1996). The computer program CERVUS 2.0 (Marshall et al. 1998) was employed to test for deviation from the Hardy-Weinberg equilibrium (HW) and to calculate polymorphism information contents (PICs) at all loci. We found that the markers had a mean (SD) PIC of 0.76 (0.08), and 1.31% of the markers deviated significantly from HW.

Statistical analyses

Nonparametric exclusion mapping of affected sib-pair data (ASP) was performed by using ASPEX version 2.4 (Hinds and Risch 1996). ASPEX calculates the percentage of identical by descent (%IBD) sharing and reports the proportion of shared alleles of paternal, maternal, and combined origin. The results for alleles of combined origin also include alleles where the parental origin is unknown. We calculated mlod with a linear model and assuming a multiplicative model. The ASPEX SIB_PHASE algorithm was applied; this uses allele frequency information to reconstruct and to phase missing parental information. Sex-specific recombination maps were used for the calculation of multipoint mlod scores. Marker order and map positions were determined by using an integrated map (Nievergelt et al. 2004) based on the deCODE genetic map and updated physical map information.


Results from the multipoint analyses on chromosomes 1 through 22 are shown in Fig. 1 for paternal, maternal, and combined meioses. Our complete genome scan for male sexual orientation yielded three interesting peaks with mlod scores greater than 1.8, located on chromosomes 7, 8, and 10. Table 1 contains additional information concerning these peaks, including the nearest marker, the location, MLOD, and allele sharing. Additionally, Table 1 contains the approximate boundary of the linkage peak, by reporting the approximate cM position at which the mlod score declines below 1.0. For chromosomes 7 and 8, the peak is a result of approximately equal contributions from maternal and paternal transmission, whereas a maternal-origin effect was found for the peak on chromosome 10.
Fig. 1

Genome scan results. The x-axis is the chromosome location (cM), and the y-axis is the mlod score. Graphics included for combined (a), maternal (b), and paternal (c) meioses

Table 1

Chromosomal locations with nominally significant linkage peaks. The cM positions in parentheses indicate the boundary at which the mlod score declines below 1.0. For chromosomes 7 and 8, the position is based on the combined map, but for chromosome 10, the position is based on the female map.

Nearby marker 



Percentage of sharing







169.9 (155.1–end)







54.2 (45.1–64.8)







208.1 (201.8–217.4)






Figure 2 shows the multipoint mlod plots for the X chromosome. Analyses of the full sample (dashed line) did not produce any chromosomal regions with mlod scores greater than 1.0. Given the previous evidence of linkage to Xq28 with a portion of the sample reported here (Hamer et al. 1993; Hu et al. 1995), we performed supplemental analyses to determine why we did not find linkage in the full sample. We began by re-analyzing the data from the previously reported 73 families, which had been selected for showing no evidence of paternal transmission, by using updated marker positions (dotted line). This produced a maximum mlod score of 6.47 for markers in the Xq28 region. We then performed a linkage analysis, with only the markers from the ABI linkage mapping set, on these same 73 families. This produced a maximum mlod score of 1.99 for markers in the Xq28 region. Although the mlod score is higher when using the current markers in the limited sample compared with the full sample (1.99 vs. 0.35), it is still significantly lower than the previously reported markers in the limited sample. We provide Table 2 in order to help clarify these results. Table 2 provides singlepoint and multipoint results for the 73 previously reported families on all markers ever reported from our group, starting with the most telemeric new Xq28 marker. Table 2 makes it clear that, although the multipoint results suggest a dramatic change in mlod score between the current markers and the previously reported markers (6.47 vs. 1.99 for markers 0.62 cM apart), the singlepoint results are not dramatically different (2.23 vs. 1.47). This difference is likely to be attributable to two factors. First, the previous reports focused on the X chromosome and contained many more markers in the Xq28 region; the previously reported markers had an average resolution of 1 marker every 1.12 cM, whereas the current markers had an average resolution 6.97 cM in the Xq28 region. The higher concentration of previously reported markers surely allowed for the extraction of more multipoint linkage information. Second, there were more telomeric markers in the previously reported mapping sets than in the current one. The singlepoint results showed a trend for higher mlod scores closer to the telomere, with the exception of JXYQ28, which had a low PIC (0.28).
Fig. 2

Multipoint linkage analysis for the X chromosome. The x-axis is the chromosome location (cM), and the y-axis is the mlod score. —— Current markers with sample restricted to previously reported families. - - - - Current markers with full sample. ...... Previously reported markers with previously reported families

Table 2

Supplemental analyses comparing Xq28 results across markers reported on in 1995, 1997, and the current report. All analyses reported here are based on the sample restricted to those families previously reported. Current markers and previously reported markers were analyzed separately for the purpose of calculating multipoint mlod scores.


Study year

Location (cM)

Marker distance (cM)

Multipoint mlod (previous markers)

Multipoint mlod (current markers)

Singlepoint mlod




































This study reports results from the first full genome scan for male sexual orientation. Using 73 previously reported families and 73 new families with two or more gay male siblings, we found three new regions of genetic interest. Our strongest finding was on 7q36 with a combined mlod score of 3.45 and equal contribution from maternal and paternal allele transmission. This score falls just short of Lander and Kruglyak’s (1995) criteria for genomewide significance. Several interesting candidate genes map to this region of chromosome 7. Vasoactive intestinal peptide (VIP) receptor type 2 (VIPR2; MIM 601970) is a G protein-coupled receptor that activates adenylate cyclase in response to VIP (Metwali et al. 1996), which functions as a neurotransmitter and as a neuroendocrine hormone. VIPR2 is essential for the development of the hypothalamic suprachiasmatic nucleus in mice (Harmar et al. 2002), which makes it an interesting candidate gene for sexual orientation in view of earlier reports of an enlarged suprachiasmatic nucleus in homosexual men (Swaab and Hofman 1990). Sonic hedgehog (SHH; MIM 600725) plays an essential role in patterning the early embryo, including hemisphere separation (Roessler et al. 1996) and left to right asymmetry (Tsukui et al. 1999). Homosexual men and women show a significant increase in non-righthandedness, which is related to brain asymmetry (Lalumiere et al. 2000).

Two additional regions approached the criteria for suggestive linkage. The region near 8p12 contains several interesting candidate genes, given the hypothesized relationship between prenatal hormones and sexual orientation (Mustanski et al. 2002). Gonadotropin-releasing hormone 1 (GNRH1; MIM 152760) stimulates both the synthesis and release of luteinizing hormone and follicle-stimulating hormone, which are important regulators of steroidogenesis in the gonads, and inhibits the release of prolactin (Adelman et al. 1986). GnRH is synthesized in the arcuate nucleus and other nuclei of the hypothalamus (Kawakami et al. 1975). Steroidogenic acute regulatory protein (STAR; MIM 600617) mediates pregnenolone synthesis and is involved in the hypothalamic-pituitary regulation of adrenal steroid production (Sugawara et al. 1995), which in turn plays an important role in sexual development. Neuregulin1 (NRG1; MIM 142445) produces a variety of isoforms that regulate the growth and differentiation of neuronal and glial cells through interaction with ERBB receptors (Burden and Yarden 1997; Wen et al. 1994).

The 10q26 region is of special interest because it results from excess sharing of maternal but not paternal alleles. Previous studies have suggested that there is an excess of homosexual family members related to the proband through the mother, and we have proposed previously that this might result in part from genomic imprinting (Bocklandt and Hamer 2003). In support of a connection between 10q26 and imprinting, a germline differentially methylated region has been identified at this location by Strichman-Almashanu et al. (2002) who performed a genomewide screen for normally methylated CpG islands and found 12 regions to be differentially methylated in uniparental tissues of germline origin, i.e., hydatidiform moles (paternal origin) and complete ovarian teratomas (maternal origin). Such CpG islands can regulate the expression of imprinted genes over distances of several hundred kilobases. The region around the 10q26 CpG islands includes the brain-expressed gene Shadow of Prion Protein (SPRN), several transcription regulators (ZNF511, VENTX2; MIM 607158), neurotransmitter interacting proteins (DRD1IP; MIM 604647), and cell signaling pathway proteins (INPP5A; MIM 600106, GPR123).

Four previous linkage studies have been conducted on the X chromosome and together produce a statistically suggestive MSP in the Xq28 region (Sanders and Dawood 2003). Because the focus of this study was a full genome scan with the ABI linkage mapping set on a partially new set of families, we began by reporting results for these markers on the full sample. This analysis did not produce evidence of linkage in the Xq28 region; therefore, we conducted supplemental analyses to clarify this result given previous findings. Our first supplemental analysis combined results from the two previous reports from our group (Hamer et al. 1993; Hu et al. 1995) in order to determine the magnitude of the linkage signal in the 73 previously reported families that currently comprised half of the current sample. This produced a mlod of 6.47. To determine whether the lack of linkage evidence in the full sample was attributable to the new markers or the additional families (who were not selected based on family transmission patterns), we then conducted analyses on the previously reported families by using the markers from the ABI linkage mapping set. This produced an mlod score of 1.99. Table 2, which provides a summary of the single point and multipoint results for this comparison, suggests that that the difference in mlod score between the restricted sample with the old and new markers is attributable to the non-optimal position and density of the new markers. The difference in mlod scores between the full sample and the sample restricted to families without evidence of paternal transmission (with the goal of enriching the sample for families showing maternal transmission) denotes the possibility of etiologic heterogeneity for the proposed Xq28 locus.

Several limitations of the current study should be noted. First, we were unable to calculate empirically derived significance levels for this project because none of the simulation programs that currently exist allow for the use of sex-specific maps with ASP data. Future development of simulation programs that allow for the incorporation of this important information will prevent this limitation in the future. Second, our marker set had an average resolution of 10 cM, which may have led to underestimated mlod scores. We discuss in detail above the likely negative effects that this had on our X chromosome results. Optimally, genome scans are followed up with dense markers placed in promising regions, but because of financial limitations, we were unable to do this. Future studies will undoubtedly employ more sophisticated and dense marker sets. Third, we analyzed only 146 independent families, which is a small sample for a complex trait such as sexual orientation. Approximately half of these families have previously been included in reports on the X chromosome (Hamer et al. 1993; Hu et al. 1995). Future research should be conducted on a new and larger sample of participants. Our linkage results should be interpreted with consideration of the fact that we only included families with two self-identified gay brothers. Our results may not extrapolate to individuals who do not meet our exclusion criteria, such as individuals who engage in same-sex behavior but do not identify as gay or individuals who identify as bisexual. The definition of homosexuality is complicated, and future genetic research would benefit from additional phenotype development or the identification of endophenotypes for sexual orientation (Mustanski et al. 2002). The identification of basic processes that underlie sexual orientation could increase the power of future genetic studies. A related limitation is that we did not include females in our study because it is not yet clear if female sexual orientation is determined by the same factors as male sexual orientation (for a discussion, see Mustanski et al. 2002). Future research with mix-sexed samples should help to answer this question. Finally, we did not collect data on the number of older brothers, which shows a robust association with male sexual orientation (Blanchard 2004). Future studies should collect this data to allow for explorations of gene by environment interactions; this could increase the ability to identify genetic loci and also help to elucidate the process linking number of older brothers to sexual orientation.

In summary, we report the first genome scan for loci involved in the complex phenotype of male sexual orientation. We have also identified several chromosomal regions and candidate genes for future exploration. The molecular analysis of genes involved in sexual orientation could greatly advance our understanding of human variation, evolution, and brain development. In the absence of obvious animal models, genetic linkage and association studies provide the best opportunity for discovering these loci.


We thank all the individuals who participated in the project for their time and openness and Lynn Goldin and Danielle Dick for comments on the manuscript. B.S.M. was supported by a NSF Graduate Research Fellowship and an NIH Summer Research Fellowship. N.J.S. and C.M.N. were supported in part by the NHLBI Family Blood Pressure Program (FBPP; HL64777-01).

Copyright information

© Springer-Verlag 2005