Introduction

Male homosexuality runs in families, and twin studies have shown that genetic contributions appear to account for a moderate proportion of the variation in male sexual orientation with heritability estimated at ~ 32% (for review, see Bailey et al., 2016). Three genome-wide linkage studies (GWLS) have been conducted on male sexual orientation, all focusing on concordant sibling pairs (2010homosexual brothers)—we refer here to these GWLS datasets as Hamer (Mustanski et al., 2005), MGSOSO (Molecular Genetic Study of Sexual Orientation) (Sanders et al., 2015), and Canadian (Ramagopalan et al., ). The Hamer GWLS combined samples from two earlier studies (Hamer et al., 1993; Hu et al., 1995) with newly collected families (Mustanski et al., 2005) to total 155 independent concordant sibling pairs in 145 families. While linkage to chromosome Xq28 was prominent in the earlier linkage studies focusing on chromosome X (Hamer et al., 1993; Hu et al., 1995), the Hamer GWLS instead had its strongest finding of suggestive linkage at chromosome 7q36 (Mustanski et al., 2005). Another research group collected 55 families in Canada and performed a GWLS, with the strongest (albeit not significant) linkage reported at chromosome 14q32 (Ramagopalan et al., 2010). The MGSOSO performed a GWLS on 409 independent concordant sibling pairs in 384 families, making its strongest finding of significant (Lander & Kruglyak, 1995) linkage at pericentromeric chromosome 8 and also detecting suggestive (Lander & Kruglyak, 1995) linkage (supportive evidence of previous findings) at chromosome Xq28 (Sanders et al., 2015). In order to extract the maximal positional information from GWLS of currently available family resources, we jointly analyzed the Hamer and MGSOSO datasets (and included the Canadian dataset by meta-analyzing published summary statistics).

Method

Joint Linkage Analyses

The two jointly analyzed datasets used very similar phenotype definitions for homosexual men from their questionnaire data: Hamer used “Kinsey 5–6” for several questions (attraction, fantasy, behavior, and self-identification) (Mustanski et al., 2005), and MGSOSO used “Kinsey 5–6” for fantasy along with homosexual identity (Sanders et al., 2015). The Hamer dataset consisted of 441 individuals in 145 families genotyped with 408 short tandem repeat polymorphism genetic markers (STRPs) (Mustanski et al., 2005), and the MGSOSO dataset consisted of 908 individuals in 384 families and genotyped with 45,387 single-nucleotide polymorphism genetic markers (SNPs) (Sanders et al., 2015). Various quality control steps had already been performed in the respective GWLS as previously detailed (Mustanski et al., 2005; Sanders et al., 2015). After obtaining collaborative access to genotypes for each dataset, we conducted multipoint nonparametric linkage analyses with Merlin v1.1.2 (Abecasis et al., 2002) on the Hamer (Mustanski et al., 2005) and MGSOSO (Sanders et al., 2015) datasets separately since they were genotyped differently (STRPs vs. SNPs). To integrate, we found the genetic positions of the respective markers in the Rutgers Map v.3 (hg19 build) (Nato et al., 2018) and then used the nonparametric S-pairs and grid 1 cM options to perform multipoint linkage on both data sets, followed by combining LOD scores at each grid position across the marker sets.

Meta-Analyses of Summary Statistics

For phenotype definitions for homosexual men, the Canadian dataset used an interview approach based on identity and corroboration by sibling, and on a sub-sample all also had Kinsey 5–6 for several questions (attraction, fantasy, and behavior) (Rice et al., 1999a, b). As we were unable to access genotypes for the Canadian dataset (accounting for < 10% of the families in GWLS on the trait), we were only able to incorporate the Canadian GWLS by meta-analyzing summary statistics. Thus, we used the plotted multipoint Canadian GWLS Fig. 1 (Ramagopalan et al., 2010) and interpolated into cM bins enabling use of GWLS meta-analytic methods not needing genotypes, namely the multi-scan probability (MSP) approach utilizing regional p-values (Badner & Gershon, 2002), and the rank-based genome scan meta-analysis (GSMA) approach (Levinson et al., 2003; Wise & Lewis, 1999).

Fig. 1
figure 1

Joint multipoint linkage analysis of the combined Hamer and MGSOSO datasets. Multipoint LOD scores are plotted v. the chromosomal positions for the nonparametric linkage analysis. Adjacent chromosomes are separated by alternating black and gray lines

Results

The multipoint plots for the Hamer and the MGSOSO datasets for the current analyses (Supplementary Figs. 1 and 2, respectively) line up very well with the original GWLS manuscripts’ multipoint plots–Fig. 1a (Mustanski et al., 2005) and Fig. 1 (Sanders et al., 2015), respectively. This overlap of multipoint findings was found despite some differences between the original reports (Mustanski et al., 2005; Sanders et al., 2015) and the current manuscript in statistical analysis software (Aspex vs. Merlin for the Hamer dataset) and genetic map used (deCode vs. Rutgers for both the Hamer and MGSOSO datasets). The joint analysis of the combined Hamer and MGSOSO datasets is shown in Fig. 1, with zoomed-in plots of the top two multipoint linkage peaks from this joint GWLS depicted for chromosomes 8 and X in Fig. 2. The results of the meta-analyses of summary statistics from Hamer, MGSOSO, and Canadian GWLS datasets are presented in Supplementary Tables 1 (MSP) and 2 (GSMA).

Fig. 2
figure 2

Individual chromosomal plots of multipoint linkage peaks on chromosomes 8 and X. Multipoint LOD scores are plotted v. the chromosomal positions for the nonparametric linkage analysis, the red line indicating the results from the Hamer dataset, the blue line for the MGSOSO dataset, and the black line for the joint analysis. Chromosome 8 (top) has its multipoint linkage peak in the pericentromeric region, and chromosome X (bottom) has its multipoint linkage peak at Xq28

Discussion

Our primary analysis for this investigation was the joint analysis of multipoint linkage from the Hamer and MGSOSO datasets (Mustanski et al., 2005; Sanders et al., 2015), to which each dataset contributed some peaks (Fig. 1, Supplementary Figs. 1 and 2). Overall, the maximum multipoint peaks increased little in height, though the pericentromeric chromosome 8 peak was broadened (Fig. 2). Chromosomes 8 and X retained the highest multipoint peaks genome-wide, mostly arising from the larger (MGSOSO) dataset (Fig. 2). The joint analysis gives a more comprehensive picture of shared and heterogeneous linkage regions (e.g., at pericentromeric chromosome 8), the studies share overlapping peaks (possibly suggesting heterogeneity, perhaps with different genes involved in the different datasets), and the evidence broadens the search. The secondary analyses on summary statistics using MSP and GSMA to incorporate all three (Hamer, MGSOSO, Canadian) GWLS datasets showed no genome-wide significant results though suggestive findings remained present. The joint analysis of multipoint linkage (Fig. 1) extracted the available positional information from collaborating GWLS, though previous GWLS findings were not much further strengthened in these analyses. Nevertheless, this provides information to complement other approaches, such as helping prioritize findings from GWAS. Linkage and association studies measure different genetic properties (i.e., segregation of a region within families, vs. correlation of alleles in a population), both of which provide clues about underlying trait genetics. Thus, since GWLS are different from GWAS, we were unable to directly combine any GWAS (e.g., Ganna et al., 2019) with the studied GWLS in our GWLS meta-analysis. Limitations include those inherent to linkage (as opposed to GWAS) of traits with complex genetics (e.g., their limited utility for phenotypes with contributions from more than one or a few genes); on the other hand, linkage retains some advantages over association approaches, such as being robust to allelic heterogeneity (Lipner & Greenberg, 2018). Accumulating genetic studies of the trait such as by much enlarged GWAS (e.g., Ganna et al., 2019) will be especially useful, given its successful application in the study of other phenotypes manifesting complex genetics (e.g., Fig. 3b in Sullivan et al. (2018)).