Main text

Sex-linked markers (SLMs) are important in both theoretical and applied biological sciences, especially in conservation. For example, such markers can provide valuable insight into sex chromosome turnover rates (Charlesworth and Mank 2010; Kitano and Peichel 2012). Further, in the field of population genetics they allow the inference of sex-specific demographic events due to their sex-specific inheritance, for instance the comparison of mitochondrial and Y chromosome markers can reveal sex-specific migration rates (Petit et al. 2002; Wilson Sayres 2018). Lastly, sex ratio is a key component ecological and demographic studies, particularly when females and males differ in life-history traits (e.g. Tsai et al. 2014; Pillans et al. 2021). In species with a heterogametic sex, markers on sex chromosomes provide a way to identify sex from DNA samples when sexual dimorphism is absent and destructive sampling is not an option, e.g. for Threatened species (Stovall et al. 2018; Suda et al. 2019) or when sampling in no-take zones.

Despite the importance of SLMs, only a few methods and tools exist to identify them in non-model species. Most approaches are focussed on presence-absence and heterozygosity patterns of Restriction-site-Associated DNA sequencing (RADseq) data to differentiate the heterogametic (XY or ZW) from the homogametic (XX or ZZ) sex (Gamble and Zarkower 2014; Fowler and Buonaccorsi 2016; Gamble 2016; Hill et al. 2018). Most of the workflows following this approach use demultiplexed FASTQ files and are based in Stacks or RADtools (see Gamble and Zarkower 2014; Fowler and Buonaccorsi 2016), which can be computationally intensive, or the faster RADSex software (based in C++; Feron et al. 2021). Currently, the only methods that can identify SLMs from genotyped single nucleotide polymorphism (SNP) data are outliers detection methods (e.g. BayeScan; Foll and Gaggiotti 2008) where the data is partitioned by sex (e.g. Benestan et al. 2017; Trenkel et al. 2020).

Many elasmobranchs (sharks and rays) are threatened with extinction (Dulvy et al. 2014, 2021). Their slow life history, low fitness and low connectivity between populations, which is often male biased (see Phillips et al. 2021), has instigated many conservation genomic studies in elasmobranchs (see Ovenden et al. 2018; Green et al. 2022; Devloo-Delva et al. 2023a). However, to date, no studies have used these available genomic resources to investigate the value of SLMs for sex identification and population genetics. In this study, we introduce a tool that can analyse the existing genomic data, such as RADseq, Diversity Arrays Technologies (DArTseq), and genotyping by sequencing (GBS) to look for differential signals between heterogametic and homogametic individuals and identify SLMs on the sex chromosomes.

Specifically, we have designed a function, ‘sexy_markers’, as part of the radiator package (Gosselin et al. 2020) in the R environment (R Core Team 2020) which tests three different scenarios to find markers on the sex chromosomes under the assumption that one sex is heterogametic (see Supplementary Material Sect. 1): (i) markers are only present in females or males, (ii) markers are homozygous in one sex while exhibiting an intermediate range of heterozygosity in the other sex, (iii) markers have double the read depth in females or males. Here, the first scenario identifies markers on the sex chromosome unique to the heterogametic sex and the latter two detect markers on the sex chromosomes shared by the heterogametic and homogametic sexes. This function also allows the re-assignment of genetic sex when markers on the heterogametic sex chromosome are identified. The function is based on a visual identification of SLMs from genotyped RAD-type data after minimal data-quality filtering. Radiator is developed to import bi-allelic SNP data from vcf and csv files from several genotype callers: DArTsoft14, Stacks, GATK, platypus, samtools and ipyrad.

We demonstrated the workflow and accuracy of this function using a White Shark example (Carcharodon carcharias, listed as Vulnerable to extinction; Rigby et al. 2019), with a DArTseq dataset of 558 individuals and 23,393 SNPs (Bruce et al. 2018; Hillary et al. 2018). We further compare our function to alternative approaches that use genotyped SNP data as input (i.e. fixed allele differences): ‘gl.report.sexlinked’ from dartR package (Gruber et al. 2018; Mijangos et al. 2022), OutFLANK (Whitlock and Lotterhos 2015), and PCadapt (Luu et al. 2017). These outlier methods identify markers with differences in allele frequencies between the sexes, i.e. at homologous regions between the sex chromosomes (Robledo-Ruiz et al. 2023). The SLMs were validated using polymerase chain reactions (PCR) with primers designed from SLMs that were mapped to the reference genomes from Marra et al. (2019) and the Vertebrate Genome Project (VGP; https://vgp.github.io/genomeark/; NCBI RefSeq accession GCF_017639515.1). Autosomal primers in the beta-actin gene were also designed from the reference genome to act as a positive control between sexes. Primer sequences and PCR conditions are described in the Supplementary Material (Sect. 4).

Overall, we found nine Y-linked and 406 X-linked markers using the ‘sexy_markers’ function in less than 5 min computation time. The nine heterogametic SLMs allowed us to assign sex to 43 individuals with unknown visual sex and showed a 6.7% phenotypic – genotypic sex discrepancy across the 402 sexed sharks that passed quality filtering. The latter is most likely explained by human error, although hermaphroditic elasmobranchs have been described sporadically (reviewed in Adolfi et al. 2019). Further, the outlier methods identified 131 and 2720 SLMs for OutFLANK and PCadapt respectively (10 markers in common), but only PCadapt had 16 markers in common with the ‘sexy_markers’ approach (see Supplementary Material Sect. 2). We were able to confidently blast 179 SLMs (seven Y-linked and 172 X-linked markers) to 49 scaffolds from the Marra et al. (2019) genome, of which 47 SLMs mapped to five scaffolds (i.e. putative sex scaffolds). Eight Y-linked and 215 X-linked markers had confident BLAST hits (see Supplementary Material Sect. 3) to eight scaffolds from the VGP genome, with the majority (199 SLMs) mapping to three scaffolds. Overall, we conclude that 48% of the 415 identified SLMs were located in close proximity on putative sex chromosome scaffolds. These markers were considered as a reference to test the accuracy of the ‘sexy_markers’ function with suboptimal data (Fig. 1). By randomly sampling 6, 12, 24, 48, 72, 96, 120, 144, 252 and 348 individuals for 200, 1000, 2000, 5000, 7500, 10000, 12500, 15000, 17500 and 20000 markers, we showed that too few individuals (< 100) and too few markers (< 10,000 or 50% of the total data) will identify false positive SLMs (i.e. not in common with the SLMs from the full data; Fig. 1A-B). This result was more pronounced when the female:male sex ratio was skewed (2:1 or 1:2; Fig. 1C-F). The Y- and X-linked markers were validated through multiplex PCR (Fig. 2; Supplementary Material Sect. 4). PCR results showed that males amplified for the Y-chromosome fragments, while X and beta actin fragments were present in both sexes. The same individuals that had a phenotypic – genotypic sex mismatch based on the heterogametic SLMs (70 base pairs) also showed this discrepancy for the Y-chromosome fragment (655 base pairs; Fig. 2).

Fig. 1
figure 1

Robustness test for the ‘sexy_markers’ R function on White Shark, Carcharodon carcharias, data. The heatmaps show data with varying numbers of markers and individuals, and a female:male (F-M) sex ratio of 1:1 (A,B), 1:2 (C,D) or 2:1 (E,F). Each square is coloured according to correctly identified sex-linked markers (A,C,E) or false positive detections (B,D,F).

In general, these results confirm that the White Shark has partially non-recombining sex chromosomes (X and Y), males being the heterogametic sex. This is the first study to validate male heterogamety in the White Shark with sufficiently high sample size (n = 558); an observation also obtained using karyotyping (Maddock and Schwartz 1996), where the authors suggested the White Shark and several other elasmobranchs possess X and Y sex chromosomes, albeit with low samples sizes (n = 1). Further, we showed the utility and robustness of the ‘sexy_markers’ function for species with distinct sex chromosomes, where species with larger non-recombining regions have a higher chance of finding Y/W-linked markers. Importantly, the function takes genotyped SNP data as input (whereas other software require demultiplexed FASTQ files), which allows a more versatile use of previously published datasets for comparative studies. Finally, we developed a quick (~ 2-hour) PCR assay to identify the sex of sampled White Sharks. This tool will prove useful for sex identification in species that do not display obvious morphological differences between sexes. For instance, most juvenile sharks without developed external sex organs or samples obtained from processed carcasses (e.g. fisheries or fin trade) could be sexed using our method. Future studies include applying the R function on other species, as well as utilising the sex-linked markers for population genetic studies.

Fig. 2
figure 2

PCR results of Y- and X-chromosome amplicons between female (F) and male (M) White Sharks, Carcharodon carcharias, showing 655 and 164 base pairs (bp) fragments, respectively. Conserved beta-actin amplicons (287 bp) serve as a positive control. PCR products were visualised on a 2% agarose gel, stained with SYBR safe (Invitrogen, USA) and a TriDye 1 kb DNA ladder (New England BioLabs, UK) for size reference. The first row shows males that were identified as males by both visual assignment and PCR assay. The second row shows females that were identified as females by both visual assignment and PCR assay. The last row shows the individuals that had a discrepancy between phenotypic and genotypic sex