Development of 79 SNP markers to individually genotype and sex-type endangered mountain gorillas (Gorilla beringei beringei)

The mountain gorilla (Gorilla beringei beringei) is one of two endangered subspecies of eastern gorilla. The principle approach to monitoring the two extant mountain gorilla populations has been to use fecal surveys to obtain DNA profiles for individuals that are then used for capture-recapture-based estimates of abundance. To date, 11 to 14 microsatellites have been used for this purpose. To adapt to ongoing changes in genotyping technologies and to facilitate the analysis of fecal DNA samples by multiple laboratories, we developed a panel of single nucleotide polymorphism (SNP) markers that can be used for future gorilla monitoring. We used published short read data sets for 3 individuals to develop a suite of 79 SNPs, including two sex markers, for a Fluidigm platform. This marker set provided high resolution to differentiate individuals and will facilitate future monitoring, leaving room for additional SNPs to be included in a 96-assay format.

The mountain gorilla (Gorilla beringei beringei) is one of two subspecies of eastern gorilla, and is currently recognized as endangered by the IUCN . Today, two isolated populations of mountain gorilla remain, one in Bwindi Impenetrable National Park, Uganda and Sarambwe Nature Reserve, Democratic Republic of Congo (DRC) and the other in the Virunga Massif of Uganda, Rwanda, and DRC. Since the early 2000s, monitoring the abundance of these populations has involved regular noninvasive fecal DNA surveys (e.g., Guschanski et al. 2009). Genetic analyses are used to identify individuals from fecal DNA samples, which enable mark-recapture analyses to estimate total numbers of individuals within the population (Roy et al. 2014;Hickey and Sollman 2018). In past surveys, microsatellite markers have been used for this purpose (Guschanski et al. 2009;Roy et al. 2014;Hickey and Sollman 2018;Granjon et al. 2020). Although these markers performed adequately in the past, the technology required to support their use is becoming obsolete and will eventually be unsupported (von Thaden et al. 2017). Additionally, microsatellites are challenging to calibrate between laboratories. Therefore, we developed a set of 79 SNPs with high resolution to differentiate even close relatives that can be used by any participating laboratory for future surveys of these and other eastern gorilla populations.
To discover SNPs, we obtained raw sequencing reads from entire genomes (80-98 Gbp each) of 3 female mountain gorillas from the Virunga Massif (none were available from Bwindi): Turimaso (ENA No. ERS525618), Maisha (ENA No. ERS525616), and Uririmo (ENA No. ERS525617) (Xue et al. 2015). We aligned trimmed reads to the western lowland gorilla (Gorilla gorilla gorilla) genome (gorGor5; Gordon et al. 2016) using bwa-mem (Li 2013), and called variants using freeBayes (v0.9.21-19-gc003c1e; Garrison and Marth 2012), which resulted in identification of 12,894,536 variable sites (i.e., SNPs). We then filtered SNPs using vcftools (Danecek et al. 2011), retaining biallelic SNPs with 3 of each allele (-mac 3) and with read depths ranging 20-40x (-minDP 20, -maxDP 40); the 1 3 average read depths of genomes were ~ 25-30x, so our narrow depth range ensured removing sequence repeats and low-coverage sites. We further removed sites that were heterozygous in all 3 individuals, ensuring that all remaining sites had a representative of all 3 possible genotypes, resulting in 132,958 sites. We then mapped contigs to the human genome to obtain orthologous positions, and removed all sites that were on gorilla contiguous sequences < 3 Mbp long (Gordon et al. 2016), leaving 115,106 sites with locations on chromosomes that were known with high confidence. Finally, we selected 129 candidate SNPs that were distributed approximately evenly across the chromosomes (i.e., 37 Mbp [range: 16.5-58.5 Mbp] apart on average). To facilitate primer design, we also limited selection to SNPs for which the regions flanking them for 200 bp in 3' and 5' directions included no other known SNPs or indels (i.e., based on the unfiltered vcf file).
To design 2 sex markers (i.e., bringing the total to 131 candidate SNPs), we used sequences from human X chromosome and Y chromosome introns of the amelogenin and zinc finger genes (Kim et al. 2010). We used basic local alignment search tool (BLAST) in Genbank to obtain gorilla Y chromosome sequences that were orthologous to the Y chromosome introns of the amelogenin gene (Genbank accession Nos. FJ532255.1) and the zinc finger genes (AH014841.2 ZFY). However, this procedure produced no X chromosome orthologs for gorillas. Therefore, we used the human sequences for these X chromosome loci as references to extract reads from a female gorilla whole genome sequence (Turimaso) using bwa-mem. We manually aligned reads and BLAST product sequences, respectively, for gorilla X and Y chromosome paralogs of these genes to verify that the same 3 SNPs differing between human X and Y chromosome introns also differed between the gorilla X and Y chromosomes. However, we also discovered 3 additional variants in gorillas corresponding to sites in the published human SNP primers, which required us to redesign primers specifically for gorillas.
We used Fluidigm's D3 Assay Design Tool (https:// d3. fluid igm. com) in conjunction with the flanking sequences for the 131 SNPs (Supporting Information, Supporting text 1) to design primers, which we ordered from Fluidigm (Fluidigm, San Francisco USA; Supporting information, Table S1). We used a set of 561 DNA samples extracted from mountain gorilla feces collected from nests in Bwindi during 2018 that had been individually identified with microsatellites and which included ≥ 1 sample from each of 450 putative individuals (i.e., all but one of 451 identified in the survey; Hickey et al. 2019;Hu 2020). This sample set included arbitrarily selected duplicates of the same DNA extracts and extracts from multiple fecal samples assigned with microsatellites to the same individual. After screening all candidate SNPs against 94 of these samples, we retained 96 of the SNPs for further evaluation against the remaining 467 samples. From the 96 loci screened against the entire samples set, we selected 79 SNP loci based on consistent cluster separation and high call rates.
We used the 96.96 Fluidigm Dynamic Arrays with integrated fluidic circuits (IFCs) run on the Biomark HD system, initially following the manufacturer's guidelines, except that we used 18 specific target amplification (STA) PCR cycles (instead of 14) during a pre-amplification step to increase the concentration of marker-specific DNA to be used as template for the allele specific (ASP) reactions (von Thaden et al. 2017), and we diluted the STA PCR product 1:10 rather than 1:100 before adding to the ASP reactions; finally, we used 45 cycles instead of 38 cycles in the ASP PCR reaction. We called genotypes using Fluidigm SNP Genotyping Analysis Software v4.3.2. We estimated heterozygosity, unbiased probabilities of identity (P ID ) and identity of siblings (P IDsibs ) in Gimlet (v. 1.3.3;Valière 2002).
Based on the 79 final loci, we obtained > 95% call rates on 488 of the 561 (87%) fecal DNA samples. For 475 genotypes where both sex markers (GorAmelo2-CG, GorZF1-CT) yielded genotypes, 471 pairs (99.16%) agreed, of which 468 of these samples (99.14%) also agreed with sexes typed from the microsatellite study (Hickey et al. 2019), suggesting < 1% sex-typing error rate. Based on 21 pairs of replicate genotypes (same fecal extract), the overall agreement was 98.6% (SD = 0.007%), indicating a genotyping error rate < 2%. Similarly, 41 pairs of fecal samples previously determined based on microsatellites to be from the same individual matched at 99.3% (SD = 0.027) of their SNP alleles on average. Conversely, pairwise comparisons among genotypes from 424 putative distinct individuals (as identified through microsatellites) matched at 73.9% (SD = 0.038) of alleles on average. These metrics, along with the average expected and observed heterozygosity (0.34 and 0.33, respectively) and the combined P ID and P IDsibs (1.6 × 10 -23 and 2.2 × 10 -12 , respectively; Supplementary information, Table S2), indicate high resolution to differentiate individuals from the Bwindi population. Because the markers were designed using genomes of mountain gorillas from the Virunga Massif and because we did not screen markers on the basis of polymorphism in the Bwindi population, they should perform as well or better on gorillas from the Virunga population, for example exhibiting polymorphism in the 15 SNPs that were monomorphic in the Bwindi population. Despite ascertainment bias favoring the Virunga population in particular and mountain gorillas in general, these markers also would likely perform well for eastern lowland (G. b. graueri), and possibly western (G. gorilla) gorillas as well, due to their higher effective population sizes (Xue et al. 2015), although empirical confirmation would be necessary.