De novo discovery of SNPs for genotyping endangered sun parakeets (Aratinga solstitialis) in Guyana

Parrots (Psittaciformes) are among the most endangered groups of birds today and remain threatened by habitat loss and exploitation for the live bird trade. Under such conditions, reliable and non-invasive monitoring techniques are crucial for successful conservation measures. In this study, we developed a panel of 86 high quality SNPs for genotyping endangered sun parakeets (Aratinga solstitialis) in Guyana, which form one of the last known breeding populations of this South American species in the wild. Genotyping was tested on different types of samples (blood, feathers, feces, beak and cloacal swabs). While blood performed best, feathers and feces also yielded reliable results and could thus be used as non-invasive sources of DNA for future population monitoring. Discriminant Analysis of Principal Components (DAPC) on genotypes revealed that Guyanese sun parakeets clustered separately from other psittacine species as well as conspecifics from a captive population. A priori known first-order kinships were also adequately detected by the SNP panel. Using a series of experimental contaminations, we found that contamination from other psittacine species and slight contamination (~ 10%) from conspecifics did not prevent successful genotyping and recognition of individuals. We show that instances of higher conspecific contamination (~ 50%) can be detected through an increased level of heterozygosity that falls outside the distribution of uncontaminated samples.


Introduction
Effective conservation and management of wild animal species often relies on trustworthy estimates of population size and trends (Newson et al. 2008;Marques et al. 2013), which also form vital elements of the IUCN Red List assessment protocol (IUCN 2019). Particularly in the case of rare and threatened species, there is a need for rapid, feasible, and economical methods to obtain reliable metrics of abundance (Keeping 2014) that are also low-impact in terms of disturbance and allow for repeated application. Parrots (Psittaciformes) are among the most endangered groups of birds due to habitat alterations and their exploitation for the live bird trade (Snyder et al. 2000). This situation is particularly acute in the Neotropics, where poaching of parrot nestlings has significantly contributed to the decline of numerous species (Wright et al. 2001).
The sun parakeet (Aratinga solstitialis), also known as sun conure, is a brightly colored, medium-sized (ca. 30 cm), gregarious parrot native to northeastern South America (Fig. 1). Its current distribution appears to be restricted to 1 3  (Collar et al. 2019). Two closely related species A. jandaya and A. auricapillus occur further south in eastern Brazil (Joseph 1992;Ribas and Miyaki 2004) and a new species, A. pintoi, was identified by Silveira et al. (2005) and later renamed as A. maculata (Nemésio and Rasmussen 2009). Since A. maculata had formerly been misidentified as juvenile sun parakeets or hybrids with A. jandaya (Silveira et al. 2005), this split of the A. solstitialis complex drew renewed attention to the status of sun parakeets in the wild (Bergman 2009).
It was found that the once common species had been extirpated from much of its former range since the 1970s due to heavy trapping pressure (Robbins et al. 2004;BirdLife International 2016). As a consequence, the sun parakeet was uplisted to 'endangered' by IUCN in 2008 and is currently believed to be declining (BirdLife International 2016). The total population size in the wild remains uncertain but is estimated to number fewer than 2500 mature individuals (Bird-Life International 2016). In Guyana, one breeding population has been reported from the area surrounding Karasabai, an indigenous community near the Ireng river where flocks are frequently observed (BirdLife International 2016) and the local population size has been estimated to be approximately 200 birds (Bergman 2009). At present, efforts are undertaken by community members to assure the survival of what may be the last remaining large flock in the wild. Local conservation measures include a ban on hunting, trapping or otherwise harming sun parakeets. A monitoring scheme encompassing point counts and nest observations is underway in collaboration with local authorities and international NGOs and increasing community support for conservation is garnered through shared benefits from a budding Ecotourism centered around birdwatching (A. Albert personal communication, 28 November 2016). However, prolonged systematic and direct observations are difficult due to large distances, challenging terrain, the frequent movements of flocks and limited local resources. Census efforts are further hampered by the fact that sun parakeets are sexually monomorphic, which makes discrimination between sexes and recognition of individuals based on visual cues alone impossible and strongly increases the risk of double counts. Molecular methods using genomic markers to genotype individuals can help to circumvent many of these problems. Rapid advances in sequencing power coupled with declining costs have facilitated the study of numerous wild species (Ekblom and Galindo 2011). The use of genomic data offers possibilities beyond simply identifying individuals. For example, it allows for determining effective population size. This is particularly important for small and isolated populations that may be at risk of inbreeding depression and genetic drift (Keller and Waller 2002) such as the current wild sun parakeet population or prospective founder populations in reintroduction and species recovery projects. Detecting paternity allows for the construction of pedigrees which provide insight into mating strategies and reproductive success (Spong et al. 2008). Moreover, genetic analyses can reveal hybridization events and aid in identifying introgression zones (Barrowclough et al. 2005) and dispersal patterns (Broquet and Petit 2009).
Due to its non-invasive nature, the use of DNA from samples collected in the environment, such as feces, is becoming a key component in the toolbox of ecologists and environmental managers (Taberlet et al. 2018), particularly for the study and conservation of vulnerable species. However, noninvasively collected samples are often degraded because of their exposure to the elements and can be difficult to work with. Single-nucleotide polymorphisms (SNPs) require only short intact sequences (typically 50-70 bp) for successful amplification which makes them especially suitable genetic markers for working with degraded DNA (Morin and McCarthy 2007). They have been shown to perform better in genotyping from highly fragmented DNA than microsatellites (von Thaden et al. 2017). Because more loci can be genotyped at comparable cost, SNPs are also an economical choice that provides high resolution genotypes for confident assignment of individuals (Blåhed et al. 2018). Moreover, the bi-allelic nature of SNPs provides simplified genotyping that is less prone to errors (Vignal et al. 2002) and technical artifacts such as null alleles (Norman et al. 2013).
In this study, we used blood samples obtained from the wild sun parakeet population in Guyana to develop de novo a set of 86 high quality SNPs (83 autosomal an 3 mitochondrial) for genotyping individuals and tested the performance of the panel on an array of sample types (feathers, feces, beak and cloacal swabs) from a captive population. We prioritized separation of individuals and inference of relatedness by targeting SNPs with high minor allele frequencies (MAF) and verified relatedness results within known pedigrees from the captive population. Using experimental contaminations, we investigated the effects of contamination from conspecifics and other species that are likely to affect non-invasive samples and describe a method to detect such samples. We discuss the suitability and limitations of our SNP panel for future field studies and conclude with a number of suggested applications that will aid in the conservation of the endangered sun parakeets in Guyana and beyond.

Sample collection
A total of 10 (4 and 6) sun parakeets were caught by experienced community members at two roosting sites in the vicinity of Karasabai (4° 2′ 0′′ N, 59° 32′ 0′′ W), a Macushi Amerindian community in the North Rupununi region of southwestern Guyana (Fig. 1). The area contains the only known wild population of sun parakeets in Guyana and is characterized by a largely still pristine mosaic of savannas, marshes, and rainforest covered hills that form part of the Pakaraima mountain range. Approximately 100 μl of blood were collected from the brachial vein of each bird (Owen 2011) and stored on Whatman FTA cards (GE Healthcare, USA) according to the manufacturer's instructions. The capture and blood sampling were carried-out during November 2016 in accordance with the research permit issued by the Guyana Environmental Protection Agency (No. 112316BR020). After the blood extraction, birds were observed for 48 h to assure no negative effects before being released back into their flocks near the capture sites.
Additional samples of blood, feathers, feces, beak and cloacal swabs from 8 sun parakeets and blood samples from 4 other neotropical psittacines (Myiopsitta monachus, Pionites melanocephalus, Pionites leucogaster, and Bolborhynchus lineola) were obtained from a private breeder in Saxony, Germany. The additional blood samples were also stored on Whatman FTA cards. Beak and cloacal swabs were taken with FLOQSwabs (Copan Flock Technologies, Italy) and fecal samples collected on sterile pieces of gauze. Feather samples were plucked with sterile tweezers from the breast plumage. All nonblood samples were then placed into sterile, airtight scintillation tubes (20 ml) filled with silica gel desiccant (1-3 mm, Merck KGaA, Germany) and stored at room temperature until further processing (DeMay et al. 2013;Taberlet et al. 2018).

DNA extraction
DNA extraction from blood samples, beak swabs, and cloacal swabs was automized on a QIAsymphony SP platform using the DNA minikit (Quiagen, Hilden, Germany) for blood samples and the DNA investigator kit (Quiagen, Hilden, Germany) for beak and cloacal swabs according to the manufacturer's protocols. DNA from feathers was extracted with the Quiagen DNeasy Blood & Tissue Kit following the protocol for purification of total DNA from feathers (available at: https ://www.qiage n.com/de/ resou rces/). Extraction of DNA from avian feces can be challenging due to high concentrations of uric acid (Vo and Jedlicka 2014). We used the QIAamp cador Pathogen kit (Qiagen, Hilden, Germany) according to the protocol suggested by Eriksson et al. (2017). DNA yield and purity were assessed with spectrophotometry (NanoDrop, Thermo Fisher Scientific, USA) and extracts stored at − 20 °C.

Sex identification
As is the case for many birds, the sex of sun parakeets cannot be determined visually. We therefore sexed the Guyanese samples using DNA amplification with primer pair 2550F/2718R which has been shown to work for numerous bird species, including psittacines (Sulandart and Zein 2012;Çakmak et al. 2017). In this approach, the CHD region of the Z and W chromosome are amplified, resulting in PCR products of different length. Males produce one band at CHD-Z and females two bands, corresponding to CHD-W and CHD-Z (Sulandart and Zein 2012). We PCR amplified each of the 10 samples in a total reaction volume of 5 μl consisting of 0.5 μl DNA extract (160-270 ng/μl), 0.15 μl of 10 μM each of the forward and reverse primer, 1.65 μl of nuclease free water, 2.5 μl of 2*QIAGEN Multiplex PCR Master Mix, and 0.1 μl of 5*Q-solution. PCR consisted of 1 cycle of 95 °C for 15 min; 35 cycles of 95 °C for 30 s, 50 °C for 90 s, 72 °C for 60 s; and a final elongation at 72 °C for 5 min. Sex was then assigned through visual inspection of PCR products after electrophoresis on a 1.5% agarose gel. For the samples from captive birds, the sex was provided by the breeder.

DNA sequencing
For sequencing we followed a reduced representation library approach. DNA samples were digested with the EcoRI restriction enzyme at 37 °C for 3 h and at 8 °C overnight (ca. 15 h) before heat inactivation at 80 °C for 5 min. Average post-digestion fragment size was estimated to be 1500 bp based on gel electrophoresis and we chose a size range of 400-700 bp for sequencing. Paired-end library preparation (2 × 125 bp) and RAD-sequencing on an Illumina HiSeq2500 platform (HiSeq Control Software 2.2.58/ RTA 1.18.64) were carried out by the National Genomics Infrastructure (NGI), SciLifeLab, Stockholm.

SNP detection
The RAD sequences were quality filtered using the pro-cess_radtags script from Stacks (Version 1.48; Catchen et al. (2011)) using the clean, quality and rescue options. The resulting reads were then aligned, and SNP variants were identified and catalogued for each individual using ustacks, and cstacks from Stacks. A custom R script (R Core Team 2017) was used to filter for high quality SNPs with a high minor allele frequency, useful for individual identification and assessing the pairwise relatedness between individuals (custom script available at https ://githu b.com/anita jnorm an/snp_filte r). Assays were developed for the selected SNP panel and then tested on samples using the Fluidigm Biomark HD (San Francisco, 1 3 USA). We used a 1:100 dilution of DNA templates and SNP genotyping was carried-out according to the manufacturer's protocol (PN 100-3912 D1; available at www. fluid igm.com) for the loading (IFC Controller HX) and PCR (Biomark HD, program SNPtype 96.96 v1) workflows. Previous testing in our lab identified environmental samples of low DNA concentration to be more successfully genotyped if the number of pre-amp cycles is increased. Thus, each non-blood sample type was run twice with two different numbers of cycles in the preamplification process (25 and 40), whereas blood samples were run with the standard 14 pre-amplification cycles. Genotypic clusters were automatically demarcated and then manually screened and adjusted for accuracy using the Biomark Genotyping Analysis software (version 4.5.1.).

Sample types, consensus genotype and SNP categories
To test the efficacy of various invasive and non-invasive sampling methods, each individual sun parakeet from Germany was sampled multiple times to obtain blood, beak swab, cloacal swab, feather and fecal samples. Consensus genotypes were identified from blood samples. The remaining genotypes from all sampling methods belonging to one individual were compared to the consensus genotypes. Any deviations for each SNP were tallied under the following categories (Table 1): Samples with more than ten 'No Calls' were removed from the analysis as they were deemed unreliable.

Contamination
Since future monitoring scenarios using non-invasive sampling techniques pose a risk of contamination from multiple individuals, we included three types of experimental contaminations in the SNP genotyping to be able to identify them in the future. The experimental contaminations included: 1. Drop contamination unequal DNA concentrations (~ 90/10) of two known individual sun parakeets; 2. Equal equal DNA concentrations (~ 50/50) of two known individual sun parakeets; 3. Other species equal concentrations of one sun parakeet and one Pionites melanocephalus.
Consensus genotypes from two known samples were compared to the contaminated sample and were tallied under the following categories (Table 2):

Genetic differentiation
We used the R package adegenet (v. 2.1.1) to assess genetic differentiation using Discriminant Analysis of Principal Components (DAPC) on the consensus genotypes (Jombart 2008).

Relatedness
The R package related (Pew et al. 2015), which is based on the software Coancestry (Wang 2011), was used to estimate pairwise relatedness for all pairs of samples. The individuals   Table 1 for details) from Germany form a family group within a captive breeding program with one breeding pair and the remaining individuals being offspring of this breeding pair. Pairwise relatedness estimates were used to genetically confirm the relationships. Prior to selecting the coefficient of relatedness estimator, we ran a statistical power analysis three times to identify the coefficient of relatedness most suitable for this dataset using 999 iterations.

SNP detection
Sequences were quality filtered and aligned using Stacks software (Catchen et al. 2011). A total of 24,565 SNPs were identified and were pared down to 96 (93 autosomal and 3 mitochondrial) using the following filters: one SNP per read; at least 40 base pairs in the flanking sequence (i.e., on each side of the SNP); present in all 10 individuals; and with high minor allele frequencies (MAF > 0.3).

SNP testing
Of the 96 SNPs, 10 were removed due to poor clustering, leaving 86 SNPs (83 autosomal and 3 mitochondrial; Online Resource 1). The three mitochondrial SNPs were retained, but not included in the analyses that follow. Table 3 shows the characteristics of the SNPs based on the sample groups.

Sample type, consensus genotype and SNP categories
We included one replicate for one blood sample and detected no discrepancies in all 86 loci. All 86 loci were successfully typed in each replicate. Thus, we expect our error rate to be < 0.001. For all blood samples excluding the replicate (n = 17; 1462 loci), we detected zero 'No Calls' and eight 'Invalid' loci for a call rate equal to 0.9945.
A total of 26 non-blood samples from the captive (German) sun parakeets were genotyped to identify their usefulness in future sampling scenarios. Sample types included beak swabs (n = 7), cloacal swabs (n = 7), feathers (n = 8), and fecal (n = 4). Each sample type was run in duplicates with different amplification cycles (25 and 40), resulting in a total of 52 replicates.
Of these, 17 replicates were removed from analysis due to low amplification success which was likely due to poor quality and low quantity of DNA. The removed replicates largely corresponded to beak and cloacal swabs (Fig. 2a).
There were higher numbers of failed samples that were run with only 25 cycles (63%). Of the remaining 35 replicates, 340 loci (11.3%) were incongruous with the consensus genotype and fell into the categories listed in Table 1 (Fig. 2b). Feathers had a relatively high rate of incongruous genotypes, most of which were identified as contamination (homozygous consensus genotype and a heterozygote deviant genotype).

Contamination
With contaminated samples, we would expect higher levels of heterozygosity (Jun et al. 2012) relative to uncontaminated samples due to pairing of different genotypes. Neither the '90/10' nor the 'Other Species' experimental contaminations showed elevated levels of heterozygosity (Fig. 3a). However, there were more invalidated SNPs (SNPs that did not fall within a cluster; n = 17) and SNPs that did not amplify (n = 12) relative to the uncontaminated individuals. When there was a discrepancy between the two SNP genotypes, in most cases, the resulting genotype was represented by the focal individual (the one with the higher DNA concentration in the '90/10' scenario and the sun parakeet in the 'Other Species' scenario).
Contrastingly, the '50/50' experimental contamination showed very elevated levels of heterozygosity well outside the levels for uncontaminated samples (Fig. 3a). Additionally, whereas in the other experimental contaminations the focal individual was most often represented, in the '50/50' experimental contamination, it seemed quite random which genotype was represented between the two individuals ( Fig. 3b; n = 24 in A; n = 20 in B).

Genetic differentiation (DAPC)
To assess the genetic differentiation that exists in the various samples, we ran a Discriminant Analysis of Principal Components where we retained four principal components (PC) and two discriminant functions. Not unexpectedly, the clustering separated samples based on their origin or species delimitation (Fig. 4). The German and Guyana sun parakeet samples cluster separately, but remain close, whereas the other species are well-distinguished from the sun parakeets based on the second PC.

Relatedness
The power analysis revealed the Lynch-Ritland coefficient of relatedness as being most suitable based on our data with a mean correlation of 0.906 (SD = 0.002). The breeding pair had an r-value of 0.158 (Table 4), suggesting slight inbreeding (e.g., third-order relatives such as first cousins). Since the expectation was that all individuals except the breeding Fig. 2 a Of the samples that were retained, the blue bars show the percentage of loci that were not in accordance with the consensus genotype. The orange bars indicate the percentage of samples that were removed from analysis due a high rate of amplification failure. b Total number of discrepancies or error per SNP category ( pair were first order relatives (e.g., either parent-offspring or full sibling), the remaining values should be close to 0.5. This was found to be mostly the case (r mean = 0.495; SD = 0.142) although notably lower (r = 0.209) and higher (r = 0.672) values were also detected.

Discussion
The final panel of 83 autosomal SNPs allowed for reliable discrimination of individual sun parakeets and population assignment. The comparatively high MAF ( − x = 0.467) of the SNPs in the samples from the wild Guyanese target population and the levels of heterozygosity also suggests good discriminatory power of the SNP panel for additional individuals sampled during future monitoring exercises. The very low values for MAF and heterozygosity among the other psittacine species (Table 3) support the specificity of the identified SNPs to sun parakeets, i.e., indicate emergence of the SNPs after the phylogenetic separation between sun parakeets and other psittacines. Although not conclusive, these measures provide useful guidance for future field studies: Highly homozygous genotypes obtained from putative sun parakeet non-invasive samples probably originated from other psittacines and should therefore be discarded. Additionally, DAPC also showed distinct clustering of genotypes Fig. 4 Genetic differentiation based on a discriminant analysis (DA) of principal components (PC) (Jombart 2008). The axes represent the first two DAPCs. Guyana A (blue) and B (grey) correspond to two capture sites of wild sun parakeets in Guyana, German (yellow) refers to a sun parakeet breeding pair and offspring in captivity. The other psittacine species are marked in red based on species and origin (Fig. 4) and could be used to further verify species identification of future field samples. Potentially, the SNP panel presented here could be supplemented by species-specific SNPs which would require field surveys in Guyana to identify sympatric bird species and comparisons of their mitochondrial genomes to the sun parakeet. Such species-specific mitochondrial SNPs could be useful in cases of highly degraded non-invasive samples that do not yield sufficiently complete genotypes based on the autosomal markers to infer species. Similarly, while the PCR-based sexing used in this study is relatively fast and inexpensive it would be desirable to identify sex-specific SNPs by sequencing the W chromosome of the Guyanese sun parakeets. The suitability of the SNP panel for determining relatedness and reconstructing pedigrees could only be assessed using samples from captive birds from Germany. While the Lynch-Ritland relatedness values generally agreed well with the known relationships, we also found deviations (e.g., r = 0.209 for full-sibling pair BG; Table 4). Because the SNP panel was developed using Guyanese samples there may have been some ascertainment bias associated with the SNPs in the German samples which could have affected the r-values. In addition, relatedness coefficients have a higher degree of error (RSME) the lower the number of samples. In our case there were only eight samples suggesting that the r-values reported here can be off by 0.2 to 0.3 (Wang 2017) which corresponds well with the instances of observed deviance. However, mean relatedness between clearly unrelated captive (German) and wild (Guyana) sun parakeets was extremely low (r mean = − 0.18) as could be expected. Larger sample sizes of individuals with known pedigrees (e.g., from zoos) are needed for further testing but our results suggest that pedigree reconstruction should be possible from genotypes obtained by our SNP panel. For long-term monitoring of wild populations such estimates can be much improved if the sampling scheme includes annual nest checks which allow for recording the age and relatedness of individuals (i.e., the chicks in a nest) along with the genotype.
For the non-blood sample types, we found that 40 preamplification cycles yielded slightly better results than 25 cycles. Beak and cloacal swabs performed worst with 50 and 38 percent of samples failing to sufficiently amplify (Fig. 2a). This could be explained by the very low amounts of DNA obtained from swabs. Feathers and feces (the sample types that would be targeted during field collections of non-invasive samples) performed much better and all samples could be retained. Compared to the blood-based consensus genotype, the rate of incongruous genotypes in feathers was twice as high as in feces (14% of loci versus 7% respectively). Most of these instances indicated intraspecific contamination, i.e., a homozygous consensus genotype and a heterozygote deviant genotype. A possible explanation for this could be the high degree of sociability and gregariousness of sun parakeets which includes frequent mutual grooming and perching in close physical contact. DNA from one individual could thus easily be transferred to the feathers of another. Moreover, under the confined conditions of captivity the possibility of DNA transfers between individuals is likely to be higher than in the wild. Fecal samples were collected directly after observed defecation and were therefore less exposed to possible sources of contamination.
As is not unusual in environmental samples such as feathers and feces, dropout was a common source of discrepancy with the consensus genotype (Fig. 2b). Invalids or cluster discrepancies were also observed. They are likely to arise when samples are contaminated as in the case with some of the feathers when one of the two samples is a heterozygote and the other is a homozygote.
Our results suggest that feathers and feces might constitute viable non-invasive sources for monitoring sun parakeets, but researchers should be aware of the contamination risks.
The experimental contamination experiments showed that the effects of contamination from conspecifics or related species on the final sun parakeet genotypes depended on the magnitude and type of contamination. In both the 'Drop (90/10)' and 'Other species' scenarios the resulting genotype for most loci still corresponded to the focal individual and the level of heterozygosity did not differ from uncontaminated samples. At higher levels of conspecific contamination such as in the 'Equal (50/50)' scenario, the resulting genotypes corresponded randomly to either individual and heterozygosity was much higher than in uncontaminated samples (Fig. 3a). For field applications this suggests that non-invasive samples affected by slight conspecific contamination or contamination from other species will likely still result in viable genotypes. Higher levels of conspecific contamination can be identified by comparing heterozygosity levels and samples that fall outside the typical distribution be removed. Further research is needed to determine what levels of contamination will result in inaccurate genotypes.
Our findings suggest that the SNP panel presented here can be applied in the conservation, monitoring and management of the last remaining sun parakeet population in Guyana. Additional genotypes could be obtained from noninvasive samples (e.g., discarded feathers and feces) through citizen science collections involving local community members. This would allow for monitoring multiple aspects that are crucial for successful conservation programs such as effective and census population size, survival, inbreeding, mating strategies, breeding success, and relatedness. Moreover, the SNP panel also allows for easy genetic verification and source population assignment in cases of suspected illegal capture of sun parakeets and can also be applied in breeding programs for conservation and future reintroductions.