ORCID: 0000-0001-9082-3163.

Running title: GDA-NT 2021 computer program.

Word count: 1291.

Declarations.

Availability of software, data and material.

The program, a zip-file with different input data files for demonstration, different videos that explain the use of the program and the User’s manual are available on our website:

https://www.thuenen.de/en/institutes/forest-genetics/software/gda-nt-2021.

The conservation of plant and animal genetic resources relies on data on genetic diversity and genetic differentiation (Rodriguez-Quilon et al. 2016; Eusebi et al. 2020; Boccacci et al. 2021). Decisions on the selection of conservation units are often taken based on genetic inventories with genetic markers (Dudgeon et al. 2012; Boccacci et al. 2021). Further, the level of genetic relatedness of individuals is an important criterion for the evaluation of genetic resources (Welirnann and Bennewitz 2019). Relatedness can be estimated with the kinship-coefficient based on data of genetic markers (Han et al. 2020). In addition, genetic assignment is often used for conservation purposes, especially for law enforcement to protect species by checking the geographic origin or taxonomy of traded biological material such as seeds, timber, ivory or bush meat (Wasser et al. 2007; Degen et al. 2013).

GDA-NT stands for “Genetic data analysis and numerical tests”. The software computes various metrics of population fixation and differentiation using genetic data that are similar to other programmes, such as the Wright’s FIS and FST indices computed by Alequin 3.5 (Excoffier and Lischer 2010). Or it provides different genetic assignment criteria to assign or exclude reference populations as implemented in GeneClass (Piry et al. 2004). It also computes allele and genotype frequencies on different aggregation levels as integrated in the R-package pegas (Paradis 2010). However, GDA-NT 2021 has more options to compute exclusion-probabilities in assignment tests, and does enable self-assignment tests for groups of individuals with variable group sizes. In addition, it allows the calculation of alternative measures of population differentiation, such as the standardized FST (Hedrick 2005; Meirmans and Hedrick 2011) or Dj (Gregorius and Roberds 1986; Gregorius et al. 2007), which can also integrate geographic location information to select sub-populations. The application of these alternative measures of genetic differentiation is particularly useful to address questions of conservation genetics (Prunier et al. 2017; Attu et al. 2022; Nguyen et al. 2022). Figure 1 shows the application of Dj as an indicator to identify pedunculate oak populations in Germany that are likely of foreign origin and thus should be excluded from a conservation program.

Fig. 1
figure 1

Map visualising the genetic differentiation Dj(5) of 94 pedunculate oak locations in Germany screened at 356 nuclear SNPs. The map has been created with ArcMap 10.8 (ESRI). Each circle represents a location (stand) at which ten oak trees have been randomly sampled from the local population. This data is part of a range-wide study of pedunculate oaks in Europe (Degen et al. 2021) and has been analysed with GDA-NT 2021. Dj(5) is one of the spatially explicit measures. It computes the genetic distance of a given population to the five closest neighbour populations. The number of five neighbours has been selected according to the spatial distribution of sampled locations and represents in most cases a group of oak stands within a radius of less than 300 km. The distribution of the Dj(5)-values of the 94 locations is shown as a boxplot. The red marked stands are the 10% most differentiated ones. These extreme Dj(5)-values served as one indicator among others to identify suspicious, potentially non-local seed sources. Usually, oak populations that are not autochthones get excluded from conservation programs

ASCI text files with diploid genetic markers (e.g., nSNPs, nSSRs) or haploid genetic markers (e.g., cpSNPs, cpSSRs) are used as input files for GDA-NT. Alternative CSV files generated by other programs such as EXCEL and R can be imported, transformed and saved as input files. The results are automatically stored as text-files and optionally as csv-files for further data visualisation and downstream data analyses such as a detailed analysis of spatial genetic structures, e.g., using the software SGS (Degen et al. 2001) or principal component analysis (PCA) and cluster analysis based on allele frequencies with the program PAST (Hammer et al. 2001). An overview of the program features is given in Table 1. The program can handle data of up to a few hundred populations and a few hundred genetic markers (see as an example Degen et al. (2021)). GDA-NT 2021 is well suited for conservation genetics studies, where typical datasets involve the screening of many populations with a specifically selected subset of informative genetic markers. It is not developed for applications with large SNP arrays comprising thousands or millions of SNPs, but it can be used for a genetic quality check of pruned SNP sets drawn from such large SNP arrays.

Table 1 Features, methods and measures implemented in the program GDA-NT 2021

GDA-NT 2021 has been programmed in visual basic and compiled for the operating system Microsoft Windows (Windows 10 and earlier versions). The program, a zip-file with different input data files for demonstration, different videos that explain the use of the program and the user’s manual are available on our website:

https://www.thuenen.de/en/institutes/forest-genetics/software/gda-nt-2021.