GDA-NT 2021 – a computer program for population genetic data analysis and assignment

Data on genetic diversity and differentiation, as well as kinship between individuals, are important for the conservation of animal and plant genetic resources. Often genetic assignment is part of law enforcement of protected endangered species. The software GDA-NT 2021 is a new, freely available user-friendly Windows program that can be used to compute various measures of genetic diversity and population genetic differentiation. It further allows genetic assignment of individuals to populations and enables the calculation of kinship-coefficients and genetic distances among pairs of individuals within populations. GDA-NT 2021 specifically computes the alternative measures for population differentiation Dj and the standardized FST of Hedrick. It has more options to compute exclusion-probabilities in assignment tests, enables self-assignment tests for variable groups of individuals, and allows for information on geographic positions to be accounted for while using permutation tests to assess statistical significance.

as csv-files for further data visualisation and downstream data analyses such as a detailed analysis of spatial genetic structures, e.g., using the software SGS (Degen et al. 2001) or principal component analysis (PCA) and cluster analysis based on allele frequencies with the program PAST (Hammer et al. 2001). An overview of the program features is given in Table 1. The program can handle data of up to a few hundred populations and a few hundred genetic markers (see as an example Degen et al. (2021)). GDA-NT 2021 is well suited for conservation genetics studies, where typical datasets involve the screening of many populations with a specifically selected subset of informative genetic markers. It is not developed for applications with large SNP arrays comprising thousands or millions of SNPs, but it can be used for a genetic quality check of pruned SNP sets drawn from such large SNP arrays.
GDA-NT 2021 has been programmed in visual basic and compiled for the operating system Microsoft Windows (Windows 10 and earlier versions). The program, a zip-file with different input data files for demonstration, different videos that explain the use of the program and the user's manual are available on our website: allele and genotype frequencies on different aggregation levels as integrated in the R-package pegas (Paradis 2010). However, GDA-NT 2021 has more options to compute exclusion-probabilities in assignment tests, and does enable self-assignment tests for groups of individuals with variable group sizes. In addition, it allows the calculation of alternative measures of population differentiation, such as the standardized F ST (Hedrick 2005;Meirmans and Hedrick 2011) or D j (Gregorius and Roberds 1986;Gregorius et al. 2007), which can also integrate geographic location information to select sub-populations. The application of these alternative measures of genetic differentiation is particularly useful to address questions of conservation genetics (Prunier et al. 2017;Attu et al. 2022;Nguyen et al. 2022). Figure 1 shows the application of D j as an indicator to identify pedunculate oak populations in Germany that are likely of foreign origin and thus should be excluded from a conservation program.
ASCI text files with diploid genetic markers (e.g., nSNPs, nSSRs) or haploid genetic markers (e.g., cpSNPs, cpSSRs) are used as input files for GDA-NT. Alternative CSV files generated by other programs such as EXCEL and R can be imported, transformed and saved as input files. The results are automatically stored as text-files and optionally . Each circle represents a location (stand) at which ten oak trees have been randomly sampled from the local population. This data is part of a range-wide study of pedunculate oaks in Europe (Degen et al. 2021) and has been analysed with GDA-NT 2021. D j (5) is one of the spatially explicit measures. It computes the genetic distance of a given population to the five closest neighbour populations. The number of five neighbours has been selected according to the spatial distribution of sampled locations and represents in most cases a group of oak stands within a radius of less than 300 km. The distribution of the D j (5)-values of the 94 locations is shown as a boxplot. The red marked stands are the 10% most differentiated ones. These extreme D j (5)-values served as one indicator among others to identify suspicious, potentially non-local seed sources. Usually, oak populations that are not autochthones get excluded from conservation programs Funding The author did not receive support from any organization for the submitted work.

Conflicts of interest/Competing interests
The author has no relevant financial or non-financial interests to disclose.Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/. https://www.thuenen.de/en/institutes/forest-genetics/ software/gda-nt-2021.  Gregorius (1987) Evenness of allele frequencies (E) Gregorius (1990) Observed heterozygosity (Ho) Expected heterozygosity (He) Fixation index (F IS ) Wright (1978) Degree of heterozygosity Genetic differentiation Genetic distance (DN) Nei (1972) Genetic distance (GD) Gregorius (1974) Differentiation of populations (Dj) Gregorius and Roberds (1986) Population fixation (F ST ) Wright (1965) Standardized population fixation (F ST_H )