Although it is important to maintain high levels of heterozygosity and allelic diversity in broodstock (Lande and Shannon 1996; Hansen et al. 2000), genetic variation within hatcheries-dependant populations may be decreasing due to progressive elimination of allelic diversity (Koljonen et al. 1999, 2002; Verspoor et al. 2005). Transfer of genetic variation to the next generation will be reduced if genetically similar individuals are paired because their genetic profiles are unknown. To create breeding pairs who are as genetically different as possible, genetic profiles can be prepared using highly polymorphic fragments of DNA such as microsatellites. Unfortunately, the calculations involved are time-consuming, especially with animals with a tetrasomic genome, and there has been no software available for this task. To address this need, Genassemblage was constructed. It can be used for animals with diploid or tetraploid genomes, and animals with a diploid genome containing tetraploid fragments, e.g. cyprinids, salmonids and sturgeons.

Genassemblage estimates the expected genetic variation of the offspring of different combinations of individuals. It calculates expected values for the heterozygosity of the potential offspring of each pairing, the number of different alleles the offspring would inherit, and, at tetrasomic loci, the percentage of “weak heterozygotes” [individuals with three identical alleles and a fourth allele that differs within a tetrasomic locus, e.g. AAAB (Kaczmarczyk and Fopp-Bayat 2012)]. To identify the best combinations, the values can be weighted based on their importance, then used to calculate a “v index” indicating the best combinations.

Optimal breeding pairs are selected based on the following criteria and assumptions:

  1. A.

    The sex of the individuals is known; each individual is fertile and ready for reproduction.

  2. B.

    All individuals are marked and identifiable.

  3. C.

    The probability of two identical genotypes in unrelated individuals is very small.

  4. D.

    Individuals with a higher frequency of heterozygous genotypes at the analysed genetic markers are more likely to have heterozygous genotypes in loci that are important for viability and adaptation (Shikano and Taniguchi 2002; Olech 2003).

  5. E.

    If individuals have different alleles within the analysed loci, they are more likely to differ in loci that determine viability and adaptation than individuals with identical alleles.

Genassemblage 1.0 is a Windows-based program; it can be downloaded with a detailed user manual from http://pracownicy.uwm.edu.pl/d.kaczmarczyk/main_page.htm. The program can convert *.xls or *.dat files to *.arp. files, so data can be transferred from Genassemblage to Arlequin 3.5 (Excoffier and Lischer 2010), and to MSA (Dieringer and Schlötterer 2003).

Figure 1 shows the user interface. It allows the user to load data, choose the values that Genassemblage will calculate, and convert files to .arp format. The input data are the genetic profiles of the investigated individuals. Each profile includes the individual’s population group and population name, its tag number and sex, and a list of the alleles of investigated markers detected in its genome. These markers should have the following characteristics:

Fig. 1
figure 1

The Genassemblage user-interface

  1. A.

    a high level of polymorphism,

  2. B.

    inheritance on autosomal chromosomes following Mendelian Laws,

  3. C.

    neutrality for evolution (not subject to natural selection).

The expected heterozygosity of the offspring is calculated from disomic and tetrasomic markers for which there are complete genotyping results for both potential parents. A locus for which there is no data from one or both parental individuals is excluded from the calculations for this breeding pair. The following equation is used

$$H = 1 - \frac{{(ph_{1} + ph_{2} + ph_{3} + \cdots ph_{n} )}}{nl}$$
(Algorithm 1)

Algorithm 1. Calculation of the expected heterozygosity of offspring in which ph n is the expected share of homozygous genotypes in the offspring of a specific breeding couple within the nth loci, and nl is the total number of analysed loci.

To calculate the number of different alleles that potential offspring would inherit, all the alleles found in the markers (na n ) are summed (Algorithm 2). The calculations include all the alleles found within the di- and tetrasomic markers in a specific breeding couple.

$$ar = \sum {(na_{n} } )$$
(Algorithm 2)

Algorithm 2. Calculation of potential allelic diversity in the offspring of a specific couple.

The expected proportion of weak heterozygote loci in offspring is calculated only for tetrasomic markers using Algorithm 3, where wh is the probability that the locus n is a “weak heterozygote. The values of probability (pwh) are calculated assuming that alleles of the tetrasomic fragment are located on four independently inherited homologous chromosomes, gametes conjugate randomly, and the frequency of individual genotypes in the offspring is not changed by natural selection.

$$wh = \frac{{(pwh_{1} + pwh_{2} + pwh_{3} + \cdots pwh_{n} )}}{nl}$$
(Algorithm 3)

Algorithm 3. Calculation of the expected proportion of “weak heterozygotes”.

Figure 2 shows how the results of the above calculations are displayed. For estimated heterozygosity and allelic diversity; higher values are better; to reduce the number of homozygotic individuals in the following generations, a lower number of “weak heterozygotes” is better. These values can be weighted to calculate a “v index” (see the Genassemblage manual) that indicates the best parental combinations.

Fig. 2
figure 2

The expected heterozygosity, allelelic diversity and percentage of “weak heterozygotes” in progeny of potential spawning pairs. Yellow (here, light grey), indicates incomplete data for one parent. Orange (here, dark grey) indicates incomplete data for both parents

Genassemblage can also be used for other tasks related to managing genetic variation in breeding stocks or human dependant populations.