Introduction

Multi-environment trials of potato breeding clones along with released cultivars grown by farmers aim to get accurate results for target productivity and quality traits, which calls for maximum control of unexplained variability within a data sample, e.g. due to soil heterogeneity (Terman et al. 1967) or weather. Variability often decreases when adding replications, testing sites or years, using an appropriate experimental design or considering spatial data analysis for adjusting plot results.

Uniformity trials were used for determining shape and size for field experiments in potato (Justesen 1932). Plot shape effect seems to be minor in trials testing potato hybrids (Stockem et al. 2022). Fisher (1970) argued the degree of precision of a trial for estimating any mean depends on the replication number. In this regard, Kalamkar (1932) noted that increasing plot size decreased trial efficiency; i.e., more replications of smaller plots are better than a small number of larger plot, especially if larger plot sizes do not reduce significantly the trial variability. Furthermore, Caligari et al. (1985) indicated that the most efficient design for yield trials in potato may include a single drill or plant with as many replicates that can be managed by a breeder. However, as noticed by Bos (1983), increasing replication number, decreases the number of testing accessions in a trial, thereby counterbalancing the improvement of selection response expected from more intensive germplasm testing. Furthermore, Aikman and Langton (1983) indicated that replications had a marginal effect under high selection intensity for low heritability traits.

Although the experiment accuracy depends on the number of both testing accessions and replications, it seems that the optimum plot size for assessing total tuber weight ranges from eight to 12 hills or plants, for breeding clones (Bisogninda et al. 2006), and hybrids (Stockem et al. 2022), respectively. Guard rows are often included when the tuber yield of one plot affects that of the adjacent plot (e.g. in fertilizer trials), may be also used for cultivar testing (Mountier 1964), but at increasing cost. Nevertheless, Knight (1924) demonstrated that replicated single rows provide reliable results in potato’s field experiments. Blocking improves the efficiency of potato cultivar trials (Mountier 1985). Lattice or incomplete block designs are preferred when the breeding clones or cultivars included for testing are large (> 20).

Enhancing accuracy in germplasm testing leads to an enlarged heritability, which results in increasing the expected response to selection. We may use heritability estimates along with field plot techniques (Vallejo and Mendoza 1992) to improve multi-environment testing of potato breeding clones along with released cultivars, particularly when the genotype-by-environment interaction affects productivity and quality traits in this crop (Yildirim and Çalişkan 1985). For example, heritability estimates confirmed the efficiency of unilateral sexual polyploidization for multi-trait selection and progeny testing in potato breeding (Ortiz et al. 1991). Hence, the objective of this research was to determine the minimum number of replications, testing sites and years for potato multi-environment trials based on the use of broad-sense heritability estimates. In this way, we will be able to optimize potato breeding trial efficiency in the cultivar pipeline.

Materials and methods

Data from multi-site trials over years of the Svenska potatisförädling run by the Swedish University of Agricultural Sciences (SLU, Alnarp, Sweden) were used for this research. The trials included up to 256 breeding clones and released cultivars grown by EU farmers (https://hdl.handle.net/11529/10548617) that underwent testing at Skåne (Helgegården and Mosslunda) and Norrland (Umeå) regions of Sweden in 2020 and 2021 (Table 1). The trials used simple lattice designs with two replications of 10-plant plots. Helgegården and Mosslunda are potato producing sites near Kristianstad (56° 01′ 46″ N  14° 09′ 24″ E) in southern Sweden, while Umeå (63° 49′ 30″ N  20° 15′ 50″ E) is in northern Sweden. In each site crop husbandry practices were the same as those used in potato farming. Fungicide sprays against the oomycete Phytophthora infestans were made only in Helgegården to avoid late blight throughout the growing season. This treatment was used to achieve tuber yield potential at this testing site.

Table 1 Number of advanced potato breeding clones and cultivars planted at three sites in Sweden over 2 years

The characteristics evaluated were total tuber yield in 10-plant plot (kg), tuber weight (kg) by size (< 40 mm, 40–50 mm, 50–60 mm, > 60 mm;) in the 10-plant plot, while percentage of starch in the tuber flesh was calculated after determining specific gravity at harvest (Schippers 1976). Potato glucose strip tests were used for measuring reducing sugars in the tuber flesh (Mann et al. 1991). Host plant resistance to P. infestans was evaluated over 2 years solely in Mosslunda, where the pathogen is ubiquitous and causes high late blight severity, using the area under disease progress curve (AUDPC, Fry 1978).

Analyses of the trials in each and across environments were done with META-R (Alvarado et al. 2020), which also estimated the best linear unbiased predictors (BLUPs) for the eight evaluated traits considering both the testing germplasm, sites, and years as random samples of their respective populations.

Biometrical modeling

Single-site year model

The response of the ith cultivar on the rth replicate within the bth incomplete block nested within a replicated is represented as \(y_{irb}\) in following Eq. (1):

$$y_{irb} = \mu + C_{i} + R_{r} + IB\left( R \right)_{b\left( r \right)} + e_{irb}$$
(1)

where \(\mu\) is the overall mean, \(C_{i}\) is the random effect of the ith cultivar assumed to have an independent and identical distribution (iid) that is normal with mean zero and variance \(\sigma_{C}^{2} ,\) that is, \(C_{i} \mathop \sim \limits^{iid} N\left( {0, \sigma_{C}^{2} } \right) \left( {i = 1,2, \ldots ,I} \right),\;{\text{and}}\;R_{r}\) is the random effect of replicates with iid normal distribution and variance \(\sigma_{R}^{2} ,R_{r} \mathop \sim \limits^{iid} N\left( {0, \sigma_{R}^{2} } \right) (r = 1,2, \ldots ,R)\). The incomplete blocks nested within replicate are considered a random effect iid with normal distribution with mean zero and variance \(\sigma_{IB\left( R \right)}^{2} \;{\text{such}}\;{\text{that}}\;IB\left( R \right)_{b\left( r \right)} \mathop \sim \limits^{iid} N\left( {0, \sigma_{IB\left( R \right)}^{2} } \right) \left( {b = 1,2, \ldots , B} \right)\). The random residual error is \(e_{irb} \mathop \sim \limits^{iid} N\left( {0, \sigma_{e}^{2} } \right)\) with variance \(\sigma_{e}^{2}\).The variance component estimations of this model are given in Table 2.

Table 2 Variance components (genetic [σ2G] and residual [σ2e]) and broad sense heritability (H2) for potato tuber weight (kg 10-plant plot), percentage of starch in the tuber flesh, reducing sugars and host plant resistance to late blight (measured by the area under disease progress curve, AUDPC) for 2-year (1: 2020, 2: 2021) multi-environment testing as determined for breeding clones and released cultivars in three distinct sites in Sweden

Multi-environment model

The response of the ith cultivar on the rth replicate within the jth environment and on the bth incomplete block nested within replicate and the jth environment is represented as \(y_{ijrb}\) in Eq. (2)

$$y_{ijrb} = \mu + C_{i} + E_{j} + R\left( E \right)_{r\left( j \right)} + IB\left( {R,E} \right)_{{b\left( {rj} \right)}} + \left( {CE} \right)_{ij} + e_{ijrb}$$
(2)

where the random effect of cultivar is \(C_{i} \mathop \sim \limits^{iid} N\left( {0, \sigma_{C}^{2} } \right)(i = 1,2, \ldots ,I)\) with cultivar variance component \(\sigma_{C}^{2}\), and the random effect of environment (location-year combination is \(E_{j} \mathop \sim \limits^{iid} N\left( {0, \sigma_{E}^{2} } \right)\left( {j = 1,2, \ldots ,J} \right)\) with environment variance component \(\sigma_{E}^{2}\). The random effects of replicated nested within environments are described as \(R\left( E \right)_{r\left( j \right)} \mathop \sim \limits^{iid} N\left( {0, \sigma_{R\left( E \right)}^{2} } \right) \left( {r = 1,2, \ldots ,R} \right)\) with variance component \(\sigma_{R\left( E \right)}^{2}\), while the random effect of incomplete block nested within replicate and environment is described as \(IB\left( {R,E} \right)_{{b\left( {rj} \right)}} \mathop \sim \limits^{iid} N\left( {0, \sigma_{{IB\left( {R,E} \right)}}^{2} } \right) \left( {b = 1,2, \ldots , B} \right)\) with variance component \(\sigma_{{IB\left( {R,E} \right)}}^{2}\). The interaction effect of the cultivar × environment is described as \(CE_{ij} \mathop \sim \limits^{iid} N\left( {0, \sigma_{CE}^{2} } \right)\) with interaction variance component \(\sigma_{CE}^{2}\) and random residual that is defined as \(e_{ijrb} \mathop \sim \limits^{iid} N\left( {0, \sigma_{e}^{2} } \right)\) variance component \(\sigma_{e}^{2}\). Variance components of this models are presented in Table 3.

Table 3 Variance components (genetic [σ2G], genetic × year [σ2GY], genetic × environment [σ2GE]Z, and residual [σ2e]), and broad-sense heritability (H2) for potato tuber weight (kg 10-plant plot), percentage of starch in the tuber flesh, reducing sugars and host plant resistance to late blight (measured by the area under disease progress curve [AUDPC] only in stress-prone site) estimated using 2-year multi-environmental testing at late blight-prone site, across two sites (yield potential and stressful) over 2 years in Skåne (Sweden), and across three sites (yield potential, late-blight prone, and very long days) in southern and northern Sweden

Multi-site over years model

The response of the ith cultivar on the jth site, the mth year, the rth replicate within site and year, and the bth incomplete block nested within replicated site and year is represented is represented as \(y_{ijmrb}\) in below Eq. (3)

$$\begin{aligned} y_{i,j,m,r,b} & = \mu + S_{j} + M_{m} + \left( {SM} \right)_{jm} + R\left( {SM} \right)_{{r\left( {jm} \right)}} + IB\left( {RSM} \right)_{{b\left( {jmr} \right)}} \\ & \quad + C_{i} + \left( {CS} \right)_{ij} + \left( {CM} \right)_{im} + \left( {CSM} \right)_{ijm} + e_{ijmrb} \\ \end{aligned}$$
(3)

where the random effect of the site is represented as \(S_{j} \mathop \sim \limits^{iid} N\left( {0, \sigma_{S}^{2} } \right)(j = 1,2, \ldots ,J)\) with variance component \(\sigma_{S}^{2}\), the random effect of the year is \(M_{m} \mathop \sim \limits^{iid} N\left( {0, \sigma_{m}^{2} } \right)(m = 1,2, \ldots ,M)\) with year variance component as \(\sigma_{m}^{2}\), and the random interaction effect of site × year is \(\left( {SM} \right)_{jm} \mathop \sim \limits^{iid} N\left( {0, \sigma_{SM}^{2} } \right)\) with interaction variance component \(\sigma_{SM}^{2}\). The random effect of replicated nested within site and year is assumed as \(R\left( {SM} \right)_{{r\left( {jm} \right)}} \mathop \sim \limits^{iid} N\left( {0, \sigma_{{R\left( {SM} \right)}}^{2} } \right)\) (r = 1,2,…,R) with variance component of \(\sigma_{{R\left( {SM} \right)}}^{2}\), while the random effect of the incomplete blocks nested within replicate site and year is defined as \(IB\left( {RSM} \right)_{{b\left( {jnr} \right)}} \mathop \sim \limits^{iid} N\left( {0, \sigma_{{IB\left( {RSY} \right)}}^{2} } \right) \left( {b = 1,2, \ldots ,B} \right)\) with variance component \(\sigma_{{IB\left( {RSY} \right)}}^{2}\). The random effects of cultivar is denoted as \(C_{i} \mathop \sim \limits^{iid} N\left( {0, \sigma_{C}^{2} } \right)\) (i = 1,2,…I) with variance component \(\sigma_{C}^{2}\), and the random effect of the interaction of cultivar × site is described by \(\left( {CS} \right)_{ij} \mathop \sim \limits^{iid} N\left( {0, \sigma_{CS}^{2} } \right)\) with variance component \(\sigma_{CS}^{2}\); the random effect of the interaction of cultivar × year is assumed \(\left( {CM} \right)_{im} \mathop \sim \limits^{iid} N\left( {0, \sigma_{CM}^{2} } \right)\) with variance component. The random effect of the three-way interaction of cultivar × site × year is assumed \(\left( {CSY} \right)_{ijy} \mathop \sim \limits^{iid} N\left( {0, \sigma_{CSY}^{2} } \right)\) with variance component \(\sigma_{CSY}^{2} ,\) and the random residual is described as \(e_{ijyrb} \mathop \sim \limits^{iid} N\left( {0,\sigma_{e}^{2} } \right)\) with variance estimation of \(\sigma_{e}^{2}\). The variance components of this model are in Table 4.

Table 4 Variance components (genetic [σ2G], site [σ2L], year [σ2Y], site × year [σ2LY], genetic × site [σ2GS], genetic × year [σ2GY], genetic × site × year (σ2GLY), and residual [σ2e]), and broad-sense heritability (H2) for potato tuber weight (kg 10-plant plot), percentage of starch in the tuber flesh, reducing sugars and host plant resistance to late blight (measured by the area under disease progress curve [AUDPC] only in stress-prone site) estimated using 2-year multi-environmental testing across three sites in Sweden

Heritability estimates

BLUPs show a high predictive accuracy even when not including pedigree information (Piepho et al. 2008), and its efficiency has been already noted for selecting among segregating offspring for tuber yield and specific gravity (Ticona-Benavente and da Silva Filho 2015). The combined analyses of variance (ANOVA) over the environments were possible due to the homogeneity of variance across each of the testing environments. The variance components for the testing germplasm and environments can be estimated using the expected mean squares of the ANOVA. Broad-sense heritability (H2), based on the plot means for each of the six-testing environment (site–year) was estimated as:

$$H^{2} = \frac{{\sigma_{C}^{2} }}{{\sigma_{C}^{2} + {\raise0.7ex\hbox{${\sigma_{e}^{2} }$} \!\mathord{\left/ {\vphantom {{\sigma_{e}^{2} } R}}\right.\kern-0pt} \!\lower0.7ex\hbox{$R$}}}}$$
(4)

in which \(\sigma_{C}^{2}\), \(\sigma_{e}^{2}\) and \(R\) were the genetic variance, the residual variance, and the number of replications (= 2), respectively. H2 based on the plot means across testing environments was estimated for seven tuber traits as:

$$H^{2} = \frac{{\sigma_{C}^{2} }}{{\sigma_{C}^{2} + {\raise0.7ex\hbox{${\sigma_{CE}^{2} }$} \!\mathord{\left/ {\vphantom {{\sigma_{CE}^{2} } E}}\right.\kern-0pt} \!\lower0.7ex\hbox{$E$}} + {\raise0.7ex\hbox{${\sigma_{e}^{2} }$} \!\mathord{\left/ {\vphantom {{\sigma_{e}^{2} } {ER}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${ER}$}}}}$$
(5)

in which \(\sigma_{CE }^{2}\) is the genotype × environment variance, and \(R\) and \(E\) are the number of replications and environments, respectively. Variance components and their interactions were further estimated independently for sites and years to estimate H2 for productivity and quality traits as follows:

$$H^{2} = \frac{{\sigma_{C}^{2} }}{{\sigma_{C}^{2} + {\raise0.7ex\hbox{${\sigma_{CS}^{2} }$} \!\mathord{\left/ {\vphantom {{\sigma_{CS}^{2} } S}}\right.\kern-0pt} \!\lower0.7ex\hbox{$S$}} + {\raise0.7ex\hbox{${\sigma_{CM}^{2} }$} \!\mathord{\left/ {\vphantom {{\sigma_{CM}^{2} } Y}}\right.\kern-0pt} \!\lower0.7ex\hbox{$Y$}} + {\raise0.7ex\hbox{${\sigma_{CSM}^{2} }$} \!\mathord{\left/ {\vphantom {{\sigma_{CSM}^{2} } {LY}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${LY}$}} + {\raise0.7ex\hbox{${\sigma_{e}^{2} }$} \!\mathord{\left/ {\vphantom {{\sigma_{e}^{2} } {LYR}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${LYR}$}}}}$$
(6)

in which \(\sigma_{CL }^{2}\), \(\sigma_{CY }^{2}\) and \(\sigma_{GLM }^{2}\) are the genotype × site, the genotype × year, and the genotype × site × year interactions, respectively, while L is the number of testing sites (= 3) and M is the number of years (= 2).

The minimum number of replications, sites and years can be determined using the estimated variance components from the data. These variance components for tuber weight, percentage of starch in the tuber flesh and reducing sugars in the tuber flesh were used to estimate H2 assuming they were stable while the denominator coefficients L, Y and R could vary. Schutz and Bernard (1967) and Ortiz et al. (2008) used a very similar approach with the phenotypic variance (instead of H2) estimates to examine the influence of experimental design on results in future experiments testing soybean and maize germplasm. The minimum option is given by the least number of L, Y and R that will not affect H2 estimates. Furthermore, a curve resulting from plotting the number of environments (sites or years) or replications in the horizontal axis and H2 estimates in the vertical axis was used to allow visualizing the critical point in which this curve starts to plateau (Duma et al. 2020); i.e., beyond this point an increase in the number of testing environments provides only a negligible gain in precision.

Results

There were highly significant differences (P < 0.001) among BLUPs for all productivity and quality characteristics in each and across testing environments. Helgegården had, on average, the largest tuber harvests in 10-plant plots (14.2 kg in 2021, 10.83 kg in 2022), while the lowest were, on average, in the late blight prone Mosslunda (7.915 in 2020, 6.512 in 2021), and in Umeå (7.192 in 2020, 7.567 in 2021). The AUDPC BLUPs for host plant resistance to late blight were highly significant (P < 0.001) among genotypes but not (P > 0.05) across both years at Mosslunda (234.1 in 2020, 240.9 in 2021). The percentage of starch in the tuber flesh was highest in Helgegården (above 14%), thereafter in Mosslunda (between 12 and 13.5%) and the lowest in Umeå (below 10.5%). Reducing sugars vary significantly across sites and years (ranging on average from 1 in Umeå 2021 to 3.4 in Mosslunda 2020). In Umeå, the percentage of tuber weight according to size was very similar across years: 21% for below 40 mm, 41% for 40–50 mm, 26% for 50–60 mm, and 12% for above 60 mm. Tubers were larger in the testing sites at Skåne than Norlland; i.e., on average above 2/3 of the total weight for tubers above 50 mm in Helgegården and about 3/5 in Mosslunda. The genotype by environment interaction (GEI) was also highly significant (P < 0.001) for all characteristics.

The highest H2 estimates for each site (Table 2) were mostly for percentage of starch in the tuber flesh (0.85–0.98). The AUDPC due to late blight had high H2 estimates (0.87–0.93) in both years at Mosslunda. Total tuber weight per plot had also a high H2 in each testing environment (ranging from 0.72 in the highest yielding Helgegården 2021 to 0.92 in the low yielding Umeå 2021. H2 estimates, on average for tubers below 40 mm or above 60 mm were greater than those for the other two tuber sizes. Reducing sugars in the tuber flesh had the lowest H2 estimates on average.

Heritability estimates were larger for weight of tubers below 40 mm and reducing tubers in the flesh when including more testing sites (Table 3). H2 decreases slightly for total tuber yield and percentage of starch in the tuber flesh when adding two more testing environments (Fig. 1). There were no H2 trends according to the number of testing environments for weight of tubers with 40–50 mm and 50–60 mm sizes (Table 3), while the H2 for AUDPC due to late blight over years (0.86) was smaller than those estimated in each year at Mosslunda.

Fig. 1
figure 1

Broad-sense heritability estimates according to the number of testing environments for percentage of starch in the tuber flesh in red, tuber weight (10-plant plot) in blue, and reducing sugars in grey using trials data after testing in 3 Nordic sites (northern and southern Sweden) over 2 years

Disaggregating the variance component of environment into testing sites and years led to smaller H2 estimates (Table 4) than those when lumping them together as environments. The highest H2 estimate, after disaggregating into testing sites and years, was again for percentage of starch in the tuber flesh (0.90), while the lowest were for weight of tubers with 40–50 mm (0.50) and 50–60 (0.38) mm sizes. Medium–high H2 were estimated for total tuber weight (0.76), and weight of tubers below 40 mm (0.72) or above 60 mm (0.74) and reducing sugars in the tuber flesh (0.72). The magnitude of the variance component for the genotype × location (σ2GL) interaction was larger than that of genotype × year (σ2GY) interaction for most tuber traits except the variable reducing sugars in the tuber flesh as measured by the sugar strip test. The variance component for the genotype × location × year (σ2GLY) was larger than the σ2GL and σ2GY for percentage of starch in the tuber flesh and weight of tubers below 40 mm and with 40–50 mm size, but smaller than the σ2GL for total tuber weight and weight of tubers 50–60 mm size and above 60 mm, and than σ2GY for reducing sugars in the tuber flesh.

Table 5 provides the results of simulating H2 when keeping unchanged the variance components for tuber weight, percentage of starch in the tuber flesh and reducing sugars in the tuber flesh (Table 4) but varying the number of testing sites, years, and replications. It appears clearly H2 estimates are larger that by increasing any of them but it will be most costly to run the multi-environment testing. Hence, the tabulated data allows detection of the plateau beyond which an increase in the number of testing sites, years and replications only will result in a negligible gain in the H2 estimate. Accordingly, it seems that multi-environment trials using incomplete block designs with two replications across two sites over 2 years will suffice to estimate H2 reliably. Table 5 further assists understanding why selection in early generations (non-replicated single hill in first clonal generation [T1] or larger plots in second clonal generation [T2]) does not seem to be efficient for total tuber weight because of low heritability estimates in such trials. Using trials with at least two replications or even better if testing occurs with multi-environment trials (e.g. from T4 onwards as done by Svenska potatisförädling) at the target population of environments provides means for identifying more precisely promising breeding clones during potato cultivar development.

Table 5 Simulated broad-sense heritability with varying number of testing sites (L), years (Y) and replications (R) for tuber weight, percentage of starch in the tuber flesh and reducing sugars. Bold numbers indicate testing clonal selections (T2, T3, T4) at one site (S = 1) in one (Y = 1),) or over two (Y = 2) and 3 years (Y = 3), respectively, using non-replicated plots (R = 1) in the field

Discussion

Broad-sense heritability is the percentage of the phenotypic variance accounted by genetic differences due to significant variability amongst genotypes (Schmidt et al. 2019b). H2 is also associated with the coefficient of determination (R2) of a linear regression (P = μ + bG) of the unobservable genotypic value (Gi) on the observed phenotype (Pi), or to the squared correlation between predicted phenotypic value and genotypic value. It is of further interest to plant breeding because H2 may be used in the genetic gain (ΔG) equation to predict response to selection (ΔG = H2 × S[= mean phenotypic value of the selected genotypes as a deviation from μ]), or as a descriptive measurement to determine how useful and precise are the results from cultivar trials (Schmidt et al. 2019a). The H2 estimates in this research were mostly high, which is not surprising because, as noted by D’hoop et al. (2011), this often occurs for variable traits in asexual crops such as potato.

Defining the target populations of environments where a cultivar will be released is key in plant breeding. A large variation between sites could lead to either developing cultivars for each site or showing adaptability across sites over years. Hence, it will be necessary to know the relative magnitude of the interactions of genotypes with both sites and years to develop an efficient selection program particularly when significant GEI occurs, as often noticed in potato (Yildirim and Çalişkan 1985). Indeed, GEI leads to increasing minimum detectable differences that further reduce selection precision (Sengwayo et al. 2018). As indicated by the results, GEI was highly significant for productivity and quality traits in the multi-environment trials in Scandinavia, with the genotype × location interaction being larger than the genotype × year interaction for almost all tuber traits except the reducing sugars in the tuber flesh. It may be very suitable to select for tuber weights (total and according to sizes) and percentage of starch in the tuber flesh because genotype × location interactions are predictable, while the genotype × year are mostly unpredictable (Allard and Bradshaw 1964), which explain the significant variability noticed for reducing sugar in the tuber flesh across sites over years. Seeking stable potato cultivars for reducing sugars in the tuber flesh that perform consistently in multi-environment trials at representative sites may reduce the magnitude of the genotype × year interaction for this trait. On the other hand, multi-environment testing across sites is more important than testing over years to identify high yielding breeding clones with desired starch content in the tuber flesh for the target population of environments.

This research addresses an important topic in potato breeding; i.e., the use of an experimental design seeking to minimize the phenotypic variation (both GEI and error or residual) at a given cost (or number of plots for trials). The more environments used for trials, the lower H2 estimates because of a large GEI, while the high germplasm diversity may inflate H2 (D’hoop et al. 2011). The minimum number of testing environments and replications may be, however, debatable because both depend on various factors, including the availability of planting materials. Early generation (T1) testing uses nonreplicated 1-plant plots in the first-year field trial. There are sufficient tubers for having replicated trials in the T3 or T4 generation, when total tuber weight, specific gravity (as a proxy for dry matter or starch in the tuber pulp) and crisping suitability should be properly evaluated. Furthermore, the minimum number of testing sites should consider the target population of environments where the breeding clones along with cultivar checks will be included in multi-environment trials. Curves ensuing from plotting H2 at different number of testing environments (Fig. 1) suggests that the ideal will be four for tuber weight and percentage of starch in the tuber flesh because thereafter the H2 gain is minimal. Furthermore, a minimum of two sites over 2 years will suffice for determining accurately these traits when using simple lattice designs with two replications (Table 5) in trials of bred germplasm from T3 (if enough planting materials available) or T4 onwards during potato cultivar development. The saved resources resulting from reducing number of testing sites, replications and years may be used for planting more on-farm trials with advanced breeding clones (T6 onwards), which may also provide more information about associated crop husbandry practices.

Strong selection for quantitative traits, even if they are highly heritable, based on nonreplicated small plots (1–4 plants) in the T1 or T2 appears to be unreliable in potato because of a significant GEI and a high error variance. Brown (1987) demonstrated that the error variance for total tuber weight of 1-plant plots was significantly greater than that of 5-plant plots. Moreover, H2 estimates when considering potato breeding trials using non-replicated plots in one testing environment were always the lowest (Table 5). Caligari et al. (1986) indicated that the inefficiency of selection in the T1 could be also attributed to the inaccuracy of tuber yield assessment. Hence, selection for productivity using nonreplicated breeding trials seems to be ineffective, even when considering the best breeding clones from the previous year assessment; i.e., in T2. Trial heterogeneity in early-stage potato breeding trials calls for the use of augmented (Federer 1956) or p-rep designs (Paget et al. 2017) and spatial data analysis (Kempton et al. 1994) when using non-replicated plots, and pedigree-based BLUPs for selection of promising bred-germplasm in T1 and T2. As indicated by Slater et al. (2014), BLUPs that use pedigree results in increased ΔG when having low H2 in potato.

Ticona Benavente and Pereira Pinto (2012) indicated that family selection for tuber yield and specific gravity may be also effective in early potato breeding generations because heritability at the family level was always larger than at the breeding clone level. Inter-family variation is also more efficient than within-family variation because the former has a lower environmental effect (thus larger H2) than among breeding clones of the same family (Simmonds 1996). Furthermore, as noted by Bradshaw et al. (1998), combining family selection in T1 with within family selection in T2 may lead to promising T3 bred germplasm. This combined selection approach appears to be very appropriate when having low within family variation (Silva Melo et al. 2011); i.e., low H2 for the desired trait among siblings.

Potato breeding trials normally involve testing of promising advances clones along with released cultivars in several environments across testing sites over years. This article provides a methodology to optimize their numbers in METs of potato breeding materials, as well as tabulated information for choosing the appropriate number of trials in same target population of environments in the cultivar development pipeline.