Introduction

Performance of cultivars across environments can be captured in the form of cultivar main effects. Good estimates of these effects are of great value for compiling national lists of crop species to produce reliable recommendations. Also for potato, such estimates are valuable. However, the production of such estimates requires caution. Potato cultivars are maintained as vegetatively propagated clones. The origin and physiological status of the planting material (seed tubers) may already have a profound impact on the plant (Asiedu et al. 2003). Therefore, it is required to plant trial fields with seed tubers which have been propagated, stored and sprouted under identical conditions for at least one propagation cycle. Potato cultivars may serve different purposes and market niches. Since the 1970s and 1980s, potato breeding companies have specialised and consequently their gene pools have increasingly diversified. Currently, cultivars are developed for specific markets such as the processing industry (pre-baked frozen fries, crisps and flakes), fresh consumption (domestic and export markets) or the starch industry, each with their own specific requirements (Kirkman 2007).

Studies on the inheritance of complex traits are nowadays no longer limited by the number of genetic data points. High multiplex SNP assays and other DNA marker techniques have enabled the construction of marker-dense linkage maps (van Os et al. 2006), as well as marker-dense genome-wide association scans (Zhu et al. 2008). Accurate phenotypic information of biological samples has become the limiting step in both linkage and association mapping studies. In plants, large F2, backcross or recombinant inbred mapping populations can be generated easily, which in animal genetics is not so straightforward. Phenotypic information always has intrinsic inaccuracies due to the methods of data collection. Replication of observations through installing extra field trial years or locations will have a diminishing return in the end, while including the number of genotypes will overcome limitations in phenotyping accuracy. In human genetics, controlled crosses and replications are imaginary only, and therefore, association genetics with thousands of patients is habitually used by necessity. In plants, association studies have gained popularity, mostly because association mapping allows historical phenotypic data to be used for quantitative trait locus (QTL) mapping purposes (Flint-Garcia et al. 2003).

It is generally accepted that association studies (studies aimed at identifying QTL by statistically associating trait and marker data using collections of variously related germplasm) using well-known cultivars (both contemporary and historical) have two major attractive properties compared to QTL studies. First, in a QTL mapping population, the number of segregating alleles is limited to the specific alleles of the crossing parents, which is far from a comprehensive reflection of the genetic and phenotypic trait diversity of the entire gene pool. Second, phenotypic data may already be available for these cultivars. In potato, for example, the ECPGR Potato Working Group (a continuation of EU project RESGEN-CT95-34/45) aims to collect phenotypic data (www.europotato.org). Unfortunately, it appeared that these data emphasise disease resistance, are highly incomplete with respect to quality traits and are highly incomplete with respect to cultivars released in the recent years.

In the framework of a public–private research effort (www.cbsg.nl), two association panels (sets of variously related germplasm assembled for association studies) have been composed. One panel comprises 205 contemporary and historical cultivars (the academic panel), and the other panel (the industrial panel) contains 299 cultivars including recent breeds from five Dutch breeding companies next to a series of standard cultivars. The academic panel, in distinction to the industrial panel, was chosen to represent the commercial potato gene pool (excluding native potato land races from Latin America) of the previous century with emphasis on the oldest “founding fathers”, the most widely used in cultivation or the most widely used as crossing parents (major contributing ancestors, see Love 1999), as well as progenitor clones, representing widely used donors of disease resistance genes. The industrial panel covers mostly advanced clones just before granting Plant Breeders’ Rights. Obviously, breeders are hesitant to release this material for centralised phenotypic analysis, but in this project breeders gladly shared all existing multi-year–multi-location data obtained during the clonal selection process.

In this paper, we describe, analyse and compare two phenotypic data sets (“the academic panel” and “the industrial panel”). The industrial panel is based on existing multi-year–multi-location observations of recent breeds and standard cultivars at five breeding companies. The academic panel is based on a single-year field experiment with contemporary and historical potato cultivars. Both data sets are intended for use in a genome-wide association study, and this paper will describe the approach used to generate adjusted mean genotypic values. Furthermore, the strengths and weaknesses of both panels will be analysed. Finally, broader aspects related to phenotypic data sets and their analyses are discussed.

Materials and Methods

Plant Material

Tubers of 205 potato cultivars, listed in Online Resource 1, have been collected from five Dutch breeding companies (Agrico Research, Averis seeds, C. Meijer, HZPC Holland BV and Van Rijn-KWS) or were kindly provided by gene banks [Agriculture and Agri-Food (Canada), Arche Noah (Austria), IPK Gatersleben (Germany), INRA (France), SASA (Scotland), Teagasc (Ireland) and USDA (USA)]. Uniform planting material was generated by propagation of seed tubers in 2005 to perform the 2006 field trials, with a few exceptions for in-vitro-grown cultivars donated by overseas gene banks.

Field Trial Design Academic Panel

For the academic panel, it appeared problematic to use existing data for many reasons. Existing data were highly incomplete and were obtained in periods with outmoded agronomic practices (e.g. obsolete cultivars), and traits were evaluated with different methods or recorded on scales that do not relate to recent practices in varietal assessment. Therefore, it initially appeared necessary to collect new phenotypic values on all contemporary quality traits for the academic panel.

In 2006, the 205 contemporary and historical potato cultivars were grown on trial fields at two locations near Wageningen (Netherlands), one with sandy soil and one with river clay. Both locations harboured two replicates. The cultivars were arranged into maturity classes according to their maturity type (early maturing–main crop–late maturing) to avoid competition as much as possible. Plots were four-plant units. Plants were spaced at 40 cm within rows and at 75 cm between rows. Plots were randomised within maturity classes, while maturity classes were randomised within replicates. Tubers were planted on April 26 and harvested from October 9 to 11 at the river clay location and planted on May 3 and harvested from September 28 to October 2 at the sand location. Three weeks before harvest, vines were killed. No irrigation was applied. The fertility regime followed standard commercial potato production field practices.

Phenotypic Observations

Agro-morphological traits were evaluated in a similar way for both panels. For the academic panel, these traits were recorded during the growing season in the field and using the harvested material of these fields. Phenotypic values of quality traits for both the academic and industrial panels were assessed according to standard protocols of the breeding companies. In this study, we focused on the non-enzymatic discolouration due to oxidation of an iron–chlorogenic acid complex (McKenzie et al. 2005), which is observed after baking of French fries (after baking darkening, ABD) or after boiling potatoes (after cooking darkening, ACD). We also payed attention to the Maillard discolouration of French fries due to reducing sugars (frying colour, FryingCol). The traits ABD and FryingCol were assessed after varying months of storage of the harvested material. Another form of discolouration, observed on peeled or grated tubers, is caused by the enzymatic oxidation of phenolic compounds by polyphenol-oxidase (enzymatic browning, EnzBrow), see Lærke et al. (2002). Tuber flesh colour of potato is usually recorded as an ordinal trait, where values below 5.5 are regarded as white fleshed, while yellow-fleshed potato is ranked six and higher depending on the intensity of the yellow colour. The white–yellow contrast, controlled by the well-known Y-locus on chromosome 3 encoding the gene beta-carotene hydroxylase (Brown et al. 2006), was not regarded as a relevant quality trait. In this study, only the intensity of the yellow pigmentation was recorded, which means that a truncated scale was used starting from light yellow six to orange nine, in an attempt to study the genetic control of tuber flesh yellowness, both in raw tuber flesh (FleshColY) and in cooked tuber tissue (CookColY). Furthermore, we recorded cooking type, maturity, tuber shape, tuber size and starch concentration (underwater weight). An overview with descriptions and scales of all the traits measured is presented in Table 1.

Table 1 Abbreviated names of phenotypic traits along with a short description of the protocol, the data type and the scale

Data Analysis

To get an indication of the feasibility of genome-wide association mapping with our phenotypic data sets (academic and industrial), we studied the variance components for the genotypic main effect and genotype-by-environment interaction effects, as well as the heritabilities. Furthermore, we looked at the genetic correlations between traits within both the academic and industrial data sets. Finally, we looked at genetic correlations between both data sets for the genotypes that were common to both association panels. The genetic correlations were calculated from the estimates for the random genotypic main effects produced with appropriate mixed models for the individual traits in both data sets. For the analysis, we used the mixed model facilities of GenStat, 11th edition (VSN International Ltd., Oxford, UK).

Academic Panel

The academic panel contained phenotypic values obtained during a 1-year–two-locations field trial with replicates. The genotypes in the academic panel were representative for the commercial potato germplasm because cultivars were selected based on (1) the acreage for seed potato production in the Netherlands, (2) the country of origin, (3) the year of commercial introduction, (4) the market niche, (5) phenotypic diversity for quality traits and (6) availability of the cultivars, an important criterion for older genotypes which formed about one third of the academic panel.

For the analysis of the academic panel, the following model was used:

$$ {\hbox{Response}} = {\hbox{location}} + {\hbox{maturity class}} + \left( {{\hbox{location - by - maturity class interaction}}} \right) + \left( {\hbox{replicate within location}} \right) + \left( {\hbox{SecGr within replicate}} \right) + \left( {\hbox{green within replicate}} \right) + \underline {\hbox{genotype}} + (\underline {{\hbox{genotype - by - location}}\,{\hbox{interaction}}} ) + \underline {\hbox{error}} $$

Most terms in the model are self-explanatory. Random terms are underlined. Maturity class was a factor with three levels (early, main crop and late). SecGr and Green were two covariates representing secondary growth and tuber greening, respectively. Both disturbances were caused by delayed and excessive rain in the 2006 growing season.

Heritabilities for cultivar means across locations and replicates were calculated according to: \( {h^{{2}}} = \sigma_{\rm{g}}^2/\left( {\sigma_{\rm{g}}^2 + \sigma_{\rm{gl}}^2/{n_{\rm{l}}} + { }\sigma_{\rm{e}}^2/{n_{\rm{e}}}{n_{\rm{r}}}} \right) \), with \( \sigma_{\rm{g}}^2 \) the variance component for cultivar main effects, \( \sigma_{\rm{gl}}^2 \) the variance component for genotype-by-location interaction and \( \sigma_{\rm{e}}^2 \) the error variance. The number of environments, n e, and the number of locations, n l, were two. The number of replicates within locations, n r, was two as well. The variance components \( \sigma_{\rm{g}}^2 \) and \( \sigma_{\rm{gl}}^2 \) were estimated for each maturity class separately. The error variance, \( \sigma_{\rm{e}}^2 \), was estimated across maturity classes. Coefficients of variation (%CV) were calculated according to: \( \% {\hbox{CV}} = {1}00*\surd \left( {\sigma_{\rm{e}}^2} \right)/{\hbox{mean}} \).

Furthermore, heritabilities across the three maturity classes were calculated by a weighted average with weights proportional to the number of genotypes in each class. Likewise, overall coefficients of variation were calculated.

Although installing potato cultivars according to their maturity class in the field is a common practice for potato breeding trial fields, variance component and heritability estimation will likely be influenced due to the subsequent inclusion of a maturity class term and its interaction with the environment in the phenotypic analysis model. To appraise the importance of this effect, a simplified model, in which all maturity class-related terms were omitted, was tested. The principal findings related to removal of maturity class from the model are presented in the “Discussion” section.

Most traits were measured on an ordinal scale. Nevertheless, we analysed these traits without transformation as we found that the diagnostic plots did not show serious violations of the standard assumptions for the application of mixed models. This was true for both the academic and industrial panels.

Industrial Panel

The industrial data set was composed of phenotypic values of 190 advanced breeders’ clones (38 per breeding company), supplemented with additional observations on a variable series of standard potato cultivars, leading to a total of 299 genotypes. The 109 standard cultivars are well-known and widely grown cultivars used by each of the participating breeding companies. A majority of the standards (101 out of 109) was also included in the academic panel. Obviously, the companies use different sets of standard cultivars, in view of their differences in market niches. Data of the industrial data set were assembled throughout a variable number of successive years of clonal selection and commercial testing of the 190 recent breeds. Standard commercial potato production field practices were applied to all of the included trials in the industrial panel. Table 2 presents an overview of the structure of the industrial data set. The number of tested genotypes is highly variable, depending on the trait, simply because cultivars for the starch industry are habitually not evaluated for cooking and processing traits. Conversely, observations on processing and starch properties (not part of this study) are poorly known for cultivars used for fresh consumption. Table 2 also shows the number of locations, the number of years and the number of trials in which a particular trait was measured. As illustrated in Table 2, the phenotypic observations on agro-morphological traits were abundant, whereas data on processing traits were scarce. Plot sizes were in general 16 plants per genotype, in contrast to the academic panel, where plot sizes were 4 plants.

Table 2 Illustration of the structure of the industrial panel

The model for the trait values in the industrial data set was:

$$ \eqalign{ {\hbox{Response}} = \underline{\hbox{trial}} + \left( \underline{\hbox{repeat within trial}} \right) + \hfill \cr \underline{\hbox{genotype}} + \left( \underline{{\hbox{genotype - by - year interaction}}} \right) + \left( \underline{{\hbox{genotype - by - trial - within - year interaction}}} \right) + {\hbox{error}} \hfill \cr }<!endgathered> $$

Trial represents the factorial combination of location and year. “Repeat within trial” is a kind of replicate within trial effect. We use the term “repeat” instead of “replicate”. “Replicate”, as used for the academic panel, means fully duplicated for all traits. “Repeat” in case of the industrial data set indicates that a trait has been assessed in a variable number of replicates, depending on the trait and the breeding company. Again, random terms are underlined, from which it becomes evident that for these data we used a fully random model. The large number of trials made it statistically attractive to use a purely random model.

Heritabilities were calculated as \( {h^{{2}}} = \sigma_{\rm{g}}^2/{ }(\sigma_{\rm{g}}^2 + \sigma_{\rm{gy}}^2 + \sigma_{\rm{gty}}^2 + \sigma_{\rm{e}}^2) \), with \( \sigma_{\rm{g}}^2 \) the variance component for the genotypic main effect, \( \sigma_{\rm{gy}}^2 \) the variance component for the genotype-by-year interaction, \( \sigma_{\rm{gty}}^2 \) the variance component for the genotype-by-trial-within-year interaction and \( \sigma_{\rm{e}}^2 \) the error variance. As the numbers of repeats varied strongly among genotypes, we first calculated heritabilities on a per plot basis instead of on a per genotype-mean basis.

To compare heritabilities between the academic and industrial data sets, we calculated adjusted heritabilities according to: \( {h^{{2}}} = \sigma_{\rm{g}}^2/(\sigma_{\rm{g}}^2 + \sigma_{\rm{gty}}^2/{2 } + \sigma_{\rm{e}}^2/{4}) \), where the variance component for genotype-by-year interaction was left out of the formula as this component could not be estimated in the academic data set. Analogous to the heritability calculation for the academic data set, we divided the genotype-by-trial-within-year variance component by two and the error component by four. Also for this data set, we calculated coefficients of variation according to: \( \% {\hbox{CV}} = {1}00*\surd \left( {{\sigma^{{2}}}_{\rm{e}}} \right)/{\hbox{mean}} \).

Comparative Analysis of Both Association Panels

The genotypic main effects obtained with the above mixed models were used to inspect correlation structures within the academic and industrial data sets. In addition, for individual traits, we inspected the correlation (r) between the academic and industrial data sets. As we calculated the correlations based on the estimates for the random genotypic main effects, we were effectively investigating the structure of the genetic correlations within and between the two data sets. To visualize the within-set genetic correlations, we produced biplots based on a principal component analysis of standardized genetic effects. For the biplots, we used the software SC-Biplot (Cosinus Computing BV, Waalwijk, Netherlands).

Results

More than 15,000 data points were collected during the academic field trial. Nineteen phenotypic traits were evaluated for 205 potato genotypes on two locations with two replicates (205 * 2 * 2 * 19 = 15,580). The industrial data set was even larger, with 73,968 observations. This number, however, represented only about 4% of the possible data points that would have been obtained from the evaluation of all 299 genotypes in all 342 trials for all 19 traits (Table 2).

Estimates for variance components and their standard errors for the academic panel are shown in Table 3. An analogous overview is presented for the industrial panel in Table 4. Academic heritabilities are plotted versus industrial heritabilities in Fig. 1.

Table 3 Results of the phenotypic analysis of the academic field trial
Table 4 Overview of the estimates of variance components and their respective standard errors for the industrial data set
Fig. 1
figure 1

Comparison between heritability estimates of the academic and industrial panels for 19 phenotypic traits. For the industrial panel, the adjusted heritabilities on a comparable scale were used

In general, high heritabilities were obtained. For the academic field trial, the heritabilities ranged from 0.53 for ABD_Apr4c to 0.95 for CookColY. For the industrial data set, the heritabilities ranged from 0.33 for ABD_Feb8c to 0.87 for underwater weight. However, when we compared the academic heritabilities with the comparably scaled industrial heritabilities, there was for the majority of traits a clear trend for the industrial heritabilities to be higher (see Fig. 1).

The coefficients of variation for the academic and industrial panels are plotted in Fig. 2, from which it can be deduced that there was consistency between both data sets. Traits that had a higher coefficient of variation in one data set also showed a higher coefficient of variation in the other data set, likewise for the traits with a lower coefficient of variation.

Fig. 2
figure 2

Comparison of the coefficients of variation of 19 agro-morphological and quality traits between the academic and industrial panels

Figure 3 presents a biplot showing the traits following a principal components analysis on standardized genotypic effects (Gabriel 1971, 1978). Figure 3 thus gives an impression of the genetic correlations within both data sets, with the academic field trial in Fig. 3a (46% of the original variation) and the industrial data set in Fig. 3b (40% of the original variation). Acute angles between trait directions represent positively correlated traits, obtuse angles represent negative correlations and orthogonal directions represent independent traits. Distance from the origin indicates how well individual traits are represented in the biplot projection plane. For traits close to the origin, one should not try to reconstruct the original correlation from the biplot.

Fig. 3
figure 3

Biplot showing genetic correlations. Clustered traits (squares), sharing approximately the same direction (a sharp angle), represent highly correlated traits. Squares at opposite side of the origin (obtuse angles) represent negatively correlated traits. Squares with orthogonal directions represent independent traits. a Trait correlations resulting from the academic panel. b Trait correlations deduced from the industrial panel

In Fig. 3b, which we consider first because genetic correlations are visually better observed, FryingCol_Apr8c, FryingCol_Nov8c, FryingCol_Feb8c and FryingCol_Apr4c appeared highly correlated. The same observation holds for ABD_Nov8c, ABD_Feb8c, ABD_Apr8c and ABD_Apr4c. These formed expected clusters of related traits because they only differed in their time point of evaluation or in their storage temperature before evaluation. The protocols to assess the traits were exactly the same. The relationships between cooking colour and flesh colour and between cooking type and underwater weight were equally predictable. The first two measured principally the same phenotype, whereas the latter two were correlated because cooking type is influenced by dry matter concentration: higher dry matter concentration usually results in a looser cooking type (van Dijk et al. 2002). The negative correlation between maturity and underwater weight (and indirectly cooking type) was due to the development of a higher dry matter concentration by later maturing genotypes.

In Fig. 3a, the same cluster of correlated frying colour traits existed, although the correlations were less pronounced, i.e. the angles were less sharp. The cluster of after baking darkening traits became less obvious as well, only ABD_Apr8c and ABD_Feb8c appeared highly correlated, whereas ABD_Nov8c and ABD_Apr4c shifted away from the cluster. The relationship between ACD_1h and ACD_24h became more apparent in this single-year data set whereas the relationships between cooking colour and flesh colour and between cooking type and underwater weight became less obvious. The negative correlation between maturity and underwater weight also existed in this data set.

Figure 4 shows scatter plots of the estimated genetic effects in both panels, illustrates for individual traits the relationships between the two data sets and also contains information about the magnitude of variances for specific traits present in both data sets. When the points are concentrated along the diagonal, this is an indication of a similar behaviour of genotypes in both data sets and high correlation. By comparing the range of the genotypic main effects, it can be deduced whether both data sets hold the same amount of information, i.e. variance, for a particular trait. For example, tuber shape, tuber size, underwater weight, maturity, cooking type, ACD_1h, flesh colour, cooking colour, enzymatic browning and the frying colour traits all have their points more or less along the diagonal, an indication that these traits correlate well. Tuber size, underwater weight and the frying colour traits all have comparable ranges for genotypic main effects in both data sets, indicating that equal variance is present in both data sets for these traits. Tuber shape, maturity, cooking type, ACD_1h, flesh colour, cooking colour and EnzBrow_30min have a slightly higher range for their genotypic main effects in the industrial data set, which suggests that the industrial panel has a slightly higher information content for these traits. These abovementioned observations illustrate what could be interpreted from the correlations presented in Table 5 as well, namely that, at least for the traits with a high correlation, both data sets are equally valuable.

Fig. 4
figure 4

Comparison of genotypic effects for 19 traits between the academic and industrial panels. The scatter plots display the variability of the traits across genotypes as observed in the academic and industrial data sets, as well as the pair-wise correlation of the traits between both panels

Table 5 Correlations between genotypic effects of the academic and industrial panels

Discussion

Data Management is Necessary

A first complication while assembling data from different companies, as we did for the industrial data set, was to match traits with each other. Trait names can be different while the same phenotype is measured, or the trait names can be the same while a different phenotype is measured. The scales that are used can be different in order, in increments or in range. In addition, the protocols to measure the same phenotype can differ. These problems were solved through consulting the trait evaluators to obtain detailed descriptions allowing rescaling of the trait values. Inspection of unprocessed trait data also resulted in the detection of systematic differences between companies for trait values of reference cultivars. Upon discussions and adaptation of scales to score traits, we could harmonize the data obtained from five different breeding companies. Table 1 offers a clear trait ontology, analogous to the plant structure ontology for anatomy and morphology of flowering plants (Ilic et al. 2007), with accurate definitions of the trait, the evaluation protocol and the scaling of the trait values. A continued effort towards a comprehensive trait ontology could be very helpful in the case of potato. Examples of trait ontologies exist in cereal genomics (Jaiswal et al. 2002) and in tomato (Brewer et al. 2006). Also, to increase robustness and power to detect QTL, it is advisable to develop or apply objective measurement tools. Such objective measurements could replace more subjective trait evaluations based on the breeder’s eye.

Patterns of variation and correlations in two data sets, one resulting from a designed field trial (the academic panel) and one containing data from multi-year–multi-location clonal selection trials (the industrial panel), have been compared. Heritabilities, coefficients of variation, trait relationships within and between data sets and phenotypic variance attributable to genotypes have been estimated. In Tables 3 and 4, an overview of the analyses of both data sets is presented. Especially the columns with the genotypic variance component and heritability are interesting because they give an impression of the feasibility of genetic improvement and possibilities for QTL detection. Figure 1 shows that the industrial heritabilities are generally higher than the ones from the academic panel. Figure 2 demonstrates that traits in both panels behaved consistently when their coefficients of variation are considered, indicating that both data sets contain data of comparable quality. Trait relationships within each panel are depicted through biplots in Fig. 3, while correlations between traits in both data sets are listed in Table 5. Basically, the same trait relationships existed within the two data sets, but the strength of the relationships varied between them. Agro-morphological traits tend to have higher pair-wise correlations between the two data sets than quality traits, except for frying colour. This trend is visually expressed in Fig. 4, where the individual pair-wise trait correlations are depicted in scatter plots.

The Importance of Maturity

As mentioned in the “Materials and Methods” section, we also tested a simplified model without maturity class-related terms for the analysis of the academic panel, to quantify the influence of maturity class on the resulting estimates. Based on the results of this simplified model (see Online Resource 2 to 8), it can be deduced that the presence or absence of maturity terms in the analysis model did not alter the observed trends for the majority of traits when comparing the academic and the industrial panels. For example, removal of maturity terms did not influence the coefficients of variation (Online Resource 5), nor the within data set genetic correlations (Online Resource 6). However, for maturity and underwater weight, there were some changes apparent, as anticipated. Online Resource 3 and 4 show the effect of removal of maturity terms on heritabilities. The effect seemed rather unpredictable for most traits, except for maturity and underwater weight where higher heritabilities were expected after removal of maturity terms because in that case the between-maturity-class trait variation was added to the within-maturity-class trait variation in the estimation of the genetic variance component. The range of the genetic effects for the trait “maturity” obtained with the simplified model did become perfectly similar with the range in the industrial panel (Online Resource 7). This shift was also visible in the improved correlation for maturity between genotypic main effects for the industrial panel on the one hand and the academic panel on the other hand: from r = 0.44 to r = 0.87 where the latter was obtained with the simplified model (Table 5 and Online Resource 8).

High Heritabilities Were Obtained

Heritabilities obtained were high, which is not abnormal for highly variable traits of clonally propagated crops. The high heritabilities (both for the academic and industrial panels) for tuber shape and maturity confirm that these traits are very stable across locations and years. The high heritability for underwater weight can be interpreted as the absence of strong genotype-by-environment (G×E) interaction. The same interpretation holds for frying colour because the G×E variance component for this set of traits was low in comparison with the genetic variance component, as illustrated in Tables 3 and 4 and Online Resource 2. Notwithstanding the truncated scales of flesh colour and cooking colour, now reflecting yellow colour intensity only, the heritabilities of these traits were still high. Tuber size in the industrial data set had a low genetic variance component, which was reflected in the lower heritability for this trait and was probably the result of variation introduced by the size and physiological age of the seed tuber, as these factors are known to have a profound effect on size grading variability within a genotype. The genetic variance components for discolouration phenotypes (ABD and ACD) and cooking type were lower when compared to their respective error variance components for both panels, which resulted in lower heritabilities for discolouration traits and cooking type. This could be because the trait values describe composite effects, i.e. it is difficult to observe pure phenotypes. ABD observations, for instance, are influenced by the dark frying colours, caused by the Maillard discolouration of French fries due to reducing sugars, and cooking type scores are largely dependent on starch concentration.

The higher heritabilities for the academic compared to the industrial data set (here we consider the true heritability, not the comparably scaled “adjusted” heritability) can be explained in two ways. Firstly, the industrial data set included more environments. The more environments for which trait data are available, the lower heritability estimates tend to be, due to a larger contribution of G×E. Secondly, the choice for diversity among the cultivars in the academic panel will have inflated the heritabilities. About one third of the cultivars was first registered on a national list in Europe before 1930, and in those days for most traits, more deviating values were accepted that are not observed anymore in contemporary cultivars.

Phenotypic values of traits with high heritabilities were expected to correlate well between the academic and industrial data sets. This was confirmed in Table 5 where the correlations between genotypic main effects range from 0.20 for ABD_Nov8c to 0.86 for tuber shape. Indeed, tuber shape, underwater weight, flesh colour and cooking colour did not only have higher heritability estimates, but were also more highly correlated between the two data sets than quality traits like after baking darkening and after cooking darkening. Although the data structure of both panels was quite different, most of the traits correlated reasonably well. As an important consequence, existing data (industrial panel) may serve equally well for association studies as a designed trial (academic panel) for particular traits.

Expected Genetic Correlations Appeared

Figure 3 and Online Resource 6 show genetic correlations between traits within data sets. The frying colour and after baking darkening trait clusters detected in both panels were expected as these traits measure exactly the same phenotypes. They only differed in the temperature at which the tubers have been stored and the duration of storage prior to evaluation. In theory, there are no genes involved in the discolouration process itself: frying colour is the result of the Maillard reaction, whereas after baking darkening is the result of oxidation of an iron–chlorogenic acid complex (McKenzie et al. 2005). However, genetic background does influence the amount of reaction products available for the discolouration processes. Olsson et al. (2004) provided evidence that storage time and temperature do not influence the ranking of genotypes with respect to frying colour considerably, which could explain the strong relationships between the frying colour traits (Fig. 3 and Online Resource 6). Likewise, Wang-Pruski and Nowak (2004) state that for after baking darkening, storage time and temperature only influence the severity of the discolouration; nothing is mentioned about changes in genotype ranking. The positive correlation between cooking type and underwater weight in both data sets is equally predictable. Underwater weight is a measure for dry matter concentration of potato tubers and indirectly also for starch concentration; the higher the underwater weight, the higher the starch concentration of a cultivar. Starch granules create swelling pressure while being cooked and therefore can contribute to disintegration of the cooked tuber structure, which creates a looser cooking type (van Marle et al. 1997). The relationship between flesh colour and cooking colour is even more predictable: the mere difference between both phenotypes is that flesh colour is evaluated on raw tubers and cooking colour on cooked tubers. The negative correlation between maturity and underwater weight can easily be explained because later maturing cultivars reside longer in the field and thus have the opportunity to assemble more sucrose as their photosynthesis remains active for a longer time period and, so, can stack larger amounts of starch in their tubers which increases their underwater weight. The unexpected relationship between ACD_24h, cooking colour and flesh colour (visible in Fig. 3b) is possibly an artefact due to the difficulty for breeders to score after cooking darkening objectively after a 24-h period in naturally darker coloured potato genotypes.

Captured Genetic Variation Comparable with Literature

To get an impression of the genetic variation in our data sets, we compared obtained heritabilities with previously reported heritabilities, where the remark has to be made that in our study, a ratio of variance components was calculated that strictly speaking is not really a measure of heritability, but rather repeatability. Where possible, we clearly indicated how heritabilities were estimated in previous studies. However, in some cases, this was not well described. Maris (1969) reports heritabilities pooled over 12 progenies of potato. For maturity and underwater weight, he obtained values of 0.85 and 0.76 respectively, similar to our estimates (see also Tables 3 and 4 and Online Resource 2). Tai and Young (1984) reported a somewhat lower heritability of 0.54 for maturity and 0.65 for specific gravity, a trait directly related to underwater weight. Yildirim and Çalişkan (1985) obtained 0.77 as a heritability estimate on a clone mean basis for tuber shape evaluated in a field trial performed over 2 years and three locations, likewise comparable to what we computed. Love et al. (1997) reported heritabilities on a clone mean basis of 27 traits obtained from a study involving three clonal generations with 200 clones. For shape, they calculated a heritability of 0.79; we obtained 0.74 for the industrial data set and 0.90 for the academic data set (Tables 3 and 4). Thompson et al. (1983) estimated tuber size heritability to be 0.79 for a Solanum andigena population propagated by true potato seed. We obtained a lower value: 0.52 within the industrial data set, but the same value was obtained with the academic panel. Neele et al. (1991) estimated heritabilities for maturity, shape and underwater weight to be 0.77, 0.61 and 0.68, respectively. Bradshaw et al. (2008) reported heritabilities on a clone mean basis for maturity of 0.92, dry matter concentration of 0.81, tuber size of 0.87, tuber shape of 0.85, after cooking darkening of 0.60, frying colour at 4 °C 0.79 and frying colour at 10 °C of 0.78. These estimates are slightly higher than or comparable to the values we obtained (see Tables 3 and 4 and Online Resource 2).

Field Trial Design

Multi-year–multi-location (MYML) data sets will always be preferred over single-year–few-location data sets because they allow investigation of G×E. Often dilemmas regarding field trial design have to be solved before the start of new projects: one trial year with several locations and replicates versus more years with several locations and less replicates. The results of our phenotypic analyses show that the heritabilities obtained with the two types of data sets (Tables 3 and 4) were different. However, when trait correlations within or between data sets are concerned, both data sets showed similar patterns. Both MYML and single-year–few-location approaches have their advantages and disadvantages. When choosing just 1 year of field trials, heritabilities are likely overestimated due to confounding of the genotypic main effect variance and the genotype-by-year interaction variance (and genotype-by-year-by-location interaction variance). MYML trials guarantee robust phenotypic observations, heritabilities will be more conservative and probably more realistic and more variable (environment-sensitive) traits will become better assessable. However, with MYML data, some aspects can change in the course of time, for instance reference cultivars can change as new standards are set by breeding companies or trait scoring methodology can alter due to new techniques or insights. Phenotypic analyses will become increasingly complex as more trials need to be included in the analysis and unbalancedness of data sets will increase.

Plot size is another important factor in field trial design. Some researchers claim that small plot size, e.g. four plants per genotype, can suffice to evaluate most of the traits (personal communication), whereas potato breeding companies generally use a plot size of a minimum of 16 plants per genotype to get a reliable impression of the performance of a genotype. Considering the phenotypic analysis of the academic field trial—given acceptable environmental conditions and good experimental design—for most simple traits such as maturity, tuber shape or flesh colour, small plot sizes indeed suffice to obtain good quality phenotypic data. However, when more variable traits like discolouration have to be assessed, a larger plot size can significantly improve the outcome of a field experiment by providing more sample material and offering a buffer to extreme within-plot environmental conditions.

Cultivar Performance Is Better Illustrated Using Adjusted Genotypic Means

Marketing brochures and national lists usually provide information regarding a cultivar’s performance to allow growers to make well-informed decisions. However, in order to be really informative about a cultivar’s performance for specific traits, it is necessary to present reliable genotypic means. Therefore, inclusion of adjusted trait mean values, corrected for genotype-by-environment interaction is advisable, in contrast to straightforwardly averaged trait values. In our case, the genotypic effects which are directly related to these adjusted means by means of the phenotypic analysis model offered robust data that could be reliably compared throughout data sets resulting from different field trials (Fig. 4).

Conclusion

With two differently structured association panels, high heritabilities have been obtained, which also appeared comparable to previous studies. Both panels were consistent in their data quality. Anticipated trait correlations were found within both panels and the majority of the traits correlated well between both panels. The industrial panel had a somewhat higher information content for some of the traits. Maturity class separation in potato field trials was shown to have consequences for the estimation of the genotypic main effects of maturity and underwater weight, which will likely carry over to association results since association studies make direct use of the estimated genotypic main effects. Based on the results presented in this manuscript, we can conclude that two equally valuable association panels have been assembled.

The phenotypic analyses in this article represent the first step of a genome-wide two-step association mapping study. We intend to integrate a high amount of genotypic marker information to detect associations for the traits mentioned. The outcome of the current analyses produced adjusted genotypic means that subsequently will serve as entry values for further association analysis, a similar approach as described in D’hoop et al. (2008) and Li et al. (2008).