Analytical and numerical comparisons of two methods of estimation of additive × additive × additive interaction of QTL effects

This paper presents the analytical and numerical comparison of two methods of estimation of additive × additive × additive (aaa) interaction of QTL effects. The first method takes into account only the plant phenotype, while in the second we also included genotypic information from molecular marker observation. Analysis was made on 150 doubled haploid (DH) lines of barley derived from cross Steptoe × Morex and 145 DH lines from Harrington × TR306 cross. In total, 153 sets of observation was analyzed. In most cases, aaa interactions were found with an exert effect on QTL. Results also show that with molecular marker observations, obtained estimators had smaller absolute values than phenotypic estimators.


Introduction
The analysis of inheritance of quantitative traits, due to their polygenic nature, requires the use of appropriate statistical and genetic methods. Among these methods, the most interesting are those that enable the determination of the mode of action of genes in the studied population.
The concept of genetic interactions is known for more than a hundred years (Bateson and Mendel 1902). Considering that a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects (Chen et al. 2011). Most studies are focused on single locus analysis, which directly tests the association between individual genes and phenotypic variants. Pairwise interactions are often used in modern genetics (Brem et al. 2005;Jarvis and Cheverud 2011;Gaertner et al. 2012), but higher-order interactions are often neglected. This kind of more complex interaction requires complete, precise data to be successfully included, but this type of data was rarely available since more recent times (Carlborg et al. 2006;Cordell 2009). There is no denying that we do not fully understand all of the mechanics of heritability and the higher-order interactions may be the missing element of explaining the relationship between genotype and phenotype (Hartman et al. 2001;Manolio et al. 2009).
Quantitative traits are not only one of the most important in the viewpoint of breeding programs but also can be influenced by a multiplicity of polymorphic genes, environmental conditions, and genetic interactions, making them extremely difficult to fully understand (Members of the Complex Trait Consortium 2003; Mackay 2014).
The purpose of the research reported in this article is to compare two methods of estimation of the parameter connected with the additive × additive × additive (aaa) interaction gene effect: the phenotypic method and the genotypic method. The comparison was made by analytical methods and with analyses of data sets of barley doubled haploid lines. To our knowledge, this is the first report about aaa interaction.
The i-th element (i = 1, 2, …, n) of vector m l is equal − 1 or 1, depending on the parent's genotype exhibited by the i-th line.

Estimation based on the phenotype
Estimation of the additive × additive × additive interaction of homozygous loci (three-way epistasis) effect aaa on the basis of phenotypic observations y requires identification of groups of extreme lines, i.e., lines with the minimal and maximal expression of the observed trait (Choo and Reinbergs 1982). The group of minimal lines consists of the lines which contain, theoretically, only alleles decreasing the value of the trait. Analogously, the group of maximal lines contains the lines which have only alleles increasing the trait value. In this paper, we identify the groups of extreme lines as minimal and maximal, respectively, lines of the empirical distribution of means. The total three-way epistasis interaction effect aaa can be estimated by the following formula: where L min and L max denote the means for the groups of minimal and maximal lines, respectively, L denotes the mean for all lines. The number of genes (number of effective factors) obtained on the basis of phenotypic observations only was calculated using the formula presented by Kaczmarek et al. (1988).

Estimation based on the genotypic observations
Estimation of aaa is based on the assumption that the genes responsible for the trait are closely linked to the observed molecular marker. By choosing from all observed markers p, we can explain the variability of the trait, and model observations for the lines as follows: where 1 denotes the n-dimensional vector of ones, μ denotes the general mean, X denotes (n × p)-dimensional matrix of the form X = m l 1 m l 2 ⋯ m l p , l 1 , l 2 , ..., l p ∈ {1, 2, ..., q}, β denotes the p-dimensional vector of unknown parameters of the form ′ = a l 1 a l 2 ⋯ a l p , Z denotes matrix which columns are products of some columns of matrix X, γ denotes the vector of unknown parameters of the form ′ = aa l 1 l 2 aa l 1 l 3 ⋯ aa l p−1 l p , W denotes matrix which columns are three-way products of some columns of matrix X, δ denotes the vector of unknown parameters of the form ′ = aaa l 1 l 2 l 3 aaa l 1 l 2 l 4 ⋯ aaa l p−2 l p−1 l p , and e denotes the n-dimensional vector of random variables such that E(e i ) = 0, Cov(e i , e j ) = 0 for i ≠ j, i, j = 1, 2, …, n. The parameters a l 1 , a l 2 , ..., a l p are the additive effects of the genes controlling the (2) y = 1 + X + Z + W + e, trait, parameters aa l 1 l 2 , aa l 1 l 3 , ..., aa l p−1 l p are the additive × additive interaction effects and parameters aaa l 1 l 2 l 3 , aaa l 1 l 2 l 4 , ..., aaa l p−2 l p−1 l p are the additive × additive × additive interaction effects. We assume that the epistatic and three-way epistatic interaction effects show only loci with significant additive gene action effects. This assumption significantly decreases the number of potential significant effects and causes the regression model to be more useful.
Denoting by ′ = [ ′ ′ ′ ] and G = [ 1 X Z W ] we obtain the model If G is of full rank, the estimate of is given by (Searle 1982) The total three-way epistasis aaa effect of genes influencing the trait can be found as follows: For the marker selection of model (2), we used a stepwise feature selection by Akaike information criteria (Akaike 1998). The procedure consisted of two steps: first, we divided markers into groups based on chromosomes they were located on and performed stepwise feature selection by AIC; after that, we combined the remaining markers into one group and we repeated selection as above. All of the remaining markers were combined into the final group and the last feature selection was performed on a model with additive × additive × additive interaction effect included. To counteract the multiple comparisons problem, we used the Bonferroni correction.

Examples
To compare the estimates of aaa obtained by different methods, the following data sets were used.

Example 2
The second data set also comes from the NABGM project and consist of 145 doubled haploid (DH) lines of barley (cross of two-rowed varieties Harrington × TR306) analyzed for seven phenotypic traits (weight of grain harvested per unit area, WG; number of days from planting until emergence of 50% of heads on main tillers, NH; number of days from planting until physiological maturity, NM; plant height, H; lodging transformed by arcsin √ x∕100 , L; 1000 kernel weight, KW; test weight, TW) and tested in five environments (in four environments, observations were made over two years: Brandon, Manitoba, 1992 andAilsa Craig, Ontario, 1992 andElora, Ontario, 1992 andOutlook, Saskatchewan, 1992 and; Ste-Anne-de-BeUevue, Quebec, 1993) (Tinker et al. 1996, http:// wheat. pw. usda. gov/ ggpag es/ HxT). We used the map composed of 127 molecular markers (mostly RFLP) with the mean distance between markers equal to 10.62 cm.
Considering that each trait and environment was classified as an independent variable in both cases, in total of 153 sets of observations were deemed. Trait data was transformed to achieve normal distribution of the observed features. In all cases, transformation was successful and normal distribution was obtained.

Analytical comparison
The estimators, (1) and (5), of the three-way epistasis effect aaa can be analyzed and compared under simplified assumptions: (i) that the markers are unlinked and (ii) that the segregation of each marker is compatible with the genetic model appropriate for the analyzed population, which in our case means that the probability of observing "1" is the same as observing " − 1". This is true if we consider that model (2) treats the marker observations as fixed. In fact, the vectors m l , l = 1, 2, ..., q, constitute observations of some random variables. If the marker data satisfied exactly assumptions (i) and (ii) we would have where y (l k l k � l k �� ,−) and y (l k l k � l k �� ,+) denote the means for lines with observations of k-th, k'-th, and k''-th markers equal − 1 and 1, respectively. In practice, the marker data do not accurately meet the following conditions for model (6). Taking into consideration that markers chosen for model (2) are far apart from each other on the linkage map, assumption (i) is true. To test the assumption (ii) 2 , the test is used before any analysis is performed.

Numerical comparison
Obtained results for estimates of total additive × additive × additive interaction effect was presented in Tables 1, 2, 3, and 4. Tables 1 and 2 contain phenotypic and genotypic analysis, respectively, for the 150 doubled haploid lines of barley from the Steptoe × Morex cross; Tables 3 and 4 for the 145 doubled haploid lines of barley from the Harrington × TR306 cross. Figures 1 and 2 show the relative comparison of phenotypic and genotypic estimates of the total additive × additive × additive interaction effect in the form of a box-and-whisker diagram of the values âaa g ∕âaa p • 100 , classified by the observed phenotypic traits.
Results show that in 90 cases (70%) we found statistically significant additive × additive × additive interaction effects (Table 1). The same amount of interactions was found for marker observation, but only in 72 cases, where we confirmed results statistically ( Table 2). Comparisons of genotypic and phenotypic estimates of the total additive × additive × additive interaction effect show that in the majority of cases (79%), the effect was smaller than the total aaa interaction effect from phenotypic observations alone (Fig. 1). However, the scope of calculated estimates is quite large ranging from − 1590.91% for HD to 1800.00% for H in the same environment (WA92). In a total of five cases, we observed estimate values higher than |1000|%. The smallest range of estimates was observed for the trait DP. Number of genes (effective factors) ranged from 3-10 with average of 3.4 (Table 1). Minimal number of included markers equals 12, where maximum number was 32, with an average of 19.5 markers per model. The number of three-way interactions ranged from 0-35 with an average of 8.3 (Table 2).
For the Harrington × TR306, cross results show that in 63 cases (100%), we found statistically significant additive × additive × additive interaction effects (Table 3). The same amount of interactions was found for marker observation, but only in 35 cases, where we confirmed results statistically (Table 4).
Comparisons of genotypic and phenotypic estimates of the total additive × additive × additive interaction effect show that in majority of cases (79%), the effect was smaller than the total aaa interaction effect from phenotypic observations alone (Fig. 2). Same as above, the scope of calculated estimates is quite large ranging from − 2194.31% for WG in environment QC93 to 2866.67% for KW in ON93a. In a total of four cases, we observed estimate values higher than |1000|%. The smallest range of estimates was observed for the trait NM. The number of genes (effective factors) ranged from 0-13 with an average of 5.6 (Table 3). A minimal number of included markers equals 7, where the maximum number was 21, with an average of 13.9 markers per model. The number of three-way interactions ranged from 0-36 with an average of 4.8 (Table 2). In total, we analyzed 153 sets of observations, independently for each trait and each environment. Both examples were considered separately.

Discussion
Breeding programs aim to enhance the most desirable traits. Actions based solely on phenotypic observations and gene effects are likely to miss the potentially huge impact of interaction and higher-order interaction effects (Taylor and Ehrenreich 2015). Analytical and numerical comparisons of methods of estimation of the total additive × additive × additive interaction effects are presented in this paper. The numerical comparison was conducted on 153 sets of observations from two examples of barley doubled haploid lines.
The analytic comparison shows that, under the assumption of correct segregation and no linkage between markers, the formulae for the phenotypic and genotypic estimators are comparable and that the additive × additive × additive interaction effect of each QTLs triad is smaller than the phenotypic effect. The numerical comparison of estimates of additive × additive × additive interaction effect shows that in most cases (79% for both examples), genotypic estimate of aaa interaction is smaller than the phenotypic. This sentence is true due to the reason that phenotypic estimate consists of total additive × additive × additive interaction effects of all genes, unlike the genotypic estimate which includes only selected genes. For the rest of the cases that show lower values of phenotypic than genotypic estimates, it may be the result of a high genetic diversity with a lesser phenotypic diversity of the DH lines. High ranges of differences for the calculated estimates are most likely the result of a lot of different experimental variants such as different traits, environments, and experimental situations (Bocianowski and Krajewski 2009). The number of genes (effective factors) in phenotypic estimation does not directly influence the number of markers, as well as the number of aaa interaction included in genotypic models. Both the number of effective factors and number of markers are pretty consistent with few outliers, which makes sense considering that our method tries to include the maximum amount of best-fitted factors. On the contrary, the number of aaa interactions ranged quite widely which  Aberdeen, ID, 1991;ID92, Tetonia, ID, 1992;MA92, Brandon, Manitoba, 1992;MN92, Crookston, MN, 1992;MTd91, Bozeman, MT, dry, 1991;MTd92, Bonzeman, MT, dry, 1992;MTi91, Bozeman, MT, irrigated, 1991;MTi92, Bozeman, MT, irrigated, 1992;NY92, Ithaca, NY, 1992;ON92, Guelph, Ontario, 1992;OR91, Klamath Falls, OR, 1991;Kg92, Goodlae, Saskatchewan, 1992;SKk92, Kcfr, Saskatchewan, 1992;SKo92, Outlook, Saskatchewan, 1992;WA91, Pullman, WA, 1991;WA92, Pullman, WA, 1992. $ AA, alpha amylase; DP, diastatic power; GP, grain protein; GY, grain yield; H, height; HD, heading date; L, lodging; ME, malt extract. *NS, non significant; **(x | y): x, number of included markers, y, number of significant aaa interactions; " − ", aaa interaction not found may be the result of omitting markers that by themselves do not improve the model but can create the best threes. In this paper, stepwise feature selection by Akaike information criteria was used. We received comparable results to the previous paper using the same datasets (Bocianowski 2012) with backward stepwise regression as well as to the method of inclusive interval mapping (ICIM) (described by Li et al. 2008). The presented results show that the inclusion of higher-order (aaa) interactions in multiple regression models can have an exert influence on QTL effect.

Conclusions
Higher-order interactions are usually neglected due to extensive data requirements, although this does not mean they are irrelevant, on the contrary --higher-order interactions occur often and can have a huge impact on phenotype.
The presented methods were useful statistical tools for QTL characteristics and allow estimating aaa interactions.
On the basis of available literature, this is the first report concerning the presence of analytical and numerical comparisons of two methods of estimation of additive × additive × additive interaction of QTL effects.
Further studies of higher-order interactions and methods of their estimation are necessary.
Author contribution Conceptualization, JB; methodology, AC and JB; software, AC; validation, AC and JB; formal analysis, AC; investigation, AC and JB; resources, AC and JB; data curation, AC and JB; writing-original draft preparation, AC; writing-review and editing, AC and JB; visualization, AC; supervision, JB; all authors have read and agreed to the published version of the manuscript.

Availability of data and material
The data presented in this study are available on request from the corresponding authors.

Declarations
Ethics approval This article does not contain any studies with human participants or animals performed by any of the author.

Conflicts of interest Authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.