The geometry of partial fitness orders and an efficient method for detecting genetic interactions
Abstract
We present an efficient computational approach for detecting genetic interactions from fitness comparison data together with a geometric interpretation using polyhedral cones associated to partial orderings. Genetic interactions are defined by linear forms with integer coefficients in the fitness variables assigned to genotypes. These forms generalize several popular approaches to study interactions, including Fourier–Walsh coefficients, interaction coordinates, and circuits. We assume that fitness measurements come with high uncertainty or are even unavailable, as is the case for many empirical studies, and derive interactions only from comparisons of genotypes with respect to their fitness, i.e. from partial fitness orders. We present a characterization of the class of partial fitness orders that imply interactions, using a graphtheoretic approach. Our characterization then yields an efficient algorithm for testing the condition when certain genetic interactions, such as sign epistasis, are implied. This provides an exponential improvement of the best previously known method. We also present a geometric interpretation of our characterization, which provides the basis for statistical analysis of partial fitness orders and genetic interactions.
Mathematics Subject Classification
92B051 Introduction
Genetic interactions—or dependence of the fitness effect produced by a set of mutations on the genetic background—play an important role in determining evolutionary trajectories of populations. For instance, a set of individually beneficial mutations may exhibit diminishing returns: while the combined effect of all mutations is beneficial, it is not as beneficial as one would expect given the individual mutations, for example the sum of all single mutation effects. Diminishing returns have been shown to slow down the pace of adaptation (Chou et al. 2011). Similarly, the combined effect of a set of mutations can be stronger than the effect expected from the individual mutations. This socalled synergistic epistasis between deleterious mutations has been observed for instance in human populations, where it affects the distribution of deleterious alleles in the genome (Sohail et al. 2017). Genetic interactions can affect not just the magnitude of the effect of a mutation, but also the sign of the effect. That is, a particular mutation may have a beneficial effect in one genetic background and a deleterious effect in another. This type of interaction, often termed sign epistasis, can act to constrain evolutionary trajectories by requiring that some mutations occur before others (Gong et al. 2013; Kvitek and Sherlock 2011; Weinreich et al. 2005).
Here, we make a conservative assumption that the only available information about fitness is the (partial) ranking of genotypes, that is, we assume that no precise fitness measurements of all genotypes are available, a frequent situation in practice, for example, due to measurement noise. Specifically, we assume either that some of the information in a total order of all fitness values must be discarded, or that the only signal available in the data is of the form “genotype \(g_1\) has higher fitness than genotype \(g_2\)”, for various pairs of genotypes. We assume that these pairwise fitness comparisons are consistent with each other, i.e., the relation they define is transitive. We call this type of data a partial fitness order, which is a partial order of genotypes with respect to their fitness. Data of this type arises for example by considering certain directed acyclic graphs where the vertices are given by genotypes and where the edges between genotypes are directed towards the genotype with higher fitness. Importantly, our assumptions make our methods widely applicable as the same techniques can be applied to other types of rankings and not just genotypes ranked with respect to their fitness. For example, a partial fitness order of genotypes can be obtained in studies that involve measuring various kinds of breeding values (see for example Habier et al. 2007) or other applications of generalised linear models.
Even though we assume that accurate fitness measurements cannot be accessed for all genotypes in a given genotype space, our methods still enable us to detect interactions implied by the partial fitness order. In particular, we are able to detect sign epistasis. Moreover, even when a complete fitness ranking of genotypes is available a method to detect interactions from a partial order alone might provide additional insight. Indeed, there are cases where the rank order method fails to reveal interactions even though interactions in the system can be detected from the actual fitness measurements (Crona et al. 2017). In this case, partial fitness order methods can be used to conclude diminishing returns or synergistic epistasis and refute sign epistasis. In addition, if one is only interested in certain fitness comparisons (for instance, comparisons between mutational neighbors), our methods yield a criterion to determine whether these comparisons are sufficient to imply interaction. These observations will have implications for the evolutionary trajectories of the system and provide insight as to the mechanism of the interaction (Weinreich et al. 2005). A detailed analysis of the class of linear orders that imply interactions, including relevant literature and data, can be found in (Crona et al. 2017).
In this paper, we build on the results obtained by Crona et al. (2017) and settle the most general case of ranking data, when only a partial fitness order is available. Specifically, we present a graphtheoretic characterization of the class of partial fitness orders that imply genetic interactions (Sect. 3), thus establishing the conjecture presented by Gavryushkin at the Interactions between Algebra and the Sciences conference held in Max Planck Institute for Mathematics in the Sciences, Leipzig, on 27 May 2017. Second, we use our characterization to derive a cubic time algorithm for testing the condition when a given partial order implies such an interaction (Sect. 3). This provides an exponential improvement of the best previously known method that involves iterating through the list of all possible linear extensions of the given partial order (Crona et al. 2017). Third, we count all partial orders for up to 8 labeled elements which imply an interaction (Sect. 4), and fourth, we provide a geometric interpretation of our characterization, useful for statistical analysis of partial fitness orders that imply genetic interactions (Sect. 5). Finally, we use public data sets to demonstrate our methods (Sect. 6) and conclude by describing possible future directions for this line of research (Sect. 7).
2 Technical introduction
In this section we introduce all necessary terminology and notations and describe a basic statistical approach to probabilistic inference of (higherorder) interactions from partial fitness orders, which further motivates our results on partial fitness orders.
Consider a system \(\mathcal G\) consisting of k genotypes, where k is a positive integer. We denote the fitness of a genotype \(g \in \mathcal G\) by \(w_g\), which is a real number. Typically, we take fitness to be a positive real number, though this does not affect our arguments. All fitness measurements \((w_g)_{g \in \mathcal G}\) together then determine what is called the fitness landscape associated to \(\mathcal G\). We refer to the tuple of all the fitness measurements defining a fitness landscape by \(W = (w_g)_{g \in \mathcal G}\).
Estimating the probability of finteraction in an empirical setting amounts to quantifying the uncertainty of \(f(W) > 0\) given some comparative fitness data. Although in this setting the fitness values \(w_g\) are unavailable, the comparative fitness data typically allows to deduce a partial fitness order. For example, this type of data could arise from replicate measurements of any quantitative trait monotonic with respect to fitness. Another example of such data is fitness comparison data produced in competition or survival experiments, which are a popular method for fitness estimation.
Sometimes a partial fitness order is enough to deduce finteractions and hence estimate the probability of interaction. We use the following definition to address such situations. We say that a partial fitness order \(\mathcal P= (\mathcal G, \prec )\) implies positive finteraction if \(f(W) > 0\) whenever W satisfy the partial order \(\mathcal P\), that is, \(w_g < w_h\) for all \(g \prec h\).
Then the probability of finteraction given comparative fitness data can be estimated by considering the probability support of all partial orders that imply finteraction. However, such an estimation would require the condition of whether or not a partial order \(\mathcal P\) implies finteraction to be checked routinely for different partial orders \(\mathcal P\). Hence the complexity of this condition might become a computational bottleneck in practice. In this paper we resolve this problem by designing an efficient polynomialtime algorithm for checking whether or not a partial order implies finteraction.
We note that the fact that a partial order does not imply positive or negative finteraction does not necessarily mean that the system does not have finteraction. This situation therefore should be interpreted as the partial order being noninformative with respect to the finteraction; the issue is discussed in more detail in (Crona et al. 2017).
We now review the results from (Crona et al. 2017) about how to detect whether a rank order implies finteraction. In the terminology used in our paper, this situation corresponds to the case when the partial fitness order is a linear order (also called a total order or a rank order). We then study arbitrary partial orders, and present an efficient algorithm for determining when a partial order implies finteraction.
The following result from (Crona et al. 2017) depends upon the characterization of linear orders which imply interaction in terms of Dyck words.
Definition 1
Let \(\Sigma \) be an alphabet consisting of two letters P and N. A Dyck word is a word \(\omega \) consisting of an equal number of P’s and N’s such that every prefix of \(\omega \) contains at least as many P’s as N’s.
For example, PPNNPN and PNPNPN are Dyck words, but PPNNNP is not, since the prefix PPNNN contains more N’s than P’s. Clearly the definition of Dyck word does not depend on the choice of alphabet one considers. Dyck words arise in a number of contexts in discrete mathematics. For instance, they are in bijection with full binary trees with \(n+1\) leaves. If the symbols P and N are replaced with “(” and “)”, a string is a Dyck word only if all parenthesis are correctly matched. For more on Dyck words, see (Stanley 2001).
Theorem 1
(Crona et al. 2017) A linear order \(\mathcal P= (\mathcal G, \prec )\) on the set of genotypes implies positive finteraction if and only if \(\phi ^f(\mathcal P)\) is a Dyck word that starts with P.
The statement of this theorem can equivalently be formulated by saying that a linear order \(\mathcal P\) implies positive finteraction if and only if there exists a partition of the set of all genotypes \(\mathcal G\) into pairs \((p_i, n_j)\) such that \(p_i \succ n_j\) for all i, j, where each \(p_i\) appears in \(c_i\) pairs and each \(n_j\) in \(d_j\) pairs (see Crona et al. 2017). As we will see in the next section, this formulation can be generalized to arbitrary partial orders. Note that here and also below we slightly abuse notation because for certain choices of \(c_i\)’s and \(d_j\)’s, we do not have a partition in the strict sense, but this technical difficulty can be avoided by, for example, distinguishing between the copies of \(p_i\) and \(n_j\).
3 Efficient method to infer interactions from partial orders
In this section we generalize Theorem 1 and characterize partial orders which imply finteractions. We then apply our characterization to design an efficient algorithm for testing the condition of whether or not a partial order implies finteraction.
Theorem 2
Proof
In order to prove the converse statement, we prove its contrapositive. That is, we prove that if there is no such partition of the set of all genotypes, finteraction is not implied.
The linear order \(\mathcal L\) does not map to a Dyck word (starting in P) under \(\phi ^f\), since the image of the smallest prefix of \(\mathcal L\) which contains all elements of S contains more N’s than P’s since \(S > N_G(S) \). Therefore, by Theorem 3 from (Crona et al. 2017), this partial order does not imply positive finteraction. \(\square \)
A variant of Theorem 2 was independently obtained in (Crona and Luo 2017).
We now proceed by designing an efficient algorithm for detecting whether or not a partial fitness order \(\mathcal P\) implies finteraction. Recall that \(f(W) = \sum _{1\le i\le t} c_i w_{p_i}  \sum _{1\le j\le s} d_j{w_{n_j}}\) and \(\sum _{1\le i\le t} c_i = \sum _{1\le j\le s} d_j\). In the next result, denote \(m=\sum _{1\le i\le t} c_i\).
Theorem 3
There exists an \(\mathcal O(m^3)\) algorithm to determine whether a partial order \(\mathcal P= (\mathcal G, \prec )\) implies positive finteraction.
Proof
We use the notation introduced in the proof of Theorem 2. That theorem implies that a partial order implies that \(f(W) > 0\) if and only if there exists a partition of the set \(P' \cup N'\) into pairs \((p_i, n_j)\) such that \(p_i \succ n_j\). As noted in the proof of Theorem 2, this condition is equivalent to the existence of a perfect matching in the bipartite graph \(G=(\mathbf P, \mathbf N, E)\) such that \((n_j, p_i)\in E\) whenever \(p_i \succ n_j\). Thus, we can detect whether the poset implies \(f(W) > 0\) by checking whether there exists a perfect matching in G.
Using the Hopcroft–Karp algorithm (Hopcroft and Karp 1971), we can find a perfect matching in a bipartite graph with m vertices in time \(\mathcal O(m^{5/2})\). Assume we are presented the partially ordered set \(\mathcal P\) as a directed acyclic graph. Then we can construct the bipartite graph G in time \(\mathcal O(m^3)\) using the following algorithm: for each \(p\in \mathbf P\), use breadth first search to find all elements of \(\mathbf N\) which are reachable along a directed path from p, and add these to p’s list of neighbors. The worstcase complexity of this step is \(\mathcal O(m^2)\). Repeat this process for each of the m vertices to obtain a representation of G as an adjacency list—that is, as a list containing a list of neighboring vertices for each vertex. Thus, since we can construct the graph in \(\mathcal O(m^3)\) time and determine whether a matching exists in the graph in \(\mathcal O(m^{5/2})\) time, and \(m^3 > m^{5/2}\), we can determine whether the partial order determines the sign of a linear form in \(\mathcal O(m^3)\) time. \(\square \)
The complexity of our algorithm for computing whether a partial order on a set of genotypes implies finteraction depends not just on the number of genotypes, but also on the coefficients of f. Specifically, the worst case complexity is a function not just of the number of genotypes, but also of the number of positive summands of f (with multiplicity) \(\sum _{1\le i\le t} c_i\). In most practical cases (circuits, interaction coordinates, etc.), however, this last number is small relative to the number of genotypes.

constructing the bipartite graph and

detecting whether a perfect matching exists.
Linear programming provides an alternate method to check whether a partial order implies interaction: checking whether a system is consistent with positive interaction amounts to checking that the feasible region of the linear program with constraints coming from the partial order and the constraint \(f >0\) is nonempty. In this case, the complexity of the computation will just depend on the number of genotypes, not on the coefficients, thus with large coefficients, linear programming is a more practical approach then the one presented here. However, by providing a purely combinatorial description of when a partial order implies interaction, our method may be more useful for studying classes of partial orders.
4 Counting partial orders that imply interaction
For \(k\in \{2, 4, 6, 8\}\), the number of partial orders implying positive finteraction, the number of partial orders not implying finteraction, and the proportion of partial orders implying either positive or negative finteraction, truncated to the second digit. Note that the number of partial orders implying negative finteraction is, by symmetry, equal to the number of partial orders implying positive finteraction. Hence, the proportion is obtained by dividing twice the number of partial orders implying positive interaction by the total number of partial orders
k  Positive interactions  No implication  Proportion implying interaction 

2  1  1  0.67 
4  31  157  0.28 
6  10,876  108,271  0.17 
8  22,217,743  387,287,893  0.10 
We carried out our computations in the opensource mathematics software system Sage (http://www.sagemath.org) using the source code available at https://github.com/gavruskin/fitlands/tree/posets/. To make the computations possible, we employed the automorphism groups (Stein et al. 2008) of our partial orders in the following way.
Definition 2
An automorphism of a partial order \(\mathcal P=(V, \prec )\) is a bijection \(\sigma : V\rightarrow V\) such that if \(u \prec v\), then \(\sigma (u) \prec \sigma (v)\). The set of automorphisms of a finite partial order forms a group, denoted by \(\mathrm {Aut}(\mathcal P)\), under the operation of composition.
In our enumeration, we observe that the number of different ways to label a poset with k elements is \(\frac{k!}{\mathrm {Aut}(\mathcal P)}\). To see this, take a labeling of the elements on the poset. A permutation of the labels is an automorphism if and only if it does not change the partial order. Thus, each relabeling of the poset elements falls into an equivalence class of \(\mathrm {Aut}(\mathcal P)\) relabelings that do not change the partial order. Thus, since there are k! ways to relabel the elements, there are \(\frac{k!}{\mathrm {Aut}(\mathcal P)}\) nonequivalent ways to label the poset.
The implementation of our algorithm proceeds as follows: Using a built in function in Sage, we produced a list of partial orders on k elements up to isomorphism. Since whether a partial order implies finteraction depends on the element labels, not just the structure of the partial order, we have to distinguish between different labeled partial orders within each partial order automorphism class. To achieve this, for each partial order (up to automorphism), we check for finteraction with each partition of the labels used by Sage into two classes of equal size. We let one class correspond to elements with coefficient 1 and one class correspond to elements with coefficient \(\,1\). Then each choice of partition corresponds to \(((k/2)!)^2\) possible reassignments of the labels, since we can permute the elements within the class with coefficient 1 and the class with coefficient \(\,1\) without changing the partition. We compute the cardinality of the automorphism group for each partial order, and add \(\frac{1}{\mathrm {Aut}(\mathcal P)}\) to our count of partial orders that imply positive, negative, or no finteraction respectively.
We note that as we increase k, the proportion of partial orders implying finteraction decreases. The sums of the values we obtained for the number of partial orders implying positive finteraction, the number of partial orders implying negative finteraction, and the number of partial orders not implying any finteraction appears in OEIS (The OnLine Encyclopedia of Integer Sequences 2017) as the number of partial orders on k labeled elements, as expected. However, none of the columns of our table appear in OEIS. Further, each number is either prime or has a prime factorization including fairly large primes. Thus, the number of partial orders, the number of partial orders implying finteraction, and the number of partial orders not implying finteraction do not appear to be given by any simple formula.
The final computation, for \(k=8\), took approximately two hours to complete on the SageMath Cloud server. The main factor affecting the performance of these computations is that the number of partial orders on k elements is superexponential in the number of elements (The OnLine Encyclopedia of Integer Sequences 2017). Therefore, we did not count the number of partial orders on k elements which imply finteraction for \(k\ge 10\). Since we can still quickly check whether individual partial orders on \(k \ge 10\) elements imply finteraction, sampling methods can be used to estimate how the proportion of partial orders which imply interaction changes as we increase k.
Our computational observation that the fraction of partial orders which imply finteraction approaches 0 as k increases is true in general:
Theorem 4
As k approaches infinity, the fraction of partial orders which imply finteraction approaches zero.
Proof
Partition the set of partial orders on k elements into its isomorphism classes. We show that, within each isomorphism class, the proportion of partial orders implying positive finteraction is bounded by \(\frac{4}{k}\) whenever f is a linear form with coefficients \(\pm 1\). Consider an arbitrary isomorphism class, and take an arbitrary labeled poset \(\mathcal P\) in this class. There are \(\frac{k!}{\mathrm {Aut}(\mathcal P)}\) distinct labeled posets in this isomorphism class. There are \(\frac{k!\cdot 2}{k/2+1}\le 4(k1)!\) linear orders on k elements which imply positive finteraction (Crona et al. 2017, Proposition 1). Now, take an arbitrary linear extension of \(\mathcal P\). Each permutation of labels which takes \(\mathcal P\) to a labeled poset which implies positive finteraction must take this linear extension to one that implies interaction, thus there are at most \(4(k1)!\) permutations of labels which take our labeled poset to one that implies f interaction. Further, all \(\mathrm {Aut}(\mathcal P)\) permutations of labels which give the same labeled poset must take our linear order to one that implies interaction. Thus, there are at most \(\frac{4(k1)!}{\mathrm {Aut}(\mathcal P)}\) labeled posets in this isomorphism class that imply interaction. Thus, the proportion of posets in this isomorphism class that imply interaction is at most \(\frac{4}{k}\). \(\square \)
5 Geometric interpretation
In this section, we present a geometric interpretation of Theorem 2, and derive some geometric, combinatorial, and statistical results. We denote the space \(\mathbb R^{\mathcal G}\) where \(\mathcal G\) is the number of elements in \(\mathcal G\) simply by \(\mathbb R^\mathcal G\) and index the coordinate axes by the unknown fitness values \(w_g\)’s, where \(g \in \mathcal G\). When we assume fitness values to be nonnegative, we work in the positive orthants of \(\mathbb R^\mathcal G\), which we denote by \( \mathbb R^\mathcal G_{\ge 0}\). In this case, every point of \(\mathbb R^\mathcal G_{\ge 0}\) corresponds to a possible measurement of a fitness value for each genotype. When we do not assume fitness values to be nonnegative, every point in \(\mathbb R^\mathcal G\) corresponds to a fitness value for each genotype. The geometric interpretation of our results about partial orders uses the language of polyhedral cones.
Definition 3
A convex polyhedral cone in \(\mathbb R^k\) is a subset of \(\mathbb R^k\) cut out by a finite number of linear inequalities.
Like all cones, convex polyhedral cones are closed under the operations of taking nonnegative linear combinations of their elements. For the remainder of this paper, we will refer to convex polyhedral cones simply as cones. Let \(\mathcal P= (\mathcal G, \prec )\) be a partial order (which may be a total order). Then the set of points \(x \in \mathbb R^\mathcal G\) whose coordinate order satisfies \(\mathcal P\) defines a cone. To see this, note that this set of points is cut out by a finite number of linear inequalities defined by the partial order. Hence this set is a convex polyhedral cone by definition. We call a cone associated to a partial order in this way an order cone and denote it by \(C_\mathcal P\). Further, a linear form f in the variables \(w_g\)’s defines a hyperplane through the origin, where f takes the value 0. The linear form is positive in the half space on one side of this hyperplane and negative on the other. Let \(H_{f, +}\) be the half space on which f is positive, \(H_{f}\) be the hyperplane on which f is zero, and \(H_{f, }\) be the half space on which f is negative. Then Theorem 2 has the following geometric interpretation. Recall that \(f(W) = \sum _{1\le i\le t} c_i w_{p_i}  \sum _{1\le j\le s} d_j{w_{n_j}}\) is a linear form with integer coefficients that sum to zero, that is \(\sum _{1\le i\le t} c_i = \sum _{1\le j\le s} d_j\).
Theorem 5
The cone \(C_{\mathcal P}\subset H_{f, +}\) if and only if there exists a partition of the set of all genotypes \(\mathcal G\) into pairs \((p_i, n_j)\) such that \(p_i \succ n_j\) for all i, j, where each \(p_i\) appears in \(c_i\) pairs and each \(n_j\) in \(d_j\) pairs.
Proof
Note that \(\mathcal P\) implies positive finteraction if and only if \(C_\mathcal P\subset H_{f, +}\). Then the claim follows from Theorem 2. \(\square \)
As a corollary of Theorem 5, we obtain the following result, which will be applied in our analysis of a Malaria data set in Sect. 6.
Definition 4
Let \(\mathcal P_1\) and \(\mathcal P_2\) be partial orders. Their intersection is the partial order \(\mathcal P_1 \cap \mathcal P_2\) such that \(x \prec _{\mathcal P_1\cap \mathcal P_2} y\) if and only if \(x \prec _{\mathcal P_1} y\) and \(x \prec _{\mathcal P_2} y\).
To illustrate the above definition consider the partial orders \(\mathcal P_1 = 00 \succ 10 \succ 01\) and \(\mathcal P_2 = 10 \succ 01 \succ 00\). Then \(\mathcal P_1 \cap \mathcal P_2\) is the partial order given by \(10 \succ 01\).
Corollary 1
Let \(\mathcal A\) and \(\mathcal B\) be linear orders such that \(C_{\mathcal A},C_{\mathcal B} \subset H_{f,+}\), then the order cone \(C_{\mathcal A \cap \mathcal B}\) of the intersection of \(\mathcal A\) and \(\mathcal B\) is contained in \(H_{f,+}\) if and only if \(\mathcal A \cap \mathcal B= (\mathcal G, \prec )\) satisfies the condition in Theorem 5.
Proof
Immediately follows from Theorem 5. \(\square \)
Now, we prove the following theorem which is important for statistical analysis of interactions from fitness comparison data, for example, in the analysis of a sample from the probability distribution over the space of partial fitness orders with the aim to quantify the uncertainty of interactions. Specifically, we show the following. If two uncertain fitness measurements give distinct linear orders \(\mathcal A\) and \(\mathcal B\) both of which imply positive finteraction, then there exist fitness measurements corresponding to a sequence of linear orders which are intermediate between \(\mathcal A\) and \(\mathcal B\) in the sense that every linear order in this sequence is different from its predecessor by just one transposition of a pair of adjacent elements, and no linear order in the sequence implies negative finteraction.
Theorem 6
Let \(U\subset \mathbb R^k\) be a path connected, open set which has a nonempty intersection with the cones \(C_{\mathcal A}\) and \(C_{\mathcal B}\), where \(\mathcal A\) and \(\mathcal B\) are linear orders on k elements. Then there exist linear orders \(\mathcal L_1 = \mathcal A, \ldots , \mathcal L_n = \mathcal B\) such that \(\mathcal L_i\) and \(\mathcal L_{i+1}\) differ by one adjacent transposition and \(U \cap C_{\mathcal L_i} \ne \varnothing \) for each \(1 \le i < n\).
Proof
We denote cones \(C_{\mathcal L_i}\) by simply \(C_i\), for all i, throughout the proof. Note that the cones \(C_i\) and \(C_j\) have a \((k1)\)dimensional face as their intersection if and only if \(\mathcal L_i\) and \(\mathcal L_j\) differ by a single adjacent transposition. Further, note that any path which passes from \(C_i\) to \(C_j\) without passing through the interior of any other cone must pass through the intersection of the boundaries of \(C_i\) and \(C_j\). Either this boundary is an \((k1)\)dimensional face, and \(\mathcal L_i\) and \(\mathcal L_j\) differ by an adjacent transposition, or we pass through a lower dimensional face. In the latter case, this means a neighborhood of the point where we pass through the boundary contains all cones that intersect at this point. Thus, a neighborhood of this point contains a sequence of order cones of linear orders which differ by one adjacent transposition each.
Now, consider a path from a point in \(U\cap C_{\mathcal A}\) to a point in \(U\cap C_{\mathcal B}\). Since U is open and path connected, it contains some path of this form, as well as a neighborhood around the path. Consider the sequence of cones this path passes through—by the observation above, we know that if the path passes from \(C_i\) to \(C_j\), either \(\mathcal L_i\) and \(\mathcal L_j\) differ by an adjacent transposition, or a neighborhood of the point where we pass from \(C_i\) to \(C_j\) intersects the cones of a sequence of linear orders which differ by one adjacent transposition each. Thus, in either case, a path from \(C_\mathcal A\) to \(C_\mathcal B\) has a neighborhood that intersects the order cones of a sequence of linear orders \(\mathcal L_1, \ldots , \mathcal L_n\) such that \(\mathcal L_1 = \mathcal A\), \(\mathcal L_n = \mathcal B\), and \(\mathcal L_i\) and \(\mathcal L_{i+1}\) differ by one adjacent transposition for each \(1 \le i < n\). These linear orders satisfy the claim of the theorem. \(\square \)
As a corollary to Theorem 6, we obtain the following result.
Corollary 2
Suppose \(\mathcal A\) and \(\mathcal B\) are linear orders which imply positive finteraction. Then there exists a sequence of linear orders \(\mathcal L_1 = \mathcal A, \ldots , \mathcal L_k = \mathcal B\) such that \(\mathcal L_i\) and \(\mathcal L_{i+1}\) for \(1 \le i < n\) all differ by one adjacent transposition and no \(\mathcal L_i\) implies negative finteraction.
Proof
The halfspace \(H_{f, +}\) is a connected open set which has a nonempty intersection with \(C_\mathcal A\) and \(C_\mathcal B\). Thus, this result follows from Theorem 5. \(\square \)
Spaces with complicated geometries and combinatorics are known to cause significant difficulties for statistical analysis (Billera et al. 2001). For example, the space of trees, which is a particular instance of the space of partial orders, required deep mathematical advances to understand basic statistics, such as confidence regions and convex hulls, over the space (Billera et al. 2001; Gavryushkin and Drummond 2016). Advances in the geometry and combinatorics of such spaces are the stepping stone for efficient statistical methods such as Markov Chain Monte Carlo (Gavryushkin et al. 2017; Dinh et al. 2017). In a similar vein, we expect that the approach of this section allows to efficiently study probability distributions over the space of partial fitness orders. Hence our results provide a theoretical foundation for statistical analysis of partial fitness orders.
6 Applications
In this section we illustrate how our results can be used in fitnessbased genetic interactions studies.
This means that even if the only comparisons we are confident about are those that agree across the three drug concentrations, we can still conclude that this system exhibits negative total 3way interaction. In particular, this approach shows that we can ignore the inconsistency of the three rank orders associated to the three different drug concentrations under inspection. Hence any tuple of fitness values W which satisfies every fitness comparison found in each linear order will imply interaction. Further, in this case our results from Sect. 5 imply that any point in the convex hull of the points we have measured will imply negative 3way interaction. This region allows to incorporate the uncertainties of the measurements and provides a region within which those measurements can vary without breaking the conclusion of interaction.
TEM \(\beta \)lactamase As a second application, we consider the data produced in the antibiotic resistance study of the TEMfamily of \(\beta \)lactamase (Mira et al. 2015). In the table below we display the average growth rates of 16 genotypes grown in the antibiotic AMP in 12 replicates. The 16 genotypes include the wild type and all combinations of amino acid substitutions in TEM50.
Average growth rates of the 16 genotypes grown in the antibiotic AMP
Genotype  0000  1000  0100  0010  0001  1100  1010  1001 
Average growth rate  1.851  1.570  2.024  1.948  2.082  2.186  0.051  2.165 
Genotype  0110  0101  0011  1110  1101  1011  0111  1111 
Average growth rate  2.033  2.198  2.434  0.088  2.322  0.083  0.034  2.821 
Ranking of the 16 genotypes grown in the antibiotic AMP according to their average growth rates listed in Table 2
Genotype  0000  1000  0100  0010  0001  1100  1010  1001 
Rank  11  12  9  10  7  5  15  6 
Genotype  0110  0101  0011  1110  1101  1011  0111  1111 
Rank  8  4  2  13  3  14  16  1 
These small differences in the growth rates highlight the difficulty of finding accurate and robust linear orders among the genotypes.
Perfect matching of genotypes obtained from the partial fitness order according to the average growth rate
\(i = j\)  \(p_i\)  \(n_j\)  \(w_{p_i}w_{n_j}\) 

1  1111  1101  0.499 
2  0011  0001  0.352 
3  0101  0100  0.174 
4  1100  0010  0.238 
5  1001  1000  0.595 
6  0110  1110  1.945 
7  0000  1011  1.768 
8  1010  0111  0.017 
The rows in Table 4 indicate a perfect matching among the two sets of genotypes. Thus, for example the first row indicates that the average growth rates associated to 1101 and 1111 are such that \(w_{1101}>w_{1111}\). Similarly, for the other rows. From these 8 comparisons of the type \(w_{p_i}>w_{n_j}\) alone, one deduces that the system has positive total 4way interaction. Moreover, comparing with the actual average growth rates from Table 2 one can observe that the 8 differences \(w_{p_i} w_{n_j}\) (see third column in Table 4) are bigger than the differences between the averages growth rates of the critical genotypes mentioned above. Since these differences are more significant, they provide a more reliable conclusion in the empirical setting. Finally, notice that any other perfect matching between the 16 genotypes satisfying \(w_{p_i}>w_{n_j}\) would equally well yield to the same conclusion. In summary, an advantage of the partial order approach is that it relies on fewer pairwise inequalities, which makes the approach more practical.
7 Discussion and future directions
Understand genetic interactions from fitness measurements associated to genotypes represents a major challenge in evolutionary biology. In this work, we have focused on the case where only fitness comparisons between certain genotypes are available. This is a common assumption in practice, as there might be more uncertainty or cost involved in deducing some fitness measurements than others. Furthermore, this approach allows to exclude the measurements that have high uncertainty.
In this setting, we present a new algorithm to detect genetic interactions from partial fitness orders in an efficient way. Moreover, we derive a geometric characterization of the class of partial orders that imply interactions. This description, involving the geometry of convex polyhedral cones, provides a solid framework to develop statistical analysis of genetic interactions from partial fitness data.
Our work inspires a number of questions which remain open. First, while we are able to characterize the set of partial orders which imply interaction, our characterization is in terms of a matching in a separate graph which we can construct from our partial order. Thus, it is not immediately clear how to relate properties of a partial order to the probability that this partial order implies interaction. For instance, how does the number and type of incomparable pairs in linear orders affect whether an interaction is implied or not?
Second, it is important to point out that fitness measurements may produce relations which do not necessarily satisfy the transitivity assumption, see the work of (Kerr et al. 2002) were nontransitive pairwise comparisons were studied in certain microbial communities. This limitation and its implications in the theory of fitness landscapes is an important issue that will be explored in further research.
Notes
Acknowledgements
We are grateful to Bernd Sturmfels for hosting us at the Max Planck Institute for Mathematics in the Sciences in Leipzig, where this work started. AG was partially supported by Royal Society of New Zealand through Rutherford Discovery Fellowship, contract RDFUOO1702.
References
 Beerenwinkel N, Pachter L, Sturmfels B (2007) Epistasis and shapes of fitness landscapes. Stat Sin 17:1317–1342MathSciNetzbMATHGoogle Scholar
 Billera LJ, Holmes SP, Vogtmann K (2001) Geometry of the space of phylogenetic trees. Adv Appl Math 27(4):733–767. https://doi.org/10.1006/aama.2001.0759 MathSciNetCrossRefzbMATHGoogle Scholar
 Chou HH, Chiu HC, Delaney NF, Segrè D, Marx CJ (2011) Diminishing returns epistasis among beneficial mutations decelerates adaptation. Science 332(6034):1190–1192CrossRefGoogle Scholar
 Crona K, Luo M (2017) Higher order epistasis and fitness peaks. arXiv:1708.02063
 Crona K, Gavryushkin A, Greene D, Beerenwinkel N (2017) Inferring genetic interactions from comparative fitness data. eLife 6:e28629. https://doi.org/10.7554/eLife.28629 CrossRefGoogle Scholar
 Diestel R (2016) Graph theory. Springer, BerlinzbMATHGoogle Scholar
 Dinh V, Bilge A, Zhang C, Matsen FA IV (2017) Probabilistic path hamiltonian monte carlo. arXiv:1702.07814
 Gavryushkin A, Drummond AJ (2016) The space of ultrametric phylogenetic trees. J Theor Biol 403:197–208. https://doi.org/10.1016/j.jtbi.2016.05.001 (issn: 00225193)MathSciNetCrossRefzbMATHGoogle Scholar
 Gavryushkin A, Whidden C, Matsen FA IV (2017) The combinatorics of discrete timetrees: theory and open problems. J Math Biol. https://doi.org/10.1007/s0028501711679 CrossRefzbMATHGoogle Scholar
 Gong LI, Suchard MA, Bloom JD (2013) Stabilitymediated epistasis constrains the evolution of an influenza protein. eLife 2:e00631CrossRefGoogle Scholar
 Habier D, Fernando RL, Dekkers JCM (2007) The impact of genetic relationship information on genomeassisted breeding values. Genetics 177(4):2389–2397. https://doi.org/10.1534/genetics.107.081190 CrossRefGoogle Scholar
 Hopcroft JE, Karp RM (1971) An \(n^{5/2}\) algorithm for maximum matchings in bipartite graphs. SIAM J Comput 2(4):225–231. https://doi.org/10.1137/0202019 CrossRefzbMATHGoogle Scholar
 Kerr B, Riley MA, Feldman MW, Bohannan BJM (2002) Local dispersal promotes biodiversity in a reallife game of rockpaperscissors. Nature 418:171. https://doi.org/10.1038/nature00823 CrossRefGoogle Scholar
 Kvitek DJ, Sherlock G (2011) Reciprocal sign epistasis between frequently experimentally evolved adaptive mutations causes a rugged fitness landscape. PLoS Genet 7(4):e1002056CrossRefGoogle Scholar
 Mira PM, Crona K, Greene D, Meza JC, Sturmfels B, Barlow M (2015) Rational design of antibiotic treatment plans: a treatment strategy for managing evolution and reversing resistance. PLoS One 10(5):e0122283. https://doi.org/10.1371/journal.pone.0122283 CrossRefzbMATHGoogle Scholar
 Ogbunugafor CB, Hartl D (2016) A pivot mutation impedes reverse evolution across an adaptive landscape for drug resistance in plasmodium vivax. Malar J 15:40. https://doi.org/10.1186/s1293601610903 CrossRefGoogle Scholar
 Sohail M, Vakhrusheva OA, Sul JH, Pulit SL, Francioli LC, van den Berg LH, Veldink JH, de Bakker PI, Bazykin GA, Kondrashov AS et al (2017) Negative selection in humans and fruit flies involves synergistic epistasis. Science 356(6337):539–542CrossRefGoogle Scholar
 Stanley RP (2001) Enumerative combinatorics, vol 2. Cambridge University Press, CambridgezbMATHGoogle Scholar
 Stein W et al (2008) Sage: open source mathematical software. https://www.sagemath.org. Accessed 7 December 2009
 The OnLine Encyclopedia of Integer Sequences (2017) https://oeis.org/. Accessed 4 Aug 2017
 Weinreich DM, Watson RA, Chao L (2005) Perspective: sign epistasis and genetic constraint on evolutionary trajectories. Evolution 59(6):1165–1174Google Scholar
 Weinreich DM, Lan Y, Wylie CS, Heckendorn RB (2013) Should evolutionary geneticists worry about higherorder epistasis? Curr Opin Genet Dev 23(6):700–707. https://doi.org/10.1016/j.gde.2013.10.007 CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.