Abstract
Phylogenetic trees illustrate the evolutionary history of genes and species. Although genes evolve along with the species they belong to, a species tree and gene tree are often not identical. The reasons for this are the evolutionary events at the gene level, like duplication or transfer. These differences are handled by phylogenetic reconciliation, which formally is a mapping between a gene tree nodes and a species tree nodes and branches. We investigate models of reconciliation with gene transfers replacing existing genes, which is a biologically important event, but has never been included in the reconciliation models. The problem is close to the dated version of the classical subtree prune and regraft (SPR) distance problem, where a pruned subtree has to be regrafted only on a branch closer to the root. We prove that the reconciliation problem including transfer with replacement is NP-hard, and that, if speciations and transfers with replacement are the only allowed evolutionary events, it is fixed-parameter tractable with respect to the reconciliation’s weight. We prove that the results extend to the dated SPR problem.
Similar content being viewed by others
References
Abby SS, Tannier E, Gouy M, Daubin V (2012) Lateral gene transfer as a support for the tree of life. Proc Natl Acad Sci USA 109(13):4962–4967. https://doi.org/10.1073/pnas.1116871109
Allen BL, Steel M (2001) Subtree transfer operations and their induced metrics on evolutionary trees. Ann Comb 5(1):1–15. https://doi.org/10.1007/s00026-001-8006-8
Bansal MS, Alm EJ, Kellis M (2012) Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics 28(12):283–291. https://doi.org/10.1093/bioinformatics/bts225
Bansal MS, Alm EJ, Kellis M (2013) Reconciliation revisited: handling multiple optima when reconciling with duplication, transfer, and loss. J Comput Biol 20(10):738–754. https://doi.org/10.1089/cmb.2013.0073
Beiko RG, Hamilton N (2006) Phylogenetic identification of lateral genetic transfer events. BMC Evol Biol 6(1):15. https://doi.org/10.1186/1471-2148-6-15
Bonet ML, John KS (2009) Efficiently calculating evolutionary tree measures using SAT. vol 5584. LNCS. Springer, Berlin. pp 4–17. https://doi.org/10.1007/978-3-642-02777-2_3
Bordewich M, Semple C (2005) On the computational complexity of the rooted subtree prune and regraft distance. Ann Comb 8(4):409–423. https://doi.org/10.1007/s00026-004-0229-z
Chan Y, Ranwez V, Scornavacca C (2015) Exploring the space of gene/species reconciliations with transfers. J Math Biol 71(5):1179–1209. https://doi.org/10.1007/s00285-014-0851-2
Chauve C, El-Mabrouk N (2009) New perspectives on gene family evolution: losses in reconciliation and a link with supertrees. Lecture Notes in Computer Science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) 5541 LNBI, pp 46–58. https://doi.org/10.1007/978-3-642-02008-7_4
Chen ZZ, Fan Y, Wang L (2015) Faster exact computation of rSPR distance. J Comb Optim 29(3):605–635. https://doi.org/10.1007/s10878-013-9695-8
Chen J, Shi F, Wang J (2016) Approximating maximum agreement forest on multiple binary trees. Algorithmica 76(4):867–889. https://doi.org/10.1007/s00453-015-0087-6
Choi SC, Rasmussen MD, Hubisz MJ, Gronau I, Stanhope MJ, Siepel A (2012) Replacing and additive horizontal gene transfer in streptococcus. Mol Biol Evol 29(11):3309–3320. https://doi.org/10.1093/molbev/mss138
Dasgupta B, Ferrarini S, Gopalakrishnan U, Paryani NR (2006) Inapproximability results for the lateral gene transfer problem. J Comb Optim 11(4):387–405. https://doi.org/10.1007/s10878-006-8212-8
Doyon JP, Scornavacca C, Ranwez V, Berry V (2010) An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications, and transfers. In: Comparative genomics: international workshop, RECOMB-CG 2010, Ottawa, Canada, October 9–11, 2010 Proceedings (October), pp 93–108. https://doi.org/10.1007/978-3-642-16181-0_9
Doyon JP, Ranwez V, Daubin V, Berry V (2011) Models, algorithms and programs for phylogeny reconciliation. Briefings Bioinf 12(5):392–400. https://doi.org/10.1093/bib/bbr045
Even S, Itai A, Shamir A (1976) On the complexity of timetable and multicommodity flow problems. SIAM J Comput 5(4):691–703. https://doi.org/10.1137/0205048
Garey MR, Johnson DS (1979) Computers and Intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co., New York
Garey M, Johnson D, Stockmeyer L (1976) Some simplified NP-complete graph problems. Theor Comput Sci 1(3):237–267. https://doi.org/10.1016/0304-3975(76)90059-1
Goodman M, Czelusniak J, Moore GW, Romero-Herrera AE, Matsuda G (1979) Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst Biol 28(2):132–163. https://doi.org/10.1093/sysbio/28.2.132
Hallett MT, Lagergren J (2001) Efficient algorithms for lateral gene transfer problems. In: Proceedings of the fifth annual international conference on computational biology. RECOMB ’01. ACM, New York, pp 149–156. https://doi.org/10.1145/369133.369188
Hasić D, Tannier E (2019) Gene tree species tree reconciliation with gene conversion. J Math Biol. https://doi.org/10.1007/s00285-019-01331-w
Hein J, Jiang T, Wang L, Zhang K (1996) On the complexity of comparing evolutionary trees. Discrete Appl Math 71(1–3):153–169. https://doi.org/10.1016/S0166-218X(96)00062-5
Hickey G, Dehne F, Rau-Chaplin A, Blouin C (2008) SPR distance computation for unrooted trees. Evol Bioinform 4:17–27. https://doi.org/10.4137/EBO.S419
Keeling PJ, Palmer JD (2008) Horizontal gene transfer in eukaryotic evolution. Nat Rev Genet 9:605–618. https://doi.org/10.1038/nrg2386
Linz S, Semple C (2011) A cluster reduction for computing the subtree distance between phylogenies. Ann Comb 15(3):465–484. https://doi.org/10.1007/s00026-011-0108-3
Merkle D, Middendorf M, Wieseke N (2010) A parameter-adaptive dynamic programming approach for inferring cophylogenies. BMC Bioinf 11(1):S60. https://doi.org/10.1186/1471-2105-11-S1-S60
Nakhleh L (2012) Computational approaches to species phylogeny inference and gene tree reconciliation. Biophys Chem 34(1):13–23. https://doi.org/10.1016/j.immuni.2010.12.017
Raman V, Ravikumar B, Rao S (1998) A simplified NP-complete MAXSAT problem. Inf Process Lett 65(1):1–6. https://doi.org/10.1016/S0020-0190(97)00223-8
Rice DW, Palmer JD (2006) An exceptional horizontal gene transfer in plastids: gene replacement by a distant bacterial paralog and evidence that haptophyte and cryptophyte plastids are sisters. BMC Biol 4(1):31. https://doi.org/10.1186/1741-7007-4-31
Scornavacca C, Paprotny W, Berry V, Ranwez V (2013) Representing a set of reconciliations in a compact way. J Bioinform Comput Biol 11(02):1250025. https://doi.org/10.1142/S0219720012500254
Shi F, Feng Q, Chen J, Wang L, Wang J (2013) Distances between phylogenetic trees: a survey. Tsinghua Sci Technol 18(5):490–499. https://doi.org/10.1109/TST.2013.6616522
Shi F, Feng Q, You J, Wang J (2016) Improved approximation algorithm for maximum agreement forest of two rooted binary phylogenetic trees. J Comb Optim 32(1):111–143. https://doi.org/10.1007/s10878-015-9921-7
Song YS (2006) Properties of subtree-prune-and-regraft operations on totally-ordered phylogenetic trees. Ann Comb 10(1):147–163. https://doi.org/10.1007/s00026-006-0279-5
Suchard MA (2005) Stochastic models for horizontal gene transfer: taking a random walk through tree space. Genetics 170(1):419–431. https://doi.org/10.1534/genetics.103.025692
Szöllősi GJ, Tannier E, Lartillot N, Daubin V (2013) Lateral gene transfer from the dead. Syst Biol 62(3):386–397. https://doi.org/10.1093/sysbio/syt003
Szöllősi GJ, Tannier E, Daubin V, Boussau B (2015) The inference of gene trees with species trees. Syst Biol 64(1):42–62. https://doi.org/10.1093/sysbio/syu048
Tofigh A, Hallett M, Lagergren J (2011) Simultaneous identification of duplications and lateral gene transfers. IEEE ACM Trans Comput Biol Bioinform 8(2):517–535. https://doi.org/10.1109/TCBB.2010.14
Whidden C, Matsen F (2018) Calculating the unrooted subtree Prune-and-Regraft distance. IEEE ACM Trans Comput Biol Bioinform: 1–1. https://doi.org/10.1109/TCBB.2018.2802911
Whidden C, Beiko RG, Zeh N (2010) Fast FPT algorithms for computing rooted agreement forests: theory and experiments. Springer, Berlin, pp 141–153. https://doi.org/10.1007/978-3-642-13193-6_13
Whidden C, Beiko RG, Zeh N (2016) Fixed-parameter and approximation algorithms for maximum agreement forests of multifurcating trees. Algorithmica 74(3):1019–1054. https://doi.org/10.1007/s00453-015-9983-z
Wu Y (2009) A practical method for exact computation of subtree prune and regraft distance. Bioinformatics 25(2):190–196. https://doi.org/10.1093/bioinformatics/btn606
Acknowledgements
ET was supported by the French Agence Nationale de la Recherche (ANR) through Grant No. ANR-10-BINF-01–01 ‘Ancestrome’.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proofs
Appendix: Proofs
Here we give proofs that are omitted in the main text.
Proof of Lemma 1
We will use a reduction from the Max 2-Sat, where every variable appears in at most three clauses. This problem is NP-hard (Raman et al. 1998).
Let F be an instance of the Max 2-Sat, and every variable appears in at most three clauses. If there is a variable x that appears exactly once, and it belongs to a clause C, then we can assign it a value and make C a true clause. Similarly, if there is a variable y that has only positive, or only negative literals, then we can assign it a value to make the corresponding clauses true.
In this way we eliminate all the variables that appear exactly once, or have only positive or only negative literals. Therefore, we can assume that F has variables that appear two or three times, and have both positive and negative literals.
Let \(x_0\) be a variable that appears in exactly two clauses. After inserting \((x_0\vee x^0_1)\wedge (x^0_1\vee x^0_2)\wedge (\lnot x^0_1\vee x^0_3)\wedge (\lnot x^0_2\vee x^0_3)\wedge (x^0_2\vee \lnot x^0_3)\) into F, we obtain a logical formula with \(x_0\) in exactly three clauses. The new variables \(x^0_1,x^0_2,x^0_3\) also appear in exactly three clauses, and they have positive and negative literal present.
In this way we obtain a logical formula \(F'\) that has every variable in exactly three clauses, with both positive and negative literal present. If the added variables are true, then all the new clauses will be true. Therefore the number of true clauses in F is maximized if and only if the number of true clauses in \(F'\) is maximized.
The previous reduction is obviously polynomial in n and m, where n is the number of variables, and m is the number of clauses in F. \(\square \)
Proof of Lemma 4
To prove (a) and (b), we identify two cases, according to the positions of \(r_{j_1}\) and \(r_{j_2}\).
Case 1. Node \(r_{j_1}\) or \(r_{j_2}\) is above the border line. In order to obtain \(\omega (F_j)<4\), we need that some of the nodes \(r^0_{j_1}, r^0_{j_2}, r^1_{j_1}, r^1_{j_2}\) is neither a duplication nor incident with a transfer. The only way to have this is if some of them is placed in \(s^0_{C_j}=root(S_{C_j})\).
Let us take \(\rho (r^0_{j_1}) = s^0_{C_j}\). Then \(\rho (r_{j_1}) > s^0_{C_j}\), or \(\rho (r_{j_1})\) and \(s^0_{C_j}\) are incomparable. If \(\rho (r^1_{j_1}) < s^0_{C_j}\), then \((r_{j_1},r^1_{j_1})\) is a transfer, and the weight of \(F_j\) is not decreased. If \(\rho (r^1_{j_1}) = s^0_{C_j}\), then \(r_{j_1}\) is a duplication, or one of the edges \((r_{j_1}, r^0_{j_1})\) and \((r_{j_1}, r^1_{j_1})\) contains a transfer. In this way we eliminate two transfers (that were incident with \(r^0_{j_1}\) and \(r^1_{j_1}\)), and obtain one transfer or duplication. But we generate at least one non-free loss in \(S_{C_j}\). Similar considerations apply to the other nodes of \(F_j\). Hence we cannot obtain \(\omega (F_j)<4\).
Case 2. Both nodes \(r_{j_1}\) and \(r_{j_2}\) are under the border line. Then none of the nodes \(r^0_{j_1}, r^1_{j_1}, r^0_{j_2}, r^1_{j_2}\) is placed in \(s^0_{C_j}\), therefore every one of them is incident with at least one transfer. If we wish to eliminate transfers starting at \(r_{j_1}\) or \(r_{j_2}\), then we need to place them both in \(lca(B_5,B_6,B_7,B_8)\), i.e. in the minimal node in \(S_{C_j}\) that is ancestor of \(B_5\), \(B_6\), \(B_7\), and \(B_8\) (Fig. 19). In this case we increase the number of non-free losses. Whichever placement we choose, we have \(\omega (F_j)\ge 5\).
(c) The proof is similar in spirit to the proof of (a). See Figs. 4 and 5 . The idea is to see what happens if some of the 17 transfers, present in a proper reconciliation that belongs to \(G_{x_i}\), is not present in some other reconciliation.
First, note that if some of the nodes \(d^j_i\) are not placed in \(D^i_j\) (\(j=1,\ldots ,P(n)\)), then we would have transfers that are not present in a proper reconciliation. Also, if none of the nodes \(d^j_i\) is placed in \(D^i_j\) (\(j=1,\ldots ,P(n)\)), then we would have a reconciliation more expensive than any proper reconciliation. Hence we can assume that, for the anchoring nodes \(d^1_i\), we have \(\rho (d^1_i)=D^i_j\) (\(j=1,\ldots ,P(n)\)).
In a proper reconciliation, there are 14 transfers incident with \(b^s_i\)\((s=1,\ldots , 14)\). In an arbitrary reconciliation, we can achieve that no transfer or a duplication is incident with \(b^s_i\) only if \(\rho (b^s_i)=s^0_{x_i}\). Then a parent of \(b^s_i\) (i.e. \(c^{s-1}_i\)), as well as \(c^0_i\), is a duplication, or is incident with a transfer, and two or more non-free losses are created. Therefore by having \(\rho (b^s_i)=s^0_{x_i}\), for some values of s, does not give \(\omega (G_{x_i})<17\).
Assume that the nodes \(b^s_i\) (\(s=1,\ldots ,14\)) are placed as in the proper reconciliation. Observe nodes \(x^1_i, x^2_i\), and assume that they are not incident with a transfer, and the edge \((x^2_i, x^1_i)\) does not contain a transfer. Then we have at least two transfers at the edges \((c^4_i, x^2_i)\) and \((x^1_i, c^3_i)\), or at some other edges leading to some of the \(b^s_i\). Similar considerations apply for \(x^3_i\). Therefore, in this case too we cannot decrease the number of transfers.
Can we have less than 17 transfers if take \(\rho (b^s_i)=s^0_{x_i}\), for some values of s, and eliminate transfers incident with \(x^1_i, x^2_i\)? Let us take \(\rho (b^7_i)=s^0_{x_i}\). Then the nodes \(c^6_i\) and \(c^0_i\) are not placed as in the proper reconciliation. Hence we have at least two transfers or duplications, and non-free losses not present in the proper reconciliation. Also, if the nodes \(x^1_i,x^2_i\) are not placed as in the proper reconciliation, we have a transfer, different from the previous two, that is not present in the proper reconciliation. Therefore, we have at least three evolutionary events not present in the proper reconciliation, and we cannot obtain less than 17 events.
(d) Let us take that \(x^1_i\) and \(x^3_i\) are under the border line. Then at least three of the nodes \(c^1_i,\ldots , c^{12}_i\) are not on the gadgets positions. Some of these nodes are \(c^1_i,c^2_i,c^3_i\), because they are descendants of \(x^1_i\) in G. The paths \((c^1_i,b^2_i, A^3_i)\), \((c^2_i,b^3_i, A^5_i)\), \((c^3_i,b^4_i, A^7_i)\) generate extra three transfers. An extra transfer is created on the edge \((x^2_i,x^1_i)\), or on some other edge that is an ancestor of \(x^2_i\). Even if we we eliminate the two transfers incident with \(x^1_i\) and \(x^2_i\), we gain 4 more. Hence \(\omega (G_{x_i})\ge 19\). \(\square \)
Proof of Theorem 1
Let \({\mathfrak {R}}\) be a minimum \(DTLCT_R\) reconciliation. We use \({\mathfrak {R}}\) to construct \({\mathfrak {R}}'\) that is both minimum and proper.
The construction of a proper reconciliation is described earlier. The only thing that we need to specify in \({\mathfrak {R}}'\) is the positions of \(x^1_i,x^2_i\), and \(x^3_i\) with respect to the border line, as well as the positions of \(r_{j_1}\) and \(r_{j_2}\).
If \(x^1_i\) and \(x^2_i\) are not on the same side of the border line as \(x^3_i\) (in \({\mathfrak {R}}\)), then they are on the same side in \({\mathfrak {R}}'\) as in \({\mathfrak {R}}\). If \(x^1_i\) or \(x^2_i\) is on the same side as \(x^3_i\) (in \({\mathfrak {R}}\)), then \(x^1_i\) and \(x^2_i\) are above, and \(x^3_i\) is under the border line (in \({\mathfrak {R}}'\)).
Next, the vertices of \(F_j\) are placed in \(S_{C_j}\) as in the description of the proper reconciliation (Definition 16), so that the nodes \(r_{j_1}\) and \(r_{j_2}\) are placed on the same side of the border line as \(x'_{j_1}\) and \(x'_{j_2}\) (in \({\mathfrak {R}}'\)), respectively. A reconciliation, obtained in this way, we denote by \({\mathfrak {R}}'\). By construction, it is a proper reconciliation. Let us prove that it is a minimum reconciliation.
We have \(\omega _{{\mathfrak {R}}}(G_{x_i})\ge 17 = \omega _{{\mathfrak {R}}'}(G_{x_i})\), \(\omega _{{\mathfrak {R}}}(F_j)\ge 4\), and \(\omega _{{\mathfrak {R}}'}(F_j)\in \{4, 5\}\) (Lemma 4).
Let \(i\in \{1,\ldots ,n\}\), \(x^1_i,x^2_i,x^3_i\) be connected with \(r_{a_1}\in V(F_a), r_{b_1}\in V(F_b),\)\(r_{c_1}\in V(F_c)\) via transfers. We introduce a notation \(\varOmega _{{\mathfrak {R}}}(i)=\omega _{{\mathfrak {R}}}(G_{x_i})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_b)+\omega _{{\mathfrak {R}}}(F_c)\).
Case 1. Assume that \(\omega _{{\mathfrak {R}}}(F_a)\ge \omega _{{\mathfrak {R}}'}(F_a)\), \(\omega _{{\mathfrak {R}}}(F_b)\ge \omega _{{\mathfrak {R}}'}(F_b)\), \(\omega _{{\mathfrak {R}}}(F_c)\ge \omega _{{\mathfrak {R}}'}(F_c)\). Then \(\omega _{{\mathfrak {R}}}(G_{x_i})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_b)+\omega _{{\mathfrak {R}}}(F_c) \ge \omega _{{\mathfrak {R}}'}(G_{x_i})+\omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_b)+\omega _{{\mathfrak {R}}'}(F_c)\), i.e.\(\varOmega _{{\mathfrak {R}}}(i) \ge \varOmega _{{\mathfrak {R}}'}(i)\).
Case 2. Assume that \(\omega _{{\mathfrak {R}}}(F_a)=4\), \(\omega _{{\mathfrak {R}}'}(F_a)=5\), \(\omega _{{\mathfrak {R}}}(F_b)\ge \omega _{{\mathfrak {R}}'}(F_b)\), \(\omega _{{\mathfrak {R}}}(F_c)\ge \omega _{{\mathfrak {R}}'}(F_c)\). Since \(\omega _{{\mathfrak {R}}'}(F_a)=5\), we have that \(x^1_i\) is under the border line (in \({\mathfrak {R}}'\)). Because of the transformation rules, at the beginning of the proof, we have that \(x^1_i\), \(x^2_i\) are under the border line (in \({\mathfrak {R}}\) and \({\mathfrak {R}}'\)), while \(x^3_i\) is above the line (in \({\mathfrak {R}}\) and \({\mathfrak {R}}'\)).
Let \(y_1\) be a literal of variable \(x_s\) (i.e.\(y_1\in \{x^1_s,x^2_s,x^3_s\}\)) connected with \(r_{a_2}\in V(F_a)\) via transfer. Since \(\omega _{{\mathfrak {R}}}(F_a)=4\), \(\omega _{{\mathfrak {R}}'}(F_a)=5\), we have that \(y_1\) is above the border line in \({\mathfrak {R}}\), and under the line in \({\mathfrak {R}}'\), hence \(y_1=x^3_s\).
Assume that \(F_{a'},F_{b'}\) are connected with \(x^1_s,x^2_s\) via transfers. Then \(\omega _{{\mathfrak {R}}'}(F_{a'})=\omega _{{\mathfrak {R}}'}(F_{b'})=4\), \(\omega _{{\mathfrak {R}}}(G_{x_s}) \ge 19\). We have \(\omega _{{\mathfrak {R}}}(F_{a'})\ge 4=\omega _{{\mathfrak {R}}'}(F_{a'})\) and \(\omega _{{\mathfrak {R}}}(F_{b'})\ge 4=\omega _{{\mathfrak {R}}'}(F_{b'})\).
From the previous arguments, \(\omega _{{\mathfrak {R}}}(G_{x_s})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_a) \ge \)\(19+4+4 =\)\(17+5+5 =\)\(\omega _{{\mathfrak {R}}'}(G_{x_s})+\omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_a)\).
Finally, \(\big ( \omega _{{\mathfrak {R}}}(G_{x_i})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_b)+\omega _{{\mathfrak {R}}}(F_c) \big ) + \)\(\big ( \omega _{{\mathfrak {R}}}(G_{x_s})+\omega _{{\mathfrak {R}}}(F_{a'})+\omega _{{\mathfrak {R}}}(F_{b'})+\omega _{{\mathfrak {R}}}(F_{a}) \big ) \ge \)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_i})+ \omega _{{\mathfrak {R}}'}(F_a) +\omega _{{\mathfrak {R}}'}(F_b)+\omega _{{\mathfrak {R}}'}(F_c) \big ) + \)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_s}) + \omega _{{\mathfrak {R}}'}(F_{a'}) +\omega _{{\mathfrak {R}}'}(F_{b'})+\omega _{{\mathfrak {R}}'}(F_{a}) \big )\), i.e. \(\varOmega _{{\mathfrak {R}}}(i)+\varOmega _{{\mathfrak {R}}}(s) \ge \varOmega _{{\mathfrak {R}}'}(i)+\varOmega _{{\mathfrak {R}}'}(s)\).
The next cases use the approach of Case 2.
Case 3. Assume that \(\omega _{{\mathfrak {R}}}(F_b)=4\), \(\omega _{{\mathfrak {R}}'}(F_b)=5\), \(\omega _{{\mathfrak {R}}}(F_a)\ge \omega _{{\mathfrak {R}}'}(F_a)\), \(\omega _{{\mathfrak {R}}}(F_c)\ge \omega _{{\mathfrak {R}}'}(F_c)\). This case is analogous to Case 2.
Case 4. Assume that \(\omega _{{\mathfrak {R}}}(F_c)=4\), \(\omega _{{\mathfrak {R}}'}(F_c)=5\), \(\omega _{{\mathfrak {R}}}(F_a)\ge \omega _{{\mathfrak {R}}'}(F_a)\), \(\omega _{{\mathfrak {R}}}(F_b)\ge \omega _{{\mathfrak {R}}'}(F_b)\). Then \(x^3_i\) is under, and \(x^1_i,x^2_i\) are above the border line in \({\mathfrak {R}}'\). We have two subcases.
Case 4.1. Assume that \(x^1_i\) or \(x^2_i\) was on the same side of the line as \(x^3_i\) (in \({\mathfrak {R}}\)). Then \(\omega _{{\mathfrak {R}}}(G_{x_i})\ge 19\). Hence \(\omega _{{\mathfrak {R}}}(G_{x_i})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_b)+\omega _{{\mathfrak {R}}}(F_c) \ge \)\(19 + \omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_b)+ 4>\)\(17 + \omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_b)+ 5 =\)\(\omega _{{\mathfrak {R}}'}(G_{x_i})+\omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_b)+\omega _{{\mathfrak {R}}'}(F_c)\), i.e. \(\varOmega _{{\mathfrak {R}}}(i) > \varOmega _{{\mathfrak {R}}'}(i)\).
Case 4.2. Assume that \(x^1_i\) and \(x^2_i\) were not on the same side of the line as \(x^3_i\) (in \({\mathfrak {R}}\)). Then \(x^3_i\) is under the line (in \({\mathfrak {R}}\) and \({\mathfrak {R}}'\)). Now we proceed similar to Case 2.
Let \(y_3\in \{x^1_l,x^2_l,x^3_l\}\) and it is connected with \(r_{c_2}\in V(F_c)\) via transfer. From \(\omega _{{\mathfrak {R}}}(F_c)=4\), \(\omega _{{\mathfrak {R}}'}(F_c)=5\), we have that \(y_3\) in \({\mathfrak {R}}\) was above the line, and in \({\mathfrak {R}}'\) is under the line, hence \(y_3=x^3_l\), \(\omega _{{\mathfrak {R}}}(G_{x_l}) \ge 19\), \(\omega _{{\mathfrak {R}}'}(F_{a''}) = \omega _{{\mathfrak {R}}'}(F_{b''}) = 4\), where \(F_{a''}\) and \(F_{b''}\) are connected with \(x^1_l\) and \(x^2_l\) via transfers.
It follows that \(\omega _{{\mathfrak {R}}}(G_{x_l}) + \omega _{{\mathfrak {R}}}(F_c) + \omega _{{\mathfrak {R}}}(F_c) \ge 19+4+4 =\)\(17+5+5=\)\(\omega _{{\mathfrak {R}}'}(G_{x_l}) + \omega _{{\mathfrak {R}}'}(F_c) + \omega _{{\mathfrak {R}}'}(F_c)\).
Next, \(\big ( \omega _{{\mathfrak {R}}}(G_{x_i})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_b)+\omega _{{\mathfrak {R}}}(F_c) \big ) + \)\(\big ( \omega _{{\mathfrak {R}}}(G_{x_l})+\omega _{{\mathfrak {R}}}(F_{ a''})+\omega _{{\mathfrak {R}}}(F_{b''})+\omega _{{\mathfrak {R}}}(F_{c}) \big ) \ge \)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_i})+\omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_b)+\omega _{{\mathfrak {R}}'}(F_c) \big ) + \)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_l})+\omega _{{\mathfrak {R}}'}(F_{a''})+\omega _{{\mathfrak {R}}'}(F_{b''})+\omega _{{\mathfrak {R}}'}(F_{c}) \big )\), i.e. \(\varOmega _{{\mathfrak {R}}}(i)+\varOmega _{{\mathfrak {R}}}(l) \ge \varOmega _{{\mathfrak {R}}'}(i)+\varOmega _{{\mathfrak {R}}'}(l)\).
Case 5. Assume that \(\omega _{{\mathfrak {R}}}(F_a)=\omega _{{\mathfrak {R}}}(F_b)=4\), \(\omega _{{\mathfrak {R}}'}(F_a)=\omega _{{\mathfrak {R}}'}(F_b)=5\), and \(\omega _{{\mathfrak {R}}}(F_c)\ge \omega _{{\mathfrak {R}}'}(F_c)\). By a similar argument as in the previous cases, we have that \(x^1_i,x^2_i\) are under the line (in \({\mathfrak {R}}\) and \({\mathfrak {R}}'\)), while \(x^3_i\) is above the line (in \({\mathfrak {R}}\) and \({\mathfrak {R}}'\)). Let \(y_1\in \{x^1_r,x^2_r,x^3_r\}\) be connected with \(r_{a_2}\in V(F_a)\), and \(y_2\in \{x^1_t,x^2_t,x^3_t\}\) be connected with \(r_{b_2}\in V(F_b)\). As in the previous cases, we have \(y_1=x^3_r\), \(y_2=x^3_t\), and they were above the line in \({\mathfrak {R}}\), and under the line in \({\mathfrak {R}}'\). Hence \(\omega _{{\mathfrak {R}}}(G_{x_r}) \ge 19\) and \(\omega _{{\mathfrak {R}}}(G_{x_t}) \ge 19\). Let \(x^1_r,x^2_r,x^1_t,x^2_t\) be connected with \(F_{a_r},F_{b_r},F_{a_t},F_{b_t}\). Then \(\omega _{{\mathfrak {R}}'}(F_{a_r})=\omega _{{\mathfrak {R}}'}(F_{b_r})=\omega _{{\mathfrak {R}}'}(F_{a_t})=\omega _{{\mathfrak {R}}'}(F_{b_t})=4\).
Therefore \(\omega _{{\mathfrak {R}}}(G_{x_r}) + \omega _{{\mathfrak {R}}}(G_{x_t}) + \omega _{{\mathfrak {R}}}(F_a) + \omega _{{\mathfrak {R}}}(F_a) + \omega _{{\mathfrak {R}}}(F_b) + \omega _{{\mathfrak {R}}}(F_b) \ge \)\(19+19+4+4+4+4 = 17+17+5+5+5+5 = \)\(\omega _{{\mathfrak {R}}'}(G_{x_r}) + \omega _{{\mathfrak {R}}'}(G_{x_t}) + \omega _{{\mathfrak {R}}'}(F_a) + \omega _{{\mathfrak {R}}'}(F_a) + \omega _{{\mathfrak {R}}'}(F_b) + \omega _{{\mathfrak {R}}'}(F_b)\).
Hence \(\big ( \omega _{{\mathfrak {R}}}(G_{x_i})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_b)+\omega _{{\mathfrak {R}}}(F_c) \big ) +\)\(\big ( \omega _{{\mathfrak {R}}}(G_{x_r})+\omega _{{\mathfrak {R}}}(F_{a_r})+\omega _{{\mathfrak {R}}}(F_{b_r})+\omega _{{\mathfrak {R}}}(F_a) \big ) +\)\(\big ( \omega _{{\mathfrak {R}}}(G_{x_t})+\omega _{{\mathfrak {R}}}(F_{a_t})+\omega _{{\mathfrak {R}}}(F_{b_t})+\omega _{{\mathfrak {R}}}(F_b) \big ) \ge \)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_i})+\omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_b)+\omega _{{\mathfrak {R}}'}(F_c) \big ) +\)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_r})+\omega _{{\mathfrak {R}}'}(F_{a_r})+\omega _{{\mathfrak {R}}'}(F_{b_r})+\omega _{{\mathfrak {R}}'}(F_a) \big ) +\)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_t})+\omega _{{\mathfrak {R}}'}(F_{a_t})+\omega _{{\mathfrak {R}}'}(F_{b_t})+\omega _{{\mathfrak {R}}'}(F_b) \big )\), i.e. \(\varOmega _{{\mathfrak {R}}}(i) + \varOmega _{{\mathfrak {R}}}(r) + \varOmega _{{\mathfrak {R}}}(t) \ge \)\(\varOmega _{{\mathfrak {R}}'}(i) + \varOmega _{{\mathfrak {R}}'}(r) + \varOmega _{{\mathfrak {R}}'}(t)\).
The next three cases are not possible, because \(x^3_i\) cannot be on the same side of the line as \(x^1_i\) or \(x^2_i\) in \({\mathfrak {R}}'\).
Case 6. Assume that \(\omega _{{\mathfrak {R}}}(F_a)=\omega _{{\mathfrak {R}}}(F_c)=4\), \(\omega _{{\mathfrak {R}}'}(F_a)=\omega _{{\mathfrak {R}}'}(F_c)=5\), \(\omega _{{\mathfrak {R}}}(F_b) \ge \omega _{{\mathfrak {R}}'}(F_b)\).
Case 7. Assume that \(\omega _{{\mathfrak {R}}}(F_b)=\omega _{{\mathfrak {R}}}(F_c)=4\), \(\omega _{{\mathfrak {R}}'}(F_b)=\omega _{{\mathfrak {R}}'}(F_c)=5\), \(\omega _{{\mathfrak {R}}}(F_a) \ge \omega _{{\mathfrak {R}}'}(F_a)\).
Case 8. Assume that \(\omega _{{\mathfrak {R}}}(F_a)=\omega _{{\mathfrak {R}}}(F_b)=\omega _{{\mathfrak {R}}}(F_c)=4\), \(\omega _{{\mathfrak {R}}'}(F_a)=\omega _{{\mathfrak {R}}'}(F_b)=\omega _{{\mathfrak {R}}'}(F_c)=5\).
Every \(i\in \{1,\ldots , n\}\) belongs to exactly one case. Variables s (from Cases 2 and 3), l (Case 4.2), t and r (Case 5) are equal to some \(i\in \{1,\ldots , n\}\), but are different among themselves, i.e. there is no value that repeats itself among variables s, l, r, t. Let \(A_1\) be the set of all values of i from the Case 1 that are different from all s, l, r, t. In a similar manner we introduce sets \(A_{2,3}\), \(A_{4.1}\), \(A_{4.2}\), \(A_{5}\)
We will use the previous cases to prove \(\omega ({\mathfrak {R}}) \ge \omega ({\mathfrak {R}}')\). We have \(2\cdot \omega ({\mathfrak {R}}) = \sum _{i}\omega _{{\mathfrak {R}}}(G_{x_i}) + \sum _{i}\varOmega _{{\mathfrak {R}}}(i) = \)\(\sum _{i}\omega _{{\mathfrak {R}}}(G_{x_i})\)\(+\)\(\sum _{A_1} \varOmega _{{\mathfrak {R}}}(i)\)\(+\)\(\sum _{A_{2,3}}\big ( \varOmega _{{\mathfrak {R}}}(i) + \varOmega _{{\mathfrak {R}}}(s) \big )\)\(+\)\(\sum _{A_{4.1}} \varOmega _{{\mathfrak {R}}}(i)\)\(+\)\(\sum _{A_{4.2}}\big ( \varOmega _{{\mathfrak {R}}}(i) + \varOmega _{{\mathfrak {R}}}(l)\big )\)\(+\)\(\sum _{A_5}\big ( \varOmega _{{\mathfrak {R}}}(i) + \varOmega _{{\mathfrak {R}}}(r) + \varOmega _{{\mathfrak {R}}}(t) \big )\)\(\ge \)\(\sum _{i}\omega _{{\mathfrak {R}}'}(G_{x_i})\)\(+\)\(\sum _{A_1} \varOmega _{{\mathfrak {R}}'}(i)\)\(+\)\(\sum _{A_{2,3}}\big ( \varOmega _{{\mathfrak {R}}'}(i) + \varOmega _{{\mathfrak {R}}'}(s) \big )\)\(+\)\(\sum _{A_{4.1}} \varOmega _{{\mathfrak {R}}'}(i)\)\(+\)\(\sum _{A_{4.2}}\big ( \varOmega _{{\mathfrak {R}}'}(i) + \varOmega _{{\mathfrak {R}}'}(l)\big )\)\(+\)\(\sum _{A_5}\big ( \varOmega _{{\mathfrak {R}}'}(i) + \varOmega _{{\mathfrak {R}}'}(r) + \varOmega _{{\mathfrak {R}}'}(t) \big )\)\(=\)\(2\cdot \omega ({\mathfrak {R}}')\).
Finally, \(\omega ({\mathfrak {R}}) \ge \omega ({\mathfrak {R}}')\). Therefore \({\mathfrak {R}}'\) is a minimum reconciliation. \(\square \)
Proof of Theorem 4
We will transform \({\mathfrak {R}}\) into \({\mathfrak {R}}'\) in two steps. First, we adjust all transfers that need to be adjusted, and then we alternately raise and translocate nodes.
Note that the definitions of the operations give the sufficient conditions for executing them. We assume that we perform these operations only if the conditions are satisfied. Additionally, we will translocate nodes only if \(y'\) (\(y'\) is from Definition 21) is a diagonal transfer child. This results in a raisable node after translocation.
Step 1. We adjust all transfers, with the transfer parent not from G, in an arbitrary order. This procedure will end in polynomial number of steps.
Indeed, we have two situations. In the first situation (Fig. 13a) we obtain another transfer that needs adjusting (transfer \((x'',x'_2)\)), but the transfer parent \(x''\) is positioned above x in \(G'\). In the second situation (Fig. 13b), the total number of transfers waiting for adjusting decreases by 1. Hence the effect of adjusting transfer is either positioning transfer parent higher in \(G'\), or reducing the number of unadjusted transfer. Since we are bounded by the size of \(G'\) and the number of transfers, the number of adjustments is finite and it is polynomial in size of G and S.
Therefore all transfers will be adjusted.
Step 2. Take an arbitrary transfer parent or a transfer child \(x\in V(G')\) that we can raise, and raise it. Repeat the previous procedure, as long as there is a node that we can raise.
After there are no more transfer parents or children that can be raised, translocate some node, it there is such a node. Then, again, raise all nodes that can be raised. Note that we translocate a node only if it results in a raisable node (see the second paragraph of this proof).
Repeat the previous procedure of raising and translocating nodes as long as possible.
This procedure will end in a polynomial number of steps. Indeed, by raising a node x, \(\tau (x)\) increases. Since \(\tau (x)<\tau (root(G))\), we have that Step 2 must end in polynomial number of steps, i.e. we will obtain a reconciliation in which no transfer parent or child can be raised.
By applying Steps 1 and 2 we obtain a reconciliation \({\mathfrak {R}}'\). Since the number of transfers is not changed (Lemmas 9, 10, and 11 ), we have \(\omega ({\mathfrak {R}}')=\omega ({\mathfrak {R}})\), i.e. \({\mathfrak {R}}'\) is a minimum reconciliation. We need to prove that \({\mathfrak {R}}'\) is a normalized reconciliation.
Let \((x,x')\in E(G')\) be a transfer, \(y\in V(G')\) be the maximal element such that \(x \le y\), \(\rho (x) \le \rho (y)\), and \(\tau (y)\le \tau (x)+1\).
Let us prove that y is not a transfer parent. Assume the opposite. Then \(y\in V(G)\). Let \(x''\) be the element as described in Definition 20 obtained by raising y. Then \(\tau (y)=\tau (x'')\) and \(x''\) is a transfer parent or child; or \(\tau (y)=\tau (x'')-1\) and \(x''\) is a speciation or root(G). In both cases we obtain a contradiction with the maximality of y.
Let y be a transfer child. We need to prove that \(\tau (x)=\tau (y)\). Assume the opposite, i.e. \(\tau (x)<\tau (y)\). Let us take the maximal \(x_1\) such that \(x\le x_1\le y\) and \(\tau (x_1)=\tau (x)\). Since all the transfers are adjusted, we have \(x_1\in V(G)\) and \(x_1\) is a transfer parent. Since \(\tau (x_1)<\tau (y)\), node \(x_1\) can be raised, which is a contradiction with Step 2, where we raise and translocate nodes as long as possible. Therefore \(\tau (x)=\tau (y)\).
Let y be a diagonal transfer child. We need to prove that \((x,x')\) is a diagonal transfer. Assume the opposite, i.e. \(\tau (x)=\tau (x')\). Let \(\rho (y)=E_1\in E(S')\) and \(\rho (x')=E_2\in E(S')\). We have \(\tau (E_1)=\tau (E_2)=\tau (x)\). Since there are no speciations from S with the same date (see a comment after Definition 1), one of the edges \(E_1\) or \(E_2\) is not incident with a speciation from S. If \(E_1\) is not incident with a speciation, then we can raise y. If \(E_2\) is not incident with a speciation, then we can translocate x to \(E_2\) and raise x. In both situations we have a contradiction with Step 2. Therefore, \((x,x')\) is a diagonal transfer.
Let y be a speciation, or \(y=root(G)\). Then \(\tau (x)<\tau (y)\). Since \(\tau (y)\le \tau (x)+1\), we have \(\tau (y) = \tau (x)+1\).
Let \((x,x')\) be a diagonal transfer, l a loss assigned to \(x'\), \(T_l\) a lost subtree with a leaf l. From Step 2 we have that we cannot raise \(x'\). This is possible only if \(\tau (T_l)=\tau (l)+1\), and therefore \(T_l\) has only one edge.
We proved that the properties of a normalized reconciliation are satisfied. Hence \({\mathfrak {R}}'\) is a normalized reconciliation.\(\square \)
Proof of Theorem 5
Since \({\mathfrak {R}}\) is a normalized reconciliation, it is also, by definition, minimum. If I is a time slice, then \(R_I\) denotes the partial reconciliation induced by I, i.e. the part of \({\mathfrak {R}}\) that is inside I, and all other time slices before I, where “before” refer to those lower in the tree. We will prove that the algorithm constructs \({\mathfrak {R}}_I\) during the execution. We will use mathematical induction on I.
Let \(I_0\) be the first time slice (i.e. the lowest time slice), and \(s_0\in V(S)\) be a speciation such that \(\tau (s_0)\in I_0\) (Fig. 17), \(E_1, E_2\in E(S)\) are incident with \(s_0\). Next, \(e_1,e_2\) are the minimal edges of \(G'\) contained in \(E_1\) and \(E_2\). More precisely, \(e_1=(x_1,x_2)\), \(e_2=(y_1,y_2)\) are the minimal edges of \(G'\) such that \(\rho (x_2)\le E_1 \le \rho (x_1)\) and \(\rho (y_2)\le E_2 \le \rho (y_1)\). Edges \(e_1\) and \(e_2\) are unique (Lemma 6).
Let us prove that we can obtain \({\mathfrak {R}}_{I_0}\) during the execution of the algorithm. We have several cases.
Case 1. Edges \(e_1\) and \(e_2\) are incident. Let us prove that \(e_1\) and \(e_2\) coalesce at \(s_0\). Assume the opposite, \(\rho (x)\ne s_0\), where \(x\in V(G)\) is incident with both \(e_1\) and \(e_2\). Then \(e_1\) or \(e_2\) is a transfer, hence we can construct a reconciliation with smaller weight by placing x in \(s_0\), which contradicts the minimality of \({\mathfrak {R}}\). Figure 20 depicts a more detailed argumentation.
Case 2. Edges \(e_1\) and \(e_2\) are not incident. We will investigate subcases. Some subcases are not obtainable by the algorithm. For them, we will prove they cannot occur in \({\mathfrak {R}}\). Let x be the minimal element from V(G) that is an ancestor of \(e_1\) and \(e_2\).
Case 2.1. Let \(\rho (x)=s_0\), \(\rho (x'_{i_1})=E_1\), \(\rho (x''_{i_2})=E_2\)\((i_1=1,\ldots ,k_1\); \(i_2=1,\ldots ,k_2.)\), where \(x'_{i_1}\) and \(x''_{i_2}\) are explained in Sect. 4.2. This case refers to Case 3b of Sect. 4.2.
We will prove that there is a random choice such that the random expansion of \(x'_1\) produces placement of the nodes identical to the one in \({\mathfrak {R}}\).
Assume the opposite, there is no such random choice. This means that we cannot obtain a situation depicted by Fig. 18b. Then there are descendants of \(x'_1\), denoted by \(y'_j\) (\(j=1,\ldots ,k\)) (Fig. 21) such that \(y'_1,\ldots ,y'_{k-1}\in V(G)\), \(y'_j={{\textsc {p}}}_{G'}(y'_{j+1})\) (\(j=1,\ldots ,k-1\)), \(y'_k\) is a transfer parent, \(y'_k \in V(G')\backslash V(G)\), and \(\rho (y'_1)=\ldots =\rho (y'_k)=E_3\). Let \(E_4\in E(S')\) be the edge that contains \(y'_{k+1}\), which is a child of \(y'_k\). By translocating \(y'_1,\ldots y'_k\) to \(E_4\) we obtain a reconciliation with one transfer less, which contradicts the minimality of \({\mathfrak {R}}\). Another reason why this case is not possible is that transfer \((y'_k,y'_{k+1})\) is not adjusted, which contradicts the fact that \({\mathfrak {R}}\) is a normalized reconciliation.
Therefore, we can obtain the expansion of \(x'_1\). The same reasoning applies for \(x'_2,\ldots ,x'_{k_1}\) and \(x''_1,\ldots ,x''_{k_2}\) and their children.
Case 2.2. Assume that \(E_i\in E(S')\) receives a diagonal transfer and \(e_{3-i}\in E(G)\) is propagated to the next time slice (\(i=1\) or \(i=2\)). This case is also obtainable by the algorithm (Cases 3\(a_1\) and 3\(a_2\)).
Case 2.3. Both \(e_1\) and \(e_2\) are propagated to the next time slice. Then \(s_0\) contains two unaligned edges from G, which is impossible for a \(T_R\) reconciliation (Lemma 6). Therefore this case cannot occur.
Case 2.4 We have \(\rho (x)=s_0\), and there is \(y_1\in \{x'_1,\ldots ,x'_{k_1}\}\), or \(y_2\in \{x''_1,\ldots ,x''_{k_2}\}\) such that \(\rho (y_1)\ne E_1\), or \(\rho (y_2)\ne E_2\). Since \(s_0\) is the only speciation in S in the current time slice, then all \(x'_1,\ldots ,x'_{k_1}\) and \(x''_1,\ldots ,x''_{k_2}\) are transfers.
Let \({\mathfrak {R}}'\) be a reconciliation such that \(\rho _{{\mathfrak {R}}'}(x'_1)=\ldots =\rho _{{\mathfrak {R}}'}(x'_{k_1})=E_1\), \(\rho _{{\mathfrak {R}}'}(x''_1)=\ldots =\rho _{{\mathfrak {R}}'}(x''_{k_2})=E_2\), and \(\rho _{{\mathfrak {R}}'}(y)=\rho _{{\mathfrak {R}}}(y)\) for all the remaining \(y\in V(G)\). Then \({\mathfrak {R}}'\) is a reconciliation with smaller weight than \({\mathfrak {R}}\), which contradicts the optimality of \({\mathfrak {R}}\).
Case 2.5. Assume that \(\tau (x) > \tau (s_0)\), \(\tau (y_1)\le \tau (s_0)\), and \(\tau (y_2)\le \tau (s_0)\) for some \(y_1\in \{x'_1,\ldots ,x'_{k_1}\}\), \(y_2\in \{x''_1,\ldots ,x''_{k_2}\}\). Then \({\mathfrak {R}}\) is not a normalized reconciliation. Hence this case is not possible.
Case 2.6. If x is in \(I_0\) and \(\rho (x) \ne s_0\), then by taking \(\rho (x)=s_0\) we get a reconciliation with fewer transfers (similarly to Case 1 and Fig. 20), contrary to the minimality of \({\mathfrak {R}}\).
For the inductive hypothesis part, assume that the statement is true for time slices \(I_0, I_1,\ldots , I_{k-1}\). Let us prove that it is true for \(I_k\). Proving the statement for \(I_k\) is the same as for \(I_0\), therefore we will not repeat it.
Hence \({\mathfrak {R}}_I\) is obtainable by the procedure. Since \({\mathfrak {R}}_I={\mathfrak {R}}\) for the final time slice I, \({\mathfrak {R}}\) is also obtainable by the algorithm. Since it is a minimal reconciliation, \({\mathfrak {R}}\) is a possible output of the algorithm. \(\square \)
We will use the next lemma in the proof of Theorem 6. Basically, it states that it is not important which random choice we select in Case 3b of the algorithm (see Sect. 4.2).
Lemma 13
The random choice in Case 3b of the algorithm does not affect the weight of an output reconciliation.
Proof
Let \(I_k\) be the observed time interval, and \(I_0,\ldots ,I_{k-1}\) be the time intervals before \(I_{k}\) (Fig. 22).
Let \(x'_1\) be a node that we randomly expand, and assume we have more than one choice for a random active edge with maximal \(\tau \)-value that is a descendant of \(x'_1\). Let \(e_{31}\) and \(e_{32}\) be two of those edges.
We will use notations from Sect. 4.2. When constructing \({\mathfrak {R}}_{I_k}\) from \({\mathfrak {R}}_{I_{k-1}}\), we are adding some new nodes from G. Only x is a speciation, and all other nodes are transfer parents. Since every transfer has a parent from V(G), we obtain that the number of newly added transfers is equal to the number of newly added nodes from V(G) minus one. Therefore \(\omega ({\mathfrak {R}}_{I_k})\) is not affected by a choice of \(e_3\).
Note that the active edges in \(I_{k+1}\) are not affected by a choice of \(e_3\).\(\square \)
Proof of Theorem 6
It is obvious that \(\omega ({\mathfrak {R}})\le k\), because the algorithm cuts an edge of the branch and bound tree if \(t>k\), where t is the number of transfers in a partially constructed reconciliation.
Now we will prove that the conditions of Definition 22 are satisfied. Let \((x, x')\in E(G')\) be a transfer in \({\mathfrak {R}}\), and \(y\in V(G')\) be the maximal element such that \(x\le y\), \(\rho (x)\le \rho (y)\), \(\tau (y)\le \tau (x)+1\).
In the algorithm, transfers are created when nodes are randomly expanded. Since only nodes in V(G) are randomly expanded, every transfer starts in a node from V(G). Hence \(x\in V(G)\).
Transfers are constructed in Cases 3b and 4 (see Sect. 4.2). Therefore, y can be a speciation from G, transfer child, or root(G). If y is a speciation, then \(x\in \{x'_1,\ldots , x'_{k_1}, x''_1,\ldots , x''_{k_2}\}\), where \(x'_1,\ldots , x'_{k_1}, x''_1,\ldots , x''_{k_2}\) are explained in Sect. 4.2, and \(\tau (y)=\tau (x)+1\). If y is a transfer child, then \(\tau (y)=\tau (x)\). If y is root(G), then \(\tau (y)=\tau (x)+1\).
Let y be a diagonal transfer child. Diagonal transfers are made by using edges from G that were on hold. From Case 3a we have that a loss l, assigned to y, belongs to a lost subtree \(T_l\) with one edge and \(\tau (root(T_l))=\tau (y)+1\). Also, \((x,x')\) cannot be a horizontal transfer, because when we put an edge on hold all other edges are propagated to the next time slice, leaving no room for accepting a transfer.
Now we will prove that \({\mathfrak {R}}\) is a minimal reconciliation. The algorithm given in Sect. 4.2 is branch and bound, and it exhaustively observes every case possible for a normalized reconciliation, which we stated in the proof of Theorem 5. Also, which random option it takes in Case 3b does not affect the optimality of an output (Lemma 13). The algorithm always chooses a reconciliation of a smaller weight, if it finds one. Therefore, if it returns a reconciliation as a output, then it is a minimal reconciliation. i.e. \({\mathfrak {R}}\) is a minimal reconciliation. \(\square \)
Proof of Lemma 12
Note that if \((a_2,a_1)\in E(G)\), then there is a path in \(G'\)\((a_2, b_1, \ldots , b_s, a_1)\). The length of this path is at least 1, i.e. \(s\ge 0\). Hence every edge from G is a path in \(G'\). Also, \((a_2, a_1)\) can contain a transfer. In this proof we assume that all transfers are adjusted (as described by Definition 19 and Fig. 13), i.e. all transfers start in V(G).
We introduce a coloring of edges and nodes that were involved in some SPR operation. Let \(spr((a_2, a_1), (b_2, b_1)) = a'_2\) be the i-th SPR operation \(T_i \rightarrow T_{i+1}\). Then we color the edge \((a'_2, a_1)\) and node \(a'_2\) with color \(C_i\). If the edge \((b_2,b_1)\) was colored, then edges \((b_2,a'_2)\) and \((a'_2,b_1)\) are colored with the same color. Let \(c_1\) be the child of \(a_2\) (in \(T_i\)) different from \(a_1\), and \(c_2\) be the parent of \(a_2\) (in \(T_i\)). Then \(c_2\) is the parent of \(c_1\) (in \(T_{i+1}\)). If edge \((c_2,a_2)\) was colored with a color, then the edge \((c_2,c_1)\) is colored with the same color.
To a minimum SPR scenario we will assign a minimum \(T_R\) reconciliation. Colored edges will represent transfers, colored nodes will be transfer parents, non-colored edges will coincide with the edges of the species tree, and non-colored nodes will be speciations.
Let us first demonstrate the reduction from k-Minimum\(T_R\)Reconciliation to k-Minimum Dated SPR Scenario. Let S and G be a species and gene tree, \(S=T_0\rightarrow T_1\rightarrow \cdots \rightarrow T_k=G\) be a minimum SPR scenario transforming S into G. Using this minimum SPR scenario, we will construct a minimum \(T_R\) reconciliation.
Note that in \(T_k\) we have at most k nodes that are colored. Also, colored edges form (colored) subtrees of \(T_k\) with colored roots and inner nodes, while the leaves of these trees are not colored.
If \(a\in V(T_k)\) is a non-colored node, then it can be observed as a node from S and node from G. Take \(\rho (a)=a\in V(S)\), for all non-colored nodes \(a\in V(T_k)=V(G)\). Non-colored paths connect non-colored nodes. All non-colored edges from \(T_k=G\) place inside S so that they contain no transfer. Note that the leaves of \(T_k\) are non-colored.
Now, inside S we will place colored nodes and colored edges. Let \(T_c\) be an arbitrary colored tree, and \(c_0\) be its root. Then \(c_0\) is on a non-colored path of G, and we will leave it there in S. Next, the inner nodes of \(T_c\) we place inside S. Let \(L(T_c)=\{l_1,\ldots , l_s\}\), and \(\tau (l_1)\ge \cdots \ge \tau (l_s)\). Assume that \(c^1_1, c^1_2, \ldots , c^1_{i_1}\) are inner nodes of \(T_c\) in the path from \(l_1\) to \(c_0\) whose placement inside S is not defined. Then place these nodes in the edge of \(S'\) just above \(l_1\), i.e. \(\rho (c^1_1) = \ldots =\rho (c^1_{i_1})={{\textsc {p}}}^{{\textsc {e}}}_{S'}(\rho (l_1))\). Repeat the previous process for leaves \(l_2, \ldots l_s\). In this way we obtain a reconciliation with transfers, and every edge of S at any moment contains at most one lineage from \(G'\), hence if we extend losses we obtain a \(T_R\) reconciliation. Since a transfer can start only at a colored node, we have at most k transfers, i.e. \(\omega ({\mathfrak {R}}) \le k\).
After the next reduction, we will prove that \({\mathfrak {R}}\) is a minimum reconciliation.
In the second part, we demonstrate a reduction from k-Minimum Dated SPR Scenario to k-Minimum\(T_R\)Reconciliation. Let T be a dated and \(T'\) is an undated binary rooted tree. We need a minimum dated SPR scenario \(T=T_0\rightarrow T_1\rightarrow \cdots \rightarrow T_k=T'\).
Take \(S=T\) and \(G=T'\). Let \({\mathfrak {R}}\) be a minimum \(T_R\) reconciliation, and \(\omega ({\mathfrak {R}})=k\). We will prove that the length of minimum dated SPR scenario is k, and reconstruct it using \({\mathfrak {R}}\).
First, let us construct a scenario of the length k. Adjust all transfers in \({\mathfrak {R}}\), so they start at the nodes from V(G), just like in the first step of the proof of Theorem 4 (Definition 19, Fig. 13).
Take \(T_k=T'\), \(G_k=G\), \(G'_k=G'\), and \({\mathfrak {R}}_k={\mathfrak {R}}\). Let \((x_2, x_1)\) be an arbitrary transfer, \(x'_1\) be the child of \(x_1\) in \(G'\), l be the loss assigned to \(x_1\), and \(l_0=root(T_l)\), where \(T_l\) is a lost subtree such that \(l\in L(T_l)\). Let \(p_k = (l_0, l_1, \ldots , l_{s-1}, l_s=l)\) be a path in \(G'\) (i.e. in \(T_l\)), and therefore a lost path. Remove \((x_2, x_1)\) from \(G'_k\), suppress \(x_2\), include the path \(p_k\) into \(G_k\) (\(p_k\) is not a lost path anymore), suppress \(x_1\). Thus we eliminate one transfer, and obtain \(G_{k-1}, G'_{k-1}, {\mathfrak {R}}_{k-1}\), where \(\omega ({\mathfrak {R}}_{k-1})=\omega ({\mathfrak {R}}_k)-1\). By repeating this procedure, we obtain an SPR scenario \(T'=T_k\rightarrow T_{k-1}\rightarrow \cdots \rightarrow T_0=T\), i.e. \(T=T_0\rightarrow T_1\rightarrow \cdots \rightarrow T_k=T'\).
Since the transfers can be horizontal or diagonal, corresponding SPR operations are dated. We proved that optimal dated SPR scenario transforming T into \(T'\) has the length at most k.
Let us prove that the previous reductions construct a minimum reconciliation (the first reduction) and a minimum SPR scenario (the second reduction). Let \(T_1 \rightarrow \ldots T_k\) be a minimum SPR scenario. Take \(S=T_1, G=T_k\) and \({\mathfrak {R}}\) is a reconciliation obtained in the first reduction. We have \(k'=\omega ({\mathfrak {R}})\le k\). Now, let \(T_1=T'_1\rightarrow T'_2\rightarrow \cdots \rightarrow T'_{k''}=T_k\) be a SPR scenario obtained from G and S in the second reduction. Then \(k''\le k' \le k\). Since there is no SPR scenario, transforming \(T_1\) into \(T_k\), with the length less than k, we have \(k''=k'=k\). \(\square \)
Rights and permissions
About this article
Cite this article
Hasić, D., Tannier, E. Gene tree reconciliation including transfers with replacement is NP-hard and FPT. J Comb Optim 38, 502–544 (2019). https://doi.org/10.1007/s10878-019-00396-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-019-00396-z