The mathematics of xenology: di-cographs, symbolic ultrametrics, 2-structures and tree-representable systems of binary relations

Hellmuth, Marc; Stadler, Peter F.; Wieseke, Nicolas

doi:10.1007/s00285-016-1084-3

The mathematics of xenology: di-cographs, symbolic ultrametrics, 2-structures and tree-representable systems of binary relations

Published: 30 November 2016

Volume 75, pages 199–237, (2017)
Cite this article

Journal of Mathematical Biology Aims and scope Submit manuscript

Marc Hellmuth^1,2,
Peter F. Stadler^3,4,5,6,7 &
Nicolas Wieseke^8,9

523 Accesses
25 Citations
Explore all metrics

Abstract

The concepts of orthology, paralogy, and xenology play a key role in molecular evolution. Orthology and paralogy distinguish whether a pair of genes originated by speciation or duplication. The corresponding binary relations on a set of genes form complementary cographs. Allowing more than two types of ancestral event types leads to symmetric symbolic ultrametrics. Horizontal gene transfer, which leads to xenologous gene pairs, however, is inherent asymmetric since one offspring copy “jumps” into another genome, while the other continues to be inherited vertically. We therefore explore here the mathematical structure of the non-symmetric generalization of symbolic ultrametrics. Our main results tie non-symmetric ultrametrics together with di-cographs (the directed generalization of cographs), so-called uniformly non-prime () 2-structures, and hierarchical structures on the set of strong modules. This yields a characterization of relation structures that can be explained in terms of trees and types of ancestral events. This framework accommodates a horizontal-transfer relation in terms of an ancestral event and thus, is slightly different from the the most commonly used definition of xenology. As a first step towards a practical use, we present a simple polynomial-time recognition algorithm of 2-structures and investigate the computational complexity of several types of editing problems for 2-structures. We show, finally that these NP-complete problems can be solved exactly as Integer Linear Programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond Representing Orthology Relations by Trees

Article Open access 09 November 2016

On tree representations of relations and graphs: symbolic ultrametrics and cograph edge decompositions

Article 25 January 2017

Reconstructing gene trees from Fitch’s xenology relation

Article 27 June 2018

References

Altenhoff AM, Boeckmann B, Capella-Gutierrez S, Dalquen DA, DeLuca T, Forslund K, Huerta-Cepas J, Linard B, Pereira C, Pryszcz LP, Schreiber F, da Silva AS, Szklarczyk D, Train CM, Bork P, Lecompte O, von Mering C, Xenarios I, Sjölander K, Jensen LJ, Martin MJ, Muffato M, Gabaldón T, Lewis SE, Thomas PD, Sonnhammer E, Dessimoz C (2016) Standardized benchmarking in the quest for orthologs. Nat Methods 13(5):425–430
Article Google Scholar
Böcker S, Dress AWM (1998) Recovering symbolically dated, rooted trees from symbolic ultrametrics. Adv Math 138:105–125
Article MathSciNet MATH Google Scholar
Brandstädt A, Le VB, Spinrad JP (1999) Graph classes: a survey. Society for Industrial and Applied Mathematics, Philadelphia
Book MATH Google Scholar
Corneil DG, Lerchs H, Burlingham Steward L (1981) Complement reducible graphs. Discr. Appl. Math. 3:163–174
Article MathSciNet MATH Google Scholar
Crespelle C, Paul C (2006) Fully dynamic recognition algorithm and certificate for directed cographs. Discr. Appl. Math. 154:1722–1741
Article MathSciNet MATH Google Scholar
Dondi R, El-Mabrouk N, Lafond M (2016) Correction of weighted orthology and paralogy relations-complexity and algorithmic results. In: International workshop on algorithms in bioinformatics. Springer, pp 121–136
Ehrenfeucht A, Gabow HN, Mcconnell RM, Sullivan SJ (1994) An O(\(n^2\)) divide-and-conquer algorithm for the prime tree decomposition of two-structures and modular decomposition of graphs. J Algorithms 16(2):283–294
Article MathSciNet MATH Google Scholar
Ehrenfeucht A, Harju T, Rozenberg G (1995) Theory of 2-structures. In: Fülöp Z, Gécseg F (eds) Automata, languages and programming: proceedings of the 22nd international colloquium, ICALP 95 Szeged, Hungary, July 10–14, 1995. Springer, Berlin, pp 1–14
Chapter Google Scholar
Ehrenfeucht A, Harju T, Rozenberg G (1999) The theory of 2-structures: a framework for decomposition and transformation of graphs. World Scientific, Singapore
Book MATH Google Scholar
Ehrenfeucht A, Rozenberg G (1990) Primitivity is hereditary for 2-structures. Theor Comput Sci 70(3):343–358
Article MathSciNet MATH Google Scholar
Ehrenfeucht A, Rozenberg G (1990) Theory of 2-structures, part I: clans, basic subclasses, and morphisms. Theor Comput Sci 70:277–303
Article MATH Google Scholar
Ehrenfeucht A, Rozenberg G (1990) Theory of 2-structures, part II: representation through labeled tree families. Theor Comput Sci 70:305–342
Article MATH Google Scholar
Engelfriet J, Harju T, Proskurowski A, Rozenberg G (1996) Characterization and complexity of uniformly nonprimitive labeled 2-structures. Theor Comput Sci 154:247–282
Article MathSciNet MATH Google Scholar
Fitch WM (1970) Distinguishing homologous from analogous proteins. Syst Zool 19:99–113
Article Google Scholar
Fitch WM (2000) Homology a personal view on some of the problems. Trends Genet 16:227–231
Article Google Scholar
Gray GS, Fitch WM (1983) Evolution of antibiotic resistance genes: the DNA sequence of a kanamycin resistance gene from Staphylococcus aureus. Mol Biol Evol 1:57–66
Google Scholar
Hellmuth M, Hernandez-Rosales M, Huber KT, Moulton V, Stadler PF, Wieseke N (2013) Orthology relations, symbolic ultrametrics, and cographs. J Math Biol 66:399–420
Article MathSciNet MATH Google Scholar
Hellmuth M, Wieseke N (2015) On symbolic ultrametrics, cotree representations, and cograph edge decompositions and partitions. In: Xu D (ed) Computing and combinatorics, lecture notes in computer science, vol 9198. Springer International Publishing, Cham, pp 609–623
Google Scholar
Hellmuth M, Wieseke N (2016) From sequence data including orthologs, paralogs, and xenologs to gene and species trees. Springer International Publishing, Cham
Book Google Scholar
Hellmuth M, Wieseke N (2016) On tree representations of relations and graphs: Symbolic ultrametrics and cograph edge decompositions. arXiv:1509.05069 (preprint )
Hellmuth M, Wieseke N, Lechner M, Lenhof HP, Middendorf M, Stadler PF (2015) Phylogenomics with paralogs. Proc Natl Acad Sci USA 112:2058–2063
Article Google Scholar
Hernandez-Rosales M, Hellmuth M, Wieseke N, Huber KT, Moulton V, Stadler PF (2012) From event-labeled gene trees to species trees. BMC Bioinf 13(Suppl. 19):S6
Google Scholar
Jensen RA (2001) Orthologs and paralogs—we need to get it right. Genome Biol 2:8
Article Google Scholar
Keeling PJ, Palmer JD (2008) Horizontal gene transfer in eukaryotic evolution. Nat Rev Genet 9:605–618
Article Google Scholar
Koonin E (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39:309–338
Article Google Scholar
Koonin EV, Makarova KS, Aravind L (2001) Horizontal gene transfer in prokaryotes: quantification and classification. Annu Rev Microbiol 55:709–742
Article Google Scholar
Lafond M, Dondi R, El-Mabrouk N (2016) The link between orthology relations and gene trees: a correction perspective. Algorithms Mol Biol 11(1):1
Article Google Scholar
Lafond M, El-Mabrouk N (2015) Orthology relation and gene tree correction: complexity results. In: International workshop on algorithms in bioinformatics. Springer, pp 66–79
Lechner M, Findeiß S, Steiner L, Marz M, Stadler PF, Prohaska SJ (2011) Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinf 12:124
Article Google Scholar
Lechner M, Hernandez-Rosales M, Doerr D, Wiesecke N, Thevenin A, Stoye J, Hartmann RK, Prohaska SJ, Stadler PF (2014) Orthology detection combining clustering and synteny for very large datasets. PLoS One 9(8):e105,015
Article Google Scholar
McConnell RM (1995) An \(o(n^2)\) incremental algorithm for modular decomposition of graphs and 2-structures. Algorithmica 14(3):229–248
Article MathSciNet MATH Google Scholar
McConnell RM, de Montgolfier F (2005) Linear-time modular decomposition of directed graphs. Discr Appl Math 145(2):198–209
Article MathSciNet MATH Google Scholar
Möhring RH (1985) Algorithmic aspects of the substitution decomposition in optimization over relations, set systems and boolean functions. Ann Oper Res 4(1):195–225
Article MathSciNet Google Scholar
Möhring RH, Radermacher FJ (1984) Substitution decomposition for discrete structures and connections with combinatorial optimization. Ann Discr Math 19:257–356
MathSciNet MATH Google Scholar
Schmerl JH, Trotter WT (1993) Critically indecomposable partially ordered sets, graphs, tournaments and other binary relational structures. Discr Math 113(1):191–205
Article MathSciNet MATH Google Scholar
Semple C, Steel M (2003) Phylogenetics, Oxford lecture series in mathematics and its applications, vol 24. Oxford University Press, Oxford
Google Scholar
Sennblad B, Lagergren J (2009) Probabilistic orthology analysis. Syst Biol 58:411–424
Article Google Scholar
Soucy SM, Huang J, Gogarten JP (2015) Horizontal gene transfer: building the web of life. Nat Rev Genet 16:472–482
Article Google Scholar
Valdes J, Tarjan RE, Lawler EL (1982) The recognition of series parallel digraphs. SIAM J Comput 11:298–313
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We thanks Maribel Hernández-Rosales for discussions. This work was funded by the German Research Foundation (DFG) (Proj. Nos. MI439/14-1 to P.F.S. and N.W.).

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Straße 47, 17487 , Greifswald, Germany
Marc Hellmuth
Center for Bioinformatics, Saarland University, Building E 2.1, P.O. Box 151150, 66041 , Saarbrücken, Germany
Marc Hellmuth
Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, 04107 , Leipzig, Germany
Peter F. Stadler
Interdisciplinary Center of Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107 , Leipzig, Germany
Peter F. Stadler
Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, 04103 , Leipzig, Germany
Peter F. Stadler
Institute of Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 , Wien, Austria
Peter F. Stadler
Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501 , USA
Peter F. Stadler
Parallel Computing and Complex Systems Group, Department of Computer Science, University of Leipzig, Johannisgasse 26, 04103 , Leipzig, Germany
Nicolas Wieseke
Interdisciplinary Center of Bioinformatics, University of Leipzig, Johannisgasse 26, 04103 , Leipzig, Germany
Nicolas Wieseke

Authors

Marc Hellmuth
View author publications
You can also search for this author in PubMed Google Scholar
Peter F. Stadler
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Wieseke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc Hellmuth.

Appendix

1.1 Proofs of Proposition 1 and Lemma 6

Let \(g=(V,{\varUpsilon }, \varphi )\) be a 2-structure. In the proof of Proposition 1 we write \({\varDelta }(xyz)\) as a shorthand for “the Condition (U2) must be fulfilled for the set \(D_{xyz}\)”, where \(x,y,z\in V\). Moreover, for any forbidden subgraph K that might occur in the graph \(G_j(g)\) of some 2-structure g, we use the symbols \(K^j(abc)\) and \(K^j(abcd)\), resp., to designate the fact that \(G_j(g)\) contains the forbidden subgraph K induced by the vertices a, b, c, resp., a, b, c, d in \(G_j(g)\).

1.1.1 Proof of Proposition 1

In order to prove Proposition 1, we have to show that 2-structures are characterized by the conditions

(U1):: \(G_{i}(g)\) is a di-cograph for all \(i\in {\varUpsilon }\) and
(U2):: for all vertices \(x,y,z \in V\) it holds \(|\left\{ D_{xy}, D_{xz}, D_{yz} \right\} |\le 2\).

We will frequently apply the following argument without explicitly stating it every time: By definition, if g is reversible then \(\varphi (e)=\varphi (f)\) iff \(\varphi (e^{-1})=\varphi (f^{-1})\). Hence, for reversible \(g, D_{ab}\ne D_{xy}\) implies that \(\varphi (ab)\ne \varphi (xy), \varphi (yx)\) and \(\varphi (ba)\ne \varphi (xy), \varphi (yx)\).

\(\Rightarrow \): Let \(g=(V,{\varUpsilon }, \varphi )\) be a reversible 2-structure. If \(|V|<3\) then (U1) and (U2) are trivially satisfied. Thus we assume w.l.o.g. that \(|V|\ge 3\). Furthermore, suppose there is a label \(i\in {\varUpsilon }\) such that \(G_i(g)\) is not a di-cograph, i.e., \(G_i(g)\) contains one of the forbidden subgraphs. Since g is reversible, the forbidden subgraphs \(A,B,\overline{D_3}\), and \(\overline{N}\) cannot occur.

Now let h be a substructure of g with \(|V_h|=3\) containing \(D_3\) or \(C_3\), or \(|V_h|=4\) containing \(P_4\) or N, respectively. It is not hard to check that for each of these four graphs and any two distinct vertices \(a,b \in V_h\) there is always a vertex \(v\in V_h{\setminus }\{a,b\}\) so that \(\varphi (av)\ne \varphi {(bv)}\). Therefore, \(\{a,b\}\) cannot form a module in h. For \(P_4\) and N one checks that for any three distinct vertices \(a,b,c \in V_h\) and \(v\in V_h{\setminus }\{a,b,c\}\) we always have \(\varphi (av)\ne \varphi {(bv)}\), or \(\varphi (av)\ne \varphi {(cv)}\), or \(\varphi (bv)\ne \varphi {(cv)}\), so that \(\{a,b,c\}\) cannot form a module in h. Thus, h contains only trivial modules and, hence, is prime. This contradiction implies that (U1) must be fulfilled.

Since \(g=(V,{\varUpsilon },\varphi )\) has a tree-representation without prime nodes, and since three distinct leaves can have at most two distinct least common ancestors, Condition (U2) must hold as well.

\(\Leftarrow \): Now assume that \(\varphi \) is a symbolic ultrametric, i.e., condition (U1) and (U2) are fulfilled for a reversible 2-structure g. In order to show that g is we have to demonstrate that all substructures h of g with \(|V_h|=3\) and \(|V_h|=4\) are non-prime (cf. Theorem 3).

Claim 1

If h is a substructure of g with \(V_h=\{a,b,c\}\), then h is non-prime.

Proof of Claim 1

Since \({\varDelta }(abc)\) we may assume that \(D_{ab}=D_{ac}\), otherwise we simply relabel the vertices. If \(|D_{ab}|=1\), then \(\{b,c\}\) forms a module in h. Assume that \(|D_{ab}|=2\). There are two cases, either \(\varphi (ab)=\varphi (ac)\), then \(\{b,c\}\) is a module in h, or \(\varphi (ba)=\varphi (ac)=i\). In the latter case, \(\varphi (bc)=i\) since otherwise either or \(D^i_3(abc)\) or \(C_3^i(abc)\) would occur. Therefore, \(\{a,c\}\) forms a module in h.

Hence, in all cases, a substructure h of g with \(V_h=\{a,b,c\}\) forms a non-prime structure. \(\square \)

Claim 2

If h is a substructure of g with \(V_h=\{a,b,c,d\}\), then h is non-prime.

Proof of Claim 2

There are two cases, either \(|D_{ab}|=1\) or \(|D_{ab}|=2\). For both cases, we will examine numerous sub-cases that might occur, and show that for each of these cases h contains non-trivial modules and thus, is non-prime. \(\square \)

Case \(|D_{ab}|=1\):

Since \({\varDelta }(abc)\) we can assume that \(D_{ab}=D_{ac}\), otherwise relabel the vertices. Thus, \(\varphi (ab)=\varphi (ba)=\varphi (ac)=\varphi (ca)=i\) for some \(i\in {\varUpsilon }\). Since \({\varDelta }(acd)\) we have the three distinct cases

(i)
\(\varphi (ad)=\varphi (cd)=i\),
(ii)
either (A) \(\varphi (ad)=i\) or (B) \(\varphi (cd)=i\)
(iii)
neither \(\varphi (ad)=i\) nor \(\varphi (cd)=i\).

In Case (i) and (iiA), \(\{b,c,d\}\) is a module in h. In Case (iiB), the arc (bc) or (bd) must be labeled with i as otherwise there is \(P_4^i(abcd)\). If \(\varphi (bc)=i\), then \(\{a,b,d\}\) is a module in h. If \(\varphi (bd)=i\), then \(\{a,d\}\) is a module in h.

Consider now Case (iii). Since \({\varDelta }(acd)\), it follows that \(D_{ad}=D_{cd}\) and in particular, \(i\notin D_{ad}=D_{cd}\), since g is reversible. Let first \(|D_{ad}|=|\{j\}|=1\). Since \({\varDelta }(abd)\), we have that either \(\varphi (bd)=j\), in which case \(\{a,b,c\}\) is a module in h or \(\varphi (bd)=i\), which implies that \(\varphi (bc)=i\), since otherwise \(P_4^i(abcd)\). In the latter case, \(\{a,c,d\}\) forms a module in h. If \(|D_{ad}|=2\), we have only the case that \(\varphi (ad)=\varphi (cd)=j\) for some \(j\in {\varUpsilon }\). In the two other cases \(\varphi (ad)=\varphi (dc)=j\) or \(\varphi (da)=\varphi (cd)=j\) we would obtain \(D_3^j(adc)\). Since \({\varDelta }(abd)\), we obtain that either (I) \(\varphi (bd)=i\), (II) \(\varphi (bd)=j\) or (III) \(\varphi (db)=j\). Case (I) implies that \(\varphi (bc)=i\) as otherwise there is \(P_4^i(abcd)\). Hence, \(\{a,c,d\}\) form a module in h. In Case (II) \(\{a,b,c\}\) is a module in h and Case (III) cannot occur, as otherwise there is \(D_3^j(abd)\).

Case \(|D_{ab}|=2\): Since \({\varDelta }(abc)\), we can assume wlog. that \(D_{ab}=D_{ac}\), otherwise we relabel the vertices. Hence, we have either (I) \(\varphi (ab)=\varphi (ac)=i\) or (II) \(\varphi (ba)=\varphi (ac)=i\). Note that in Case (I), \(\varphi (ba)=\varphi (ca)=i'\ne i\) and in Case (II) \(\varphi (ab)=\varphi (ca)=i'\ne i\).

Consider Case (I). Since \({\varDelta }(acd)\), we have one of the four distinct cases

(i)
\(D_{ac}=D_{ad}=D_{cd}\)
(ii)
\(D_{ac}=D_{ad}\ne D_{cd}\)
(iii)
\(D_{ac}=D_{cd}\ne D_{ad}\)
(iv)
\(D_{ac}\ne D_{cd}\) and \(D_{ac}\ne D_{ad}\)

In Case (Ii) it is not possible to have \(\varphi (cd)=\varphi (da)=i\) as otherwise there is \(C^i_{3}(acd)\). If \(\varphi (ad)=i\), then \(\{b,c,d\}\) is module in h. If \(\varphi (da)=i\), then \(\varphi (db)=\varphi (dc)=i\), since otherwise there is \(D_3^i(abd), D_3^i(acd)\), \(C_3^i(abd)\) or \(C_3^i(acd)\). In that case, \(\{a,b,c\}\) is a module in h.

In Case (Iii) it is not possible to have \(\varphi (da)=i\), since otherwise there is \(D_3^i(acd)\). Thus, \(\varphi (ad)=i\) and therefore, \(\{b,c,d\}\) forms a module in h.

In Case (Iiii) it is not possible to have \(\varphi (cd)=i\), since otherwise there is \(D_3^i(acd)\). Hence, \(\varphi (dc)=i\). But then, at least one of the remaining arcs (bc), (cb), (bd), (db) must have label i, since otherwise there is \(N^i(abcd)\). If \(\varphi (bc)=i\), then \(\{a,b,d\}\) is a module in h. If \(\varphi (cb)=i\), then \(\varphi (db)=i\) as otherwise there is \(D_3^i(bcd)\) or \(C_3^i(bcd)\). Hence, \(\{a,c,d\}\) is a module in h. The case \(\varphi (bd)=i\) is not possible, since then there is \(D_3^i(abd)\). If \(\varphi (db)=i\), then \(\{a,d\}\) is a module in h.

In Case (Iiv) and since \({\varDelta }(acd)\), we have \(D_{ad}=D_{cd}\). If \(D_{ad}=\{j\}\) and thus, \(|D_{ad}|=1\), then \({\varDelta }(abd)\) implies that either \(\varphi (bd)=\varphi (db)=j\ne i\), or \(\varphi (bd)=i\), or \(\varphi (db)=i\). If \(\varphi (bd)=j\ne i\), then \(\{a,b,c\}\) is a module in h. The case \(\varphi (bd)=i\) cannot happen, since otherwise there is \(D^i_3(abd)\). If \(\varphi (db)=i\), then either \(\varphi (bc)=i\) or \(\varphi (cb)=i\), otherwise there is \(N^i(abcd)\). The case \(\varphi (bc)=i\) is not possible, otherwise there is \(D_3^i(bcd)\). If \(\varphi (cb)=i\), then \(\{a,c,d\}\) is a module in h.

Assume now that in Case (Iiv) we have \(|D_{ad}|=2\). Again, since \({\varDelta }(acd)\), we have \(D_{ad}=D_{cd}\). Assume that \(j\in D_{ad}\). There are two case, either \(\varphi (ad)=\varphi (cd)=j\ne i\) or \(\varphi (ad)=\varphi (dc)=j\ne i\). However, the latter case is not possible, otherwise there is \(D_3^j(acd)\). Hence, let \(\varphi (ad)=\varphi (cd)=j\ne i\). Since \({\varDelta }(abd)\) we can conclude that either \(\varphi (bd)=i\), or \(\varphi (db)=i\), or \(\varphi (bd)=j\), or \(\varphi (db)=j\). The cases \(\varphi (bd)=i\) and \(\varphi (db)=j\) are not possible, otherwise there is \(D^i_3(abd)\) and \(D^j_3(abd)\), respectively. If \(\varphi (db)=i\), then \(\varphi (bc)=i\) or \(\varphi (cb)=i\), otherwise there is \(N^i(abcd)\). This case can be treated as in the previous step and we obtain the module \(\{a,c,d\}\) in h. If \(\varphi (bd)=j\), then \(\{a,b,c\}\) is a module in h.

Consider now Case (II) \(\varphi (ba)=\varphi (ac)=i\), and \(\varphi (ab)=\varphi (ca)=i'\ne i\). Hence, \(\varphi (bc)=i\), otherwise there is \(D_3^i(abc)\) or \(C_3^i(abc)\). Again, since \({\varDelta }(acd)\), we have one of the four distinct cases (i), (ii), (iii) or (iv), as in Case (I).

Consider the Case (IIi). If \(\varphi (dc)=i\), then \(\{a,b,d\}\) is a module in h. Thus, assume \(\varphi (cd)=i\). The case \(\varphi (da)=i\) is not possible, since then there is \(C_3^i(acd)\). If \(\varphi (ad)=i\), then \(\varphi (bd)=i\), otherwise there is \(D_3^i(abd)\) or \(C_3^i(abd)\). Now, \(\{a,c,d\}\) is a module in h.

Now, Case (IIii). The case \(\varphi (da)=i\) is not possible, otherwise there is \(D_3^i(acd)\) and thus, \(\varphi (ad)=i\). Then \(\varphi (bd)=i\), otherwise there is \(D^i_3(abd)\) or \(C^i_3(abd)\). Therefore, \(\{a,c,d\}\) is a module in h.

Consider the Case (IIiii). The case \(\varphi (cd)=i\) is not possible, otherwise there is \(D_3^i(acd)\). Thus, \(\varphi (dc)=i\) and therefore, \(\{a,b,d\}\) is a module in h.

In Case (IIiv) and since \({\varDelta }(acd)\), we have \(D_{ad}=D_{cd}\). If \(D_{ad}=\{j\}\) and thus, \(|D_{ad}|=1\), then \({\varDelta }(abd)\) implies that either \(\varphi (bd)=j\ne i\), or \(\varphi (bd)=i\), or \(\varphi (db)=i\). If \(\varphi (bd)=j\ne i\), then \(\{a,b,c\}\) is a module in h. If \(\varphi (bd)=i\), then \(\{a,c,d\}\) is a module in h. The case \(\varphi (db)=i\) cannot happen, otherwise there is \(D_3^i(bcd)\).

If \(|D_{ad}|=2\) and \(j\in D_{ad}\), then there are two cases either \(\varphi (ad)=\varphi (cd)=j\ne i\) or \(\varphi (ad)=\varphi (dc)=j\ne i\). However, the latter case is not possible, otherwise there is \(D_3^j(acd)\). Hence, let \(\varphi (ad)=\varphi (cd)=j\ne i\). Since \({\varDelta }(abd)\) we can conclude that either \(\varphi (bd)=i\), or \(\varphi (db)=i\), or \(\varphi (bd)=j\), or \(\varphi (db)=j\). The cases \(\varphi (db)=i\) and \(\varphi (db)=j\) are not possible, otherwise there is \(D^i_3(abd)\) and \(D^j_3(abd)\), respectively. If \(\varphi (bd)=i\) or \(\varphi (bd)=j\), then \(\{a,c,d\}\), resp., \(\{a,b,c\}\) is a module in h. \(\square \)

In summary, in each of the cases a substructure h of g with 3 or 4 vertices is non-prime whenever (U1) and (U2) holds. Thus g is . \(\square \)

1.1.2 Proof of Lemma 6

\(\Rightarrow \): Let \(G_i(g)\) be a di-cograph for all \(i\in {\varUpsilon }\). Moreover, assume for contradiction that there is a label \(j\in {\varUpsilon }_{\mathrm {rev}(g)}\) such that \(G_j(\mathrm {rev}(g))\) is not a di-cograph. Then \(G_j(\mathrm {rev}(g))\) contains a forbidden subgraph. Since \(\mathrm {rev}(g)\) is reversible, only the subgraphs \(D_3, C_3, N\), and \(P_4\) are possible. Moreover, by construction of \(\mathrm {rev}(g)\) and because \(\varphi _{\mathrm {rev}(g)}(e)=\varphi _{\mathrm {rev}(g)}(f)\) implies \(\varphi (e)=\varphi (f)\), we have \(G_j(\mathrm {rev}(g)) \subseteq G_k(g)\) for some \(k\in {\varUpsilon }\).

In the following we will show that the existence of one of the forbidden subgraphs \(D_3, C_3, N\), and \(P_4\) in any \(G_j(\mathrm {rev}(g))\) leads to a contradiction. We proceed case by case.

Case: \(G_j(\mathrm {rev}(g))\) contains \(D_3\) for some \(j\in {\varUpsilon }_{\mathrm {rev}(g)}\).

If \(G_j(\mathrm {rev}(g))\) contains \(D_3\) induced by the vertices x, y, z, we can wlog. assume that the vertices are labeled so that \(\varphi _{\mathrm {rev}(g)}(xy)=\varphi _{\mathrm {rev}(g)}(yz)=j\ne \varphi _{\mathrm {rev}(g)}(xz)\), \(\varphi _{\mathrm {rev}(g)}(zy)=\varphi _{\mathrm {rev}(g)}(yx)=k\ne \varphi _{\mathrm {rev}(g)}(zx)\) and \(j\ne k\). By construction of \(\mathrm {rev}(g)\) we obtain \(\varphi (xy)=\varphi (yz)=j', \varphi (zy)=\varphi (yx)=k'\) for some distinct \(j',k'\in {\varUpsilon }\). However, since g does not contain forbidden subgraphs in \(G_{j'}(g)\), there must be an arc connecting x and z with label \(j'\). The possibilities \(\varphi (zx)=j'\ne \varphi (xz)\) and \(\varphi (zx)=\varphi (xz)=j'\) cannot occur, since then \(G_{j'}(g)\) would contain a \(C_3\) or \(\overline{D}_3\) as forbidden subgraph. Hence, it must hold that \(\varphi (xz)=j'\). Analogously, one shows that \(\varphi (zx)=k'\). By construction of \(\mathrm {rev}(g)\), we obtain \(\varphi _{\mathrm {rev}(g)}(xz)=j\), and \(\varphi _{\mathrm {rev}(g)}(zx)=k\); a contradiction.

Case: \(G_j(\mathrm {rev}(g))\) contains \(C_3\) for some \(j\in {\varUpsilon }_{\mathrm {rev}(g)}\).

If \(G_j(\mathrm {rev}(g))\) contains a \(C_3\) induced by the vertices x, y, z, we can wlog. assume that the vertices are labeled so that \(\varphi _{\mathrm {rev}(g)}(xy)=\varphi _{\mathrm {rev}(g)}(yz)=\varphi _{\mathrm {rev}(g)}(zx)\ne \varphi _{\mathrm {rev}(g)}(yx)=\varphi _{\mathrm {rev}(g)}(xz)=\varphi _{\mathrm {rev}(g)}(zy)\). Thus, \(\varphi (xy)=\varphi (yz)=\varphi (zx)=j'\) and \(\varphi (yx)=\varphi (xz)=\varphi (zy)=k'\). We have \(j'\ne k'\) as otherwise \(\varphi _{\mathrm {rev}(g)}(xy)=\varphi _{\mathrm {rev}(g)}(yx)\). Therefore, \(G_{j'}(g)\) contains the forbidden subgraph \(C_3\); a contradiction.

Case: \(G_j(\mathrm {rev}(g))\) contains \(P_4\) for some \(j\in {\varUpsilon }_{\mathrm {rev}(g)}\).

If \(G_j(\mathrm {rev}(g))\) contains a \(P_4\) induced by the vertices a, b, c, d, we can wlog. assume that the vertices are labeled so that \(\varphi _{\mathrm {rev}(g)}(e)=\varphi _{\mathrm {rev}(g)}(f)=j\) for all \(e,f\in E'=\{(a,b),(b,a),(b,c),(c,b),(c,d),(d,c)\}\). For all these arcs \(e,f\in E'\) it additionally holds that \(\varphi (e)=\varphi (f)=j'\). Moreover, for all other arcs \(e\in \{a,b,c,d\}^{\times }_{\mathrm {irr}}{\setminus } E'\) it is not possible that \(\varphi (e)=\varphi (e^{-1})=j'\), as otherwise, \(\varphi _{\mathrm {rev}(g)}(e)=\varphi _{\mathrm {rev}(g)}(e^{-1})=j\) and the \(P_4\) would not be an induced subgraph of \(G_j(\mathrm {rev}(g))\). By the latter argument and since \(G_{j'}(g)\) does not contain an induced \(P_4\) there must be at least one arc \(e\in \{a,b,c,d\}^{\times }_{\mathrm {irr}}{\setminus } E'\) with \(\varphi (e)=j'\), but \(\varphi (e^{-1})\ne j'\). Now full enumeration of all possibilities (which we leave to the reader) to set one, two, or three of these arcs to the label \(j'\) yields one of the forbidden subgraphs \(\overline{D_3}, A, B\) or \(\overline{N}\) in \(G_{j'}(g)\); a contradiction.

Case: \(G_j(\mathrm {rev}(g))\) contains N for some \(j\in {\varUpsilon }_{\mathrm {rev}(g)}\).

If \(G_j(\mathrm {rev}(g))\) contains an N induced by the vertices a, b, c, d, we can wlog. assume that the vertices are labeled so that \(\varphi _{\mathrm {rev}(g)}(ba)=\varphi _{\mathrm {rev}(g)}(bc)=\varphi _{\mathrm {rev}(g)}(dc)= j\ne \varphi _{\mathrm {rev}(g)}(ab)=\varphi _{\mathrm {rev}(g)}(cb)=\varphi _{\mathrm {rev}(g)}(cd)=k\). Thus, \(\varphi (ba)=\varphi (bc)=\varphi (dc)=j'\ne \varphi (ab)=\varphi (cb)=\varphi (cd)=k'\). Since \(G_{j'}(g)\) is cograph, there must be an arc \(e\in E'=\{(a,c), (c,a), (a,d), (d,a), (b,d), (d,b)\}\) with \(\varphi (e)=j'\). Moreover, for this arc e it must hold that \(\varphi (e^{-1})\ne k'\) as otherwise, \(\varphi _{\mathrm {rev}(g)}(e)=j\). The graph \(G_k(\mathrm {rev}(g))\) also contains an N induced by the vertices a, b, c, d. Hence, by analogous arguments there is an \(f \in E', e\ne f\) with \(\varphi (f)=k'\) with \(\varphi (f^{-1})\ne j'\). Assume first that e is (a, c) or (c, a) and thus, \(D_{ac}=\{j',j''\}\) where \(j''=j'\) is allowed. If f is (a, d) or (d, a), then \(D_{ad}=\{k',k''\}\) where \(k''=k'\) is allowed. But then \(D_{acd} = \{\{k',j'\},\{j',j''\},\{k',k''\}\}\) with \(j'\ne k'\) and thus \(| D_{acd} | =3\) violating Condition (U2) in g; a contradiction. If f is (b, d) or (d, b), then \(D_{bd}=\{k',k''\}\) where \(k''=k'\) is allowed. Thus, \(\{j',j''\}, \{k',j'\}\in D_{acd}\) and \(\{k',k''\}, \{k',j'\}\in D_{abd}\). The only way to satisfy \(|D_{acd}|=2\) and \(|D_{abd}|=2\) is achieved by \(D_{ad}=\{k',j'\}\). However, the case \(\varphi (e)=j'\) and \(\varphi (e^{-1})=k'\) with \(e\in E'\) is not allowed. All other cases, starting with \(e\in E'{\setminus }\{(a,c), (c,a)\}\) can be treated analogously.

\(\Leftarrow \): Let \(G_j(\mathrm {rev}(g))\) be a di-cograph for all \(j\in {\varUpsilon }_{\mathrm {rev}(g)}\). Moreover, assume for contradiction that there is a label \(i\in {\varUpsilon }\) such that \(G_i(g)\) is not a di-cograph. Hence, \(G_i(g)\) contains a forbidden subgraph.

In the following we will show that the existence of one of the forbidden subgraphs in any \(G_i(g)\) leads to a contradiction. Again we analyze the possible forbidden subgraph separately.

Case: \(G_i(g)\) contains \(D_3, A\) or B for some \(i\in {\varUpsilon }\).

If \(G_i(g)\) contains a forbidden subgraph \(D_3, A, B\) then there are arcs (a, b), (b, c) contained in these forbidden subgraphs with \(\varphi (ab)=\varphi (bc)=i\) but \(\varphi (ac)\ne i\) and \(\varphi (ca)\ne i\). Moreover, since \(G_j(\mathrm {rev}(g))\) does not contain these forbidden subgraphs for any \(j\in {\varUpsilon }_{\mathrm {rev}(g)}\), we also obtain that \(\varphi (ab)=\varphi (bc)=i\) but \(\varphi (ba)\ne \varphi (cb)\). But this implies that \(|D_{abc}|=3\) in g; a contradiction to (U2).

Case: \(G_i(g)\) contains \(\overline{D_3}\) or \(C_3\) for some \(i\in {\varUpsilon }\).

If \(G_i(g)\) contains a forbidden subgraph \(\overline{D_3}\) or \(C_3\), then there are arcs (a, b), (b, c) contained in these forbidden subgraphs with \(\varphi (ab)=\varphi (bc)=i\) and \(\varphi (ba)\ne i, \varphi (cb)\ne i\). If \(\varphi (ba) = \varphi (cb)\) and the case \(\overline{D_3}\) is contained \(G_i(g)\), then \(G_j(\mathrm {rev}(g))\) contains the \(D_3\) as forbidden subgraph. If \(\varphi (ba) = \varphi (cb)\) and the case \(C_3\) is contained \(G_i(g)\), then \(G_j(\mathrm {rev}(g))\) contains the \(D_3\) or \(C_3\) as forbidden subgraph. Hence, \(\varphi (ba) \ne \varphi (cb)\). For the case \(\overline{D_3}\), we observe that \(|D_{abc}|=3\) in g; a contradiction to (U2). For the case \(C_3\), we can conclude by analogous arguments, \(\varphi (ba) \ne \varphi (ca)\) and \(\varphi (cb) \ne \varphi (ca)\) and again, \(|D_{abc}|=3\) in g; a contradiction.

Case: \(G_i(g)\) contains N for some \(i\in {\varUpsilon }\).

Similarly, if N is contained in \(G_i(g)\) then there are arcs (b, a), (b, c), (d, c) contained in N with \(\varphi (ba)=\varphi (bc)=\varphi (dc)=i\) and \(\varphi (e) \ne i\) for all \(e \in \{(a,c), (c,a), (b,d), (d,b)\}\). Since \(G_j(\mathrm {rev}(g))\) does not contain N it holds that \(\varphi (ab) \ne \varphi (cb)\) or \(\varphi (cb) \ne \varphi (cd)\). If \(\varphi (ab) \ne \varphi (cb)\) then \(|D_{abc}|=3\), as \(\varphi (ac) \ne i\) and \(\varphi (ca) \ne i\); a contradiction to (U2). On the other hand, if \(\varphi (cb) \ne \varphi (cd)\) then \(|D_{bcd}|=3\), as \(\varphi (bd) \ne i\) and \(\varphi (db) \ne i\); again a contradiction to (U2).

Case: \(G_i(g)\) contains \(P_4\) for some \(i\in {\varUpsilon }\).

The \(P_4\) on four vertices a, b, c, d cannot be contained in any \(G_i(g)\), since for any two arcs \(e,f\in E'=\{(a,b),(b,a),(b,c),(c,b),(c,d),(d,c)\}\) of this \(P_4\) it still holds \(\varphi _{\mathrm {rev}(g)}(e)=\varphi _{\mathrm {rev}(g)}(f) = i'\) and for any arc e not in \(E', \varphi _{\mathrm {rev}(g)}(e)\ne i'\). Hence, if \(G_i(g)\) contains a \(P_4\), then \(G_{i'}(\mathrm {rev}(g))\) contains a \(P_4\) as forbidden subgraph; a contradiction.

Case: \(G_i(g)\) contains \(\overline{N}\) for some \(i\in {\varUpsilon }\).

If \(G_i(g)\) contains the forbidden subgraph \(\overline{N}\) on four vertices a, b, c, d, then for the three arcs \(e_1,e_2,e_3\) with \(\varphi (e_j)=\varphi (e_j^{-1})=i\), it still holds, that \(\varphi _{\mathrm {rev}(g)}(e_j)=\varphi _{\mathrm {rev}(g)}(e_j^{-1})=i', 1\le j\le 3\). However, for the other arcs \(f_1,f_2,f_3\) with \(\varphi (f_j)=i\ne \varphi (f_j^{-1})\), we can infer that \(\varphi _{\mathrm {rev}(g)}(f_j)\ne i'\) and \(\varphi _{\mathrm {rev}(g)}(f_j^{-1})\ne i'\). Thus, \(G_{i'}(\mathrm {rev}(g))\) contains a \(P_4\) on the three edges \(e_1\), \(e_2, e_3\) as forbidden subgraph; a contradiction. \(\square \)

1.2 Algorithmic considerations

We show that the characterization of 2-structures in terms of di-cographs and 1-clusters (cf. Theorem 6(4)) can be used to derive a simple algorithm for the recognition of 2-structures. In the following the integer n will always denote |V| as a measure of the input size.

Pseudocode for the recognition procedure is given in Algorithm 1. Furthermore, we give pseudocode for all necessary subroutines (Algorithms 2 to 6). We omit the procedure for computing the modular decomposition \(\mathbb {M}_{\mathrm {str}}(G)\) of a digraph \(G=(V,E)\), as McConnell and de Montgolfier McConnell and Montgolfier (2005) already presented an \(O(|V|+|E|)\) time algorithm for this problem.

We first prove the correctness of Algorithms 4, 5, and 6.

Lemma 10

Given a digraph G and its modular decomposition \(\mathbb {M}_{\mathrm {str}}(G)\), Algorithm 4 recognizes whether G is a di-cograph or not.

Proof

At first, Algorithm 4 computes the inclusion tree T of \(\mathbb {M}_{\mathrm {str}}(G)\) and then iterates over all strong modules \(M \in \mathbb {M}_{\mathrm {str}}(G)\). For each strong module M two arbitrary but distinct children \(M',M'' \in \mathbb {M}_{\mathrm {str}}(G)\) of M in T are selected and it is checked if there is an arc between two vertices \(x \in M'\) and \(y \in M''\). If G is a di-cograph and there is an arc \((x,y) \in E\) or \((y,x) \in E\), then by Remark 3, M must be either series or order. In other words, if we have found an \((x,y) \in E\) or \((y,x) \in E\), but M is neither series nor order, it must be prime which implies that G was not a di-cograph. However, it might be possible, that the chosen elements x and y do not form an arc \((x,y) \in E\) or \((y,x) \in E\), but then M is either prime or parallel. If M is prime there must be arcs \((x',y')\) or \((y',x')\), that we might have not observed in the preceding step, where \(x'\in M'\), \(y'\in M''\) for some children \(M',M''\) of M, otherwise M would be parallel. However, this case is covered by counting the numbers of all arcs between the vertices of maximal strong submodules contained in series or order modules M. If the accumulated number e of all counted arcs is equal to the number of arcs |E| in G, then all modules \(M' \in \mathbb {M}_{\mathrm {str}}(G)\) which are neither series nor order must be parallel. Hence, no prime modules exists and therefore G is a di-cograph. \(\square \)

Lemma 11

Given a di-cograph \(G_i\) and its modular decomposition \(\mathbb {M}_{\mathrm {str}}(G_i)\), Algorithm 5 computes the 1-clusters \(\mathscr {C}_i^1\) of \(G_i\).

Proof

At first, Algorithm 5 computes the inclusion tree T of \(\mathbb {M}_{\mathrm {str}}(G_i)\). Then, for each strong module M two arbitrary vertices from distinct children \(M',M'' \in \mathbb {M}_{\mathrm {str}}(G)\) of M in T are selected. If there is an arc \((x,y) \in E\) or \((y,x) \in E\), then by Remark 3, M cannot be parallel and hence, M is a 1-cluster and therefore, has to be added to the set of 1-clusters \(\mathscr {C}_i^1\). \(\square \)

The next lemma shows that Algorithm 6 correctly recognizes, whether \(\mathscr {C}^1(\mathrm {rev}(g))\) is a hierarchy or not. However, due to efficiency and also simplicity of the algorithm, we deal here with multisets, \(\mathscr {C} =\biguplus _{i \in {\varUpsilon }_{rev(g)}} \mathscr {C}_i^1\). The symbol “\(\biguplus \)” denotes the multiset-union of sets where the multiplicity of an element M in \(\mathscr {C}\) is given by the number of sets that contain M.

Lemma 12

Given a multiset \(\mathscr {C} =\biguplus _{i \in {\varUpsilon }_{rev(g)}} \mathscr {C}_i^1\) of the 1-clusters of a set of di-cographs \(G_i=(V,E_i)\), Algorithm 6 recognizes whether \(\mathscr {C}^1=\bigcup _{i \in {\varUpsilon }_{rev(g)}} \mathscr {C}_i \cup \{v | v \in V\}\) is a hierarchy or not.

Proof

Note that the multiset \(\mathscr {C}\) may contain a cluster C more than once, as C can be part of different 1-clusters \(\mathscr {C}_i^1\). Furthermore, \(\mathscr {C}\) does not contain the singletons. However, it is easy to see that \(\mathscr {C}^1\) is a hierarchy if and only if the singletons are contained in \(\mathscr {C}^1\) (which is satisfied by construction), there is a 1-cluster equal to V and for all \(C', C'' \in \mathscr {C}\) it holds that \(C' \cap C'' \in \{C',C'',\emptyset \}\). The latter is equivalent to the following statement. For all \(C', C'' \in \mathscr {C}, |C'| \le |C''|\) it holds that either \(C' \cap C'' = \emptyset \) or \(C' \subseteq C''\).

In Line 4, a list \(\mathscr {C}_\le \) is created with all \(C \in \mathscr {C}\) being sorted ascending by cardinality. Hence, \(\mathscr {C}_\le (|\mathscr {C}_\le |)\) is one of the largest clusters. In Line 6, it is checked if this largest cluster contains all elements from the ground set \(V=\{1, \ldots , n\}\). If not then \(V \notin \mathscr {C}\) and therefore \(\mathscr {C}^1\) is not a hierarchy. In Lines 9 to 14, lists \(\mathscr {L}_i\) are created, containing all clusters \(C \in \mathscr {C}\) with \(i \in C\). The relative order of clusters in \(\mathscr {L}_i\) is identical to the relative order of clusters in \(\mathscr {C}_\le \). In each iteration of Lines 16 to 28 the smallest cluster L is selected among all remaining clusters \(\bigcup _{i=1}^n \mathscr {L}_i\). For each \(i \in L\) obviously \(L \in \mathscr {L}_i\). If \(s,t \in L\) then it is checked if \(\mathscr {L}_s = \mathscr {L}_t\). This can be done, as \(\mathscr {L}_s\) and \(\mathscr {L}_t\) have the same relative order of clusters. If \(s,t \in L\) and \(\mathscr {L}_s = \mathscr {L}_t\) then it follows that \(s,t \in L'\) for all \(L' \in \mathscr {L}_s \cup \mathscr {L}_t\). As this holds for all pairwise distinct \(s,t \in L\) and \(|L| \le |L'|\) for all \(L' \in \bigcup _{i=1}^n \mathscr {L}_i\) it follows that \(L \subseteq L'\) for all \(L' \in \bigcup _{i=1}^n \mathscr {L}_i\) with \(L \cap L' \ne \emptyset \). As \(\mathscr {L}_s = \mathscr {L}_t\) it is sufficient to keep only one of the lists, e.g., \(\mathscr {L}_s\) (Line 24). Finally, L is removed from \(\mathscr {L}_s\) (Line 27) and the while-loop is repeated with the next smallest cluster. \(\square \)

We now show the correctness of Algorithm 1.

Lemma 13

Given a 2-structure \(g = (V,{\varUpsilon },\varphi )\), Algorithm 1 recognizes whether g is or not.

Proof

In fact, Algorithm 1 recognizes, for the reversible refinement rev(g), whether all monochromatic subgraphs \(G_i(rev(g))\) are di-cographs and whether in addition the 1-clusters in \(\mathscr {C}^1(rev(g))\) form a hierarchy. By Theorem 6, this suffices to decide whether g is or not.

It is easy to see that Algorithm 2 computes the reversible refinement of g by means of Definition 9 and Remark 1 with \(\varphi _{\mathrm {rev}(g)}(e) = (\varphi _g(e),\varphi _g(e^{-1}))\). Hence, in Line 2 the reversible refinement \(g'=\mathrm {rev}(g)\) of g is computed.

If \(\mathrm {rev}(g)\) is , then there exists a tree-representation \((T_{\mathrm {rev}(g)},t_{\mathrm {rev}(g)})\). As \(T_{\mathrm {rev}(g)}\) has at most \(n-1\) inner vertices there can be at most \(n-1\) different labels \(t_{\mathrm {rev}(g)}(lca(x,y))=(i,j)\), each composed of at most two distinct labels \(i,j \in {\varUpsilon }_{rev(g)}\). Assuming that all labels are pairwise distinct leads to \(2(n-1) \le |{\varUpsilon }_{rev(g)}|\) distinct labels in total. Hence, if \(|{\varUpsilon }_{rev(g)}| > 2(n-1)\) then rev(g) is not . It is easy to see that, given the 2-structure \(\mathrm {rev}(g)\), Algorithm 3 (which is called in Line 6) computes the respective monochromatic subgraphs \(G_i(\mathrm {rev}(g))\). By Lemma 10, for each \(G_i\) Algorithm 4 (which is called in Line 10) checks whether \(G_i\) is a di-cograph or not, and by Lemma 11 in Line 11 the corresponding 1-clusters \(C_i^1\) are returned. In Line 15, the 1-clusters \(\mathscr {C}_i^1\) of all di-cographs \(G_i\) are collectively stored in the multiset \(\mathscr {C}\), without removing duplicated entries.

Since \(T_{\mathrm {rev}(g)}\) has at most most \(n-1\) inner vertices and since each 1-cluster appears in at most 2 distinct cotrees whenever \(\mathrm {rev}(g)\) is (cf. Lemma 7), we can conclude that \(\mathscr {C}\) can contain at most \(2(n-1)\) elements. Hence, if \(|\mathscr {C}| > 2(n-1)\) then \(\mathrm {rev}(g)\) is not , and therefore, g is not (Line 17).

Finally, by Lemma 12 it is checked in Line 20, if the set of 1-clusters \(\mathscr {C}^1\) is a hierarchy. Hence, TRUE is returned if g is and FALSE else. \(\square \)

Before we show the time complexity of Algorithm 1 we first show the time complexity of the two subroutines Algorithms 4 and 6.

Lemma 14

For a given digraph \(G=(V,E)\) and its modular decomposition \(\mathbb {M}_{\mathrm {str}}\), Algorithm 4 runs in time \(O(n+m)\) with \(n=|V|\) and \(m=|E|\).

Proof

By Lemma 9 computing the inclusion tree T of \(\mathbb {M}_{\mathrm {str}}(G)\) in Line 2 takes time O(n) as there are at most O(n) strong modules. In the for-loop from Line 4 to 14 for each strong module M it is checked, whether or not there is an arc between two arbitrary vertices from two distinct children of M in T. This has to be done for all O(n) strong modules \(M \in \mathbb {M}_{\mathrm {str}}(G)\). Only if there is an arc it is further checked whether M is series or order. This can be done by checking all the arcs between vertices x and y from distinct children of M in T. In both cases (M being series and order) there is at least one arc \((x,y) \in E\) or \((y,x) \in E\), between any pair of vertices x and y. Furthermore, as only vertices from distinct children of M in T are considered, every pair (x, y) is checked at most once once. Hence, the number of all pairwise checks is bounded by O(m). For the same reason, counting the arcs (Line 11) can also be done in O(m) time. This accounts to a running time of \(O(n+m)\) in total. \(\square \)

Lemma 15

For a given multiset of clusters \(\mathscr {C}\) of size N on the ground set \(\{1, \ldots , n\}\), Algorithm 6 runs in time O(nN).

Proof

Computing the identifier id for each cluster in \(\mathscr {C}\) (Line 2) takes time O(N), computing the bit string representation for each cluster in \(\mathscr {C}\) (Line 3) takes time O(nN), and sorting the clusters of \(\mathscr {C}\) (Line 4) using bucket sort with n buckets takes time \(O(N+n)\). The for-loop from Line 5 to Line 15 runs in time O(nN), as there are O(N) clusters in \(\mathscr {C}\) which possibly have to be removed in Line 12 from the respective lists \(\mathscr {L}_i\). The while-loop (Lines 16 to 28) is executed at most O(N) times, as in each iteration one of the N clusters is removed from all the lists \(\mathscr {L}_i\) that contain it (Line 24 and 27). The for-loop from Line 19 to Line 26 is executed for all of the O(n) many elements \(t \in L\). However, as in each execution of the inner loop (Lines 20 to 25) one of the n lists \(\mathscr {L}_i\) gets empty, Lines 20 to 25 are executed n times in total and each execution takes O(N) time. Hence, the time that Algorithm 6 spends on computing Lines 20 to 25 is bounded by O(nN). This sums up to a total running time of O(nN) for Algorithm 6. \(\square \)

Finally, we show the time complexity of \(O(n^2)\) for Algorithm 1.

Lemma 16

For a given 2-structure \(g=(V,{\varUpsilon },\varphi )\) with \(n=|V|\), Algorithm 1 runs in time \(O(n^2)\).

Proof

Computing the reversible refinement of g in Line 2 takes \(O(n^2)\) time using Algorithm 2. In Line 3 it is assured that there are at most \(2(n-1)\) labels and hence \(N = |{\varUpsilon }_{rev(g)}| < 2(n-1)\) monochromatic subgraphs \(G_i(rev(g))\). Computing those O(n) subgraphs at once using Algorithm 3 in Line 6 takes \(O(n^2)\) time. The for-loop from Line 8 to Line 16 runs for each of the O(n) many digraphs \(G_i(rev(g))\). As already stated, there is an \(O(n+m)\) time complexity algorithm for computing the modular decomposition of a digraph (Line 9) given in McConnell and Montgolfier (2005). By Lemma 14, Algorithm 4 (Line 10) has also a time complexity of \(O(n+m)\). Algorithm 5 (Line 11) has a time complexity of O(n), as by Lemma 9 constructing the inclusion tree within Line 3 of Algorithm 5 takes time O(n) as there are at most O(n) strong modules within \(G_i\). Hence, all procedures within the for-loop (Lines 8 to 16) have a time complexity of \(O(n+m)\). Precisely, the time complexity is \(O(n+m_i)\) with \(m_i=|E(G_i(rev(g)))|\) the number of arcs of \(G_i(rev(g))\). The total running time of the for-loop therefore is \(O(n+m_1) + O(n+m_2) + \ldots O(n+m_N) = O(n^2 + \sum _{i=1}^N m_i)\). As each arc (x, y) occurs in exactly one of the digraphs \(G_i(rev(g))\) it follows that \(\sum _{i=1}^N m_i = n(n-1)\), which leads to a running time of \(O(n^2)\) for Line 8 to 16. Line 17 assures that the multiset \(\mathscr {C}\) contains at most \(2(n-1)\) clusters. Hence, \(|\mathscr {C}| \in O(n)\). Therefore, and by Lemma 15 Algorithm 6 runs in time \(O(n^2)\). This leads to a time complexity of \(O(n^2)\) for Algorithm 1. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hellmuth, M., Stadler, P.F. & Wieseke, N. The mathematics of xenology: di-cographs, symbolic ultrametrics, 2-structures and tree-representable systems of binary relations. J. Math. Biol. 75, 199–237 (2017). https://doi.org/10.1007/s00285-016-1084-3

Download citation

Received: 08 March 2016
Revised: 20 November 2016
Published: 30 November 2016
Issue Date: July 2017
DOI: https://doi.org/10.1007/s00285-016-1084-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The mathematics of xenology: di-cographs, symbolic ultrametrics, 2-structures and tree-representable systems of binary relations

Abstract

Access this article

Similar content being viewed by others

Beyond Representing Orthology Relations by Trees

On tree representations of relations and graphs: symbolic ultrametrics and cograph edge decompositions

Reconstructing gene trees from Fitch’s xenology relation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Proofs of Proposition 1 and Lemma 6

1.1.1 Proof of Proposition 1

Claim 1

Proof of Claim 1

Claim 2

Proof of Claim 2

1.1.2 Proof of Lemma 6

1.2 Algorithmic considerations

Lemma 10

Proof

Lemma 11

Proof

Lemma 12

Proof

Lemma 13

Proof

Lemma 14

Proof

Lemma 15

Proof

Lemma 16

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation