Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The quest of the Tree of Life arose centuries ago, and one of the first illustrations of an evolutionary tree was produced by Charles Darwin in 1859, in his book “The Origin of Species”. Over a century later, evolutionary biologists still used phylogenetic trees to depict evolution. A phylogenetic tree T on X is obtained by labeling the leaves of a tree by the set of taxa \(X = \{x_1, x_2, \ldots , x_n\}\). Each taxon \(x_i\) represents a species or an organism.

The branches of the phylogenetic trees represent the evolution of species, and sometimes the length of their edges is scaled in order to represent the time.

As pointed out in [4], molecular phylogeneticists were failing to find the true tree of life, not because their methods were inadequate or because they had chosen the wrong genes, but perhaps because the history of life cannot be properly represented as a tree. Indeed, the mechanisms of horizontal gene transfer, hybridization and genetic recombination necessitate the use of phylogenetic network models to illustrate them.

There are many different types of phylogenetic networks which can be separated in two main classes according to [8]: implicit phylogenetic networks that provide tools to visualize and analyze incompatible phylogenetic signals, such as split networks [7], and explicit phylogenetic networks that provide explicit scenarios of reticulate evolution, such as hybridization networks [16, 17], horizontal gene transfer networks [6] and recombination networks [5, 10].

Visualization of phylogenetic trees and networks is an important part of this area, since most of these graphs are huge. Furthermore, the usual node-link representation leads to visual clutter. Thus, alternative visualization of phylogenetic trees, such as treemaps, may be preferable.

Treemaps [14], a space filling technique for visualizing large hierarchical data sets, display trees as a set of nested rectangles. The (root of the) tree is the initial rectangle. Each subtree is assigned to a subrectangle, which is then tiled into smaller rectangles representing further subtrees. Space filling visualizations, such as treemaps, have the capacity to display thousands of items legibly in limited space via a two dimensional map. Treemaps have been used in bioinformatics to visualize phylogenetic trees [1], gene expression data [18], gene ontologies [2, 20, 21], and the Encyclopedia of Life [1]. An extension of treemaps is presented in [22], which manages to visualize not only trees, but also Directed Acyclic Graphs (DAGs). As shown in [22], it is not always possible to visualize a DAG with a DAGmap without having node duplications.

In this paper we present space filling techniques that use DAGmap drawings for the visualization of two categories of phylogenetic networks, galled trees and planar galled networks. No node duplications appear in both visualization algorithms that we present. In Sect. 2 we introduce an algorithm which locates the galls of a graph and examines whether this graph is a galled tree or a galled network. In Sect. 3 we describe how to draw the DAGmaps of galled trees, and we examine whether the galled trees and galled networks can be one-dimensionally DAGmap drawn. Finally, in Sect. 4 we present an algorithm for producing DAGmap drawings of planar galled networks.

2 Preliminaries

Let \(G = (V,E)\) be a directed graph (digraph) with \(n = |V|\) nodes and \(m = |V|\) edges. If \(e = (u, v) \in E\) is a directed edge, we say that e is incident from u (or outgoing from u) and incident to v (or incoming to v); edge u is the origin of e and node v is the destination of e. A directed acyclic graph (DAG) is a digraph that contains no cycles. A source of digraph G is a node without incoming edges. A sink of G is a node without outgoing edges. An internal node of G has both incoming and outgoing edges.

A drawing of a graph G maps each node v to a distinct point of the plane and each edge (uv) to a simple open Jordan curve, with endpoints u and v. A drawing is planar if no two edges intersect except, possibly, at common endpoints. A graph is planar if it admits a planar drawing. Two planar drawings of a graph are equivalent if they determine the same circular ordering of the edges around each node. An equivalence class of planar drawings is a (combinatorial) embedding of G. An embedded graph is a graph with a specified embedding. A planar drawing partitions the plane into topologically connected regions that are called faces.

An upward drawing of a digraph is such that all the edges are represented by directed curves increasing monotonically in the vertical direction. A digraph has an upward drawing if and only if it is acyclic. A digraph is upward planar if it admits a planar upward drawing. Note that a planar acyclic digraph does not necessarily have a planar upward drawing. A graph is layered planar if it can be drawn such that the nodes are placed in horizontal rows or layers, the edges are drawn as polygonal chains connecting their end nodes, and there are no edge crossings.

In a phylogenetic network there can be three kind of nodes: root, tree, and reticulation nodes. A root node has no incoming edges. There is only one root node in every rooted phylogenetic network. Tree nodes have exactly one ancestor. Reticulation nodes have more than one ancestors. It is easy to realize that a phylogenetic tree is a phylogenetic network without reticulation nodes.

In addition, there can be two kind of edges: tree, and reticulation edges. A tree edge leads to a node that has exactly one incoming edge. A reticulation edge leads to a node that has more than one incoming edges.

Reticulation cycles are defined as follows. Since there is only one root node in every rooted phylogenetic network, in the corresponding undirected graph every reticulation node belongs to a cycle. This cycle, in the directed graph, is called reticulation cycle.

Fig. 1.
figure 1

The structure of a gall.

A gall is a reticulation cycle in a phylogenetic network that shares no nodes with any other reticulation cycle. It consists of a beginning node \(g_0\), two chains (the left and the right one) and a reticulation node \(g_k\), as shown in Fig. 1. The beginning node \(g_0\) is on level 1 of this subgraph, the reticulation node on level \(k+1\), and the chain nodes are on the i levels, \(i\in \{2, \ldots , k\}\). Every level i may contain either one or two chain nodes. Every node \(g_i\), \(i\in \{0, \ldots , k\}\), of the gall may have a subtree \(t_{i+1}\) as a descendant. These subtrees do not have more connections with this gall, because in that case a reticulation cycle would be created, which would share a node with the gall, and this is not allowed according to the definition of a gall.

A galled tree is a phylogenetic network whose reticulation cycles are galls [5, 23]. This is called the galled tree condition. Considering the definition of a gall, it is easy to realize that the reticulation nodes of a galled tree have indegree two.

A Galled network is a rooted phylogenetic network in which every reticulation cycle shares no reticulation nodes with any other reticulation cycle [9]. This is called the galled network condition.

In contrast to the galled trees, galled networks allow the reticulation cycles to share nodes, as long as they are not reticulation nodes. These reticulation cycles are called loose galls. In the rest of the paper, whenever we refer to loose galls of a galled network, we will use the term galls for simplification.

Galled trees [5, 12, 19, 23] and galled networks [8, 9, 11, 13] have received much attention in recent years. They are important types of phylogenetic networks due to their biological significance and their simple, almost treelike, structure. A galled tree or network may suffice to accurately describe an evolutionary process when the number of recombination events is limited and most of them have occurred recently [5].

Fig. 2.
figure 2

(a) A treemap drawing. (b) A DAGmap drawing.

2.1 The DAGmap Problem

DAGmaps are space filling visualizations of DAGs that generalize treemaps [22]. The main properties of DAGmaps are shown in Fig. 2. In treemaps the rectangle of a child node is included into the rectangle of its parent node (see Fig. 2(a)). In DAGmaps the rectangle of a node is included into the union of rectangles of its ancestors. Also the rectangle of an edge is contained in the intersection of the rectangles of its source and destination nodes (see Fig. 2(b)).

The DAGmap problem is the problem of deciding whether a graph admits a DAGmap drawing without node duplications. Deciding whether or not a DAG admits a DAGmap drawing is NP-complete [22]. Furthermore, the DAGmap problem remains NP-complete even when the graphs are restricted to be galled networks:

Theorem 1

The DAGmap problem for galled networks is NP-complete.

Proof

Omitted due to space limitations.    \(\square \)

2.2 Locating the Galls

The first task is to recognize whether a given phylogenetic network is a galled tree or a galled network. Since they both contain galls, we will need to locate the galls of the given phylogenetic network. This will allow us to check whether our network is a galled tree, a galled network, or none of them. This can be accomplished by the following algorithm:

figure a

This process will discover all the galls of the graph, since every reticulation node corresponds to exactly one gall. In addition, every chain node will be visited a constant number of times if we use a hash table to store the chain nodes. Also, the property that every gall has exactly one reticulation node guarantees that this algorithm will neither leave any gall undiscovered, nor claim to discover a gall that does not exist. Thus, it is straightforward to show that Algorithm 1 runs in \(O(n+m)\) time.

3 DAGmaps for Galled Trees

In this section we present techniques for drawing galled trees as DAGmaps.

3.1 Drawing Galled Trees as DAGmaps

Next, we present a three step algorithm for drawing galled trees as DAGmaps. First, we transform the input galled tree into a tree by collapsing the two chains of each gall into a single chain. Then, we use treemap techniques to draw the tree. Finally, we expand the collapsed galls. Next, we make some interesting observations:

Fact 1

Any node of a galled tree has indegree at most two.

If there were a node with indegree more than two in a galled tree, then this node would belong to more than one reticulation cycles, which means that there would be (more than one) reticulation cycles.

Fact 2

Every galled tree is planar.

This is easy to realize considering that galled trees are almost like trees, but with some branches being made of two parallel chains, instead of one (see Fig. 3). Furthermore, this implies that the number of edges of a galled tree is O(n).

We now present an algorithm for constructing a DAGmap of a galled tree:

Fig. 3.
figure 3

Transformation of a galled tree (a) into a tree (b).

figure b

Step 1 of the above algorithm is illustrated in Fig. 3. The parallel chains have been united, and nodes \(g_{i_l}\), \(g_{i_r}\) have been replaced by node \(g_i\) while the subtrees \(t_{i_l}\) and \(t_{i_r}\) remain unchanged, \(i\in \{1, \ldots , k\}\).

The treemap of T, in Step 2, is drawn under the constraint that the \(g_i\) nodes (which represent the union of nodes \(g_{i_l}\) and \(g_{i_r}\) of the DAG) must always touch both \(t_{i_l}\) and \(t_{i_r}\), in the same direction. This means that if we choose to place \(g_{i_l}\) on the left and \(g_{i_r}\) on the right, where \(i\in \{1, \ldots , k-1\}\), then we will follow this convention for every \(i\in \{1, \ldots , k-1\}\) (see Fig. 4(a)). Drawing the treemap of T needs O(n) time, if we choose a linear time layout algorithm like the slice and dice layout.

Slice and dice [14] is a treemap drawing technique, where the initial rectangle is recursively divided. The direction of each subdivision changes in each level, from horizontal to vertical.

Fig. 4.
figure 4

(a) The treemap drawing of the tree shown in Fig. 3(b). (b) The DAGmap drawing of the gall shown in Fig. 3(a).

The output of Step 3 is shown in Fig. 4(b), where the unified nodes are split. Note that the reticulation node \(g_k\) lies on both \(g_{k-1_l}\) and \(g_{k-1_r}\). This step needs O(n) time, because in the worst case it traverses all the nodes of the graph.

From the above we conclude that:

Theorem 2

Every galled tree admits a DAGmap drawing, which can be computed in O(n) time.

In the next section we show that galled trees can be drawn as one-dimensional DAGmaps.

3.2 Drawing Galled Trees as One-Dimensional DAGmaps

A DAGmap is called one-dimensional if the initial rectangle is sliced only along the vertical (horizontal) direction. Since the height (width) of all the rectangles is constant and equal to the height (width) of the initial drawing rectangle, the problem is one-dimensional.

Next, we show that galled trees can be drawn as one-dimensional DAGmaps.

Theorem 3

Every galled tree can be drawn as a one-dimensional DAGmap.

Sketch of Proof

Let \(G = (V, E)\) be a proper layered DAG with vertex partition \(V = L_1\cup L_2 \cup \ldots \cup L_h\), where \(h > 1\), such that the source (root) is in \(L_h\) and the sinks are in \(L_1\). Tsiaras et al. [22] have shown that a DAG G admits a one-dimensional DAGmap if and only if it is layered planar. We will show that every galled tree is layered planar, using its tree-like structure.

We transform the galled tree G into a tree T, as shown in Fig. 3. We take the vertex partition of T: \(V_T = L_1\cup L_2 \cup \ldots \cup L_h\), where \(h > 1\), such that the source (root) is in \(L_h\) and the sinks are in \(L_1\). Then, we define the vertex partition of the galled tree \(V_G = L_1\cup L_2 \cup \ldots \cup L_h\), where \(h > 1\), such that every node of T which also belongs to G remains at the same layer. Moreover, for every node \(g_i\) of T which belongs to layer \(L_j\) of the partition, and is originated from the union of the nodes \(g_{i_l}\) and \(g_{i_r}\) of G, it is concluded that \(g_{i_l}\) and \(g_{i_r}\) will belong to layer \(L_j\) of the partition \(V_G\).

Since every tree is layered planar and we obtained the vertex partition of G from the vertex partition of T, we conclude that every galled tree admits a one-dimensional DAGmap.    \(\square \)

However, not every planar galled network admits a one-dimensional DAGmap.

Lemma 1

Not every planar galled network admits a one-dimensional DAGmap.

Sketch of Proof

In Fig. 5 an example of such a planar galled network is shown, that does not admit a one-dimensional DAGmap. Node 16 will not be able to be drawn in the line of level 4 without edge crossings. However, as it will be shown in the next section, this graph can be DAGmap drawn.    \(\square \)

Fig. 5.
figure 5

An example of a galled network that does not admit a one-dimensional DAGmap.

4 DAGmaps for Galled Networks

In this section we investigate how to draw galled networks as DAGmaps. From Theorem 1 we have that this problem is NP-complete. Therefore, it is worth examining the problem of drawing planar galled networks as DAGmaps. In the following lemma we show that planar galled networks are a subset of the set of galled networks.

Lemma 2

Not every galled network is planar.

Sketch of Proof

This lemma can be proved by creating a family of galled networks that contain a subgraph homeomorphic to \(K_5\) [15]. Figure 6(a) depicts a Galled network. This is a non planar galled network since it is topologically the same as the network shown in Fig. 6 (b), which is homeomorphic to \(K_5\).    \(\square \)

Fig. 6.
figure 6

An example of a non planar galled network. As we can see the network (a) is the same with the network (b), which is topologically equivalent to \(K_5\).

Since planar galled networks represent phylogenetic networks, it is clear that all edges flow in the same direction monotonically. This means that planar galled networks are upward (downward) planar graphs. Therefore, we have the following:

Fact 3

Planar galled networks are upward planar.

By definition, the phylogenetic networks are single source directed acyclic graphs. Therefore, we have the following:

Fact 4

Each planar galled network is a single source upward planar directed acyclic graph.

Fig. 7.
figure 7

Transformation of a galled network (a) to a galled tree (b).

In order to draw planar galled networks as DAGmaps, without node duplication, we will relax the rule for drawing DAGmaps, which states that every node is drawn as a rectangle. Specifically, we will allow nodes to be drawn as rectilinear cohesive polygons. Next, we present an algorithm that produces DAGmaps of planar galled networks.

figure c

Step 1 of the above algorithm is illustrated in Fig. 7. As shown, every node u that participates in k galls (\(k > 1\)) is being replaced by k nodes \(u_i\), \(i\in \{1, \ldots , k\}\). Each node \(u_i\) participates in only one gall. Consequently GT is a galled tree because there is no gall that shares nodes with any other gall. This step needs O(n) time, because in the worst case it traverses all the nodes of the graph, and the number of edges of a planar graph is O(n).

In Steps 2 and 3 we define the order of all subtrees of the galled tree \(GT\). The goal is to find an ordering such that all splitted nodes are neighbors. We observe that a proper nesting of the galls produces a planar embedding of \(G\). Thus, given a planar embedding \(\varGamma \) of \(G\), it is easy to find the correct order of all subtrees. Specifically, the order of the subtrees of \(GT\) is determined by the clockwise order of the incoming and outgoing edges of each node (to be splitted) in \(\varGamma \). Bertolazzi et al. [3] have shown that if a single source digraph is upward planar, then its drawing can be constructed in O(n) time. Thus, given Fact 4, we can produce an upward planar drawing of a planar galled network in linear time.

The drawings of the DAGmaps of the galls of GT (Step 4) are obtained by executing Algorithm 2. The running time of this algorithm is O(n). Finally, the unification of Step 5 needs O(n) time in the worst case, since it is the reverse procedure of Step 1. The output is shown in Fig. 8.

Fig. 8.
figure 8

The DAGmap drawing of the galled network of Fig. 7(a) produced by Algorithm 3.

Generally speaking, the node splitting process triggers the duplication of all of its out-neighbors. Therefore, the transformation of a DAG into a tree leads to trees with (potentially exponentially) many more nodes than the original DAG. However, the node splitting of Step 1 does not have the exponential effects of the ordinary node duplication, since all the duplicated nodes of this case are neighbors. From the above, we realise that Algorithm 4 takes O(n) time, and combining this with Algorithm 3, we conclude that:

Theorem 4

Every planar galled network admits a DAGmap drawing, which can be computed in O(n) time.

5 Conclusions and Future Work

DAGmaps, an extension of Treemaps, represent an effective space filling visualization method to display and analyze hierarchical data. In this paper we have presented algorithms that use DAGmap drawings for the visualization of two categories of phylogenetic networks, galled trees and planar galled networks. Future work will cover the study of more categories of phylogenetic networks, in addition to answering the question whether one could minimize the number of node duplications performed during Step 1 of Algorithm 3 in the case of non planar galled networks. Furthermore, we intend to develop a visualization tool for processing phylogenetic networks and displaying them as DAGmaps.