Graphillion: software library for very large sets of labeled graphs
 1.8k Downloads
 4 Citations
Abstract
Several graph libraries have been developed in the past few decades, and they were basically designed to work with a few graphs. However, there are many problems in which we have to consider all subgraphs satisfying certain constraints on a given graph. Since the number of subgraphs can increase exponentially with the graph size, explicitly representing these sets is infeasible. Hence, libraries concerned with efficiently representing a single graph instance are not suitable for such problems. In this paper, we develop Graphillion, a software library for very large sets of (vertex)labeled graphs, based on zerosuppressed binary decision diagrams. Graphillion is not based on a traditional representation of graphs. Instead, a graph set is simply regarded as a “set of edge sets” ignoring vertices, which allows us to employ powerful tools of a “family of sets” (a set of sets) and permits large graph sets to be handled efficiently. We also utilize advanced graph enumeration algorithms, which enable the simple family tools to understand the graph structure. Graphillion is implemented as a Python library to encourage easy development of its applications, without introducing significant performance overheads. In experiments, we consider two case studies, a puzzle solver and a power network optimizer, in which several operations and heavy optimization have to be performed over very large sets of constrained graphs (i.e., cycles or forests with complicated conditions). The results show that Graphillion allows us to manage a huge number of graphs with very low development effort.
Keywords
Graph Set Software library Family algebra Frontierbased search Binary decision diagram1 Introduction
A graph is a representation of a set of edges, each of which connects a pair of vertices. Graphs are often used as a mathematical model for a variety of problems. Researchers have developed many sophisticated graph libraries, but the design focuses on handling a small number of graphs. Thus they cannot work with very large sets of graphs, even though the set can grow exponentially with graph size since a graph with \(N\) edges induces up to \(2^N\) subgraphs. A graph library that could efficiently manage very large and complex sets of graphs within a small amount of memory would provide a novel way for powerful graph operations; e.g., an optimizer that efficiently finds the best graph from a nonconvex graph set, and a graph database that can select all matched graphs from a very large set. To the best of our knowledge, there is no library that has been designed to handle such large sets of graphs.
In this paper, we introduce Graphillion, a software library optimized for very large sets of graphs. Traditional graph libraries maintain each graph individually, which leads to poor scalability, while Graphillion handles a set of graphs collectively without considering graphs individually. Graphillion concentrates on edgeinduced subgraphs of a given (vertex)labeled graph \(G = (V, E)\), and a set of graphs is reduced to a set of edge collections,^{1} or a family of sets of edges more formally; i.e., a set of graphs, \(\{G_1=(V,E_1), G_2=(V,E_2)\}\), is regarded as a set of edge collections, \(\{G_1=E_1, G_2=E_2\}\). This reduction loses the properties of each vertex, but allows programmers to apply a powerful theory on the family [8]. A set of collections can be represented in a compressed form by sharing common parts of similar collections, so a huge number of graphs can be stored in a small amount of memory. We also employ efficient algebra called family algebra [6], in order to perform optimization (i.e., finding minimum or maximum weighted graphs), selection, and modification on very large graph sets; the efficiency is due to the fact that they can be executed without decompressing the data.
This family theory, of course, is unconcerned about graph structure like a tree or a path, since it considers a graph to be just an edge collection with no structure. We rectify this omission by employing the graph enumeration algorithm called frontierbased search [4, 5, 10, 13]. The algorithm lists all graphs that have a specified structure, and then the listed graphs (edge collections) are handled by family algebra. The number of graphs listed, of course, can be very enormous, but a recent development in enumeration algorithms allows us to output the graphs in compressed form without enumerating them one by one. This compressed form is easily converted into the compressed form of the family theory [15], and so there is no difficulty to adopt family algebra.
Graphillion is implemented in Python language because of its high productivity. Python is a highlevel programming language with a rich set of libraries (or “modules” in the Python terminology) including NumPy/ SciPy (mathematical computation) [17] and NetworkX (network analysis) [2]. Moreover, Python can be extended by using C or C++ for highperformance numerical computation, and it is wellsuited to scientific and engineering code [12]. However, Python objects must be reinterpreted in every extended function call (e.g., Python’s builtin set object can reinterpret all elements in some function calls), and this overhead would be unacceptable if a very large graph set were involved. Graphillion, in contrast, deals with a whole graph set directly without considering individual graphs, and so only a reference to the set is reinterpreted regardless of the number of graphs in it. In this way, our graph set representation allows us to establish an efficient computation scheme for graph sets via Python’s extension mechanism.
We evaluate the performance and productivity of Graphillion in experiments. We first measure the performance on simple operations. The results show that Graphillion needs only 500 MB of memory to process a very large set of \(10^{37}\) trees in 10 s (just one second for some operations). We then present two case studies, a puzzle solver and a power network optimizer, and reveal that Graphillion reduces the lines of code by 90 % with an acceptable performance overhead. In the power network optimization, our optimizer, which only needs a thousand lines of code, searches a nonconvex set of \(10^{58}\) feasible graphs and finds the optimal graph in just 1 min.
The rest of this paper is organized as follows: Section 2 gives an overview of Graphillion. Sections 3 and 4 discuss the theoretical aspects of Graphillion. Section 5 describes its implementation, and Sect. 6 reports the experiments and results. Section 7 summarizes related work, and Sect. 8 concludes the paper.
2 Overview

High performance Graphillion processes very large sets of graphs efficiently in terms of both space and time. It is implemented as a Python module with C++ extensions. A set of graphs is represented in a compressed form of a C++ object, which is created by frontier search (Fig. 1a) and is manipulated by family algebra (Fig. 1b). Since only the reference to the set is exposed to the Python world, the function call overhead is very small and its impact is independent of the size of the C++ object. Only minimum necessary graphs are extracted from the set through iterators, so there is no need to restore all the graphs in the object (Fig. 1c).

High productivity Graphillion makes it easy to develop applications that deal with very large graph sets. Graphillion follows the programming interface of the builtin set class in Python (Fig. 1b, c), and so it is very easy for Python programmers to use. Since we redesign family algebra to suit graph sets, it is tractable to write complicated operations over graph sets, such as optimization, selection, and modification. Since Python is a generalpurpose programming language with a rich set of modules, programmers can implement their tasks just using Python and they are freed from the need to coordinate multiple programs in different languages. We evaluate the productivity by the number of code lines in this paper.
3 Representations of a graph and the set
3.1 Representation of a graph
Our graph model puts no restriction on edge type,^{2} but this paper treats only simple undirected edges with no selfloops for simplicity. Edges can be weighted.
3.2 Representation of a set of graphs
Number of trees versus memory needed by ZDD
Grid size  Number of trees  Memory of ZDD [byte] 

\(2\times 2\)  10  990 
\(3\times 3\)  750  9,870 
\(4\times 4\)  7,37,354  61,830 
\(5\times 5\)  8,96,59,81,766  3,35,190 
\(6\times 6\)  1,33,41,22,53,35,91,284  23,64,750 
\(7\times 7\)  2,41,75,10,62,60,51,12,71,73,092  1,81,68,510 
\(8\times 8\)  53,14,03,15,31,28,26,65,03,00,53,06,20,174  5,63,21,790 
\(9\times 9\)  1,41,30,43,45,22,30,40,66,55,78,92,21,37,31,29,70,09,012  20,71,15,950 
4 Creation and manipulation of a set of graphs
This section describes the creation of a graph set using frontier search and the use of family algebra to manipulate set contents.
4.1 Creation of a set of graphs
We build a ZDD representing a set of graphs by using a graph enumeration algorithm called frontierbased search [13].^{4} Frontier search finds all graphs that have a specified structure.
It outputs the enumerated graphs in a compressed form that is easily converted into a ZDD [15]. The time complexity is determined by the size of the compressed form (slightly larger than that of corresponding ZDD), not by the number of graphs being output.
We briefly describe frontier search. Consider a tree that represents a set of graphs. On the tree, a node of depth \(i\) corresponds to \(i\)th edge of universe (\({{e}_{i} \,{\in }\, {E}_{u}}\)), and a branch incident from the node is labeled to indicate whether the \(i\)th edge is included to the collection (\({e_{i}\,\in \,{E}}\)). A path from the root to a leaf corresponds to an edge collection (\({{E}\,{\subseteq }\, {E}_{u}}\)), and a leaf indicates whether the path is included to the set. Two tree nodes can be shared if their subtrees are identical, which compresses the tree into a directed acyclic graph. Frontier search constructs such a directed acyclic graph by examining the universe graph without backtracking. A branch is pruned if all the paths through the branch cannot lead to the specified structure.
Creation methods for graph sets
Structure  Parameters 

Tree  A root vertex, spanning or not 
Forest  Root vertices, spanning or not 
Path  Terminal vertices, Hamiltonian or not 
Cycle  Hamiltonian or not 
Clique  Size 
Connected component  Vertices to be connected 
Some simple sets of graphs can be created by ZDD’s primitives without frontier search: e.g., the empty set and the power set are given by the ZDD’s primitives, and small graph sets can be created by explicitly specifying the graphs (edge collections).
4.2 Manipulation of a set of graphs
Family algebra defines several operations on sets of collections, and the operations can be efficiently performed over ZDDs [6]. Surprisingly, these operations can be executed directly on the compressed data, so they are highly efficient. In this subsection, we describe the operations for optimization, selection, and modification, in the context of graph sets.
Selection operations for graph sets
Operation  Definition 

Union  \(\mathcal{G}_1\cup \mathcal{G}_2=\{GG\in \mathcal{G}_1\vee G\in \mathcal{G}_2\}\) 
Intersection  \(\mathcal{G}_1\cap \mathcal{G}_2=\{GG\in \mathcal{G}_1\wedge G\in \mathcal{G}_2\}\) 
Difference  \(\mathcal{G}_1{\setminus }\mathcal{G}_2=\{GG\in \mathcal{G}_1\wedge G\not \in \mathcal{G}_2\}\) 
Symmetric difference  \(\mathcal{G}_1\oplus \mathcal{G}_2=(\mathcal{G}_1{\setminus } \mathcal{G}_2)\cup (\mathcal{G}_2{\setminus } \mathcal{G}_1)\) 
Subgraphs  \(\mathcal{G}_1{\curvearrowleft }\mathcal{G}_2=\{G_1\in \mathcal{G}_1\exists G_2\in \mathcal{G}_2 (G_1\subseteq G_2)\}\) 
Supergraphs  \(\mathcal{G}_1{\curvearrowright }\mathcal{G}_2=\{G_1\in \mathcal{G}_1\exists G_2\in \mathcal{G}_2 (G_1\supseteq G_2)\}\) 
Maximal graphs  \(\mathcal{G}^\uparrow =\{G_1\in \mathcal{G}G_2\in \mathcal{G}\wedge G_1\subseteq G_2 \rightarrow G_1=G_2\}\) 
Minimal graphs  \(\mathcal{G}^\downarrow =\{G_1\in \mathcal{G}G_2\in \mathcal{G}\wedge G_1\supseteq G_2 \rightarrow G_1=G_2\}\) 
Modification operations for graph sets
Operation  Definition 

Graft (join \(\sqcup \))  \(\mathcal{G}\sqcup \{E\}=\{G\cup EG\in \mathcal{G}\}\) 
Remove (meet \(\sqcap \))  \(\mathcal{G}\sqcap \{E^c\}=\{G\cap E^cG\in \mathcal{G}\}\) 
Flip (delta \(\boxplus \))  \(\mathcal{G}\boxplus \{E\}=\{G\oplus EG\in \mathcal{G}\}\) 
Graphillion defines other operations like hitting sets [16], random sampling, and counting graphs in a set, but we do not describe them here due to limited space.
5 Implementation
This section describes the implementation of Graphillion. Frontier search and family algebra are implemented in C++, while the programming interface is written in Python. This interface is based on Python’s set; e.g., the size query (len function in Python), membership query (in operation), iterators (for operation), and general set operations (e.g., union). We add graphspecific operations to this interface like supergraphs, graft, and the graphweight optimizers. Our implementation requires 14,965 lines of code in C++ and 2,251 lines in Python.
A graph set object in Python maintains a reference to the corresponding ZDD object of C++ (see Fig. 1). The graph set object is very lightweight, since it has no attribute other than the reference. The selection methods return a new graph set object that refers to the associated ZDD object. The modification methods just replace their reference with a new reference to the new ZDD object. The optimizers are implemented as a Python iterator, which runs a loop stepbystep and yields the best graphs one at a time instead of extracting all of them at once.
Vertices and edges are simply indexed by integers in C++ to improve the efficiency, while any hashable object can be used as a vertex in Python for better productivity^{5} (an edge is just a tuple of two vertex objects). Graphillion provides a transparent mechanism to convert between integers and objects by maintaining the mapping. The mapping is created automatically at universe registration, which must be done at the beginning of the code. If edges not found in the universe are used, an exception is raised.
In order to enhance productivity further, any type of graph object (e.g., a NetworkX graph) can be used in Graphillion. A graph object is transparently converted into the Graphillion’s internal representation (an edge collection) by userdefined converters. Programmers can use Graphillion as an enhancement tool for their favorite graph modules simply by registering the converters.
6 Experiments
In this section, we consider the performance of Graphillion’s operations. We then discuss two case studies, a puzzle solver and a power network optimizer, to examine the tradeoff between performance and productivity. All experiments were conducted with Python 2.7 and GCC 4.7 on Linux 2.6 using a single core in Intel Xeon E31290 (3.60 GHz) with 32 GB of RAM.
6.1 Basic performance
We evaluate the performance by using a set of trees rooted at a corner on a grid graph. The set size is shown in Table 1. Creation performance is measured by building a set of the trees. Selection performance is evaluated by calculating the union of two sets of trees; trees in one set are rooted at a corner while those in the other set are rooted at the diagonally opposite corner. Modification performance is evaluated by grafting an edge to all trees. Finally, optimization performance is measured by finding the top3 weighted trees with the maximization operation.
We measured the CPU time and the memory usage of these operations with and without Graphillion. In the implementation without Graphillion, graphs are created as NetworkX objects, and are stored in Python’s builtin set object (the union operation is provided by the builtin set, but the other operations were added by us). In order to evaluate Python’s overhead, we developed a pure C++ implementation of the operations just for the experiments.
6.2 Puzzle solver
Lines of code for Slitherlink solvers
Implementation  C++  Python 

w/o Graphillion  2,116  0 
w/ Graphillion  0  153 
We can obtain the top\(k\) longest or shortest cycles with Graphillion’s iterators, when the problem has multiple solution cycles. It took just another 0.24 s to find the three longest cycles from among the 117059496 solutions to the modified problem.
6.3 Power network optimizer
Lines of code for power network optimizers
Implementation  C++  Python 

w/o Graphillion  6,856  1,221 
w/ Graphillion  0  1,164 
Additionally, Graphillion can be used as a graph database of feasible forests. We issued queries specifying an open or closed switch to select all the forests matching the queries, as shown in Fig. 3b. Graphillion processed the queries within just 1.5 s for a closed switch and within 0.5 s for an open switch in the largest network.
7 Related work
There are several existing graph libraries, including NetworkX [2] and Boost Graph Library [14], which are widely used for graph analysis. They are, however, designed for a small number of graphs or a simple power set of edges: that is, they can find a shortest path from just a power set of edges without constraints. In contrast, Graphillion can find shortest paths from a large and complex set of graphs;
given a constrained set of graphs, which could be created by Graphillion operations, we first select paths from the constrained set (see Sect. 4.1) and then find minimum weighted paths from them (see Sect. 4.2).
We often use general optimizers like CPLEX^{9} for graph optimization. However, they require us to describe the constraints in simple formulae, but many practical problems are too complicated to permit this. The algebraic approach provided by Graphillion sometimes works well as shown by the power network optimization, which cannot be solved by general optimizers. In addition, general optimizers are not designed to search for multiple solutions, while Graphillion provides iterators that yield the top\(k\) solutions.
Graph databases [1] store multiple graphs and provide selection methods based on graph structure. However, they cannot store as many graphs as Graphillion can, because they do not employ an efficient graph set representation.
VSOP [9] employs family algebra like Graphillion, but it provides an abstraction for combinatorial item sets, not graph sets. Frontier search is, of course, not implemented in VSOP, and so it does not create graph sets of a given structure efficiently. Since VSOP runs on its own interpreter, we cannot utilize Python’s rich collection of libraries.
8 Conclusions
In this paper, we have introduced Graphillion, which is a software library designed for very large sets of graphs. Our representation of a graph set allows us to utilize the theory of the “family of sets”, which can compress graph sets and manipulate them efficiently. Graphillion is implemented in Python and provides a sophisticated but easytouse interface. Experiments showed the excellent performance of Graphillion. Two case studies showed that programmers can handle very large graph sets with just a small number of lines of code. Graphillion can also be used for railway analysis.^{10}
Future work includes a plugin mechanism for operation customization, generalized design for directed graphs and hyper graphs, and analysis of compression ratio on graph set characteristics.
Since we would like to discover more applications for which Graphillion works well, we make it publicly available online at Graphillion’s page^{11} and PyPI (the Python Package Index).^{12}
Footnotes
 1.
In order to describe a set of sets without confusion, the word collection is used to indicate an “inner” set like an edge set, while set is used for an “outer” set like a graph set.
 2.
Edges can be either directed or undirected. They can also be selfloops. Multiple edges can be placed between a same pair of vertices if they are distinguishable. Edges can be hyper edges, which can include any number of vertices.
 3.
There is no rigorous theory that can estimate the compression ratio of binary decision diagrams, but it is believed that they will work well in most practical data applications [18].
 4.
While this enumeration algorithm had no name originally, it was named in [10].
 5.
This is analogous to Python’s builtin set, which accepts any hashable object as an element.
 6.
Selection requires 500 MB of memory, which is slightly larger than double the theoretical value (207 MB), shown in Table 1, because of the unused slots in the hash table used to maintain ZDDs.
 7.
 8.
We should be careful when comparing the numbers of lines between Python and C++, since they have different grammars and syntaxes: e.g., Python does not require opening and closing brackets while C++ does. However, in our experiments, the average line length in C++ is not significantly shorter than that in Python (26.0 characters per line in C++ compared with 32.2 characters in Python). Moreover, they both use the objectoriented programming model. We, therefore, believe that our evaluation roughly measures the productivity.
 9.
 10.
 11.
 12.
References
 1.Angles, R., Gutierrez, C.: Survey of graph database models. ACM Comput. Surv. 40(1), 1:1–1:39 (2008)CrossRefGoogle Scholar
 2.Hagberg, A., Swart, P., S Chult, D.: Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conference (SciPy 2008), pp. 11–16 (2008)Google Scholar
 3.Inoue, T., Takano, K., Watanabe, T., Kawahara, J., Yoshinaka, R., Kishimoto, A., Tsuda, K., Minato, S., Hayashi, Y.: Distribution loss minimization with guaranteed error bound. IEEE Trans. Smart Grid 5(1), 102–111 (2014)CrossRefGoogle Scholar
 4.Iwashita, H., Minato, S.: Efficient topdown ZDD construction techniques using recursive specifications. Tech. rep., Hokkaido University, Division of Computer Science, TCS Technical Reports, TCSTRA1369 (2013). http://wwwalg.ist.hokudai.ac.jp/tra.html
 5.Kawahara, J., Inoue, T., Iwashita, H., ichi Minato, S.: Frontierbased search for enumerating all constrained subgraphs with compressed representation. Tech. rep., Hokkaido University, Division of Computer Science, TCS Technical Reports, TCSTRA1476 (2014). http://wwwalg.ist.hokudai.ac.jp/tra.html
 6.Knuth, D.E.: 7.1.4 Binary Decision Diagrams, vol. 4A. AddisonWesley, USA (2011)Google Scholar
 7.Lavaei, J., Rantzer, A., Low, S.: Power flow optimization using positive quadratic programming. In: Proceedings of the 18th IFAC World Congress (2011)Google Scholar
 8.Minato, S.: Zerosuppressed BDDs for set manipulation in combinatorial problems. In: Proceedings of Conference on Design Automation, pp. 272–277 (1993)Google Scholar
 9.Minato, S.: VSOP (valuedsumofproducts) calculator for knowledge processing based on zerosuppressed BDDs. Federation over the Web. Lecture Notes in Computer Science, vol. 3847, pp. 40–58. Springer, Berlin, Heidelberg (2006)Google Scholar
 10.Minato, S.: Techniques of BDD/ZDD: brief history and recent activity. IEICE Trans. Inf. & Syst. E96D(7) (2013)Google Scholar
 11.Nikoli: Slitherlink 1 (1992)Google Scholar
 12.Oliphant, T.E.: Python for scientific computing. Comput. Sci. Eng. 9(3), 10–20 (2007)CrossRefGoogle Scholar
 13.Sekine, K., Imai, H., Tani, S.: Computing the Tutte polynomial of a graph of moderate size. In: Algorithms and Computations, Lecture Notes in Computer Science, vol. 1004, pp. 224–233. Springer (1995)Google Scholar
 14.Siek, J.G., Lee, L.Q., Lumsdaine, A.: The Boost Graph Library: User Guide and Reference Manual. AddisonWesley Professional, USA (2001)Google Scholar
 15.Sieling, D., Wegener, I.: Reduction of OBDDs in linear time. Inform. Process. Lett. 48(3), 139–144 (1993)CrossRefMathSciNetzbMATHGoogle Scholar
 16.Toda, T.: Hypergraph transversal computation with binary decision diagrams. In: Experimental Algorithms, Lecture Notes in Computer Science, vol. 7933, pp. 91–102. Springer (2013)Google Scholar
 17.Walt, S.V.D., Colbert, S., Varoquaux, G.: The NumPy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13(2), 22–30 (2011)Google Scholar
 18.Yoshinaka, R., Kawahara, J., Denzumi, S., Arimura, H., Minato, S.: Counterexamples to the longstanding conjecture on the complexity of BDD binary operations. Inform. Process. Lett. 112(16), 636–640 (2012)Google Scholar
 19.Yoshinaka, R., Saitoh, T., Kawahara, J., Tsuruma, K., Iwashita, H., Minato, S.: Finding all solutions and instances of numberlink and slitherlink by ZDDs. Algorithms 5(2), 176–213 (2012)CrossRefMathSciNetGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.