Skip to main content
Log in

On computational complexity of graph inference from counting

  • Published:
Natural Computing Aims and scope Submit manuscript

Abstract

In de novo drug design, chemical compounds are quantitized as real-valued vectors called chemical descriptors, and an optimization algorithm runs on known drug-like chemical compounds in a database and outputs an optimal chemical descriptor. Since structural information is needed for chemical synthesis, we must infer chemical graphs from the obtained descriptor. This is formalized as a graph inference problem from a real-value vector. By generalizing subword history, which was originally introduced in formal language theory to extract numerical information of words and languages based on counting, we propose a comprehensive framework to investigate the computational complexity of chemical graph inference. We also propose a (pseudo-)polynomial-time algorithm for inferring graphs in a class of practical importance from spectrums.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Abbreviations

\(\Upsigma\) :

Alphabet

\({\overrightarrow{{\mathcal{G}}}}\) :

The class of (\(\Upsigma\)-labeled, loopless, (weakly-)connected) directed multigraphs

\({{\mathcal{G}}}\) :

The class of (\(\Upsigma\)-labeled, loopless, connected) undirected multigraphs

d(v):

The degree of a vertex v

T :

A tree

h(T):

The height of tree T

T K :

The T’s frontier vector of level K

tw(G):

The tree-width of G

\(\Upupsilon\) :

The class of trees

\(\Upupsilon_h\) :

The class of trees of height at most h

\({{\mathcal{SPG}}}\) :

The class of series-parallel graphs

\({{\mathcal{PLG}}}\) :

The class of planar graphs

\(\mathcal{TW}(w)\) :

The class of graphs of tree-width at most w

\({{\overrightarrow{\mathcal{SSG}}}}\) :

The class of scattered subword graphs

\({{\overrightarrow{\mathcal{CSG}}}}\) :

The class of continuous subword graphs

WH :

A walk history

\({{\mathcal{SWH}}}\) :

The class of systems of walk histories

\({{\mathcal{SLWH}}}\) :

The class of systems of linear walk histories

\({{\mathcal{COUNT}}}\) :

The class of counting systems

\(\mathcal{WH}\) :

The class of systems of single walk history

\(\mathcal{LWH}\) :

The class of systems of single linear walk history

A–F algorithm:

Akutsu–Fukagawa algorithm

References

  • Akutsu T, Fukagawa D (2005) Inferring a graph from path frequency. In: Aposolico A, Crochemore M, Park K (eds) CPM 2005. Lecture notes in computer science, vol 3537. Springer, New York, pp 371–382

    Google Scholar 

  • Bakir GH, Weston J, Schölkopf B (2004a) Learning to find pre-images. In: Advances in neural information processing systems, pp 449–456

  • Bakir GH, Zien A, Tsuda K (2004b) Learning to find graph pre-images. In: Proceedings of the 26th DAGM symposium. Lecture notes in computer science, vol 3175, Springer, New York, pp 253–261

  • Bodlaender H (1998) A partial k-arboretum of graphs with bounded treewidth. Theor Comput Sci 209(1–2): 1–45

    Article  MathSciNet  MATH  Google Scholar 

  • Diestel R (2010) Graph theory, 4th edn. Springer, New York

    Book  Google Scholar 

  • Fraigniaud P, Nisse N (2006) Connected treewidth and connected graph searching. In: LATIN 2006. Lecture notes in computer science, vol 3887, Springer, New York, pp 479–490

  • Fujiwara H et al. (2008) Enumerating treelike chemical graphs with given path frequency. J Chem Inf Model 48:1345–1357

    Article  Google Scholar 

  • Garey M R, Johnson D S (1979) Computers and intractability. A guide to the theory of NP-completeness. W. H. Freeman and Co, New York

  • Goto S et al (2002) LIGAND: Database of chemical compounds and reactions in biological pathways. Nucleic Acids Res 30:402–404

    Article  Google Scholar 

  • Ibarra OH (1978) Reversal-bounded multicounter machines and their decision problems. J ACM 25:116–133

    Article  MathSciNet  MATH  Google Scholar 

  • Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the 7th Pacific symposium on biocomputing. pp 564–575

  • Mateescu A, Salomaa A, Yu S (2004) Subword histories and Parikh matrices. J Comput Syst Sci 68:1–21

    Article  MathSciNet  MATH  Google Scholar 

  • Matiyasevich Y (1970) Solution of the tenth problem of Hilbert. Matematikai Lapok 21:83–87

    MathSciNet  MATH  Google Scholar 

  • Matiyasevich Y (1993) Hilbert’s tenth problem. MIT Press, Cambridge

  • Nagamochi H (2009) A detachment algorithm for inferring a graph from path frequency. Algorithmica 53:207–224

    Article  MathSciNet  MATH  Google Scholar 

  • Parikh RJ (1966) On context-free languages. J Assoc Comput Mach 13:570–581

    Article  MathSciNet  MATH  Google Scholar 

  • Robertson N, Seymour PD (1986) Graph minors. ii. algorithmic aspects of tree-width. J Algor 7:309–322

    Article  MathSciNet  MATH  Google Scholar 

  • Rozenberg G, Salomaa A (eds). (1997) Handbook of formal languages, vol 1. Springer, New York

  • Seki S (2011) Absoluteness of subword inequality is undecidable. Theor Comput Sci 418:116-120

    Article  MathSciNet  Google Scholar 

  • Shannon CS, Weaver W (1949) The mathematical theory of communication. The University of Illinois Press, Urbana

  • Yamaguchi A, Aoki KF, Mamitsuka H (2003) Graph complexity of chemical compounds in biological pathways. Genome Inform 14:376–377

    Google Scholar 

Download references

Acknowledgements

We wish to express our gratitude for the anonymous referees for their carefully and thoroughly reviewing the earlier version of this manuscript and giving valuable comments and suggestions on it. Shinnosuke Seki expresses his sincere gratitude to Professor Mark Daley, Professor Oscar. H. Ibarra, Professor Helmut Jürgensen, Professor Lila Kari, and Professor Arto Salomaa for the creative discussions with them on the research topic in this paper. This research was carried out with the financial support of the JSPS Postdoctoral Fellowship P10827 to Szilárd Zsolt Fazekas, of the Funding Program for Next Generation World-Leading Researchers (NEXT program) to Yasushi Okuno, and of the Kyoto University Start-up Grant-in-Aid for Young Scientists, No. 021530, to Shinnosuke Seki. Works by Shinnosuke Seki were also financially supported by Department of Information and Computer Science, Aalto University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shinnosuke Seki.

Appendix: Proof of Theorem 3

Appendix: Proof of Theorem 3

Let us propose two results first, which are useful in constructing a pseudo polynomial time transformation from 3-PARTITION to \({{\sc Solvability}({\user2 S}_2, {\mathcal{PLG}})}\). The graphs considered from now on are assumed to be undirected. A vertex \({v_1}\) of a graph is singly connected with another vertex \({v_2}\) if there exists exactly one edge between them.

Lemma 12

For an undirected graph \({G}\), if \({G}\) contains exactly one \({a}\)-vertex and \({|G|_{ab} = |G|_{aba} = n}\) for some \({n, }\) then there exist \({n b^{\prime} s}\) that are singly connected with the \({a}\)-vertex.

Proof

Let \({v}\) be the \({a}\)-vertex of \({G}\). An \({ab}\)-walk contributes to \({|G|_{aba}}\) by 1 (starting from the \({a}\)-vertex, we arrive at the \({b}\)-vertex via the edge on this walk and return back via the same edge). Hence, \({|G|_{ab} \le |G|_{aba}}\) holds.

Suppose that a \({b}\)-vertex \({v_1}\) is connected with \({v}\) by two edges \({e_1, e_2}\). Then apart from the \({aba}\)-walks explained above, now we have extra two \({aba}\)-walks, that is, \({v e_1 v_1 e_2 v}\) and \({v e_2 v_1 e_2 v}\). Then, \({G}\) would contain at least \({n+2 aba}\)-walks, a contradiction.

Since \({G}\) contains \({n ab}\)-walks, exactly \({n b}\)-vertices are singly connected with the \({a}\)-vertex. \(\square\)

Lemma 13

For an undirected graph G, if G contains n a-vertices and n b-vertices and |G| ab  = |G| aba  = |G| bab  = n for some n, then there exist n pairwise distinct pairs of an a-vertex and b-vertex that are singly connected.

Our proof of the following theorem will borrow several basic terminologies from topology. Let G be a planar multigraph, that is, G can be embedded onto the plane \({\mathbb{R}^2}\). The regions of \({\mathbb{R}^2 \setminus G}\) are called the faces of G. Since we can lay G inside some sufficiently large disc D, there exists exactly one among its faces that cannot be thus bounded, that is, the face that contains \({\mathbb{R}^2 \setminus D}\). This face is called the outer face of G, and the others are called its inner faces.

Now, we are ready for proving Theorem 3.

Proof

The basic idea is from (Akutsu and Fukagawa 2005): a pseudo polynomial time transformation from 3-PARTITION, which is defined as: given a set X that consists of 3m elements \(x_1, \ldots, x_{3m}\) along with their integer weights w(x i ) and a positive integer B such that B/4 < w(x i ) < B/2 for 1 ≤ i ≤ 3m, find a partition of X into m (disjoint) sets \(A_1, \ldots, A_m\) of cardinality 3 such that A j  = {x j,1x j,2x j,3} and w(x j,1) + w(x j,2) + w(x j,3) = B for 1 ≤ j ≤ m, where \(x_{j, 1}, x_{j, 2}, x_{j, 3} \in X\).

Let \(\Upsigma = X \cup \{a_1, \ldots, a_m\} \cup \{a, b, c, d, f_1, f_2\}\). From a given instance of 3-PARTITION, we construct a feature vector v of level 2 specified as follows; we write x i  = 1 to indicate that the x i coordinate of v has value 1. For any u, if the value of u coordinate of v is not mentioned below, then it is 0, that is, in the target graph, no u-walk is found. For 1 ≤ i ≤ 3m and 1 ≤ h ≤ m,

  • VERTICES x i  = 1, a = Bm, b = 3m, c = 3m + 1, d = 1, f 1 = f 2 = 3m, and a h  = 1;

  • WALKS-C(enter) d a h  = 1, a h b = a h b a h  = 3, and \(a_1 a_2 = a_2 a_3 = \cdots = a_{m-1} a_m = 1\);

  • WALKS-B(lock) for \(s \in \{1, 2\}, b f_s = b f_s b = f_s b f_s = 3m, x_i f_s = 1, ba = bab = Bm, x_i a = x_i a x_i = w(x_i); \)

  • WALKS-BC x i d = 1, f 1 c f 2 = 3m − 1, a h c = a h c a h  = 3, f 1 c a 1 = 3, f 2 c a 1 = 2, f 1 c a  = 3, f 2 c a  = 3 for 1 < ℓ < m, f 1 c a m  = 3, f 2 c a m  = 4, and a h b a = B;

  • WALKS-I(nhibited) for any 1 ≤ j, k ≤ m with j ≠ kx j f 1 x k  = x j f 2 x k  = a j b a k  = a j c a k  = 0.

For example, x i  = 1 in VERTICES means that a target graph must contain exactly 1 x i -vertex.

Let us give a topological characterization of graphs \({G \in {\mathcal{PLG}}}\) that satisfy |G| S 2 = v. Indeed, we shall see that v uniquely determines a structure that consists of the center graph (an m-star with the center d-vertex) and the a h -vertex (1 ≤ h ≤ m), to each of which 3 b-vertices are singly connected) and 3m rhombuses bounded by the cycle b f 1 x i f 2 b (x i -rhombus), within which exactly w(x i ) a-vertices are forced to be fenced and single connected with the b-vertex on the rhombus (see Fig. 9). Once confirmed, this structure and its uniqueness enable us to conclude that the given instance of 3-PARTITION has a solution if and only if there exists a planar graph whose feature vector of level 2 is v; this is because a h ba = B must be satisfied for 1 ≤ h ≤ m. Note that our construction of the system of inequalities is a pseudo polynomial time transformation.

Fig. 9
figure 9

Reduction from 3-PARTITION to \({{\sc Solvability}({\user2 S}_2, {\mathcal{PLG}})}\)

First of all, VERTICES, WALKS-C, and WALKS-I determine the center graph. Due to Lemma 12, a h b = a h b a h  = 3 force exactly 3 b-vertices be singly connected with each a h -vertex and a j b a k  = 0 in WALKS-I inhibits a b-vertex from being connected with more than one of \(a_1, \ldots, a_m\)-vertices. For 1 ≤ ℓ < m, the a -vertex has to be connected with the a ℓ+1-vertex in order to satisfy a a ℓ+1 = 1.

We shift our focus onto the x i -rhombus and w(x i ) a-vertices. As being done above but using Lemma 13, one can easily see that exactly one of 3m f 1 (f 2)-vertices must be singly connected to each of the 3m b-vertices of the center graph in a one-to-one manner. To these f 1-vertices, distinct x i -vertex is to be singly connected as we need exactly one x i f 1-walk and x j f 1 x k -walk is inhibited whenever j ≠ k. This fact allows us to index the f 1-vertex and b-vertex on the walk from the x i -vertex to the d-vertex by the subscript i as f 1, i and b i , and the f 2-vertex that is connected to the b i -vertex is thus indexed as f 2, i , but this indexing is only for the ease of explanation. In the following, we denote the three of \(x_1, \ldots, x_{3m}\)-vertices that have been thus connected with the a 1-vertex by x 1,1x 1,2, and x 1,3 for convenience sake (see Fig. 9), but note that we do not know which of \(x_1, \ldots, x_{3m}\) is x 1,1 or we should not. The extended center graph built so far is still a tree, and hence, has only one face. Now we draw edges from each of these x i -vertices to the d-vertex, but due to the \(a_1 a_2 \cdots a_m\)-walk, these edges cannot help but go through between the a 1-vertex and a m -vertex as shown in Fig. 9. These edges separate the face into 3m + 1 faces, that is, the face bounded by the d a 1 b f 1 x 1,1-walk, one bounded by the d x 1,1 f 1 b a 1 b f 1 x 1,2 d-walk, and so on. Since the x j f 2 x k -walk is inhibited whenever j ≠ k, each x i -vertex must be singly connected with distinct f 2-vertex. The lines from x i to d now force the x i -vertex to be thus connected with the f 2, i -vertex. As a result, the x i -rhombus has been formed.

Now we will fence in w(x i ) a-vertices in the x i -rhombus. To this end, we connect the 3m f 1-vertices and f 2-vertices via 3m − 1 f 1 c f 2-walks. The readers should be now familiar enough with the technique based on Lemma 12 and WALKS-I to check that exactly 3 c-vertices are singly connected with the a h -vertex, and these be distinct. This means that the c-vertex on any of these f 1 c f 2-walks must be connected with the a h -vertex for some h, and hence, none of these walks can share their f 1-vertex and f 2-vertex. It is left to the reader to check that the way illustrated in Fig. 9 is the only way to draw 3m − 1 f 1 c f 2-walks so as to satisfy all of these requirements and f 1 c a 1 = 3, f 2 c a 1 = 2, f 1 c a  = 3, f 2 c a  = 3 for 1 < ℓ < m, f 1 c a m  = 3, f 2 c a m  = 4. These newly-added structures prevent an a-vertex from being connected both with a b-vertex and with x i -vertex unless it is placed in the x i -rhombus. Check that each a-vertex must be singly connected with exactly one b-vertex, and w(x i ) a-vertices must be singly connected with the x i -vertex. Thus, the x i -rhombus must contain exactly w(x i ) a-vertices and they have to be singly connected with the b i -vertex. \(\square\)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fazekas, S.Z., Ito, H., Okuno, Y. et al. On computational complexity of graph inference from counting. Nat Comput 12, 589–603 (2013). https://doi.org/10.1007/s11047-012-9349-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11047-012-9349-2

Keywords

Navigation