# On computational complexity of graph inference from counting

- 201 Downloads

## Abstract

In de novo drug design, chemical compounds are quantitized as real-valued vectors called chemical descriptors, and an optimization algorithm runs on known drug-like chemical compounds in a database and outputs an optimal chemical descriptor. Since structural information is needed for chemical synthesis, we must infer chemical graphs from the obtained descriptor. This is formalized as a graph inference problem from a real-value vector. By generalizing subword history, which was originally introduced in formal language theory to extract numerical information of words and languages based on counting, we propose a comprehensive framework to investigate the computational complexity of chemical graph inference. We also propose a (pseudo-)polynomial-time algorithm for inferring graphs in a class of practical importance from spectrums.

## Keywords

Computational complexity Counting de novo drug design Graph inference Tree-decomposition Spectrum Walk history## List of symbols

- \(\Upsigma\)
Alphabet

- \({\overrightarrow{{\mathcal{G}}}}\)
The class of (\(\Upsigma\)-labeled, loopless, (weakly-)connected) directed multigraphs

- \({{\mathcal{G}}}\)
The class of (\(\Upsigma\)-labeled, loopless, connected) undirected multigraphs

*d*(*v*)The degree of a vertex

*v**T*A tree

*h*(*T*)The height of tree

*T*- ∂
*T*_{K} The

*T*’s frontier vector of level*K*- tw(
*G*) The tree-width of

*G*- \(\Upupsilon\)
The class of trees

- \(\Upupsilon_h\)
The class of trees of height at most

*h*- \({{\mathcal{SPG}}}\)
The class of series-parallel graphs

- \({{\mathcal{PLG}}}\)
The class of planar graphs

- \(\mathcal{TW}(w)\)
The class of graphs of tree-width at most

*w*- \({{\overrightarrow{\mathcal{SSG}}}}\)
The class of scattered subword graphs

- \({{\overrightarrow{\mathcal{CSG}}}}\)
The class of continuous subword graphs

*WH*A walk history

- \({{\mathcal{SWH}}}\)
The class of systems of walk histories

- \({{\mathcal{SLWH}}}\)
The class of systems of linear walk histories

- \({{\mathcal{COUNT}}}\)
The class of counting systems

- \(\mathcal{WH}\)
The class of systems of single walk history

- \(\mathcal{LWH}\)
The class of systems of single linear walk history

- A–F algorithm
Akutsu–Fukagawa algorithm

## Notes

### Acknowledgements

We wish to express our gratitude for the anonymous referees for their carefully and thoroughly reviewing the earlier version of this manuscript and giving valuable comments and suggestions on it. Shinnosuke Seki expresses his sincere gratitude to Professor Mark Daley, Professor Oscar. H. Ibarra, Professor Helmut Jürgensen, Professor Lila Kari, and Professor Arto Salomaa for the creative discussions with them on the research topic in this paper. This research was carried out with the financial support of the JSPS Postdoctoral Fellowship P10827 to Szilárd Zsolt Fazekas, of the Funding Program for Next Generation World-Leading Researchers (NEXT program) to Yasushi Okuno, and of the Kyoto University Start-up Grant-in-Aid for Young Scientists, No. 021530, to Shinnosuke Seki. Works by Shinnosuke Seki were also financially supported by Department of Information and Computer Science, Aalto University.

## References

- Akutsu T, Fukagawa D (2005) Inferring a graph from path frequency. In: Aposolico A, Crochemore M, Park K (eds) CPM 2005. Lecture notes in computer science, vol 3537. Springer, New York, pp 371–382Google Scholar
- Bakir GH, Weston J, Schölkopf B (2004a) Learning to find pre-images. In: Advances in neural information processing systems, pp 449–456Google Scholar
- Bakir GH, Zien A, Tsuda K (2004b) Learning to find graph pre-images. In: Proceedings of the 26th DAGM symposium. Lecture notes in computer science, vol 3175, Springer, New York, pp 253–261Google Scholar
- Bodlaender H (1998) A partial
*k*-arboretum of graphs with bounded treewidth. Theor Comput Sci 209(1–2): 1–45MathSciNetCrossRefzbMATHGoogle Scholar - Diestel R (2010) Graph theory, 4th edn. Springer, New YorkCrossRefGoogle Scholar
- Fraigniaud P, Nisse N (2006) Connected treewidth and connected graph searching. In: LATIN 2006. Lecture notes in computer science, vol 3887, Springer, New York, pp 479–490Google Scholar
- Fujiwara H et al. (2008) Enumerating treelike chemical graphs with given path frequency. J Chem Inf Model 48:1345–1357CrossRefGoogle Scholar
- Garey M R, Johnson D S (1979) Computers and intractability. A guide to the theory of NP-completeness. W. H. Freeman and Co, New YorkGoogle Scholar
- Goto S et al (2002) LIGAND: Database of chemical compounds and reactions in biological pathways. Nucleic Acids Res 30:402–404CrossRefGoogle Scholar
- Ibarra OH (1978) Reversal-bounded multicounter machines and their decision problems. J ACM 25:116–133MathSciNetCrossRefzbMATHGoogle Scholar
- Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the 7th Pacific symposium on biocomputing. pp 564–575Google Scholar
- Mateescu A, Salomaa A, Yu S (2004) Subword histories and Parikh matrices. J Comput Syst Sci 68:1–21MathSciNetCrossRefzbMATHGoogle Scholar
- Matiyasevich Y (1970) Solution of the tenth problem of Hilbert. Matematikai Lapok 21:83–87MathSciNetzbMATHGoogle Scholar
- Matiyasevich Y (1993) Hilbert’s tenth problem. MIT Press, CambridgeGoogle Scholar
- Nagamochi H (2009) A detachment algorithm for inferring a graph from path frequency. Algorithmica 53:207–224MathSciNetCrossRefzbMATHGoogle Scholar
- Parikh RJ (1966) On context-free languages. J Assoc Comput Mach 13:570–581MathSciNetCrossRefzbMATHGoogle Scholar
- Robertson N, Seymour PD (1986) Graph minors. ii. algorithmic aspects of tree-width. J Algor 7:309–322MathSciNetCrossRefzbMATHGoogle Scholar
- Rozenberg G, Salomaa A (eds). (1997) Handbook of formal languages, vol 1. Springer, New YorkGoogle Scholar
- Seki S (2011) Absoluteness of subword inequality is undecidable. Theor Comput Sci 418:116-120MathSciNetCrossRefGoogle Scholar
- Shannon CS, Weaver W (1949) The mathematical theory of communication. The University of Illinois Press, UrbanaGoogle Scholar
- Yamaguchi A, Aoki KF, Mamitsuka H (2003) Graph complexity of chemical compounds in biological pathways. Genome Inform 14:376–377Google Scholar