Advertisement

Comparison of Approaches for Querying Chemical Compounds

  • Vojtěch Šípek
  • Irena Holubová
  • Martin SvobodaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11721)

Abstract

Chemical compounds form a database with specific features that can be utilized for more efficient query processing. Currently, there exists no comparison of performance and memory usage of the respective and most efficient approaches on the same data set. In this paper, we address this lack of information and we create an unbiased benchmark of the most popular index building methods for subgraph querying of chemical databases. In addition, we compare the results with the performance of an SQL and a graph database for which there exist various unconfirmed hypotheses on their efficiency.

Keywords

Chemical database Subgraph querying Graph database Subgraph isomorphism 

References

  1. 1.
    AMBIT, 19 May 2017. http://ambit.sourceforge.net/
  2. 2.
    ChEMBL, 2 May 2019. https://www.ebi.ac.uk/chembl/
  3. 3.
    Neo4j database, 19 May 2017. https://neo4j.com/
  4. 4.
  5. 5.
    The Chemistry Development Kit, 19 May 2017. https://github.com/cdk/
  6. 6.
    Agrafiotis, D.K., et al.: Efficient substructure searching of large chemical libraries: the ABCD chemical cartridge. J. Chem. Inf. Model. 51(12), 3113–3130 (2011)CrossRefGoogle Scholar
  7. 7.
    Azaouzi, M., Ben Romdhane, L.: A minimal rare substructures-based model for graph database indexing. In: Madureira, A.M., Abraham, A., Gamboa, D., Novais, P. (eds.) ISDA 2016. AISC, vol. 557, pp. 250–259. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-53480-0_25CrossRefGoogle Scholar
  8. 8.
    Bauer, U.: Minimum cycle basis algorithms for the chemistry development toolkit (2004)Google Scholar
  9. 9.
    Bonnici, V., Ferro, A., Giugno, R., Pulvirenti, A., Shasha, D.: Enhancing graph database indexing by suffix tree structure. In: Dijkstra, T.M.H., Tsivtsivadze, E., Marchiori, E., Heskes, T. (eds.) PRIB 2010. LNCS, vol. 6282, pp. 195–203. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-16001-1_17CrossRefGoogle Scholar
  10. 10.
    Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)CrossRefGoogle Scholar
  11. 11.
    Dongoran, E.S.S., Saleh, W.K.R., Gozali, A.A.: Analysis and implementation of graph indexing for graph database using GraphGrep algorithm. In: ICoICT 2015, pp. 59–64 (2015)Google Scholar
  12. 12.
    Ehrlich, H.-C., Rarey, M.: Systematic benchmark of substructure search in molecular graphs - from Ullmann to VF2. J. Cheminform. 4(1), 13 (2012)CrossRefGoogle Scholar
  13. 13.
    Golovin, A., Henrick, K.: Chemical substructure search in SQL. J. Chem. Inf. Model. 49(1), 22–27 (2009)CrossRefGoogle Scholar
  14. 14.
    He, H., Singh, A.K.: Closure-tree: an index structure for graph queries. In: ICDE 2006, p. 38 (2006)Google Scholar
  15. 15.
    He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: 2008 ACM SIGMOD, pp. 405–418. ACM, New York (2008)Google Scholar
  16. 16.
    Hoksza, D., Jelínek, J.: Using Neo4j for mining protein graphs: a case study. In: DEXA 2015, pp. 230–234, September 2015Google Scholar
  17. 17.
    Jiang, H., Wang, H., Yu, P.S., Zhou, S.: GString: a novel approach for efficient search in graph databases. In: ICDE 2007, pp. 566–575 (2007)Google Scholar
  18. 18.
    Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Am. Math. Soc. 7(1), 48–50 (1956)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. VLDB Endow. 6(2), 133–144 (2012)CrossRefGoogle Scholar
  20. 20.
    May, J.: Substructure search face-off: are the slowest queries the same between tools? NextMove Software (2015), 19 May 2017Google Scholar
  21. 21.
    Microsoft: Windows Subsystem for Linux Documentation, 25 April 2019. https://docs.microsoft.com/en-us/windows/wsl/about
  22. 22.
    Oracle: An Introduction to Graph: Database, Analytics, and Cloud Services, 25 April 2019. https://www.slideshare.net/JeanIhm/an-introduction-to-graph-database-analytics-and-cloud-services
  23. 23.
    Oracle: Parallel Graph AnalytiX (PGX), 25 April 2019Google Scholar
  24. 24.
    Oracle: PGQL - Property Graph Query Language, 25 April 2019Google Scholar
  25. 25.
    Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. VLDB Endow. 1(1), 364–375 (2008)CrossRefGoogle Scholar
  26. 26.
    Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM 23(1), 31–42 (1976)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Vajda, K.: JChem Cartridge for Oracle. ChemAxon Ltd. (2015), 19 May 2017Google Scholar
  28. 28.
    Šípek, V.: Comparison of approaches for querying of chemical compounds. Master thesis, Charles University, Prague, Czech Republic (2019). http://www.ksi.mff.cuni.cz/~holubova/dp/Sipek.pdf
  29. 29.
    Williams, D.W., Huan, J., Wang, W.: Graph database indexing using structured graph decomposition. In: ICDE 2007, pp. 976–985 (2007)Google Scholar
  30. 30.
    Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: ICDM 2002, pp. 721–724 (2002)Google Scholar
  31. 31.
    Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: 2004 ACM SIGMOD, pp. 335–346. ACM, New York (2004)Google Scholar
  32. 32.
    Zaharevitz, D.: AIDS Antiviral Screen Data. NIH/NCI (2015), 19 May 2017Google Scholar
  33. 33.
    Zhang, S., Li, S., Yang, J.: GADDI: distance index based subgraph matching in biological networks. In: EDBT 2009, pp. 192–203. ACM, New York (2009)Google Scholar
  34. 34.
    Zhao, P., Han, J.: On graph query optimization in large networks. VLDB Endow. 3(1–2), 340–351 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Vojtěch Šípek
    • 1
  • Irena Holubová
    • 1
  • Martin Svoboda
    • 1
    Email author
  1. 1.Faculty of Mathematics and PhysicsCharles UniversityPragueCzech Republic

Personalised recommendations