Advertisement

Parallel Structural Graph Clustering

  • Madeleine Seeland
  • Simon A. Berger
  • Alexandros Stamatakis
  • Stefan Kramer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6913)

Abstract

We address the problem of clustering large graph databases according to scaffolds (i.e., large structural overlaps) that are shared between cluster members. In previous work, an online algorithm was proposed for this task that produces overlapping (non-disjoint) and non-exhaustive clusterings. In this paper, we parallelize this algorithm to take advantage of high-performance parallel hardware and further improve the algorithm in three ways: a refined cluster membership test based on a set abstraction of graphs, sorting graphs according to size, to avoid cluster membership tests in the first place, and the definition of a cluster representative once the cluster scaffold is unique, to avoid cluster comparisons with all cluster members. In experiments on a large database of chemical structures, we show that running times can be reduced by a large factor for one parameter setting used in previous work. For harder parameter settings, it was possible to obtain results within reasonable time for 300,000 structures, compared to 10,000 structures in previous work. This shows that structural, scaffold-based clustering of smaller libraries for virtual screening is already feasible.

Keywords

Virtual Screening Cluster Member Cluster Membership Graph Database Graph Object 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Aggarwal, C.C., Ta, N., Wang, J., Feng, J., Zaki, M.: XProj: a framework for projected structural clustering of XML documents. In: KDD 2007: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 46–55. ACM, New York (2007)Google Scholar
  2. 2.
    Chen, J., Swamidass, S.J., Dou, Y., Baldi, P.: ChemDB: a public database of small molecules and related chemoinformatics resources. Bioinf. 21, 4133–4139 (2005)CrossRefGoogle Scholar
  3. 3.
    Chen, J.H., Linstead, E., Swamidass, S.J., Wang, D., Baldi, P.: ChemDB updatefull-text search and virtual chemical space. Bioinf. 23, 2348–2351 (2007)CrossRefGoogle Scholar
  4. 4.
    Hossain, M.S., Angryk, R.A.: GDClust: A graph-based document clustering technique. In: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, ICDMW 2007, pp. 417–422. IEEE Computer Society, Washington, DC, USA (2007)Google Scholar
  5. 5.
    McGregor, M.J., Pallai, P.V.: Clustering of large databases of compounds: Using the MDL “keys” as structural descriptors. Journal of Chemical Information and Computer Sciences 37(3), 443–448 (1997)CrossRefGoogle Scholar
  6. 6.
    Raymond, J.W., Blankley, C.J., Willett, P.: Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures. J. Mol. Graph. Model. 21(5), 421–433 (2003)CrossRefGoogle Scholar
  7. 7.
    Raymond, J.W., Willett, P.: Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. J. Comput. Aided. Mol. Des. 16(1), 59–71 (2002)CrossRefGoogle Scholar
  8. 8.
    Seeland, M., Girschick, T., Buchwald, F., Kramer, S.: Online structural graph clustering using frequent subgraph mining. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6323, pp. 213–228. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Stahl, M., Mauser, H.: Database clustering with a combination of fingerprint and maximum common substructure methods. J. Chem. Inf. Model. 45, 542–548 (2005)CrossRefGoogle Scholar
  10. 10.
    Tsuda, K., Kudo, T.: Clustering graphs by weighted substructure mining. In: ICML 2006: Proceedings of the 23rd International Conference on Machine Learning, pp. 953–960. ACM, New York (2006)Google Scholar
  11. 11.
    Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 721–724 (2002)Google Scholar
  12. 12.
    Yoshida, T., Shoda, R., Motoda, H.: Graph clustering based on structural similarity of fragments. In: Jantke, K.P., Lunzer, A., Spyratos, N., Tanaka, Y. (eds.) Federation over the Web. LNCS (LNAI), vol. 3847, pp. 97–114. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Madeleine Seeland
    • 1
  • Simon A. Berger
    • 2
  • Alexandros Stamatakis
    • 2
  • Stefan Kramer
    • 1
  1. 1.Institut für Informatik/I12Technische Universität MünchenGarching b. MünchenGermany
  2. 2.Heidelberg Institute for Theoretical StudiesHeidelbergGermany

Personalised recommendations