DS 2016: Discovery Science pp 67-82 | Cite as

Min-Hashing for Probabilistic Frequent Subtree Feature Spaces

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9956)

Abstract

We propose a fast algorithm for approximating graph similarities. For its advantageous semantic and algorithmic properties, we define the similarity between two graphs by the Jaccard-similarity of their images in a binary feature space spanned by the set of frequent subtrees generated for some training dataset. Since the feature space embedding is computationally intractable, we use a probabilistic subtree isomorphism operator based on a small sample of random spanning trees and approximate the Jaccard-similarity by min-hash sketches. The partial order on the feature set defined by subgraph isomorphism allows for a fast calculation of the min-hash sketch, without explicitly performing the feature space embedding. Experimental results on real-world graph datasets show that our technique results in a fast algorithm. Furthermore, the approximated similarities are well-suited for classification and retrieval tasks in large graph datasets.

Keywords

Feature Space Span Tree Tree Pattern Subgraph Isomorphism Naive Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences, pp. 21–29. IEEE (1997)Google Scholar
  2. 2.
    Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. Trans. Knowl. Data Eng. 17(8), 1036–1050 (2005)CrossRefGoogle Scholar
  4. 4.
    Diestel, R.: Graph Theory. Graduate Texts in Mathematics, vol. 173, 4th edn. Springer, Heidelberg (2012). http://dblp.dagstuhl.de/rec/bib/books/daglib/0030488 MATHGoogle Scholar
  5. 5.
    Geppert, H., Horváth, T., Gärtner, T., Wrobel, S., Bajorath, J.: Support-vector-machine-based ranking significantly improves the effectiveness of similarity searching using 2D fingerprints and multiple reference compounds. J. Chem. Inf. Model. 48(4), 742–746 (2008)CrossRefGoogle Scholar
  6. 6.
    Horváth, T., Bringmann, B., Raedt, L.: Frequent hypergraph mining. In: Inoue, K., Ohwada, H., Yamamoto, A. (eds.) ILP 2006. LNCS (LNAI), vol. 4455, pp. 244–259. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-73847-3_26 CrossRefGoogle Scholar
  7. 7.
    Horváth, T., Ramon, J.: Efficient frequent connected subgraph mining in graphs of bounded tree-width. Theor. Comput. Sci. 411(31–33), 2784–2797 (2010)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)CrossRefGoogle Scholar
  9. 9.
    Shamir, R., Tsur, D.: Faster subtree isomorphism. J. Algorithms 33(2), 267–280 (1999). doi: 10.1006/jagm.1999.1044 MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A.J., Vishwanathan, S.V.N.: Hash kernels for structured data. J. Mach. Learn. Res. 10, 2615–2637 (2009)MathSciNetMATHGoogle Scholar
  11. 11.
    Teixeira, C.H.C., Silva, A., Meira Jr., W.: Min-hash fingerprints for graph kernels: a trade-off among accuracy, efficiency, and compression. J. Inf. Data Manag. 3(3), 227–242 (2012)Google Scholar
  12. 12.
    Welke, P., Horváth, T., Wrobel, S.: Probabilistic frequent subtree kernels. In: Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2015. LNCS (LNAI), vol. 9607, pp. 179–193. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-39315-5_12 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Pascal Welke
    • 1
  • Tamás Horváth
    • 1
    • 2
  • Stefan Wrobel
    • 1
    • 2
  1. 1.Department of Computer ScienceUniversity of BonnBonnGermany
  2. 2.Fraunhofer IAISSankt AugustinGermany

Personalised recommendations