Abstract
A fundamental problem related to graph structured databases is searching for substructures. One issue with respect to optimizing such searches is the ability to estimate the frequency of substructures within a query graph. In this work, we present and evaluate two techniques for estimating the frequency of subgraphs from a summary of the data graph. In the first technique, we assume that edge occurrences on edge sequences are position independent and summarize only the most informative dependencies. In the second technique, we prune small subgraphs using a valuation scheme that blends information about their importance and estimation power. In both techniques, we assume conditional independence to estimate the frequencies of larger subgraphs. We validate the effectiveness of our techniques through experiments on real and synthetic datasets.
Chapter PDF
Similar content being viewed by others
References
Aboulnaga, A., Alameldeen, A., Naughton, J.: Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In: VLDB (2001)
Burge, C.: Identification of Complete Gene Structures in Human Genomic DNA. Ph.D. Thesis, Stanford University, Stanford, CA (1997)
Dehaspe, L., Toivonen, H., King, R.D.: Finding Frequent Substructures in Chemical Compounds. In: KDD (1998)
Desphande, M., Kuramochi, M., Wale, N.: Frequent Substructure-Based Approaches for Classifying Chemical Compounds. TKDE 17(8) (August 2005)
Klyne, G., Carroll, J.J.: RDF Concepts and Abstract Syntax, W3C Recommendation (Revised) (February 2004), http://www.w3.org/TR/rdf-syntax-grammar/
Perry, M.: TOntoGen: A Synthetic Data Set Generator for Semantic Web Applications. In: SIGSEMIS Bulletin
Pei, J., Dong, G., Zou, W., Han, J.: On Computing Condensed Frequent Pattern Bases. In: ICDM (2002)
Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Selectivity Estimation for XML Twigs. In: ICDE (2004)
Polyzotis, N., Garofalakis, M.: Statistical Synopses for Graph-Structured XML Databases. In: SIGMOD (2002)
Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF. W3C Working Draft (April 19, 2005), http://www.w3.org/TR/rdf-sparql-query/
Shannon, C.E.: A Mathematical Theory of Communication. Bell Syst. Tech. Journal 27, 379–423, 623-656 (1948)
Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and Applications of Tree and Graph Searching. In: PODS (2002)
Wang, C., Parthasarathy, S., Jin, R.: A Decomposition-Based Probabilistic Framework for Estimating the Selectivity of XML Twig Queries. In: EDBT (2006)
Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: ICDM (2002)
Yan, X., Yu, P.S., Han, J.: Graph Indexing: A Frequent Structure-based Approach. In: SIGMOD (2004)
Zhao, P., Yu, J.X., Yu, P.S.: Graph Indexing: Tree + Delta >= Graph. In: VLDB (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maduko, A., Anyanwu, K., Sheth, A., Schliekelman, P. (2008). Graph Summaries for Subgraph Frequency Estimation. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds) The Semantic Web: Research and Applications. ESWC 2008. Lecture Notes in Computer Science, vol 5021. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68234-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-540-68234-9_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68233-2
Online ISBN: 978-3-540-68234-9
eBook Packages: Computer ScienceComputer Science (R0)