Counting Graph Matches with Adaptive Statistics Collection

  • Jianhua Feng
  • Qian Qian
  • Yuguo Liao
  • Lizhu Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4016)


High performance of query processing in large scale graph-structured data poses a pressing demand for high-quality statistics collection and selectivity estimation. Precise and succinct statistics collection about graph-structured data plays a crucial role for graph query selectivity estimation. In this paper, we propose the approach SMT, Succinct Markov Table, which achieves high precision in selectivity estimation with low memory space consumed. Four core notions of SMT are constructing, refining, compressing and estimating. The efficient algorithm SMTBuilder provides facility to build adaptive statistics model in the form of SMT. Versatile optimization rules, which investigate local bi-directional reachability, are introduced in SMT refining. During compressing, affective SMT grouping techniques are introduced. Statistical methods are used for selectivity estimations of various graph queries basing on SMT, especially for twig queries. By a thorough experimental study, we demonstrate SMT’s advantages in accuracy and space by comparing with previously known alternative, as well as the preferred optimization rules and compressing technique that would favor different real-life data.


Suffix Tree Query Pattern Optimization Rule Path Query Graph Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aboulnaga, A., Alameldeen, A.R., Naughton, J.F.: Estimating the selectivity of XML path expressions for internet scale applications. In: VLDB 2001 (2001)Google Scholar
  2. 2.
    Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E.: Extensible Markup Language (XML) 1.0, 2nd edn. W3C Recommendation (October 2000)Google Scholar
  3. 3.
    Chamberlin, D., Clark, J., Florescu, D., Robie, J., Simeon, J., Stefanescu, M.: XQuery 1.0: An XML query language. W3C Working Draft, June 7 (2001)Google Scholar
  4. 4.
    Chen, Q., Lim, A., Ong, K.W.: D(k)-index: An adaptive structural summary for graphstructured data. In: SIGMOD 2003 (2003)Google Scholar
  5. 5.
    Chen, Z., Jagadish, H.V., Korn, F., Koudas, N., Muthukrishnan, S., Ng, R., Srivastava, D.: Counting twig matches in a tree. In: ICDE 2001 (2001)Google Scholar
  6. 6.
    Jianhua Feng, Qian Qian, Yuguo Liao, Guoliang Li, Na Ta. DMT: A flexible and versatile selectivity estimation approach for graph query. In: WAIM 2005 (2005)Google Scholar
  7. 7.
    Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting Local Similarity for Efficient Indexing of Paths in Graph Structured Data. In: ICDE 2002 (2002)Google Scholar
  8. 8.
    Lim, L., Wang, M., Padmanabhan, S., Vitter, J., Parr, R.: XPathLearner: An On-Ling Self- Tuning Markov Histogram for XML Path Selectivity Estimation. In: VLDB 2002 (2002)Google Scholar
  9. 9.
    Ley, M.: DBLP XML records (2001)Google Scholar
  10. 10.
    Milo, T., Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  11. 11.
    Polyzotis, N., Garofalakis, M.: Statistical Synopses for Graph-Structured XML Databases.In: SIGMOD 2002 (2002)Google Scholar
  12. 12.
  13. 13.
    XMARK: The XML-benchmark project (2002),

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jianhua Feng
    • 1
  • Qian Qian
    • 1
  • Yuguo Liao
    • 1
  • Lizhu Zhou
    • 1
  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina

Personalised recommendations