Data Mining and Knowledge Discovery

, Volume 18, Issue 3, pp 472–516

Incremental sequence-based frequent query pattern mining from XML queries

  • Guoliang Li
  • Jianhua Feng
  • Jianyong Wang
  • Lizhu Zhou
Article

DOI: 10.1007/s10618-009-0126-5

Cite this article as:
Li, G., Feng, J., Wang, J. et al. Data Min Knowl Disc (2009) 18: 472. doi:10.1007/s10618-009-0126-5

Abstract

Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i+, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly.

Keywords

XML query patterns Frequent query patterns XML frequent pattern mining Incremental mining Sequential pattern mining 

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Guoliang Li
    • 1
  • Jianhua Feng
    • 1
  • Jianyong Wang
    • 1
  • Lizhu Zhou
    • 1
  1. 1.Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and TechnologyTsinghua UniversityBeijingChina

Personalised recommendations