Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Efficient Mining of Frequent Closed XML Query Pattern

  • 22 Accesses

  • 3 Citations

Abstract

Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA*, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA* deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA* over the previous known alternative. SOLARIA* is also linearly scalable in terms of XML queries’ size.

This is a preview of subscription content, log in to check access.

References

  1. [1]

    Chen Q, Lim A et al. D(k)-index: An adaptive structural summary for graph-structured data. In Proc. the ACM SIGMOD Int. Conf. Management of Data, San Diego, CA, USA, Jun. 9–12, 2003, pp.134–144.

  2. [2]

    Kaushik R, Shenoy P et al. Exploiting local similarity for efficient indexing of paths in graph structured data. In Proc. the 18th Int. Conf. Data Engineering, San Jose, CA, USA, Feb. 26–Mar. 1, 2002, pp.129–140.

  3. [3]

    Milo T, Suciu D. Index structures for path expressions. In Proc. the 7th Int. Conf. Database Theory, Jerusalem, Israel, Jan. 10–12, 1999, pp.277–295.

  4. [4]

    Yang L H, Lee M L et al. Efficient mining of XML query patterns for caching. In Proc. the 29th Int. Conf. Very Large Data Bases, Berlin, Germany, Sept. 9–12, 2003, pp.69–80.

  5. [5]

    Yan X, Han J et al. Mining closed sequential patterns in large databases. In Proc. the 3rd SIAM Int. Conf. Data Mining, San Francisco, CA, USA, May 1–3, 2003, Electronic Edition.

  6. [6]

    Dehaspe L, Toivonen H et al. Finding frequent substructures in chemical compounds. In Proc. 4th Int. Conf. Knowledge Discovery and Data Mining, New York, USA, Aug. 27–31, 1998, pp.30–36.

  7. [7]

    Bettini C, Wang X et al. Mining temporal relationals with multiple granularities in time sequences. IEEE Data Engineering Bulletin, 1998, 21(1): 32–38.

  8. [8]

    Pei J, Han J et al. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. the 18th Int. Conf. Data Engineering, Heidelberg, Germany, April 2–6, 2001, pp.215–224.

  9. [9]

    Feng J, Qian Q et al. Exploit sequencing to accelerate hot XML query pattern mining. In Proc. the 2006 ACM Symp. Applied Computing, Dijon, France, Apr. 23–27, 2006, pp.517–524.

  10. [10]

    Qian Q, Feng J et al. Exploit sequencing to accelerate XML twig query answering. In Proc. the 11th Int. Conf. Database Systems for Advanced Applications, Singapore, Apr. 12–15, 2006, pp.279–294.

  11. [11]

    Wang J, Han J. BIDE: Efficient mining of frequent closed sequences. In Proc. the 20th Int. Conf. Data Engineering, Boston, MA, USA, Mar. 30–Apr. 2, 2004, pp.79–90.

  12. [12]

    Kuramochi M, Karypis G. Frequent subgraph discovery. In Proc. the 1st IEEE Int. Conf. Data Mining, San Jose, CA, USA, Nov. 29–Dec. 2, 2001, pp.313–320.

  13. [13]

    Agrawal R, Srikant R. Fast algorithms for mining association rules. In Proc. the 20th Int. Conf. Very Large Data Bases, Santiago de Chile, Chile, Sept. 12–15, 1994, pp.487–499.

  14. [14]

    Zaki M. Efficiently mining frequent trees in a forest. In Proc. the 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, Jul. 23–26, 2002, pp.71–80.

  15. [15]

    Asai T, Abe K et al. Efficient substructure discovery from large semi-structured data. In Proc. the 2nd SIAM Int. Conf. Data Mining, Arlington, VA, USA, Apr. 11–13, 2002, Electronic Edition.

  16. [16]

    Termier A, Rousset M C et al. TreeFinder: A first step towards XML data mining. In Proc. the 2nd IEEE Int. Conf. Data Mining, Maebashi, Japan, Dec. 9–12, 2002, pp.450–457.

  17. [17]

    Han J, Pei J et al. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. the 6th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Boston, MA, USA, Aug. 20–23, 2000, pp.355–359.

  18. [18]

    Masseglia F, Cathala F et al. The PSP approach for mining sequential patterns. In Proc. the 2nd European Symp. Principles of Data Mining and Knowledge Discovery, Nantes, France, Sept. 23–26, 1998, pp.176–184.

  19. [19]

    Srikant R, Agrawal R. Mining sequential patterns: Generalizations and performance improvements. In Proc. the 5th Int. Conf. Extending Database Technology, Avignon, France, Mar. 25–29, 1996, pp.3–17.

  20. [20]

    Ozden B, Ramaswamy S et al. Cyclic association rules. In Proc. the 14th Int. Conf. Data Engineering, Orlando, Florida, USA, Feb. 23–27, 1998, pp.412–421.

  21. [21]

    Han J, Dong G et al. Efficient mining of partial periodic patterns in time series database. In Proc. the 18th Int. Conf. Data Engineering, Sydney, Australia, Mar. 23–26, 1999, pp.106–115.

  22. [22]

    Yang J, Yu P S et al. Mining long sequential patterns in a noisy environment. In Proc. 2003 ACM SIGMOD Int. Conf. Management of Data, Madison, WI, USA, Jun. 3–6, 2002, pp.406–417.

  23. [23]

    Chi Y, Xia Y et al. Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans. Knowledge and Data Engineering, 2005, 17(2): 190–202.

  24. [24]

    Berglund A, Boag S et al. XML path language (XPath) 2.0, W3C Candidate Recommendation, June, 2006, http://www.w3.org/TR/xpath20/.

  25. [25]

    Boag S, Chamberlin D et al. XQuery 1.0: An XML query language. W3C Candidate Recommendation, June, 2006, http://www.w3.org/TR/xquery.

  26. [26]

    Raw P R, Moon B. PRIX: Indexing and querying XML using Prufer sequences. In Proc. the 20th Int. Conf. Data Engineering, Boston, MA, USA, Mar. 30–Apr. 2, 2004, pp.288–300.

  27. [27]

    Picciotto S. How to encode a tree [Dissertation]. University of California, San Diego, USA, 1999.

  28. [28]

    Yang L, Lee M L et al. Mining frequent query patterns from XML queries. In Proc. the 8th Int. Conf. Database Systems for Advanced Applications, Kyoto, Japan, Mar. 26–28, 2003, pp.355–362.

Download references

Author information

Correspondence to Jian-Hua Feng.

Additional information

This work is supported in part by the National Natural Science Foundation of China under Grant No. 60573094, the National Grand Fundamental Research 973 Program of China under Grant No. 2006CB303103, the National High Technology Development 863 Program of China under Grant No. 2006AA01A101, and Tsinghua Basic Research Foundation under Grant No. JCqn2005022.

Electronic Supplementary Material

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Feng, J., Qian, Q., Wang, J. et al. Efficient Mining of Frequent Closed XML Query Pattern. J Comput Sci Technol 22, 725–735 (2007). https://doi.org/10.1007/s11390-007-9081-z

Download citation

Keywords

  • computer software
  • frequent closed pattern
  • data mining
  • XML
  • XPath