Abstract
Semistructured pattern can be formally modeled as Graph Pattern. The most important problem to be solved in mining large semi structured dataset is the scalability of the method. With the successful development of efficient and scalable algorithms for mining frequent itemsets and sequences, it is natural to extend the scope of study to a more general pattern mining problem: mining frequent semistructured patterns or graph patterns. In this paper, we extend the methodology of pattern-growth and develop a novel algorithm called CLS (Canonical Labeling System), which discovers frequent connected subgraphs efficiently using either depth-first search or breadth-first search strategy.
A novel canonical labeling system and search order are devised to support efficient pattern growth. CLS has advantages of simplicity and efficiency over other methods since it combines pattern growing and pattern checking into one procedure. Based on CLS, we develop CLS Close to mine closed frequent graphs, which not only eliminates redundant patterns but also substantially increases the efficiency of mining, especially in the presence of large graph patterns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of tems in large databases. InProc. 1993ACM-SIGMOD Int. Conf. Management of data (SIGMOD’93), pages 207–216, Washington, DC, May 1993.
T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. InProc. 2002SIAM Int. Conf. Data Mining (SDM’02), Arlington, VA, April 2002.
T. Cormen, C. Leiserson, R. Rivest, and C. Stein.Introduction to Algorithms, 2nd ed. The MIT Press, Cambridge, MA, 2001.
J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining spatial motifs from protein structure graphs. InProc. of the 8th Annual Int. Conf. on Research in Computational Molecular Biology (RECOMB’04), pages 308–315, 2004.
M. Kuramochi and G. Karypis. Frequent subgraph discovery. InProc. 2001 Int. Conf. Data Mining (ICDM’01), pages 313–320, San Jose, CA, Nov. 2001.
B. D. McKay. Practical graph isomorphism.Congressus Numerantium, 30:45–87, 1981.
S. Su, D. J. Cook, and L. B. Holder. Knowledge discovery in molecular biology: Identifying Structural regularities in proteins.Intelligent Data Analysis, 3:413–436, 1999.
Gaol, F.L & Widjaja, B.H, Frameworks of Graph Dataset Transformation into Canonical Form. InProc. of 3rd International Seminar Information &Communication and Technology, pages 143 – 150, ITS Surabaya, Sept 2007.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media B.V.
About this paper
Cite this paper
Gaol, F.L., Widjaja, B.H. (2008). CLS and CLS Close: The Scalable Method for Mining the Semi Structured Data Set. In: Elleithy, K. (eds) Innovations and Advanced Techniques in Systems, Computing Sciences and Software Engineering. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-8735-6_35
Download citation
DOI: https://doi.org/10.1007/978-1-4020-8735-6_35
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-8734-9
Online ISBN: 978-1-4020-8735-6
eBook Packages: Computer ScienceComputer Science (R0)