Skip to main content

Abstract

Semistructured pattern can be formally modeled as Graph Pattern. The most important problem to be solved in mining large semi structured dataset is the scalability of the method. With the successful development of efficient and scalable algorithms for mining frequent itemsets and sequences, it is natural to extend the scope of study to a more general pattern mining problem: mining frequent semistructured patterns or graph patterns. In this paper, we extend the methodology of pattern-growth and develop a novel algorithm called CLS (Canonical Labeling System), which discovers frequent connected subgraphs efficiently using either depth-first search or breadth-first search strategy.

A novel canonical labeling system and search order are devised to support efficient pattern growth. CLS has advantages of simplicity and efficiency over other methods since it combines pattern growing and pattern checking into one procedure. Based on CLS, we develop CLS Close to mine closed frequent graphs, which not only eliminates redundant patterns but also substantially increases the efficiency of mining, especially in the presence of large graph patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of tems in large databases. InProc. 1993ACM-SIGMOD Int. Conf. Management of data (SIGMOD’93), pages 207–216, Washington, DC, May 1993.

    Google Scholar 

  2. T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. InProc. 2002SIAM Int. Conf. Data Mining (SDM’02), Arlington, VA, April 2002.

    Google Scholar 

  3. T. Cormen, C. Leiserson, R. Rivest, and C. Stein.Introduction to Algorithms, 2nd ed. The MIT Press, Cambridge, MA, 2001.

    MATH  Google Scholar 

  4. J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining spatial motifs from protein structure graphs. InProc. of the 8th Annual Int. Conf. on Research in Computational Molecular Biology (RECOMB’04), pages 308–315, 2004.

    Google Scholar 

  5. M. Kuramochi and G. Karypis. Frequent subgraph discovery. InProc. 2001 Int. Conf. Data Mining (ICDM’01), pages 313–320, San Jose, CA, Nov. 2001.

    Google Scholar 

  6. B. D. McKay. Practical graph isomorphism.Congressus Numerantium, 30:45–87, 1981.

    MathSciNet  Google Scholar 

  7. S. Su, D. J. Cook, and L. B. Holder. Knowledge discovery in molecular biology: Identifying Structural regularities in proteins.Intelligent Data Analysis, 3:413–436, 1999.

    Article  Google Scholar 

  8. Gaol, F.L & Widjaja, B.H, Frameworks of Graph Dataset Transformation into Canonical Form. InProc. of 3rd International Seminar Information &Communication and Technology, pages 143 – 150, ITS Surabaya, Sept 2007.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science+Business Media B.V.

About this paper

Cite this paper

Gaol, F.L., Widjaja, B.H. (2008). CLS and CLS Close: The Scalable Method for Mining the Semi Structured Data Set. In: Elleithy, K. (eds) Innovations and Advanced Techniques in Systems, Computing Sciences and Software Engineering. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-8735-6_35

Download citation

  • DOI: https://doi.org/10.1007/978-1-4020-8735-6_35

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-8734-9

  • Online ISBN: 978-1-4020-8735-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics