Knowledge and Information Systems

, Volume 23, Issue 2, pp 199–224 | Cite as

POTMiner: mining ordered, unordered, and partially-ordered trees

  • Aída Jiménez
  • Fernando Berzal
  • Juan-Carlos Cubero
Regular Paper

Abstract

Non-linear data structures are becoming more and more common in data mining problems. Trees, in particular, are amenable to efficient mining techniques. In this paper, we introduce a scalable and parallelizable algorithm to mine partially-ordered trees. Our algorithm, POTMiner, is able to identify both induced and embedded subtrees in such trees. As special cases, it can also handle both completely ordered and completely unordered trees.

Keywords

Data mining Frequent patterns Partially-ordered trees Induced and embedded subtrees 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abe K et al. (2002) Efficient substructure discovery from large semi-structured data. In: Proceedings of the 2nd SIAM international conference on data miningGoogle Scholar
  2. 2.
    Agarwal RC et al (2001) A tree projection algorithm for generation of frequent item sets. J Parallel Distrib Comput 61(3): 350–371MATHCrossRefGoogle Scholar
  3. 3.
    Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8: 962–969CrossRefGoogle Scholar
  4. 4.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of 20th international conference on very large data bases, 12–15 September, pp 487–499Google Scholar
  5. 5.
    Asai T et al (2003) Discovering frequent substructures in large unordered trees. In: Discovery science. Lecture Notes in Artificial Intelligence, vol 2843. Springer, Berlin, pp 47–61Google Scholar
  6. 6.
    Berzal F et al (2007) Hierarchical program representation for program element matching. In: IDEAL’07. Lecture Notes in Computer Science, vol 4881, pp 467–476Google Scholar
  7. 7.
    Bringmann B (2006) To see the wood for the trees: mining frequent tree patterns. In: Constraint-based mining and inductive databases, European workshop on inductive databases and constraint based mining. 11–13 March 2004, Hinterzarten, Germany. Revised Selected Papers. Lecture Notes in Computer Science, vol 3848. Springer, Berlin, pp 38–63Google Scholar
  8. 8.
    Cheung DW-L et al (1996) Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8(6): 911–922CrossRefMathSciNetGoogle Scholar
  9. 9.
    Chi Y et al (2005a) Frequent subtree mining—an overview. Fundam Inform 66(1–2): 161–198MATHMathSciNetGoogle Scholar
  10. 10.
    Chi Y et al (2005b) Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans Knowl Data Eng 17(2): 190–202CrossRefGoogle Scholar
  11. 11.
    Chi Y et al (2004) HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical form. In: The 16th international conference on scientific and statistical database management, pp 11–20Google Scholar
  12. 12.
    Chi Y et al (2005c) Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl Inform Syst 8(2): 203–234CrossRefGoogle Scholar
  13. 13.
    Džeroski S (2003) Multi-relational data mining: an introduction. SIGKDD Explor Newsl 5(1): 1–16CrossRefGoogle Scholar
  14. 14.
    Gall H et al (2007) 4th international workshop on mining software repositories (MSR 2007). In: ICSE COMPANION ’07, pp 107–108Google Scholar
  15. 15.
    Hadzic F et al (2007) UNI3—efficient algorithm for mining unordered induced subtrees using TMG candidate generation. In: Computational intelligence and data mining, pp 568–575Google Scholar
  16. 16.
    Han J et al (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1–12Google Scholar
  17. 17.
    Hido S, Kawano H (2005) AMIOT: induced ordered tree mining in tree-structured databases. In: Proceedings of the 5th IEEE international conference on data mining, pp 170–177Google Scholar
  18. 18.
    Nayak R et al (2006) Knowledge discovery from XML documents. Lecture Notes in Computer Science, vol 3915. Springer, BerlinGoogle Scholar
  19. 19.
    Nijssen S, Kok JN (2003) Efficient discovery of frequent unordered trees. In: First international workshop on mining graphs, trees and sequences (MGTS2003), in conjunction with ECML/PKDD’03, pp 55–64Google Scholar
  20. 20.
    Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 647–652Google Scholar
  21. 21.
    Parthasarathy S et al (2001) Parallel data mining for association rules on shared-memory systems. Knowl Inform Syst 3(1): 1–29MATHCrossRefGoogle Scholar
  22. 22.
    Rückert U, Kramer S (2004) Frequent free tree discovery in graph data. In: Proceedings of the 2004 ACM symposium on applied computing, pp 564–570Google Scholar
  23. 23.
    Schuster A et al (2005) A high-performance distributed algorithm for mining association rules. Knowl Inform Syst 7(4): 458–475CrossRefGoogle Scholar
  24. 24.
    Shen L et al (1999) New algorithms for efficient mining of association rules. Inform Sci 118(1–4): 251–268CrossRefGoogle Scholar
  25. 25.
    Tan H et al (2005a) X3-Miner: mining patterns from an XML database. In: The 6th international conference on data mining, text mining and their business applications. May 2005, Skiathos, Greece, pp 287–296Google Scholar
  26. 26.
    Tan H et al (2005b) MB3-Miner: mining eMBedded subTREEs using tree model guided candidate generation. In: Proceedings of the first international workshop on mining complex data, pp 103–110Google Scholar
  27. 27.
    Tan H et al (2006) IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding. In: Proceedings of the 10th Pacific-Asia conference on knowledge discovery and data mining, pp 450–461Google Scholar
  28. 28.
    Tatikonda S et al (2006) TRIPS and TIDES: new algorithms for tree mining. In: Proceedings of the 15th ACM international conference on information and knowledge management, pp 455–464Google Scholar
  29. 29.
    Termier A et al (2002) TreeFinder: a first step towards XML data mining. In: Proceedings of the 2nd IEEE international conference on data mining, pp 450–457Google Scholar
  30. 30.
    Termier A et al (2004) DRYADE: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: Proceedings of the 4th IEEE international conference on data mining, pp 543–546Google Scholar
  31. 31.
    Wang C et al (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining. Lecture Notes in Computer Science, vol 3056. Springer, Berlin, pp 441–451Google Scholar
  32. 32.
    Xiao Y et al (2003) Efficient data mining for maximal frequent subtrees. In: Proceedings of the 3rd IEEE international conference on data mining, pp 379–386Google Scholar
  33. 33.
    Yin X et al (2004) CrossMine: efficient classification across multiple database relations. In: International conference on data engineering, pp 399–410Google Scholar
  34. 34.
    Yin X et al (2005) Cross-relational clustering with user’s guidance. In: Knowledge discovery and data mining, pp 344–353Google Scholar
  35. 35.
    Zaki MJ (2005a) Efficiently mining frequent embedded unordered trees. Fundam Inform 66(1–2): 33–52MATHMathSciNetGoogle Scholar
  36. 36.
    Zaki MJ (2005b) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowl Data Eng 17(8): 1021–1035CrossRefGoogle Scholar
  37. 37.
    Zhang S, Wang JTL (2008) Discovering frequent agreement subtrees from phylogenetic data. IEEE Trans Knowl Data Eng 20(1): 68–82CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  • Aída Jiménez
    • 1
  • Fernando Berzal
    • 2
  • Juan-Carlos Cubero
    • 1
  1. 1.Department of Computer Science and Artificial Intelligence, ETSIITUniversity of GranadaGranadaSpain
  2. 2.Department of Computer Science and Artificial Intelligence, ETSIITUniversity of GranadaGranadaSpain

Personalised recommendations