Advertisement

IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding

  • Henry Tan
  • Tharam S. Dillon
  • Fedja Hadzic
  • Elizabeth Chang
  • Ling Feng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3918)

Abstract

Tree mining has recently attracted a lot of interest in areas such as Bioinformatics, XML mining, Web mining, etc. We are mainly concerned with mining frequent induced and embedded subtrees. While more interesting patterns can be obtained when mining embedded subtrees, unfortunately mining such embedding relationships can be very costly. In this paper, we propose an efficient approach to tackle the complexity of mining embedded subtrees by utilizing a novel Embedding List representation, Tree Model Guided enumeration, and introducing the Level of Embedding constraint. Thus, when it is too costly to mine all frequent embedded subtrees, one can decrease the level of embedding constraint gradually up to 1, from which all the obtained frequent subtrees are induced subtrees. Our experiments with both synthetic and real datasets against two known algorithms for mining induced and embedded subtrees, FREQT and TreeMiner, demonstrate the effectiveness and the efficiency of the technique.

Keywords

Minimum Support Association Rule Mining Depth First Search Extension Point Tree Structure Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast Algo. for Mining Assoc. Rules. In: Proc. the 20th VLDB, pp. 487–499 (1994)Google Scholar
  2. 2.
    Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized Substructure Discovery for Semistructured Data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 1–14. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Chi, Y., Nijssen, S., Muntz, R.R., Kok, J.N.: Frequent Subtree Mining An Overview. Fundamenta Informaticae, Special Issue on Graph and Tree Mining (2005)Google Scholar
  4. 4.
    Feng, L., Dillon, T.S., Weigand, H., Chang, E.: An XML-Enabled Assoc. Rule Framework. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 88–97. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Kudo, T.: FREQT Implementation (2003), http://www.chasen.org/~taku/software/freqt/
  6. 6.
    Kuramochi, M., Karypis, G.: An Efficient Algo. for Discovering Freq. Subgraphs. IEEE Transactions Knowledge and Data Engineering 16(9), 1038–1051 (2004)CrossRefGoogle Scholar
  7. 7.
    Sidhu, A.S., Dillon, T.S., et al.: Protein Ontology: Vocabulary for Protein Data. In: 3rd IEEE ICITA 2005, Sydney, vol. 1, pp. 465–469 (2005)Google Scholar
  8. 8.
    Tan, H., Dillon, T.S., Feng, L., Chang, E., Hadzic, F.: X3-Miner: Mining Patterns from XML Database. In Proc. Data Mining 2005, Skiathos, Greece (2005) Google Scholar
  9. 9.
    Tan, H., Dillon, T.S., Hadzic, F., Chang, E., Feng, L.: MB3-Miner: mining eMBedded subTREEs using Tree Model Guided candidate generation. In: MCD 2005, Houston, USA (2005)Google Scholar
  10. 10.
    Tan, H., Dillon, T.S., Hadzic, F., Feng, L., Chang, E.: TMG: Tree Model Guided Candidate Generation. In: Data Mining 2006, Prague, Czech Republic (2006) (Submitted) Google Scholar
  11. 11.
    Termier, A., Rousset, M.-C., Sebag, M.: Treefinder: A First Step Towards XML Data Mining. In: Proc. IEEE. ICDM 2002 (2002)Google Scholar
  12. 12.
    Wang, C., Hong, M., Pei, J., Zhou, H., Wang, W., Shi, B.: Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 441–451. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  13. 13.
    Wang, K., Liu, H.: Discovering Typical Structures of Documents: A Road Map Approach. In: Proc. ACM SIGIR Conf. Information Retrieval (1998)Google Scholar
  14. 14.
    Yang, L.H., Lee, M.L., Hsu, W.: Efficient Mining of XML Query Patterns for Caching. In: Proc. the 29th VLDB Conf. (2003) Google Scholar
  15. 15.
    Zhang, J., Ling, T.W., Bruckner, R.M., Tjoa, A.M., Liu, H.: On Efficient and Effective Association Rule Mining from XML Data. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 497–507. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Zaki, M.J.: Fast Vertical Mining Using Diffsets. In: Proc. of SIGKDD 2003 (2003)Google Scholar
  17. 17.
    Zaki, M.J.: Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Transaction on Knowledge and Data Engineering 17(8), 1021–1035 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Henry Tan
    • 1
  • Tharam S. Dillon
    • 1
  • Fedja Hadzic
    • 1
  • Elizabeth Chang
    • 2
  • Ling Feng
    • 3
  1. 1.Faculty of Information TechnologyUniversity of Technology SydneySydneyAustralia
  2. 2.School of Information SystemCurtin University of TechnologyPerthAustralia
  3. 3.Department of Computer ScienceUniversity of TwenteEnschedeNetherlands

Personalised recommendations