Discovering Frequent Substructures in Large Unordered Trees

  • Tatsuya Asai
  • Hiroki Arimura
  • Takeaki Uno
  • Shin-ichi Nakano
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2843)


In this paper, we study a frequent substructure discovery problem in semi-structured data. We present an efficient algorithm Unotthat computes all frequent labeled unordered trees appearing in a large collection of data trees with frequency above a user-specified threshold. The keys of the algorithm are efficient enumeration of all unordered trees in canonical form and incremental computation of their occurrences. We then show that Unotdiscovers each frequent pattern T in O(kb 2 m) per pattern, where k is the size of T, b is the branching factor of the data trees, and m is the total number of occurrences of T in the data trees.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized Substructure Discovery for Semi-structured Data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 1–14. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)zbMATHGoogle Scholar
  3. 3.
    Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: Proc. SIAM SDM 2002, pp. 158–174 (2002)Google Scholar
  4. 4.
    Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa, S.: Online Algorithms for Mining Semi-structured Data Stream. In: Proc. IEEE ICDM 2002, pp. 27–34 (2002)Google Scholar
  5. 5.
    Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering Frequent Substructures in Large Unordered Trees, DOI Technical Report DOI-TR 216, Department of Informatics, Kyushu University (June 2003),
  6. 6.
    Avis, D., Fukuda, K.: Reverse Search for Enumeration. Discrete Applied Mathematics 65(1–3), 21–46 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Holder, L.B., Cook, D.J., Djoko, S.: Substructure Discovery in the SUBDUE System. In: Proc. KDD 1994, pp. 169–180 (1994)Google Scholar
  8. 8.
    Inokuchi, A., Washio, T., Motoda, H.: An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  9. 9.
    Kuramochi, M., Karypis, G.: Frequent Subgraph Discovery. In: Proc. IEEE ICDM (2001)Google Scholar
  10. 10.
    Miyahara, T., Suzuki, Y., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 341–355. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Nakano, S.: Efficient generation of plane trees. Information Processing Letters 84, 167–172 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Nakano, S., Uno, T.: Efficient Generation of Rooted Trees, NII Technical Report NII-2003-005E, Natinal Institute of Informatics (July 2003) ISSN 1346-5597Google Scholar
  13. 13.
    Nestrov, S., Abiteboul, S., Motwani, R.: Extracting Schema from Semistructured Data. In: Proc. SIGKDD 1998, pp. 295–306. ACM, New York (1998)Google Scholar
  14. 14.
    Nijssen, S., Kok, J.N.: Effcient Discovery of Frequent Unordered Trees. In: Proc. MGTS 2003 (September 2003)Google Scholar
  15. 15.
    Termier, A., Rousset, M., Sebag, M.: TreeFinder: a First Step towards XML Data Mining. In: Proc. IEEE ICDM 2002, pp. 450–457 (2002)Google Scholar
  16. 16.
    Uno, T.: A Fast Algorithm for Enumerating Bipartite Perfect Matchings. In: Eades, P., Takaoka, T. (eds.) ISAAC 2001. LNCS, vol. 2223, pp. 367–379. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  17. 17.
    Vanetik, N., Gudes, E., Shimony, E.: Computing Frequent Graph Patterns from Semistructured Data. In: Proc. IEEE ICDM 2002, pp. 458–465 (2002)Google Scholar
  18. 18.
    Wang, K., Liu, H.: Schema Discovery from Semistructured Data. In: Proc. KDD 1997, pp. 271–274 (1997)Google Scholar
  19. 19.
    Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: Proc. IEEE ICDM 2002, pp. 721–724 (2002)Google Scholar
  20. 20.
    Zaki, M.J.: Efficiently Mining Frequent Trees in a Forest. In: Proc. SIGKDD 2002, ACM, New York (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Tatsuya Asai
    • 1
  • Hiroki Arimura
    • 1
  • Takeaki Uno
    • 2
  • Shin-ichi Nakano
    • 3
  1. 1.Kyushu UniversityFukuokaJAPAN
  2. 2.National Institute of InformaticsTokyoJAPAN
  3. 3.Gunma UniversityKiryu-shi, GunmaJAPAN

Personalised recommendations