Advertisement

Finding Trees from Unordered 0–1 Data

  • Hannes Heikinheimo
  • Heikki Mannila
  • Jouni K. Seppänen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4213)

Abstract

Tree structures are a natural way of describing occurrence relationships between attributes in a dataset. We define a new class of tree patterns for unordered 0–1 data and consider the problem of discovering frequently occurring members of this pattern class. Intuitively, a tree T occurs in a row u of the data, if the attributes of T that occur in u form a subtree of T containing the root. We show that this definition has advantageous properties: only shallow trees have a significant probability of occurring in random data, and the definition allows a simple levelwise algorithm for mining all frequently occurring trees. We demonstrate with empirical results that the method is feasible and that it discovers interesting trees in real data.

Keywords

Association Rule Tree Pattern Frequent Itemsets Find Tree Interesting Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent subtree mining – an overview. Fundamenta Informaticae 66, 161–198 (2005)MATHMathSciNetGoogle Scholar
  2. 2.
    Chi, Y., Yang, Y., Muntz, R.R.: Indexing and mining free trees. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM), pp. 509–512 (2003)Google Scholar
  3. 3.
    Chi, Y., Yang, Y., Muntz, R.R.: Mining frequent rooted trees and free trees using canonical forms. Technical Report CSD-TR No. 030043, UCLA Computer Science Department (2003), ftp://ftp.cs.ucla.edu/tech-report/2003-reports/030043.pdf
  4. 4.
    Chi, Y., Yang, Y., Muntz, R.R.: HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM), pp. 11–20 (2004)Google Scholar
  5. 5.
    Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: First International Workshop on Mining Graphs, Trees and Sequneces (MGST), pp. 55–64 (2003)Google Scholar
  6. 6.
    Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 71–80 (2002)Google Scholar
  7. 7.
    Felsenstein, J.: Inferring Phylogenies. Sinauer Associates, Inc., Sunderland (2004)Google Scholar
  8. 8.
    Pei, J., Tung, A.K., Han, J.: Fault-tolerant frequent pattern mining: Problems and challenges. In: Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), pp. 7–12 (2001)Google Scholar
  9. 9.
    Yang, C., Fayyad, U., Bradley, P.S.: Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 194–203 (2001)Google Scholar
  10. 10.
    Seppänen, J.K., Mannila, H.: Dense itemsets. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 683–688 (2004)Google Scholar
  11. 11.
    Gionis, A., Kujala, T., Mannila, H.: Fragments of order. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 129–136 (2003)Google Scholar
  12. 12.
    Tuzhilin, A., Adomavicius, G.: Handling very large numbers of association rules in the analysis of microarray data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 396–404 (2002)Google Scholar
  13. 13.
    Lent, B., Swami, A.N., Widom, J.: Clustering association rules. In: Proceedings of the 13th International Conference on Data Engineering (ICDE), pp. 220–231 (1997)Google Scholar
  14. 14.
    Liu, B., Hsu, W., Ma, Y.: Pruning and summarizing the discovered associations. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 125–134 (1999)Google Scholar
  15. 15.
    Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Proceedings of the Third International Conference on Information and Knowledge Management (CIKM), pp. 401–407 (1994)Google Scholar
  16. 16.
    Jaroszewicz, S., Simovici, D.A.: Pruning redundant association rules using maximum entropy principle. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 135–147. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  17. 17.
    Kreher, D.L., Stinson, D.R.: Combinatorial Algorithms: Generation, Enumeration and Search. In: Discrete mathematics and its applications. CRC Press, Boca Raton (1999)Google Scholar
  18. 18.
    Sloane, N.J.A.: The on-line encyclopedia of integer sequences (2006), http://www.research.att.com/~njas/sequences/
  19. 19.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)Google Scholar
  20. 20.
    Hettich, S., Bay, S.D.: The UCI KDD archive. University of California, Department of Information and Computer Science, Irvine, CA (1999), http://kdd.ics.uci.edu
  21. 21.
    Fortelius, M. (coordinator): Neogene of the old world database of fossil mammals (NOW), University of Helsinki (2006), http://www.helsinki.fi/science/now/
  22. 22.
    Fortelius, M., Gionis, A., Jernvall, J., Mannila, H.: Spectral ordering and biochronology of european fossil mammals. Paleobiology 32, 206–214 (2006)CrossRefGoogle Scholar
  23. 23.
    Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)Google Scholar
  24. 24.
    Bayardo, R.: Efficiently mining long patterns from databases. In: Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD), pp. 85–93 (1998)Google Scholar
  25. 25.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hannes Heikinheimo
    • 1
  • Heikki Mannila
    • 1
  • Jouni K. Seppänen
    • 1
  1. 1.HIIT Basic Research Unit, Lab. Computer and Information ScienceHelsinki University of TechnologyFinland

Personalised recommendations