Don’t Be Afraid of Simpler Patterns

  • Björn Bringmann
  • Albrecht Zimmermann
  • Luc De Raedt
  • Siegfried Nijssen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4213)

Abstract

This paper investigates the trade-off between the expressiveness of the pattern language and the performance of the pattern miner in structured data mining. This trade-off is investigated in the context of correlated pattern mining, which is concerned with finding the k-best patterns according to a convex criterion, for the pattern languages of itemsets, multi-itemsets, sequences, trees and graphs. The criteria used in our investigation are the typical ones in data mining: computational cost and predictive accuracy and the domain is that of mining molecular graph databases. More specifically, we provide empirical answers to the following questions: how does the expressive power of the language affect the computational cost? and what is the trade-off between expressiveness of the pattern language and the predictive accuracy of the learned model? While answering the first question, we also introduce a novel stepwise approach to correlated pattern mining in which the results of mining a simpler pattern language are employed as a starting point for mining in a more complex one. This stepwise approach typically leads to significant speed-ups (up to a factor 1000) for mining graphs.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zaki, M.: Efficiently mining frequent trees in a forest. In: Hand, D., Keim, D., Ng, R. (eds.) KDD, pp. 71–80 (2002)Google Scholar
  2. 2.
    Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: ICDM, pp. 721–724 (2002)Google Scholar
  3. 3.
    Kramer, S., De Raedt, L., Helma, C.: Molecular feature mining in HIV data. In: KDD, pp. 136–143 (2001)Google Scholar
  4. 4.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994)Google Scholar
  5. 5.
    Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: PODS, pp. 226–236 (2000)Google Scholar
  6. 6.
    Zimmermann, A., De Raedt, L.: Corclass: Correlated association rule mining for classification. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 60–72. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  7. 7.
    Helma, C., Cramer, T., Kramer, S., De Raedt, L.: Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. Journal of Chemical Information and Computer Systems 44, 1402–1411 (2004)Google Scholar
  8. 8.
    Frank, E., Hall, M., Trigg, L.E., Holmes, G., Witten, I.H.: Data mining in bioinformatics using Weka. Bioinformatics 20, 2479–2481 (2004)CrossRefGoogle Scholar
  9. 9.
    Cohen, W.W.: Fast effective rule induction. In: Prieditis, A., Russell, S.J. (eds.) ICML, pp. 115–123 (1995)Google Scholar
  10. 10.
    Quinlan, J.R.: C4.5: Programs for Machine Learning (1993)Google Scholar
  11. 11.
    Blockeel, H., Dzeroski, S., Kompare, B., Kramer, S., Pfahringer, B., Laer, W.V.: Experiments in predicting biodegradability. Appl. Art. Int. 18, 157–181 (2004)CrossRefGoogle Scholar
  12. 12.
    Kazius, J., Nijssen, S., Kok, J., Back, T., IJzerman, A.: Substructure mining using elaborate chemical representation. Journal of Chemical Information and Modeling 46, 597–605 (2006)CrossRefGoogle Scholar
  13. 13.
    Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: KDD, pp. 647–652 (2004)Google Scholar
  14. 14.
    Horváth, T., Gärtner, T., Wrobel, S.: Cyclic pattern kernels for predictive graph mining. In: KDD, pp. 158–167 (2004)Google Scholar
  15. 15.
    Wale, N., Karypis, G.: Acyclic subgraph-based descriptor spaces for chemical compound retrieval and classification. Technical report, Univ. Minnesota (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Björn Bringmann
    • 1
  • Albrecht Zimmermann
    • 1
  • Luc De Raedt
    • 1
  • Siegfried Nijssen
    • 1
  1. 1.Institute of Computer Science, Machine Learning LabAlbert-Ludwigs-University FreiburgFreiburgGermany

Personalised recommendations