Using and Learning Semantics in Frequent Subgraph Mining

  • Bettina Berendt
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4198)


The search for frequent subgraphs is becoming increasingly important in many application areas including Web mining and bioinformatics. Any use of graph structures in mining, however, should also take into account that it is essential to integrate background knowledge into the analysis, and that patterns must be studied at different levels of abstraction. To capture these needs, we propose to use taxonomies in mining and to extend frequency / support measures by the notion of context-induced interestingness. The AP-IP mining problem is to find all frequent abstract patterns and the individual patterns that constitute them and are therefore interesting in this context (even though they may be infrequent). The paper presents the fAP-IP algorithm that uses a taxonomy to search for the abstract and individual patterns, and that supports graph clustering to discover further structure in the individual patterns. Semantics are used as well as learned in this process. fAP-IP is implemented as an extension of Gaston (Nijssen & Kok, 2004), and it is complemented by the AP-IP visualization tool that allows the user to navigate through detail-and-context views of taxonomy context, pattern context, and transaction context. A case study of a real-life Web site shows the advantages of the proposed solutions.

ACM categories and subject descriptors and keywords: H.2.8 [Database Management]: Database Applications—data mining; H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia —navigation, user issues; graph mining; Web mining; background knowledge and semantics in mining.


Association Rule Canonical Form Frequent Pattern Edge Label Node Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berendt, B.: Using site semantics to analyze, visualize and support navigation. Data Mining and Knowledge Discovery 6(1), 37–59 (2002)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Berendt, B., Brenstein, E.: Visualizing Individual Differences in Web Navigation: STRATDYN, a Tool for Analyzing Navigation Patterns. BRMIC 33, 243–257 (2001)Google Scholar
  3. 3.
    Berendt, B., Hotho, A., Stumme, G.: Usage mining for and on the semantic web. In: Kargupta, H., et al. (eds.) Data Mining: Next Generation Challenges and Future Directions, pp. 461–480. AAAI/MIT Press, Menlo Park (2004)Google Scholar
  4. 4.
    Berendt, B., Kralisch, A.: Analysing and visualising logfiles: the Individualised SiteMap tool ISM. In: Proc. GOR 2005 (2005)Google Scholar
  5. 5.
    Berendt, B., Spiliopoulou, M.: Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal 9(1), 56–75 (2000)CrossRefGoogle Scholar
  6. 6.
    Borgelt, C.: On Canonical Forms for Frequent Graph Mining. In: Proc. of Workshop on Mining Graphs, Trees, and Sequences (MGTS 2005 at PKDD 2005), pp. 1–12 (2005)Google Scholar
  7. 7.
    Dai, H., Mobasher, B.: Using ontologies to discover domain-level web usage profiles. In: Proc. 2nd Semantic Web Mining Workshop at PKDD 2001 (2001)Google Scholar
  8. 8.
    Dupret, G., Piwowarski, B.: Deducing a term taxonomy from term similarities. In: Proc. Knowledge Discovery and Ontologies Workshop at PKDD 2005, pp. 11-22 (2005)Google Scholar
  9. 9.
    Eirinaki, M., Vazirgiannis, M., Varlamis, I.: Sewep: Using site semantics and a taxonomy to enhance the web personalization process. In: Proc. SIGKDD 2003, pp. 99–108 (2003)Google Scholar
  10. 10.
    Hofer, H., Borgelt, C., Berthold, M.R.: Large scale mining of molecular fragments with wildcards. In: Advances in Intelligent Data Analysis V, pp. 380–389 (2003)Google Scholar
  11. 11.
    Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphisms. In: Proc. ICDM, pp. 549–552 (2003)Google Scholar
  12. 12.
    Inokuchi, A.: Mining generalized substructures from a set of labeled graphs. In: Proc. ICDM 2004, pp. 414–418 (2004)Google Scholar
  13. 13.
    Inokuchi, I., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  14. 14.
    Inokuchi, I., Washio, T., Nishimura, K., Motoda, H.: A fast algorithm for mining frequent connected subgraphs. Research Report, IBM Research, Tokyo (2002)Google Scholar
  15. 15.
    Jin, R., Wang, C., Polshakov, D., Parthasarathy, S., Agrawal, G.: Discovering frequent topological structures from graph datasets. In: Proc. SIGKDD 2005, pp. 606–611 (2005)Google Scholar
  16. 16.
    Jin, X., Zhou, Y., Mobasher, B.: A maximum entropy web recommendation system: Combining collaborative and content features. In: Proc. SIGKDD 2005, pp. 612–617 (2005)Google Scholar
  17. 17.
    Kralisch, A., Berendt, B.: Language sensitive search behaviour and the role of domain knowledge. New Review of Hypermedia and Multimedia 11, 221–246 (2005)CrossRefGoogle Scholar
  18. 18.
    Kralisch, A., Eisend, M., Berendt, B.: Impact of culture on website navigation behaviour. In: Proc. HCI-International (2005)Google Scholar
  19. 19.
    Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc. ICDM, pp. 313–320 (2001)Google Scholar
  20. 20.
    McEneaney, J.E.: Graphic and numerical methods to assess navigation in hypertext. Int. J. of Human-Computer Studies 55, 761–786 (2001)MATHCrossRefGoogle Scholar
  21. 21.
    Meinl, T., Borgelt, C., Berthold, M.R.: Mining fragments with fuzzy chains in molecular databases. In: Proc. Worksh. Mining Graphs, Trees & Sequences at PKDD 2004, pp. 49–60 (2004)Google Scholar
  22. 22.
    Meo, R., Lanzi, P.L., Matera, M., Esposito, R.: Integrating web conceptual modeling and web usage mining. In: Mobasher, B., Nasraoui, O., Liu, B., Masand, B. (eds.) WebKDD 2004. LNCS (LNAI), vol. 3932, pp. 135–148. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  23. 23.
    Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proc. SIGKDD 2004, pp. 647–652 (2004), extended version: LIACS, Leiden Univ., Leiden, The Netherlands, Tech. Report (April 2004),
  24. 24.
    Oberle, D., Berendt, B., Hotho, A., Gonzalez, J.: Conceptual user tracking. In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds.) AWIC 2003. LNCS (LNAI), vol. 2663, pp. 155–164. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  25. 25.
    Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proc. 21st VLDB Conference, pp. 407–419 (1995)Google Scholar
  26. 26.
    Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  27. 27.
    Wörlein, M., Meinl, T., Fischer, I., Philippsen, M.: A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 392–403. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  28. 28.
    Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: Proc. ICDM, pp. 51–58 (2002)Google Scholar
  29. 29.
    Yan, X., Zhou, X.J., Han, J.: Mining closed relational graphs with connectivity constraints. In: Proc. SIGKDD 2005, pp. 324–333 (2005)Google Scholar
  30. 30.
    Zaïane, O.R., Han, J.: Discovering web access patterns and trends by applying OLAP and data mining technology on web logs. In: Proc. ADL 1998, pp. 19–29 (1998)Google Scholar
  31. 31.
    Zaki, M.J.: Efficiently mining trees in a forest. In: Proc. SIGKDD 2002, pp. 71–80 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Bettina Berendt
    • 1
  1. 1.Institute of Information SystemsHumboldt University BerlinBerlinGermany

Personalised recommendations