Advertisement

Data Mining and Knowledge Discovery

, Volume 24, Issue 1, pp 1–39 | Cite as

Using trees to mine multirelational databases

  • Aída Jiménez
  • Fernando Berzal
  • Juan-Carlos Cubero
Article

Abstract

This paper proposes a new approach to mine multirelational databases. Our approach is based on the representation of multirelational databases as sets of trees, for which we propose two alternative representation schemes. Tree mining techniques can thus be applied as the basis for multirelational data mining techniques, such as multirelational classification or multirelational clustering. We analyze the differences between identifying induced and embedded tree patterns in the proposed tree-based representation schemes and we study the relationships among the sets of tree patterns that can be discovered in each case. This paper also describes how these frequent tree patterns can be used, for instance, to mine association rules in multirelational databases.

Keywords

Multirelational databases Frequent itemset mining Association rules Tree pattern mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abe K, Kawasoe S, Asai T, Arimura H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: Proceedings of the 2nd SIAM international conference on data mining, pp 158–174Google Scholar
  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, 12–15 Sept, pp 487–499Google Scholar
  3. Bayardo RJ (2004) The hows, whys, and whens of constraints in itemset and rule discovery. In: Constraint-based mining and inductive databases, lecture notes in artificial intelligence, pp 1–13Google Scholar
  4. Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3): 221–235zbMATHGoogle Scholar
  5. Berzal F, Cubero JC, Sánchez D, Serrano JM (2004) ART: a hybrid classification model. Mach Learn 54(1): 67–92CrossRefzbMATHGoogle Scholar
  6. Blockeel H, Raedt LD (1998) Top-down induction of first-order logical decision trees. Artif Intell 101 (1–2): 285–297CrossRefzbMATHGoogle Scholar
  7. Booch G, Rumbaugh J, Jacobson I (2005) The unified modeling language user guide, 2nd edn. Addison-Wesley Professional, New YorkGoogle Scholar
  8. Chi Y, Yang Y, Muntz RR (2003) Indexing and mining free trees. In: Proceedings of the 3rd IEEE international conference on data mining, pp 509–512Google Scholar
  9. Chi Y, Muntz RR, Nijssen S, Kok JN (2005) Frequent subtree mining—an overview. Fundam Inform 66(1–2): 161–198zbMATHMathSciNetGoogle Scholar
  10. Codd EF (1990) The relational model for database management, version 2. Addison-Wesley, New YorkzbMATHGoogle Scholar
  11. De Knijf J (2006) FAT-miner: mining frequent attribute trees. Tech. Rep. UU-CS-2006-053, Department of Information and Computing Sciences, Utrecht UniversityGoogle Scholar
  12. De Knijf J (2007) FAT-miner: mining frequent attribute trees. In: Proceedings of the 2007 ACM symposium on applied computing. ACM, New York, pp 417–422Google Scholar
  13. Džeroski S (2003) Multi-relational data mining: an introduction. SIGKDD Explor Newsl 5(1): 1–16CrossRefGoogle Scholar
  14. Fagin R, Mendelzon AO, Ullman JD (1982) A simplied universal relation assumption and its properties. ACM Trans Database Syst 7: 343–360CrossRefzbMATHMathSciNetGoogle Scholar
  15. Garcia-Molina H, Ullman JD, Widom J (2008) Database systems: the complete book. Pearson Education, BostonGoogle Scholar
  16. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1): 53–87CrossRefMathSciNetGoogle Scholar
  17. Jimenez A, Berzal F, Cubero JC (2010a) Frequent tree pattern mining: a survey. Intell Data Anal 14(6): 603–622Google Scholar
  18. Jimenez A, Berzal F, Cubero JC (2010b) POTMiner: mining ordered, unordered, and partially-ordered trees. Knowl Inform Syst 23(2): 199–224CrossRefGoogle Scholar
  19. King RD, Srinivasan A, Dehaspe L (2001) Warmr: a data mining tool for chemical data. J Comput-Aided Mol Des 15(2): 173–181CrossRefGoogle Scholar
  20. Krogel MA, Wrobel S (2003) Facets of aggregation approaches to propositionalization. In: Horvath T, Yamamoto A (eds) Work-in-progress track at the thirteenth international conference on inductive logic programmingGoogle Scholar
  21. Lee AJT, Wang CS (2007) An efficient algorithm for mining frequent inter-transaction patterns. Inform Sci 177(17): 3453–3476CrossRefMathSciNetGoogle Scholar
  22. Leiva HA, Gadia S, Dobbs D (2002) MRDTL: a multi-relational decision tree learning algorithm. In: Proceedings of the 13th international conference on inductive logic programming. Springer-Verlag, pp 38–56Google Scholar
  23. Maier D, Ullman JD (1983) Maximal objects and the semantics of universal relation databases. ACM Trans Database Syst 8: 1–14CrossRefzbMATHGoogle Scholar
  24. Maier D, Ullman JD, Vardi MY (1984) On the foundations of the universal relation model. ACM Trans Database Syst 9: 283–308CrossRefzbMATHMathSciNetGoogle Scholar
  25. McGovern A, Hiers NC, Collier M, II DJG, Brown RA (2008) Spatiotemporal relational probability trees: an introduction. In: Proceedings of the 8th IEEE international conference on data mining. IEEE Computer Society, pp 935–940Google Scholar
  26. Neville J, Jensen D, Friedland L, Hay M (2003) Learning relational probability trees. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 625–630Google Scholar
  27. Paterson J, Edlich S, Hörning H, Hörning R (2006) The definitive guide to db4o. Apress, New YorkGoogle Scholar
  28. Pei J, Han J (2002) Constrained frequent pattern mining: a pattern-growth view. SIGKDD Explor Newsl 4(1): 31–39CrossRefGoogle Scholar
  29. Perlich C, Provost F (2006) Distribution-based aggregation for relational learning with identifier attributes. Mach Learn 62: 65–105CrossRefGoogle Scholar
  30. Silberschatz A, Korth HF, Sudarshan S (2001) Database systems concepts. McGraw-Hill, New YorkGoogle Scholar
  31. Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of the 3rd international conference of knowledge discovery and data mining, pp 63–73Google Scholar
  32. Srinivasan A, Muggleton SH, King R, Sternberg M (1994) Mutagenesis: ILP experiments in a non-determinate biological domain. In: Proceedings of the 4th international workshop on inductive logic programming, vol 237 of GMD-Studien, pp 217–232Google Scholar
  33. Tung AKH, Lu H, Han J, Feng L (2003) Efficient mining of intertransaction association rules. IEEE Trans Knowl Data Eng 15(1): 43–56CrossRefGoogle Scholar
  34. Turmeaux T, Salleb A, Vrain C, Cassard D (2003) Learning characteristic rules relying on quantified paths. In: Proceedings of the 7th European conference on principles and practice of knowledge discovery in databases, pp 471–482Google Scholar
  35. Ullman JD (1988) Principles of database and knowledge-base systems, vol I: classical database systems. Computer Science Press Inc., New YorkGoogle Scholar
  36. Ullman JD (1990) Principles of database and knowledge-base systems, vol II: the new technologies. W. H. Freeman & Co., New YorkGoogle Scholar
  37. Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining. Lecture Notes in Computer Science, vol 3056, Springer, pp 441–451Google Scholar
  38. Xiao Y, Yao JF, Li Z, Dunham MH (2003) Efficient data mining for maximal frequent subtrees. In: Proceedings of the 3rd IEEE international conference on data mining, pp 379–386Google Scholar
  39. Yin X, Han J, Yang J, Yu PS (2004) CrossMine: efficient classification across multiple database relations. In: Proceedings of the 20th international conference on data engineering, pp 399–410Google Scholar
  40. Yin X, Han J, Yu PS (2005) Cross-relational clustering with user’s guidance. In: Proceedings of the 12th international conference on knowledge discovery and data mining, pp 344–353Google Scholar
  41. Zaki MJ (2005a) Efficiently mining frequent embedded unordered trees. Fundam Inform 66(1–2): 33–52zbMATHMathSciNetGoogle Scholar
  42. Zaki MJ (2005b) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowl Data Eng 17(8): 1021–1035CrossRefGoogle Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  • Aída Jiménez
    • 1
  • Fernando Berzal
    • 1
  • Juan-Carlos Cubero
    • 1
  1. 1.Department of Computer Science and Artificial IntelligenceETSIIT—University of GranadaGranadaSpain

Personalised recommendations