Skip to main content
Log in

Multirelational classification: a multiple view approach

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Multirelational classification aims at discovering useful patterns across multiple inter-connected tables (relations) in a relational database. Many traditional learning techniques, however, assume a single table or a flat file as input (the so-called propositional algorithms). Existing multirelational classification approaches either “upgrade” mature propositional learning methods to deal with relational presentation or extensively “flatten” multiple tables into a single flat file, which is then solved by propositional algorithms. This article reports a multiple view strategy—where neither “upgrading” nor “flattening” is required—for mining in relational databases. Our approach learns from multiple views (feature set) of a relational databases, and then integrates the information acquired by individual view learners to construct a final model. Our empirical studies show that the method compares well in comparison with the classifiers induced by the majority of multirelational mining systems, in terms of accuracy obtained and running time needed. The paper explores the implications of this finding for multirelational research and applications. In addition, the method has practical significance: it is appropriate for directly mining many real-world databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal CC (2004). On leveraging user access patterns for topic specific crawling. Data Min Knowl Discov 9(2): 123–145

    Article  MathSciNet  Google Scholar 

  2. Agrawal R, Imielinski T and Swami AN (1993). Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6): 914–925

    Article  Google Scholar 

  3. Berka P (2000) Guide to the financial data set. In: Siebes A, Berka P (eds) PKDD2000 discovery challenge

  4. Bilenko M, Kamath B, Mooney RJ (2006) Adaptive blocking: learning to scale up record linkage. In: ICDM ’06: Proceedings of the sixth international conference on data mining. Washington, DC, USA, IEEE Computer Society pp. 87–96

  5. Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp. 39–48

  6. Blockeel H and Raedt LD (1998). Top-Down Induction of First-Order Logical Decision Trees. Artif Intell 101(1–2): 285–297

    Article  MATH  Google Scholar 

  7. Blockeel H, Raedt LD, Jacobs N and Demoen B (1999). Scaling up inductive logic programming by learning from interpretations. Data Min Knowl Discov 3(1): 59–93

    Article  Google Scholar 

  8. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the workshop on computational learning theory

  9. Breiman L (1996). Bagging predictors. Mach Learn 24(2): 123–140

    MATH  MathSciNet  Google Scholar 

  10. Burges CJC (1998). A tutorial on support vector machines for pattern recognition. Data Mining Knowl Discov 2(2): 121–167

    Article  Google Scholar 

  11. Chen R, Sivakumar K and Kargupta H (2004). Collective mining of Bayesian networks from distributed heterogeneous data. Knowl Inf Syst 6(2): 164–187

    Google Scholar 

  12. Cheng J, Sweredoski MJ and Baldi P (2005). Accurate prediction of protein disordered regions by mining protein structure data. Data Min Knowl Discov 11(3): 213–222

    Article  MathSciNet  Google Scholar 

  13. Cheung DW, Ng VT, Fu AW and Fu Y (1996). Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8(6): 911–922

    Article  Google Scholar 

  14. Cho V and Wüthrich B (2002). Distributed mining of classification rules. Knowl Inf Syst 4(1): 1–30

    Article  MATH  Google Scholar 

  15. Clark P and Niblett T (1989). The CN2 induction algorithm. Mach Learn 3(4): 261–283

    Google Scholar 

  16. Collins M, Singer Y (1999) Unsupervised models for named entity classification. In: Proceedings of the joint SIGDAT Conference on empirical methods in natural language processing and very large corpora

  17. Coursac I, Duteil N, Lucas N (2002) PKDD 2001 discovery challenge—medical domain. In: The PKDD discovery challenge 2001, vol 3(2)

  18. Dasgupta S, Littman ML, McAllester DA (2001) PAC generalization bounds for co-training. In: NIPS, pp 375–382

  19. de Sa VR and Ballard DH (1998). Category learning through multi-modality sensing. Neural Comput 10(5): 1097–1117

    Article  Google Scholar 

  20. Domingos P (1999) MetaCost: a general method for making classifiers cost-Sensitive. In: KDD’99, pp 155–164

  21. Domingos P, Pazzani MJ (1996) Beyond independence: conditions for the optimality of the simple bayesian classifier. In: ICML ’96: Proceedings of the 13th international conference on machine learning. pp 105–112

  22. Dzeroski S and Raedt LD (2003). Multi-relational data mining: an introduction. SIGKDD Explor Newsl 5(1): 1–16

    Article  Google Scholar 

  23. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: International conference on machine learning, pp 148–156

  24. Garcia-Molina H, Ullman J and Widom J (2002). Database systems: the complete book. Prentice Hall, Englewood Cliffs

    Google Scholar 

  25. Gehrke J, Ramakrishnan R and Ganti V (2000). RainForest—a framework for fast decision tree construction of large datasets. Data Min Knowl Discov 4(2–3): 127–162

    Article  Google Scholar 

  26. Getoor L, Friedman N, Koller D, Taskar B (2001) Learning probabilistic models of relational structure. In: Proceedings of the 18th international conference on machine learning, pp 170–177

  27. Ghiselli EE (1964). Theory of psychological measurement. McGrawHill, New York

    Google Scholar 

  28. Ginsberg M (1994). Essentials of artificial intelligence. Kaufmann, San Francisco

    Google Scholar 

  29. Glocer K, Eads D, Theiler J (2005) Online feature selection for pixel classification. In: ICML ’05: Proceedings of the 22nd international conference on machine learning. ACM Press, New York pp 249–256

  30. Guo H and Viktor HL (2004). Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. SIGKDD Explor Newsl 6(1): 30–39

    Article  Google Scholar 

  31. Guo H, Viktor HL (2005) Mining relational databases with multi-view learning. In: MRDM ’05: Proceedings of the 4th International Workshop on Multi-relational Mining. ACM Press, pp 15–24

  32. Guo H, Viktor HL (2006) Mining relational data through correlation-based multiple view validation. In: KDD ’06. ACM Press, New York, pp 567–573

  33. Hall M (1998) Correlation-based feature selection for machine learning. Ph.D dissertation Waikato University

  34. Han J and Kamber M (2005). Data mining: concepts and techniques, 2nd edn. Kaufmann, San Francisco

    Google Scholar 

  35. Hulten G, Domingos P, Abe Y (2003) Mining massive relational databases. In: Proceedings of the IJCAI-2003 workshop on learning statistical models from relational data, pp 53–60

  36. Joachims T (1999). Support vector machines (Aktuelles Schlagwort). KI 13(4): 54–55

    Google Scholar 

  37. John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: UAI, pp 338–345

  38. Kargupta H, Huang W, Sivakumar K and Johnson E (2001). Distributed clustering using collective principal component analysis. Knowl Inf Syst 3(4): 422–448

    Article  MATH  Google Scholar 

  39. Kietz J-U, Zücker R, Vaduva A (2000) MINING MART: Combining case-based-reasoning and multistrategy learning into a framework for reusing KDD-applications. In: 5th Int’l workshop on multistrategy learning (MSL 2000). Guimaraes, Portugal

  40. Knobbe AJ (2004) Multi-relational data mining. Ph.D. thesis, University Utrecht

  41. Knobbe AJ, de Haas M, Siebes A (2001) Propositionalisation and aggregates. In: PKDD ’01: Proceedings of the 5th European conference on principles of data mining and knowledge discovery. Springer, London, pp 277–288

  42. Kohavi R (1995) Wrappers for performance enhancement and oblivious decision graphs. Ph.D. thesis, Stanford University

  43. Kohavi R and John GH (1997). Wrappers for feature subset selection. Artif Intell 97(1–2): 273–324

    Article  MATH  Google Scholar 

  44. Krogel M-A (2005) On propositionalization for knowledge discovery in relational databases. Ph.D. thesis, Fakultät fuer Informatik, Otto-von-Guericke-Universität Magdeburg

  45. Krogel M-A, Rawles S, Zelezny F, Flach PA, Lavrac N, Wrobel S (2003) Comparative evaluation of approaches to propositionalization. In: ILP, pp 197–214

  46. Krogel M-A, Wrobel S (2001) Transformation-based learning using multirelational aggregation. In: ILP, pp 142–155

  47. Langley P, Sage S (1994) Induction of selective Bayesian classifiers. In: UAI ’94: Proceedings of the 10th annual conference on uncertainty in AI). pp 399–40, Morgan Kaufmann, San Francisco

  48. Lavrac N and Dzeroski S (1993). Inductive logic programming: techniques and applications. Routledge, New York

    Google Scholar 

  49. Lavrač N (1990) Principles of knowledge acquisition in expert systems. Ph.D. thesis, Faculty of Technical Sciences, University of Maribor

  50. Michalski RS, Mozetic I, Hong J, Lavrac N (1986) The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In: AAAI, pp 1041–1047

  51. Muggleton S (1995). Inverse entailment and progol. New Generat Comput, Special issue on Inductive Logic Programming 13(3–4): 245–286

    Google Scholar 

  52. Muggleton S, Feng C (1990) Efficient induction of logic programs. In: Proceedings of the 1st conference on algorithmic learning theory. Ohmsma, Tokyo pp 368–381

  53. Muggleton S and Raedt LD (1994). Inductive logic programming: theory and methods. J Log Programm 19/20: 629–679

    Article  Google Scholar 

  54. Muslea IA (2002) Active learning with multiple views. Ph.D. thesis, Department of Computer Science, University of Southern California

  55. Neville J, Jensen D, Friedland L, Hay M (2003) Learning relational probability trees. In: KDD ’03. pp 625–630, ACM Press, New York

  56. Parthasarathy S, Zaki MJ, Ogihara M and Li W (2001). Parallel data mining for association rules on shared-memory systems. Knowl Inf Syst 3(1): 1–29

    Article  MATH  Google Scholar 

  57. Perlich C, Provost FJ (2003) Aggregation-based feature invention and relational concept classes. In: KDD’03, pp 167–176

  58. Press WH, Flannery BP, Teukolsky SA and Vetterling WT (1988). Numerical recipes in C: the art of scientific computing. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  59. Quinlan JR (1993). C4.5: programs for machine learning. Morgan Kaufmann, San Francisco

    Google Scholar 

  60. Quinlan JR, Cameron-Jones RM (1993) FOIL: a midterm report. In: ECML, pp 3–20

  61. Raedt LD, Laer WV (1995) Inductive constraint logic. In: Proceedings of the 6th conference on algorithmic learning theory, vol 997. Springer, Heidelberg

  62. Ramakrishnan R and Gehrke J (2003). Database management systems. McGraw-Hill, New York

    MATH  Google Scholar 

  63. Russell S and Norvig P (1995). Artificial Intelligence: a modern approach. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  64. Sayal M and Scheuermann P (2001). Distributed web log mining using maximal large itemsets. Knowl Inf Syst 3(4): 389–404

    Article  MATH  Google Scholar 

  65. Skillicorn DB and Wang Y (2001). Parallel and sequential algorithms for data mining using inductive logic. Knowl Inf Syst 3(4): 405–421

    Article  MATH  Google Scholar 

  66. Srinivasan A and King RD (1999). Feature construction with inductive logic programming: a study of quantitative predictions of biological activity aided by structural attributes. Data Min Knowl Discov 3(1): 37–57

    Article  Google Scholar 

  67. Srinivasan A, Muggleton SH, Sternberg MJE and King RD (1996). Theories for mutagenicity: a study in first-order and feature-based induction. Artif Intell 85(1–2): 277–299

    Article  Google Scholar 

  68. Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic models for relational data. In: UAI, pp 485–492

  69. Vens C, Assche AV, Blockeel H, Dzeroski S (2004) First order random forests with complex aggregates. In: ILP, pp 323–340

  70. Webb G and Zheng Z (2004). Multistrategy ensemble learning: reducing error by combining ensemble learning techniques. IEEE Trans Knowl Data Eng 16(8): 980–991

    Article  Google Scholar 

  71. Webb GI (2000). MultiBoosting: a technique for combining boosting and bagging. Mach Learn 40(2): 159–196

    Article  Google Scholar 

  72. Witten IH and Frank E (2000). Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco

    Google Scholar 

  73. Wolpert DH (1992). Stacked generalization. Neural Netw 5(2): 241–259

    Article  Google Scholar 

  74. Wu X, Zhang C and Zhang S (2005). Database classification for multi-database mining. Inf Syst 30(1): 71–88

    Article  Google Scholar 

  75. Wu X and Zhang S (2003). Synthesizing high-frequency rules from different data sources. IEEE Trans Knowl Data Eng 15(2): 353–367

    Article  Google Scholar 

  76. Yin X, Han J, Yang J, Yu PS (2004) CrossMine: efficient classification across multiple database relations. In: ICDE’04, Boston, pp 399–410

  77. Zhang S, Wu X and Zhang C (2003). Multi-database mining. IEEE Comput Intell Bull 2(1): 5–13

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongyu Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, H., Viktor, H.L. Multirelational classification: a multiple view approach. Knowl Inf Syst 17, 287–312 (2008). https://doi.org/10.1007/s10115-008-0127-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-008-0127-5

Keywords

Navigation