Multirelational classification: a multiple view approach

Guo, Hongyu; Viktor, Herna L.

doi:10.1007/s10115-008-0127-5

Multirelational classification: a multiple view approach

Regular Paper
Published: 26 February 2008

Volume 17, pages 287–312, (2008)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Hongyu Guo¹ &
Herna L. Viktor¹

174 Accesses
24 Citations
Explore all metrics

Abstract

Multirelational classification aims at discovering useful patterns across multiple inter-connected tables (relations) in a relational database. Many traditional learning techniques, however, assume a single table or a flat file as input (the so-called propositional algorithms). Existing multirelational classification approaches either “upgrade” mature propositional learning methods to deal with relational presentation or extensively “flatten” multiple tables into a single flat file, which is then solved by propositional algorithms. This article reports a multiple view strategy—where neither “upgrading” nor “flattening” is required—for mining in relational databases. Our approach learns from multiple views (feature set) of a relational databases, and then integrates the information acquired by individual view learners to construct a final model. Our empirical studies show that the method compares well in comparison with the classifiers induced by the majority of multirelational mining systems, in terms of accuracy obtained and running time needed. The paper explores the implications of this finding for multirelational research and applications. In addition, the method has practical significance: it is appropriate for directly mining many real-world databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal CC (2004). On leveraging user access patterns for topic specific crawling. Data Min Knowl Discov 9(2): 123–145
Article MathSciNet Google Scholar
Agrawal R, Imielinski T and Swami AN (1993). Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6): 914–925
Article Google Scholar
Berka P (2000) Guide to the financial data set. In: Siebes A, Berka P (eds) PKDD2000 discovery challenge
Bilenko M, Kamath B, Mooney RJ (2006) Adaptive blocking: learning to scale up record linkage. In: ICDM ’06: Proceedings of the sixth international conference on data mining. Washington, DC, USA, IEEE Computer Society pp. 87–96
Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp. 39–48
Blockeel H and Raedt LD (1998). Top-Down Induction of First-Order Logical Decision Trees. Artif Intell 101(1–2): 285–297
Article MATH Google Scholar
Blockeel H, Raedt LD, Jacobs N and Demoen B (1999). Scaling up inductive logic programming by learning from interpretations. Data Min Knowl Discov 3(1): 59–93
Article Google Scholar
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the workshop on computational learning theory
Breiman L (1996). Bagging predictors. Mach Learn 24(2): 123–140
MATH MathSciNet Google Scholar
Burges CJC (1998). A tutorial on support vector machines for pattern recognition. Data Mining Knowl Discov 2(2): 121–167
Article Google Scholar
Chen R, Sivakumar K and Kargupta H (2004). Collective mining of Bayesian networks from distributed heterogeneous data. Knowl Inf Syst 6(2): 164–187
Google Scholar
Cheng J, Sweredoski MJ and Baldi P (2005). Accurate prediction of protein disordered regions by mining protein structure data. Data Min Knowl Discov 11(3): 213–222
Article MathSciNet Google Scholar
Cheung DW, Ng VT, Fu AW and Fu Y (1996). Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8(6): 911–922
Article Google Scholar
Cho V and Wüthrich B (2002). Distributed mining of classification rules. Knowl Inf Syst 4(1): 1–30
Article MATH Google Scholar
Clark P and Niblett T (1989). The CN2 induction algorithm. Mach Learn 3(4): 261–283
Google Scholar
Collins M, Singer Y (1999) Unsupervised models for named entity classification. In: Proceedings of the joint SIGDAT Conference on empirical methods in natural language processing and very large corpora
Coursac I, Duteil N, Lucas N (2002) PKDD 2001 discovery challenge—medical domain. In: The PKDD discovery challenge 2001, vol 3(2)
Dasgupta S, Littman ML, McAllester DA (2001) PAC generalization bounds for co-training. In: NIPS, pp 375–382
de Sa VR and Ballard DH (1998). Category learning through multi-modality sensing. Neural Comput 10(5): 1097–1117
Article Google Scholar
Domingos P (1999) MetaCost: a general method for making classifiers cost-Sensitive. In: KDD’99, pp 155–164
Domingos P, Pazzani MJ (1996) Beyond independence: conditions for the optimality of the simple bayesian classifier. In: ICML ’96: Proceedings of the 13th international conference on machine learning. pp 105–112
Dzeroski S and Raedt LD (2003). Multi-relational data mining: an introduction. SIGKDD Explor Newsl 5(1): 1–16
Article Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: International conference on machine learning, pp 148–156
Garcia-Molina H, Ullman J and Widom J (2002). Database systems: the complete book. Prentice Hall, Englewood Cliffs
Google Scholar
Gehrke J, Ramakrishnan R and Ganti V (2000). RainForest—a framework for fast decision tree construction of large datasets. Data Min Knowl Discov 4(2–3): 127–162
Article Google Scholar
Getoor L, Friedman N, Koller D, Taskar B (2001) Learning probabilistic models of relational structure. In: Proceedings of the 18th international conference on machine learning, pp 170–177
Ghiselli EE (1964). Theory of psychological measurement. McGrawHill, New York
Google Scholar
Ginsberg M (1994). Essentials of artificial intelligence. Kaufmann, San Francisco
Google Scholar
Glocer K, Eads D, Theiler J (2005) Online feature selection for pixel classification. In: ICML ’05: Proceedings of the 22nd international conference on machine learning. ACM Press, New York pp 249–256
Guo H and Viktor HL (2004). Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. SIGKDD Explor Newsl 6(1): 30–39
Article Google Scholar
Guo H, Viktor HL (2005) Mining relational databases with multi-view learning. In: MRDM ’05: Proceedings of the 4th International Workshop on Multi-relational Mining. ACM Press, pp 15–24
Guo H, Viktor HL (2006) Mining relational data through correlation-based multiple view validation. In: KDD ’06. ACM Press, New York, pp 567–573
Hall M (1998) Correlation-based feature selection for machine learning. Ph.D dissertation Waikato University
Han J and Kamber M (2005). Data mining: concepts and techniques, 2nd edn. Kaufmann, San Francisco
Google Scholar
Hulten G, Domingos P, Abe Y (2003) Mining massive relational databases. In: Proceedings of the IJCAI-2003 workshop on learning statistical models from relational data, pp 53–60
Joachims T (1999). Support vector machines (Aktuelles Schlagwort). KI 13(4): 54–55
Google Scholar
John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: UAI, pp 338–345
Kargupta H, Huang W, Sivakumar K and Johnson E (2001). Distributed clustering using collective principal component analysis. Knowl Inf Syst 3(4): 422–448
Article MATH Google Scholar
Kietz J-U, Zücker R, Vaduva A (2000) MINING MART: Combining case-based-reasoning and multistrategy learning into a framework for reusing KDD-applications. In: 5th Int’l workshop on multistrategy learning (MSL 2000). Guimaraes, Portugal
Knobbe AJ (2004) Multi-relational data mining. Ph.D. thesis, University Utrecht
Knobbe AJ, de Haas M, Siebes A (2001) Propositionalisation and aggregates. In: PKDD ’01: Proceedings of the 5th European conference on principles of data mining and knowledge discovery. Springer, London, pp 277–288
Kohavi R (1995) Wrappers for performance enhancement and oblivious decision graphs. Ph.D. thesis, Stanford University
Kohavi R and John GH (1997). Wrappers for feature subset selection. Artif Intell 97(1–2): 273–324
Article MATH Google Scholar
Krogel M-A (2005) On propositionalization for knowledge discovery in relational databases. Ph.D. thesis, Fakultät fuer Informatik, Otto-von-Guericke-Universität Magdeburg
Krogel M-A, Rawles S, Zelezny F, Flach PA, Lavrac N, Wrobel S (2003) Comparative evaluation of approaches to propositionalization. In: ILP, pp 197–214
Krogel M-A, Wrobel S (2001) Transformation-based learning using multirelational aggregation. In: ILP, pp 142–155
Langley P, Sage S (1994) Induction of selective Bayesian classifiers. In: UAI ’94: Proceedings of the 10th annual conference on uncertainty in AI). pp 399–40, Morgan Kaufmann, San Francisco
Lavrac N and Dzeroski S (1993). Inductive logic programming: techniques and applications. Routledge, New York
Google Scholar
Lavrač N (1990) Principles of knowledge acquisition in expert systems. Ph.D. thesis, Faculty of Technical Sciences, University of Maribor
Michalski RS, Mozetic I, Hong J, Lavrac N (1986) The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In: AAAI, pp 1041–1047
Muggleton S (1995). Inverse entailment and progol. New Generat Comput, Special issue on Inductive Logic Programming 13(3–4): 245–286
Google Scholar
Muggleton S, Feng C (1990) Efficient induction of logic programs. In: Proceedings of the 1st conference on algorithmic learning theory. Ohmsma, Tokyo pp 368–381
Muggleton S and Raedt LD (1994). Inductive logic programming: theory and methods. J Log Programm 19/20: 629–679
Article Google Scholar
Muslea IA (2002) Active learning with multiple views. Ph.D. thesis, Department of Computer Science, University of Southern California
Neville J, Jensen D, Friedland L, Hay M (2003) Learning relational probability trees. In: KDD ’03. pp 625–630, ACM Press, New York
Parthasarathy S, Zaki MJ, Ogihara M and Li W (2001). Parallel data mining for association rules on shared-memory systems. Knowl Inf Syst 3(1): 1–29
Article MATH Google Scholar
Perlich C, Provost FJ (2003) Aggregation-based feature invention and relational concept classes. In: KDD’03, pp 167–176
Press WH, Flannery BP, Teukolsky SA and Vetterling WT (1988). Numerical recipes in C: the art of scientific computing. Cambridge University Press, Cambridge
MATH Google Scholar
Quinlan JR (1993). C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Google Scholar
Quinlan JR, Cameron-Jones RM (1993) FOIL: a midterm report. In: ECML, pp 3–20
Raedt LD, Laer WV (1995) Inductive constraint logic. In: Proceedings of the 6th conference on algorithmic learning theory, vol 997. Springer, Heidelberg
Ramakrishnan R and Gehrke J (2003). Database management systems. McGraw-Hill, New York
MATH Google Scholar
Russell S and Norvig P (1995). Artificial Intelligence: a modern approach. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Sayal M and Scheuermann P (2001). Distributed web log mining using maximal large itemsets. Knowl Inf Syst 3(4): 389–404
Article MATH Google Scholar
Skillicorn DB and Wang Y (2001). Parallel and sequential algorithms for data mining using inductive logic. Knowl Inf Syst 3(4): 405–421
Article MATH Google Scholar
Srinivasan A and King RD (1999). Feature construction with inductive logic programming: a study of quantitative predictions of biological activity aided by structural attributes. Data Min Knowl Discov 3(1): 37–57
Article Google Scholar
Srinivasan A, Muggleton SH, Sternberg MJE and King RD (1996). Theories for mutagenicity: a study in first-order and feature-based induction. Artif Intell 85(1–2): 277–299
Article Google Scholar
Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic models for relational data. In: UAI, pp 485–492
Vens C, Assche AV, Blockeel H, Dzeroski S (2004) First order random forests with complex aggregates. In: ILP, pp 323–340
Webb G and Zheng Z (2004). Multistrategy ensemble learning: reducing error by combining ensemble learning techniques. IEEE Trans Knowl Data Eng 16(8): 980–991
Article Google Scholar
Webb GI (2000). MultiBoosting: a technique for combining boosting and bagging. Mach Learn 40(2): 159–196
Article Google Scholar
Witten IH and Frank E (2000). Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco
Google Scholar
Wolpert DH (1992). Stacked generalization. Neural Netw 5(2): 241–259
Article Google Scholar
Wu X, Zhang C and Zhang S (2005). Database classification for multi-database mining. Inf Syst 30(1): 71–88
Article Google Scholar
Wu X and Zhang S (2003). Synthesizing high-frequency rules from different data sources. IEEE Trans Knowl Data Eng 15(2): 353–367
Article Google Scholar
Yin X, Han J, Yang J, Yu PS (2004) CrossMine: efficient classification across multiple database relations. In: ICDE’04, Boston, pp 399–410
Zhang S, Wu X and Zhang C (2003). Multi-database mining. IEEE Comput Intell Bull 2(1): 5–13
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada
Hongyu Guo & Herna L. Viktor

Authors

Hongyu Guo
View author publications
You can also search for this author in PubMed Google Scholar
Herna L. Viktor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongyu Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, H., Viktor, H.L. Multirelational classification: a multiple view approach. Knowl Inf Syst 17, 287–312 (2008). https://doi.org/10.1007/s10115-008-0127-5

Download citation

Received: 03 August 2007
Revised: 09 January 2008
Accepted: 19 January 2008
Published: 26 February 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s10115-008-0127-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multirelational classification: a multiple view approach

Abstract

Access this article

Similar content being viewed by others

WordificationMI: multi-relational data mining through multiple-instance propositionalization

FACTORBASE: multi-relational structure learning with SQL all the way

Itemset-Based Variable Construction in Multi-relational Supervised Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multirelational classification: a multiple view approach

Abstract

Access this article

Similar content being viewed by others

WordificationMI: multi-relational data mining through multiple-instance propositionalization

FACTORBASE: multi-relational structure learning with SQL all the way

Itemset-Based Variable Construction in Multi-relational Supervised Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation