Advertisement

CrossMine: Efficient Classification Across Multiple Database Relations

  • Xiaoxin Yin
  • Jiawei Han
  • Jiong Yang
  • Philip S. Yu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3848)

Abstract

Most of today’s structured data is stored in relational data- bases. Such a database consists of multiple relations that are linked together conceptually via entity-relationship links in the design of relational database schemas. Multi-relational classification can be widely used in many disciplines including financial decision making and medical research. However, most classification approaches only work on single “flat” data relations. It is usually difficult to convert multiple relations into a single flat relation without either introducing huge “universal relation” or losing essential information. Previous works using Inductive Logic Programming approaches (recently also known as Relational Mining) have proven effective with high accuracy in multi-relational classification. Unfortunately, they fail to achieve high scalability w.r.t. the number of relations in databases because they repeatedly join different relations to search for good literals.

In this paper we propose CrossMine, an efficient and scalable approach for multi-relational classification. CrossMine employs tuple ID propagation, a novel method for virtually joining relations, which enables flexible and efficient search among multiple relations. CrossMine also uses aggregated information to provide essential statistics for classification. A selective sampling method is used to achieve high scalability w.r.t. the number of tuples in the databases. Our comprehensive experiments on both real and synthetic databases demonstrate the high scalability and accuracy of CrossMine.

Keywords

Class Label Inductive Logic Programming Target Relation Multiple Relation Clause Generation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Appice, A., Ceci, M., Malerba, D.: Mining model trees: a multi-relational approach. In: Horváth, T., Yamamoto, A. (eds.) ILP 2003. LNCS (LNAI), vol. 2835, pp. 4–21. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  2. 2.
    Aronis, J.M., Provost, F.J.: Increasing the Efficiency of Data Mining Algorithms with Breadth-First Marker Propagation. In: Proc. 2003 Int. Conf. Knowledge Discovery and Data Mining, Newport Beach, CA (1997)Google Scholar
  3. 3.
    Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of logical decision trees. In: Proc. 1998 Int. Conf. Machine Learning, Madison, WI (August 1998)Google Scholar
  4. 4.
    Blockeel, H., De Raedt, L., Jacobs, N., Demoen, B.: Scaling up inductive logic programming by learning from interpretations. Data Mining and Knowledge Discovery 3(1), 59–93 (1999)CrossRefGoogle Scholar
  5. 5.
    Blockeel, H., Dehaspe, L., Demoen, B., Janssens, G., Ramon, J., Vandecasteele, H.: Improving the efficiency of inductive logic programming through the use of query packs. Journal of Artificial Intelligence Research 16, 135–166 (2002)zbMATHGoogle Scholar
  6. 6.
    Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 121–168 (1998)CrossRefGoogle Scholar
  7. 7.
    Clark, P., Boswell, R.: Rule induction with CN2: Some recent improvements. In: Proc. 1991 European Working Session on Learning, pp. 151–163. Porto, Portugal (March 1991)Google Scholar
  8. 8.
    Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The Complete Book. Prentice-Hall, Englewood Cliffs (2002)Google Scholar
  9. 9.
    Gehrke, J., Ramakrishnan, R., Ganti, V.: Rainforest: A framework for fast decision tree construction of large datasets. In: Proc. 1998 Int. Conf. Very Large Data Bases, New York (August 1998)Google Scholar
  10. 10.
    Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Horwood (1994)Google Scholar
  11. 11.
    Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)zbMATHGoogle Scholar
  12. 12.
    Muggleton, S.: Inductive Logic Programming. Academic Press, New York (1992)zbMATHGoogle Scholar
  13. 13.
    Muggleton, S.: Inverse entailment and progol. In: New Generation Computing. Special issue on Inductive Logic Programming (1995)Google Scholar
  14. 14.
    Muggleton, S., Feng, C.: Efficient induction of logic programs. In: Proc. 1990 Conf. Algorithmic Learning Theory, Tokyo, Japan (1990)Google Scholar
  15. 15.
    Neville, J., Jensen, D., Friedland, L., Hay, M.: Learning Relational Probability Trees. In: Proc. 2003 Int. Conf. Knowledge Discovery and Data Mining, Washtington, DC (2003)Google Scholar
  16. 16.
    Popescul, A., Ungar, L., Lawrence, S., Pennock, M.: Towards structural logistic regression: Combining relational and statistical learning. In: Proc. Multi-Relational Data Mining Workshop, Alberta, Canada (2002)Google Scholar
  17. 17.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  18. 18.
    Quinlan, J.R., Cameron-Jones, R.M.: FOIL: A midterm report. In: Proc. 1993 European Conf. Machine Learning, Vienna, Austria (1993)Google Scholar
  19. 19.
    Taskar, B., Segal, E., Koller, D.: Probabilistic classification and clustering in relational data. In: Proc. 2001 Int. Joint Conf. Artificial Intelligence, Seattle, WA (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Xiaoxin Yin
    • 1
  • Jiawei Han
    • 1
  • Jiong Yang
    • 1
  • Philip S. Yu
    • 2
  1. 1.University of Illinois at Urbana-ChampaignUrbanaUSA
  2. 2.IBM T.J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations