FARS: A Multi-relational Feature and Relation Selection Approach for Efficient Classification

  • Bo Hu
  • Hongyan Liu
  • Jun He
  • Xiaoyong Du
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5139)

Abstract

Feature selection is an essential data processing step to remove the irrelevant and redundant attributes for shorter learning time, better accuracy and better comprehensibility. A number of algorithms have been proposed in both data mining and machine learning area. These algorithms are usually used in single table environment, where data are stored in one relational table or one flat file. They are not suitable for multi-relational environment, where data are stored in multiple tables joined each other by semantic relationships. To solve this problem, in this paper we propose a novel approach called FARS to do both feature and relation selection for efficient multi-relational classification. By this approach, we not only extend traditional feature selection method to selects relevant features from multi-relations, but also develop a new method to reconstruct the multi-relational database schema and get rid of irrelevant tables to further improve classification performance. Results of experiments conducted on several real databases show that FARS can effectively choose a small set of relevant features, enhancing the classification efficiency significantly and improving prediction accuracy.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Bo Hu
    • 1
  • Hongyan Liu
    • 2
  • Jun He
    • 1
  • Xiaoyong Du
    • 1
  1. 1.Key Labs of Data Engineering and Knowledge Engineering, MOE, China Information SchoolRenmin University of ChinaBeijingChina
  2. 2.School of Economics and ManagementTsinghua UniversityBeijingChina

Personalised recommendations