FARS: A Multi-relational Feature and Relation Selection Approach for Efficient Classification

  • Bo Hu
  • Hongyan Liu
  • Jun He
  • Xiaoyong Du
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5139)

Abstract

Feature selection is an essential data processing step to remove the irrelevant and redundant attributes for shorter learning time, better accuracy and better comprehensibility. A number of algorithms have been proposed in both data mining and machine learning area. These algorithms are usually used in single table environment, where data are stored in one relational table or one flat file. They are not suitable for multi-relational environment, where data are stored in multiple tables joined each other by semantic relationships. To solve this problem, in this paper we propose a novel approach called FARS to do both feature and relation selection for efficient multi-relational classification. By this approach, we not only extend traditional feature selection method to selects relevant features from multi-relations, but also develop a new method to reconstruct the multi-relational database schema and get rid of irrelevant tables to further improve classification performance. Results of experiments conducted on several real databases show that FARS can effectively choose a small set of relevant features, enhancing the classification efficiency significantly and improving prediction accuracy.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Almuallim, H., Dietterich, T.G.: Learning with Many Irrelevant Features. In: Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91), Anaheim, California, July 1991, vol. 2, pp. 547–552. AAAI Press, Menlo Park (1991)Google Scholar
  2. 2.
    Kira, K., Rendell, L.: The Feature Selection Problem: Traditional Methods and A New Algorithm. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 129–134. AAAI Press/The MIT Press, Menlo Park (1992)Google Scholar
  3. 3.
    Cardie, C.: Using Decision Trees to Improve Case–based Learning. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 25–32. Morgan Kaufmann Publishers, Inc., San Francisco (1993)Google Scholar
  4. 4.
    Langley, P.: Selection of Relevant Features in Machine Learning. In: Proceedings of the AAAI Fall Symposium on Relevance. AAAI Press, New Orleans (1994)Google Scholar
  5. 5.
    Kononenko, I.: Estimating attributes: Analysis and Extension of RELIEF. In: Proceedings of the European Conference on Machine Learning, Catania, Italy, pp. 171–182. Springer, Berlin (1994)Google Scholar
  6. 6.
    Caruana, R., Freitag, D.: Greedy attribute selection. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 180–189. Morgan Kaufmann, San Francisco (1994)Google Scholar
  7. 7.
    John, G., Kohavi, R., Pfleger, K.: Irrelevant Features and the Subset Selection Problem. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 121–129. Morgan Kaufmann, San Francisco (1994)Google Scholar
  8. 8.
    Aha, D., Bankert, R.: A Comparative Evaluation of Sequential Feature Selection Algorithms. In: Fisher, D., Lenz, H. (eds.) Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, pp. 1–7 (1995)Google Scholar
  9. 9.
    Koller, D., Sahami, M.: Toward Optimal Feature Selection. In: Proceedings of the Thirteenth International Conference on Machine Learning (1996)Google Scholar
  10. 10.
    Kohavi, R., John, G.: Wrappers for Feature Subset Selection. Artificial Intelligence, 273–324 (1997)Google Scholar
  11. 11.
    Blum, A., Langley, P.: Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence, 245–271 (1997)Google Scholar
  12. 12.
    Hall, M.: Correlation Based Feature Selection for Machine Learning. Doctoral dissertation, University of Waikato, Dept. of Computer Science (1999)Google Scholar
  13. 13.
    Hall, M.: Correlation-based Feature Selection for Discrete and Numeric class Machine Learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366 (2000)Google Scholar
  14. 14.
    Boz, O.: Feature Subsets Selection by Using Sorted Feature Relevance. In: Proc. Intl. Conf. on Machine Learning and Applications (June 2002)Google Scholar
  15. 15.
    Liu, H., Motoda, H., Yu, L.: Feature Selection with Selective Sampling. In: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 395–402 (2002b)Google Scholar
  16. 16.
    Yu, L., Liu, H.: Feature Selection for High-Dimensional Data: A Fast Correlation Based Filter Solution. In: 12th Int. Conf. on Machine Learning (ICML) (2003)Google Scholar
  17. 17.
    Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  18. 18.
    Yin, X., Han, J., Yang, J., Yu, P.S.: CrossMine: Efficient Classification across Multiple Database Relations. In: Proc. 2004 Int. Conf. on Data Engineering (ICDE 2004), Boston, MA (March 2004)Google Scholar
  19. 19.
  20. 20.
    Liu, H., Yin, X., Han, J.: An Efficient Multi-relational Naïve Bayesian Classifier Based on Semantic Relationship Graph. In: Proc. ACM-SIGKDD MRDM Workshop (2005)Google Scholar
  21. 21.
    Wang, R., Liu, H., Fang, M.: Research for the Relation between Learning Behaviors and Results. In: ICITM (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Bo Hu
    • 1
  • Hongyan Liu
    • 2
  • Jun He
    • 1
  • Xiaoyong Du
    • 1
  1. 1.Key Labs of Data Engineering and Knowledge Engineering, MOE, China Information SchoolRenmin University of ChinaBeijingChina
  2. 2.School of Economics and ManagementTsinghua UniversityBeijingChina

Personalised recommendations