Machine Learning

, Volume 53, Issue 1–2, pp 23–69

Theoretical and Empirical Analysis of ReliefF and RReliefF

  • Marko Robnik-Šikonja
  • Igor Kononenko
Article

Abstract

Relief algorithms are general and successful attribute estimators. They are able to detect conditional dependencies between attributes and provide a unified view on the attribute estimation in regression and classification. In addition, their quality estimates have a natural interpretation. While they have commonly been viewed as feature subset selection methods that are applied in prepossessing step before a model is learned, they have actually been used successfully in a variety of settings, e.g., to select splits or to guide constructive induction in the building phase of decision or regression tree learning, as the attribute weighting method and also in the inductive logic programming.

A broad spectrum of successful uses calls for especially careful investigation of various features Relief algorithms have. In this paper we theoretically and empirically investigate and discuss how and why they work, their theoretical and practical properties, their parameters, what kind of dependencies they detect, how do they scale up to large number of examples and features, how to sample data for them, how robust are they regarding the noise, how irrelevant and redundant attributes influence their output and how different metrics influences them.

attribute evaluation feature selection Relief algorithm classification regression 

References

  1. Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 15:9, 509-517.Google Scholar
  2. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Belmont, California: Wadsworth Inc.Google Scholar
  3. Brodley, C. E. (1995). Automatic selection of split criterion during tree growing based on node location. In Machine Learning: Proceedings of the Twelfth International Conference (ICML'95) (pp. 73-80). Morgan Kaufmann.Google Scholar
  4. Cestnik, B., Kononenko, I., & Bratko, I. (1987). ASSISTANT 86: A knowledge-elicitation tool for sophisticated users. In I. Bratko, & N. Lavrač (Eds.), Progress in Machine Learning, Proceedings of European Working Session on Learning EWSL'87 (pp. 31-36). Wilmslow: Sigma Press.Google Scholar
  5. Dalaka, A., Kompare, B., Robnik-Šikonja, M., & Sgardelis, S. (2000). Modeling the effects of environmental conditions on apparent photosynthesis of Stipa bromoides by machine learning tools. Ecological Modelling, 129, 245-257.Google Scholar
  6. Deng, K., & Moore, A. W. (1995). Multiresolution instance-based learning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI'95) (pp. 1233-1239). Morgan Kaufmann.Google Scholar
  7. Dietterich, T. G. (1997). Machine learning research: Four current directions. AI Magazine, 18:4, 97-136.Google Scholar
  8. Domingos, P. (1997). Context-sensitive feature selection for lazy learners. Artificial Intelligence Review, 11, 227-253.Google Scholar
  9. Friedman, J. H., Bentley, J. L., & Finkel, R. A. (1975). An algorithm for finding best matches in logarithmic expected time. Technical Report STAN-CS-75-482, Stanford University.Google Scholar
  10. Hong, S. J. (1994). Use of contextual information for feature ranking and discretization.Technical Report RC19664, IBM.Google Scholar
  11. Hong, S. J. (1997). Use of contextual information for feature ranking and discretization. IEEE Transactions on Knowledge and Data Engineering, 9:5, 718-730.Google Scholar
  12. Hunt, E. B., Martin, J., & Stone, P. J. (1966). Experiments in Induction. New York: Academic Press.Google Scholar
  13. Jovanoski, V., & Lavrač, N. (1999). Feature subset selection in association rules learning systems. In M. Grobelnik, & D. Mladenič (Eds.), Prooceedings of the Conference Analysis, Warehousing and Mining the Data (AWAMIDA'99) (pp. 74-77).Google Scholar
  14. Kira, K., & Rendell, L. A. (1992a). The feature selection problem: Traditional methods and new algorithm. In Proceedings of AAAI'92.Google Scholar
  15. Kira, K., & Rendell, L. A. (1992b). A practical approach to feature selection. In D. Sleeman, & P. Edwards (Eds.), Machine Learning: Proceedings of International Conference (ICML'92) (pp. 249-256). Morgan Kaufmann.Google Scholar
  16. Kononenko, I. (1994). Estimating attributes: Analysis and extensions of Relief. In L. De Raedt, & F. Bergadano (Eds.), Machine Learning: ECML-94 (pp. 171-182). Springer Verlag.Google Scholar
  17. Kononenko, I. (1995). On biases in estimating multi-valued attributes. In Proceedings of the International Joint Conference on Aartificial Intelligence (IJCAI'95) (pp. 1034-1040). Morgan Kaufmann.Google Scholar
  18. Kononenko, I., & Šimec, E. (1995). Induction of decision trees using reliefF. In G. Della Riccia, R. Kruse, & R. Viertl (Eds.), Mathematical and Statistical Methods in Artificial Intelligence, CISM Courses and Lectures No. 363. Springer Verlag.Google Scholar
  19. Kononenko, I., Šimec, E., & Robnik-Šikonja, M. (1997). Overcoming the myopia of inductive learning algorithms with RELIEFF. Applied Intelligence, 7, 39-55.Google Scholar
  20. Kukar, M., Kononenko, I., Grošelj, C., Kralj, K., & Fettich, J. (1999). Analysing and improving the diagnosis of ischaemic heart disease with machine learning. Artificial Intelligence in Medicine, 16, 25-50.Google Scholar
  21. Lubinsky, D. J. (1995). Increasing the performance and consistency of classification trees by using the accuracy criterion at the leaves. In Machine Learning: Proceedings of the Twelfth International Conference (ICML'95) (pp. 371-377). Morgan Kaufmann.Google Scholar
  22. Mantaras, R. L. (1989). ID3 revisited: A distance based criterion for attribute selection. In Proceedings of Int. Symp. Methodologies for Intelligent Systems. Charlotte, North Carolina, USA.Google Scholar
  23. Moore, A. W., Schneider, J., & Deng, K. (1997). Efficient locally weighted polynomial regression predictions. In D. H. Fisher (Ed.), Machine Learning: Proceedings of the Fourteenth International Conference (ICML'97 (pp. 236-244). Morgan Kaufmann.Google Scholar
  24. Murphy, P. M., & Aha, D. W. (1995) UCI repository of machine learning databases. http://www.ics.uci.edu/ mlearn/MLRepository.html.Google Scholar
  25. Perèz, E., & Rendell, L. A. (1996). Learning despite concept variation by finding structure in attribute-based data. In Machine Learning: Proceedings of the Thirteenth International Conference (ICML'96) (pp. 391-399).Google Scholar
  26. Pompe, U., & Kononenko, I. (1995). Linear space induction in first order logic with ReliefF. InG. Della Riccia, R. Kruse, & R. Viertl (Eds.), Mathematical and Statistical Methods in Artificial Intelligence. CISM Courses and Lectures No. 363. Springer Verlag.Google Scholar
  27. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:1, 81-106.Google Scholar
  28. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.Google Scholar
  29. Rendell, L. A., & Seshu, R. (1990). Learning hard concepts through constructive induction: Framework and rationale. Computational Intelligence, 6, 247-270.Google Scholar
  30. Ricci, F., & Avesani, P. (1995). Learning a local similarity metric for case-based reasoning. In Proceedings of the International Conference on Case-Based Reasoning (ICCBR-95). Sesimbra, Portugal.Google Scholar
  31. Robnik, M. (1995). Constructive induction in machine learning. Electrotehnical Review, 62:1, 43-49. (in Slovene).Google Scholar
  32. Robnik Šikonja, M. (1998). Speeding up relief algorithm with k-d trees. In Proceedings of Electrotehnical and Computer Science Conference (ERK'98) (pp. B:137-140). Portorož, Slovenia.Google Scholar
  33. Robnik Šikonja, M., & Kononenko, I. (1996). Context sensitive attribute estimation in regression. In M. Kubat, & G. Widmer (Eds.), Proceedings of ICML'96 Workshop on Learning in Context Sensitive Domains (pp. 43-52). Morgan Kaufmann.Google Scholar
  34. Robnik Šikonja, M., & Kononenko, I. (1997). An adaptation of relief for attribute estimation in regression. In D. H. Fisher (Ed.), Machine Learning: Proceedings of the Fourteenth International Conference (ICML'97) (pp. 296-304). Morgan Kaufmann.Google Scholar
  35. Robnik Šikonja, M., & Kononenko, I. (1999). Attribute dependencies, understandability and split selection in tree based models. In I. Bratko, & S. Džeroski (Eds.), Machine Learning: Proceedings of the Sixteenth International Conference (ICML'99) (pp. 344-353). Morgan Kaufmann.Google Scholar
  36. Sefgewick, R. (1990). Algorithms in C. Addison-Wesley.Google Scholar
  37. Smyth, P., & Goodman, R. M. (1990). Rule induction using information theory. In G. Piatetsky-Shapiro, & W. J. Frawley (Eds.), Knowledge Discovery in Databases. MIT Press.Google Scholar
  38. Thrun, S. B., Bala, J. W., Bloedorn, E., Bratko, I., Cestnik, B., Cheng, J., De Jong, K., Džeroski, S., Fahlman, S. E., Fisher, D. H., Hamann, R., Kaufman, K. A., Keller, S. F., Kononenko, I., Kreuziger, J., Michalski, R. S., Mitchell, T., Pachowicz, P. W., Reich, Y., Vafaie, H., Van de Welde, W., Wenzel, W., Wnek, J., & Zhang, J. (1991). The MONK's problems-A performance comparison of different learning algorithms. Technical Report CS-CMU-91-197, Carnegie Mellon University.Google Scholar
  39. Vilalta, R. (1999). Understanding accuracy performance through concept characterization and algorithm analysis. In Proceedings of the ICML-99 Workshop on Recent Advances in Meta-Learning and Future Work (pp. 3-9).Google Scholar
  40. Wettschereck, D., Aha, D.W.,& Mohri, T. (1997).Are view and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11, 273-314.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Marko Robnik-Šikonja
    • 1
  • Igor Kononenko
    • 1
  1. 1.Faculty of Computer and Information ScienceUniversity of LjubljanaLjubljanaSlovenia

Personalised recommendations