Abstract
The accuracy of a Nearest Neighbor classifier depends heavily on the weight of each feature in its distance metric. In this paper, two new methods, FW-EBNA (Feature Weighting by Estimation of Bayesian Network Algorithm) and FW-EGNA (Feature Weighting by Estimation of Gaussian Network Algorithm), inspired by the Estimation of Distribution Algorithm (EDA) approach, are used together with a wrapper evaluation scheme to learn accurate feature weights for the Nearest Neighbor algorithm. While the FW-EBNA has a set of three possible discrete weights, the FW-EGNA works in a continuous range of weights. Both methods are compared in a set of natural and artificial domains with two sequential and one Genetic Algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Reference
Aha, D.W. (1992). Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies, 36:267ā287.
Aha, D.W. and Bankert, R.L. (1994). Feature selection for case-based classification of cloud types: An empirical comparison. In Proceedings of the AAAIā94 Workshop on Case-Based Reasoning, pages 106ā112.
Alpaydin, E. (1999). Combined 5x2cv F test for comparing supervised classifi-cation learning algorithms. Neural Computation, 11:1885ā1892.
BƤck, T. (1996). Evolutionary Algorithms in Theory and Practice. Oxford Uni-versity Press.
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and Regresion Trees. Wadsworth.
Buntine, W. (1991). Theory refinement in Bayesian networks. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, pages 52ā60.
Cardie, C. and Howe, N. (1997). Improving minority class prediction using case-specific feature weights. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 57ā65.
Crecy, R.H., Masand, B.M., Smith, S.J., and Waltz, D.L. (1992). Trading mips and memory for knowledge engineering. Communications of the ACM, 35:4864.
Dasarathy, B.V. (1991). Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press.
Etxeberria, R. and LarraƱaga, P. (1999). Global optimization with Bayesian networks. In II Symposium on Artificial Intelligence. CIMAF99. Special Session on Distributions and Evolutionary Optimization, pages 332ā339.
Friedman, N. and Yakhini, Z. (1996). On the sample complexity of learning Bayesian networks. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, pages 274ā282.
Geiger, D. and Heckerman, D. (1994). Learning Gaussian networks. Technical Report MST-TR-94ā10, Microsoft Advanced Technology Division, Microsoft Corporation, Seattle, Washington.
Grefenstette, J.J. (1986). Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics, 16(1):122ā128.
Howe, N. and Cardie, C. (1997). Examining locally varying weights for nearest neighbor algorithms. In Lecture Notes in Artificial Intelligence: Case-Based Reasoning Research and Development: Second International Conference on Case-Based Reasoning, pages 455ā466.
Inza, I., LarraƱaga, P., Etxeberria, R., and Sierra, B. (2000). Feature subset selection by Bayesian network-based optimization. Artificial Intelligence, 123(1ā2):157ā184.
Inza, I., LarraƱaga, P., and Sierra, B. (2001). Feature subset selection by estimation of distribution algorithms. In LarraƱaga, P. and Lozano, J.A., ed-310 Estimation of Distribution Algorithms
itors, Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic Publishers.
Kelly, J.D. and Davis, L. (1991). A hybrid genetic algorithm for classification. In Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pages 645ā650.
Kira, K. and Rendell, L.A. (1992). A practical approach to feature selection. In Proceedings of the Ninth International Conference on Machine Learning, pages 249ā256.
Kohavi, R. and John, G. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1ā2):273ā324.
Kohavi, R., Langley, P., and Yun, Y. (1997). The utility of feature weighting in nearest-neighbor algorithms. In European Conference on Machine Learning, page poster.
LarraƱaga, P., Etxeberria, R., Lozano, J.A., and PeƱa, J.M. (2000). Optimization in continuous domains by learning and simulation of Gaussian networks. In Proceedings of the Workshop in Optimization by Building and using Probabilistic Models. GECCO-2000, pages 201ā204.
Lowe, D. (1995). Similarity metric learning for a variable-kernel classifier. Neural Computation, 7:72ā85.
Murphy, P. (1995). UCI Repository of machine learning databases. University of California, Department of Information and Computer Science.
Ng, A.Y. (1997). Preventing āoverfittingā of cross-validation data. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 245ā253.
Papadimitrious, C.H. and Steiglitz, K. (1982). Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall.
Pelikan, M., Goldberg, D.E., and CantĆŗ-Paz, E. (1998). Linkage problem, distribution estimation, and Bayesian networks. Technical Report I11iGAL Report 98013, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory.
Punch, W.F., Goodman, E.D., Pei, M., Chia-Shun, L., Hovland, P., and En-body, R. (1993). Further research on feature selection and classification using genetic algorithms. In Proceedings of the International Conference on Genetic Algorithms, pages 557ā564.
Russell, S.J. and Norvig, P. (1995). Artificial Intelligence: A Modern Approach. Prentice-Hall.
Salzberg, S.L. (1991). A nearest hyperrectangle learning method. Machine Learning, 6:251ā276.
Scherf, M. and Brauer, W. (1997). Feature selection by means of a feature weighting approach. Technical Report FKI-221ā97, Forschungsberichte KĆ¼nstliche Intelligenz, Institut fĆ¼r Informatik, Technische UniversitƤt MĆ¼nchen, MĆ¼nchen, Germany.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 7:461ā464.
Skalak, D. (1994). Prototype and feature selection by sampling and random hill climbing algorithms. In Proceedings of the Eleventh International Conference on Machine Learning, pages 293ā301.
Stanfill, C. and Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM, 29:1213ā1228.
van den Bosch, A. and Daelemans, W. (1993). Data-oriented methods for grapheme-to-phoneme conversion. Technical Report 42, Tilburg University, Institute for Language Technology and Artificial Intelligence, Tilburg, The Netherlands.
Wettschereck, D., Aha, D.W., and Mohri, T. (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11:273ā314.
Wettschereck, D. and Dietterich, D. (1995). An experimental comparison of the nearest-neighbor and nearest-hyperrectangle algorithms. Machine Learning, 19:1ā25.
Wilson, R. and MartĆnez, T.R. (1996). Instance-based learning with genetically derived attribute weights. In Proceedings of the International Conference on Artificial Intelligence, Expert Systems and Neural Networks, pages 11ā14.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2002 Springer Science+Business Media New York
About this chapter
Cite this chapter
Inza, I., LarraƱaga, P., Sierra, B. (2002). Feature Weighting for Nearest Neighbor by Estimation of Distribution Algorithms. In: LarraƱaga, P., Lozano, J.A. (eds) Estimation of Distribution Algorithms. Genetic Algorithms and Evolutionary Computation, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1539-5_14
Download citation
DOI: https://doi.org/10.1007/978-1-4615-1539-5_14
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5604-2
Online ISBN: 978-1-4615-1539-5
eBook Packages: Springer Book Archive