Skip to main content

Feature Weighting for Nearest Neighbor by Estimation of Distribution Algorithms

  • Chapter
Estimation of Distribution Algorithms

Part of the book series: Genetic Algorithms and Evolutionary Computation ((GENA,volume 2))

  • 579 Accesses

Abstract

The accuracy of a Nearest Neighbor classifier depends heavily on the weight of each feature in its distance metric. In this paper, two new methods, FW-EBNA (Feature Weighting by Estimation of Bayesian Network Algorithm) and FW-EGNA (Feature Weighting by Estimation of Gaussian Network Algorithm), inspired by the Estimation of Distribution Algorithm (EDA) approach, are used together with a wrapper evaluation scheme to learn accurate feature weights for the Nearest Neighbor algorithm. While the FW-EBNA has a set of three possible discrete weights, the FW-EGNA works in a continuous range of weights. Both methods are compared in a set of natural and artificial domains with two sequential and one Genetic Algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Reference

  • Aha, D.W. (1992). Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies, 36:267ā€“287.

    ArticleĀ  Google ScholarĀ 

  • Aha, D.W. and Bankert, R.L. (1994). Feature selection for case-based classification of cloud types: An empirical comparison. In Proceedings of the AAAIā€™94 Workshop on Case-Based Reasoning, pages 106ā€“112.

    Google ScholarĀ 

  • Alpaydin, E. (1999). Combined 5x2cv F test for comparing supervised classifi-cation learning algorithms. Neural Computation, 11:1885ā€“1892.

    ArticleĀ  Google ScholarĀ 

  • BƤck, T. (1996). Evolutionary Algorithms in Theory and Practice. Oxford Uni-versity Press.

    Google ScholarĀ 

  • Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and Regresion Trees. Wadsworth.

    Google ScholarĀ 

  • Buntine, W. (1991). Theory refinement in Bayesian networks. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, pages 52ā€“60.

    Google ScholarĀ 

  • Cardie, C. and Howe, N. (1997). Improving minority class prediction using case-specific feature weights. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 57ā€“65.

    Google ScholarĀ 

  • Crecy, R.H., Masand, B.M., Smith, S.J., and Waltz, D.L. (1992). Trading mips and memory for knowledge engineering. Communications of the ACM, 35:4864.

    ArticleĀ  Google ScholarĀ 

  • Dasarathy, B.V. (1991). Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press.

    Google ScholarĀ 

  • Etxeberria, R. and LarraƱaga, P. (1999). Global optimization with Bayesian networks. In II Symposium on Artificial Intelligence. CIMAF99. Special Session on Distributions and Evolutionary Optimization, pages 332ā€“339.

    Google ScholarĀ 

  • Friedman, N. and Yakhini, Z. (1996). On the sample complexity of learning Bayesian networks. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, pages 274ā€“282.

    Google ScholarĀ 

  • Geiger, D. and Heckerman, D. (1994). Learning Gaussian networks. Technical Report MST-TR-94ā€“10, Microsoft Advanced Technology Division, Microsoft Corporation, Seattle, Washington.

    Google ScholarĀ 

  • Grefenstette, J.J. (1986). Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics, 16(1):122ā€“128.

    Google ScholarĀ 

  • Howe, N. and Cardie, C. (1997). Examining locally varying weights for nearest neighbor algorithms. In Lecture Notes in Artificial Intelligence: Case-Based Reasoning Research and Development: Second International Conference on Case-Based Reasoning, pages 455ā€“466.

    Google ScholarĀ 

  • Inza, I., LarraƱaga, P., Etxeberria, R., and Sierra, B. (2000). Feature subset selection by Bayesian network-based optimization. Artificial Intelligence, 123(1ā€“2):157ā€“184.

    ArticleĀ  MATHĀ  Google ScholarĀ 

  • Inza, I., LarraƱaga, P., and Sierra, B. (2001). Feature subset selection by estimation of distribution algorithms. In LarraƱaga, P. and Lozano, J.A., ed-310 Estimation of Distribution Algorithms

    Google ScholarĀ 

  • itors, Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic Publishers.

    Google ScholarĀ 

  • Kelly, J.D. and Davis, L. (1991). A hybrid genetic algorithm for classification. In Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pages 645ā€“650.

    Google ScholarĀ 

  • Kira, K. and Rendell, L.A. (1992). A practical approach to feature selection. In Proceedings of the Ninth International Conference on Machine Learning, pages 249ā€“256.

    Google ScholarĀ 

  • Kohavi, R. and John, G. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1ā€“2):273ā€“324.

    ArticleĀ  MATHĀ  Google ScholarĀ 

  • Kohavi, R., Langley, P., and Yun, Y. (1997). The utility of feature weighting in nearest-neighbor algorithms. In European Conference on Machine Learning, page poster.

    Google ScholarĀ 

  • LarraƱaga, P., Etxeberria, R., Lozano, J.A., and PeƱa, J.M. (2000). Optimization in continuous domains by learning and simulation of Gaussian networks. In Proceedings of the Workshop in Optimization by Building and using Probabilistic Models. GECCO-2000, pages 201ā€“204.

    Google ScholarĀ 

  • Lowe, D. (1995). Similarity metric learning for a variable-kernel classifier. Neural Computation, 7:72ā€“85.

    ArticleĀ  Google ScholarĀ 

  • Murphy, P. (1995). UCI Repository of machine learning databases. University of California, Department of Information and Computer Science.

    Google ScholarĀ 

  • Ng, A.Y. (1997). Preventing ā€œoverfittingā€ of cross-validation data. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 245ā€“253.

    Google ScholarĀ 

  • Papadimitrious, C.H. and Steiglitz, K. (1982). Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall.

    Google ScholarĀ 

  • Pelikan, M., Goldberg, D.E., and CantĆŗ-Paz, E. (1998). Linkage problem, distribution estimation, and Bayesian networks. Technical Report I11iGAL Report 98013, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory.

    Google ScholarĀ 

  • Punch, W.F., Goodman, E.D., Pei, M., Chia-Shun, L., Hovland, P., and En-body, R. (1993). Further research on feature selection and classification using genetic algorithms. In Proceedings of the International Conference on Genetic Algorithms, pages 557ā€“564.

    Google ScholarĀ 

  • Russell, S.J. and Norvig, P. (1995). Artificial Intelligence: A Modern Approach. Prentice-Hall.

    Google ScholarĀ 

  • Salzberg, S.L. (1991). A nearest hyperrectangle learning method. Machine Learning, 6:251ā€“276.

    Google ScholarĀ 

  • Scherf, M. and Brauer, W. (1997). Feature selection by means of a feature weighting approach. Technical Report FKI-221ā€“97, Forschungsberichte KĆ¼nstliche Intelligenz, Institut fĆ¼r Informatik, Technische UniversitƤt MĆ¼nchen, MĆ¼nchen, Germany.

    Google ScholarĀ 

  • Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 7:461ā€“464.

    ArticleĀ  Google ScholarĀ 

  • Skalak, D. (1994). Prototype and feature selection by sampling and random hill climbing algorithms. In Proceedings of the Eleventh International Conference on Machine Learning, pages 293ā€“301.

    Google ScholarĀ 

  • Stanfill, C. and Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM, 29:1213ā€“1228.

    ArticleĀ  Google ScholarĀ 

  • van den Bosch, A. and Daelemans, W. (1993). Data-oriented methods for grapheme-to-phoneme conversion. Technical Report 42, Tilburg University, Institute for Language Technology and Artificial Intelligence, Tilburg, The Netherlands.

    Google ScholarĀ 

  • Wettschereck, D., Aha, D.W., and Mohri, T. (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11:273ā€“314.

    ArticleĀ  Google ScholarĀ 

  • Wettschereck, D. and Dietterich, D. (1995). An experimental comparison of the nearest-neighbor and nearest-hyperrectangle algorithms. Machine Learning, 19:1ā€“25.

    Google ScholarĀ 

  • Wilson, R. and MartĆ­nez, T.R. (1996). Instance-based learning with genetically derived attribute weights. In Proceedings of the International Conference on Artificial Intelligence, Expert Systems and Neural Networks, pages 11ā€“14.

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Inza, I., LarraƱaga, P., Sierra, B. (2002). Feature Weighting for Nearest Neighbor by Estimation of Distribution Algorithms. In: LarraƱaga, P., Lozano, J.A. (eds) Estimation of Distribution Algorithms. Genetic Algorithms and Evolutionary Computation, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1539-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-1539-5_14

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5604-2

  • Online ISBN: 978-1-4615-1539-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics