Data Characterization for Effective Prototype Selection
The Nearest Neighbor classifier is one of the most popular supervised classification methods. It is very simple, intuitive and accurate in a great variety of real-world applications. Despite its simplicity and effectiveness, practical use of this rule has been historically limited due to its high storage requirements and the computational costs involved, as well as the presence of outliers. In order to overcome these drawbacks, it is possible to employ a suitable prototype selection scheme, as a way of storage and computing time reduction and it usually provides some increase in classification accuracy. Nevertheless, in some practical cases prototype selection may even produce a degradation of the classifier effectiveness. From an empirical point of view, it is still difficult to know a priori when this method will provide an appropriate behavior. The present paper tries to predict how appropriate a prototype selection algorithm will result when applied to a particular problem, by characterizing data with a set of complexity measures.
KeywordsComplexity Measure Training Instance Lower Error Rate Neighbor Rule Prototype Selection
Unable to display preview. Download preview PDF.
- 8.Bernardo, E., Ho, T.-K.: On classifier domain of competence. In: Proc. 17th. Int. Conf. on Pattern Recognition 1, Cambridge, UK, pp. 136–139 (2004)Google Scholar