# Noise reduction for instance-based learning with a local maximal margin approach

## Abstract

To some extent the problem of noise reduction in machine learning has been finessed by the development of learning techniques that are noise-tolerant. However, it is difficult to make instance-based learning noise tolerant and noise reduction still plays an important role in *k*-nearest neighbour classification. There are also other motivations for noise reduction, for instance the elimination of noise may result in simpler models or data cleansing may be an end in itself. In this paper we present a novel approach to noise reduction based on local Support Vector Machines (LSVM) which brings the benefits of maximal margin classifiers to bear on noise reduction. This provides a more robust alternative to the majority rule on which almost all the existing noise reduction techniques are based. Roughly speaking, for each training example an SVM is trained on its neighbourhood and if the SVM classification for the central example disagrees with its actual class there is evidence in favour of removing it from the training set. We provide an empirical evaluation on 15 real datasets showing improved classification accuracy when using training data edited with our method as well as specific experiments regarding the spam filtering application domain. We present a further evaluation on two artificial datasets where we analyse two different types of noise (Gaussian feature noise and mislabelling noise) and the influence of different class densities. The conclusion is that LSVM noise reduction is significantly better than the other analysed algorithms for real datasets and for artificial datasets perturbed by Gaussian noise and in presence of uneven class densities.

### Keywords

Noise reduction Editing techniques*k*-NN SVM Locality

### References

- Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms.
*Machine Learning, 6*(1), 37–66.Google Scholar - Angiulli, F. (2007). Fast nearest neighbor condensation for large data sets classification.
*IEEE Transactions on Knowledge and Data Engineering, 19*(11), 1450–1464.CrossRefGoogle Scholar - Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/~mlearn/MLRepository.html.
- Bakir, G. H., Bottou, L., & Weston, J. (2005). Breaking SVM complexity with cross-training. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.),
*Advances in neural information processing systems*(vol. 17, pp. 81–88). Cambridge: MIT.Google Scholar - Bello-Tomás, J. J., González-Calero, P. A., & Díaz-Agudo, B. (2004). JColibri: An object-oriented framework for building CBR systems. In
*Advances in case-based reasoning, 7th European conference, (ECCBR 2004), LNCS*(vol. 3155, pp. 32–46). New York: Springer.CrossRefGoogle Scholar - Beygelzimer, A., Kakade, S., & Langford, J. (2006). Cover trees for nearest neighbor. In
*23rd international conference on machine learning*(pp. 97–104).Google Scholar - Blanzieri, E., & Bryl, A. (2007). Evaluation of the highest probability SVM nearest neighbor classifier with variable relative error cost. In
*Fourth conference on email and anti-spam, (CEAS 07)*. Mountain View, California.Google Scholar - Blanzieri, E., & Melgani, F. (2006). An adaptive SVM nearest neighbor classifier for remotely sensed imagery. In
*IEEE international conference on geoscience and remote sensing symposium, (IGARSS 06)*(pp. 3931–3934).Google Scholar - Blanzieri, E., & Melgani, F. (2008). Nearest neighbor classification of remote sensing images with the maximal margin principle.
*IEEE Transactions on Geoscience and Remote Sensing, 46*(6), 1804–1811.CrossRefGoogle Scholar - Bottou, L., & Lin, C. (2007). Support vector machine solvers. In
*Large scale kernel machines*(pp. 1–28).Google Scholar - Bottou, L., & Vapnik, V. (1992). Local learning algorithms.
*Neural Computation, 4*(6), 888–900.CrossRefGoogle Scholar - Bottou, L., Cortes, C., Denker, J., Drucker, H., Guyon, I., Jackel L., et al. (1994). Comparison of classifier methods: A case study in handwritten digit recognition. In
*12th IAPR international conference on pattern recognition*(vol. 2).Google Scholar - Brighton, H., & Mellish, C. (2002). Advances in instance selection for instance-based learning algorithms.
*Data Mining and Knowledge Discovery, 6*(2), 153–172.MathSciNetCrossRefMATHGoogle Scholar - Brodley, C. E. (1993). Addressing the selective superiority problem: Automatic algorithm/model class selection. In
*10th international machine learning conference (ICML)*(pp. 17–24). Amherst, MA.Google Scholar - Cabailero, Y., Bello, R., Garcia, M. M., Pizano, Y., Joseph, S., & Lezcano, Y. (2005). Using rough sets to edit training set in k-NN Method. In
*5th international conference on intelligent systems design and applications, (ISDA 05)*(pp. 456–461).Google Scholar - Cameron-Jones, R. M. (1995). Instance selection by encoding length heuristic with random mutation hill climbing. In
*8th Australian joint conference on artificial intelligence*(pp. 99–106).Google Scholar - Cao, G., Shiu, S., & Wang, X. (2001). A fuzzy-rough approach for case base maintenance. In D. Aha & I. Watson (Eds.),
*Case-based reasoning research and development: 4th international conference on case-based reasoning (ICCBR 01), LNAI*(vol. 2080, pp. 118–130).Google Scholar - Cataltepe, Z., Abu-mostafa, Y. S., & Magdon-ismail, M. (1999). No free lunch for early stopping.
*Neural Computation, 11*, 995–1009.CrossRefGoogle Scholar - Chang, C. C., & Lin, C. J. (2001).
*LIBSVM: A library for support vector machines*. http://www.csie.ntu.edu.tw/~cjlin/libsvm. - Chang, C. L. (1974). Finding prototypes for nearest neighbor classifiers.
*IEEE Transactions on Computers, C-23*(11), 1179–1184.CrossRefMATHGoogle Scholar - Chou, C. H., Kuo, B. H., & Chang, F. (2006). The generalized condensed nearest neighbor rule as a data reduction method. In
*18th international conference on Pattern recognition (ICPR 06)*(pp. 556–559). Washington, DC: IEEE Computer Society.CrossRefGoogle Scholar - Cortes, C., & Vapnik, V. (1995). Support-vector networks.
*Machine Learning, 20*(3), 273–297.MATHGoogle Scholar - Cunningham, P., Doyle, D., & Loughrey, J. (2003). An evaluation of the usefulness of case-based explanation. In
*5th international conference on case-base reasoning (ICCBR 03)*(pp. 122–130). New York: Springer.Google Scholar - Delany, S. J., & Bridge, D. (2006). Textual case-based reasoning for spam filtering: A comparison of feature-based and feature-free approaches.
*Artificial Intelligence Review, 26*(1–2), 75–87.CrossRefGoogle Scholar - Delany, S. J., & Cunningham, P. (2004). An analysis of case-based editing in a spam filtering system. In P. Funk & P. González-Calero (Eds.),
*Advances in case-based reasoning, 7th European conference on case-based reasoning (ECCBR 2004), LNAI*(vol. 3155, pp. 128–141). New York: Springer.Google Scholar - Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets.
*Journal of Machine Learning Research, 7*, 1–30.MathSciNetMATHGoogle Scholar - Díaz-Agudo, B., González-Calero, P., Recio-García, J., & Sánchez, A. (2007). Building CBR systems with jCOLIBRI.
*Journal Science of Computer Programming, 69*(1-3), 68–75. (special issue on Experimental Software and Toolkits).MathSciNetCrossRefMATHGoogle Scholar - Dunn, O. (1961). Multiple comparisons among means.
*Journal of the American Statistical Association, 56*, 52–64.MathSciNetCrossRefMATHGoogle Scholar - Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance.
*Journal of the American Statistical Association, 32*, 675–701.CrossRefMATHGoogle Scholar - Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings.
*The Annals of Mathematical Statistics, 11*, 86–92.MathSciNetCrossRefMATHGoogle Scholar - Gamberger, A., Lavrac, N., & Dzeroski, S. (2000). Noise detection and elimination in data preprocessing: Experiments in medical domains.
*Applied Artificial Intelligence, 14*(2), 205–223.CrossRefGoogle Scholar - Gates, G. (1972). The reduced nearest neighbor rule.
*IEEE Transactions on Information Theory, 18*(3), 431–433.CrossRefGoogle Scholar - Genton, M. G. (2001). Classes of kernels for machine learning: A statistics perspective.
*Journal of Machine Learning Research, 2*, 299–312.MATHGoogle Scholar - Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching.
*ACM Sigmod Record, 14*(2), 47–57.CrossRefGoogle Scholar - Hao, X., Zhang, C., Xu, H., Tao, X., Wang, S., & Hu, Y. (2008). An improved condensing algorithm. In
*7th IEEE/ACIS international conference on computer and information science, (ICIS 08)*(pp. 316–321).Google Scholar - Hart, P. (1968). The condensed nearest neighbor rule.
*IEEE Transactions on Information Theory, 14*(3), 515–516.CrossRefGoogle Scholar - Hsu, C., & Lin, C. (2002). A comparison of methods for multiclass support vector machines.
*IEEE Transactions on Neural Networks, 13*(2), 415–425.CrossRefGoogle Scholar - Huang, D., & Chow, T. W. S. (2005). Enhancing density-based data reduction using entropy.
*Neural Computation, 18*(2), 470–495.CrossRefMATHGoogle Scholar - Jiang, Y., & Zhou, Z. (2004). Editing training data for knn classifiers with neural network ensemble. In F. Yin, J. Wang, & Guo C. (Eds.),
*Advances in neural networks (ISNN 2004),LNCS*(vol. 3173, pp. 356–361). New York: Springer.CrossRefGoogle Scholar - Knerr, S., Personnaz, L., Dreyfus, G., Fogelman, J., Agresti, A., Ajiz, M., et al. (1990). Single-layer learning revisited: A stepwise procedure for building and training a neural network.
*Optimization Methods and Software, 1*, 23–34.Google Scholar - Koplowitz, J., & Brown, T. A. (1981). On the relation of performance to editing in nearest neighbor rules.
*Pattern Recognition, 13*(3), 251–255.CrossRefGoogle Scholar - Kressel, U., et al. (1999). Pairwise classification and support vector machines. In
*Advances in kernel methods: support vector learning*(pp. 255–268).Google Scholar - Leake, D. B. (1996). CBR in context: The present and future. In D. B. Leake (Ed.),
*Case based reasoning: Experiences, lessons, and future directions*(pp. 3–30). Cambridge: MIT.Google Scholar - Lee, Y., & Mangasarian, O. (2001). SSVM: A smooth support vector machine for classification.
*Computational Optimization and Applications, 20*(1), 5–22.MathSciNetCrossRefMATHGoogle Scholar - Li, R. L., & Hu, J. F. (2003). Noise reduction to text categorization based on density for KNN. In
*International conference on machine learning and cybernetics*(vol. 5, pp. 3119–3124).Google Scholar - Lin, H. T., Lin, C. J., & Weng, R. (2007). A note on Platt’s probabilistic outputs for support vector machines.
*Machine Learning, 68*(3), 267–276.CrossRefGoogle Scholar - Lorena, A. C., & Carvalho, A. (2004). Evaluation of noise reduction techniques in the splice junction recognition problem.
*Genetics and Molecular Biology, 27*, 665–672.CrossRefGoogle Scholar - Lowe, D. G. (1995). Similarity metric learning for a variable-kernel classifier.
*Neural Computation, 7*(1), 72–85.CrossRefGoogle Scholar - Malossini, A., Blanzieri, E., & Ng, R. T. (2006). Detecting potential labeling errors in microarrays by data perturbation.
*Bioinformatics, 22*(17), 2114–2121.CrossRefGoogle Scholar - McKenna, E., & Smyth, B. (2000). Competence-guided case-base editing techniques. In
*5th European workshop on advances in case-based reasoning (ECCBR 00)*(pp. 186–197). London: Springer.CrossRefGoogle Scholar - Mitra, P., Murthy, C. A., & Pal, S. K. (2002). Density-based multiscale data condensation.
*IEEE Transactions on Pattern Analysis and Machine Intelligence, 24*(6), 734–747.CrossRefGoogle Scholar - Nugent, C., Doyle, D., & Cunningham, P. (2008). Gaining insight through case-based explanation.
*Journal of Intelligent Information Systems, 32*(3), 267–295.CrossRefGoogle Scholar - Osuna, E., Freund, R., & Girosi, F. (1997).
*Support vector machines: Training and applications*. Tech. rep. Cambridge: Massachusetts Institute of Technology.Google Scholar - Pan, R., Yang, Q., & Pan, S. J. (2007). Mining competent case bases for case-based reasoning.
*Artificial Intelligence, 171*(16-17), 1039–1068.MathSciNetCrossRefMATHGoogle Scholar - Park, J., Im, K., Shin, C., & Park, S. (2004). MBNR: Case-based reasoning with local feature weighting by neural network.
*Applied Intelligence, 21*(3), 265–276.CrossRefMATHGoogle Scholar - Pawlak, Z. (1992).
*Rough sets: Theoretical aspects of reasoning about data*. Norwell: Kluwer.MATHGoogle Scholar - Pechenizkiy, M., Tsymbal, A., Puuronen, S., & Pechenizkiy, O. (2006). Class noise and supervised learning in medical domains: The effect of feature extraction. In
*19th IEEE symposium on computer-based medical systems (CBMS 06)*(pp. 708–713). Washington, DC: IEEE Computer Society.CrossRefGoogle Scholar - Platt, J., Cristianini, N., & Shawe-Taylor, J. (2000). Large margin DAGs for multiclass classification.
*Advances in Neural Information Processing Systems, 12*(3), 547–553.Google Scholar - Platt, J. C. (1999a).
*Fast training of support vector machines using sequential minimal optimization*(pp. 185–208). Cambridge: MIT.Google Scholar - Platt, J. C. (1999b). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In P. J. Bartlett, B. Schölkopf, D. Schuurmans, & A. J. Smola (Eds.),
*Advances in large margin classifiers*(pp. 61–74). Boston: MIT.Google Scholar - Quinlan, J. (1986). The effect of noise on concept learning.
*Machine learning: An artificial intelligence approach, 2*, 149–166.Google Scholar - Ritter, G., Woodruff, H., Lowry, S., & Isenhour, T. (1975). An algorithm for a selective nearest neighbor decision rule.
*IEEE Transactions on Information Theory, 21*(6), 665–669.CrossRefMATHGoogle Scholar - Roth-Berghofer, T. (2004). Explanations and case-based reasoning: Foundational issues. In P. Funk & P. A. González-Calero (Eds.),
*Advances in case-based reasoning, 7th European conference on case-based reasoning, (ECCBR 04), LNCS*(vol. 3155, pp. 389–403). New York: Springer.Google Scholar - Salamó, M., & Golobardes, E. (2001). Rough sets reduction techniques for case-based reasoning. In D. W. Aha & I. Watson (Eds.),
*Case-based reasoning research and development, 4th international conference on case-based reasoning, (ICCBR 01), LNCS*(vol. 2080, pp. 467–482). New York: Springer.Google Scholar - Salamó, M., & Golobardes, E. (2002). Deleting and building sort out techniques for case base maintenance. In S. Craw & A. D. Preece (Eds.),
*Advances in case-based reasoning, 6th European conference on case-based reasoning, (ECCBR 02), LNCS*(vol. 2416, pp. 365–379). New York: Springer.Google Scholar - Salamó, M., & Golobardes, E. (2004). Global, local and mixed rough sets case base maintenance techniques. In
*6th Catalan conference on artificial intelligence*(pp. 127–134). Amsterdam: IOS.Google Scholar - Sánchez, J. S., Barandela, R., Marqués, A. I., Alejo, R., & Badenas, J. (2003). Analysis of new techniques to obtain quality training sets.
*Pattern Recognition Letters, 24*(7), 1015–1022.CrossRefGoogle Scholar - Schölkopf, B., & Smola, A. J. (2001).
*Learning with kernels: Support vector machines, regularization, optimization, and beyond (adaptive computation and machine learning)*. Cambridge: MIT.Google Scholar - Segata, N. (2009).
*FaLKM-lib v1.0: A library for fast local kernel machines*. Tech. Rep. DISI-09-025, DISI, University of Trento. Software, Available at http://disi.unitn.it/~segata/FaLKM-lib. - Segata, N., & Blanzieri, E. (2007).
*Operators for transforming kernels into quasi-local kernels that improve SVM accuracy*. Tech. Rep. DISI-08-009, DISI, University of Trento.Google Scholar - Segata, N., & Blanzieri, E. (2009a). Empirical assessment of classification accuracy of local SVM. In
*The 18th annual Belgian-Dutch conference on machine learning (Benelearn 2009)*(pp. 47–55).Google Scholar - Segata, N., & Blanzieri, E. (2009b). Fast local support vector machines for large datasets. In
*6th international conference on machine learning and data mining (MLDM 09), LNCS*(vol. 5632, pp. 295–310). New York: Springer.Google Scholar - Segata, N., Blanzieri, E., & Cunningham, P. (2009). A scalable noise reduction technique for large case-based systems. In
*8th international conference on case-based reasoning (ICCBR 09), LNCS*(vol. 5650, pp. 328–342). New York: Springer.Google Scholar - Smyth, B., & Keane, M. (1995). Remembering to forget: A competence preserving case deletion policy for CBR system. In C. Mellish (Ed.),
*14th international joint conference on artificial intelligence, (IJCAI 95)*(pp. 337–382). San Francisco: Morgan Kaufmann.Google Scholar - Sriperumbudur, B. K., & Lanckriet, G. (2007).
*Nearest neighbor prototyping for sparse and scalable support vector machines*. Tech. rep., Dept. of ECE, UCSD.Google Scholar - Tang, S., & Chen, S. P. (2008a). An effective data preprocessing mechanism of ultrasound image recognition. In
*2nd international conference on bioinformatics and biomedical engineering, (ICBBE 08)*(pp. 2708–2711).Google Scholar - Tang, S., & Chen, S. P. (2008b). Data cleansing based on mathematic morphology. In
*2nd international conference on bioinformatics and biomedical engineering, (ICBBE 08)*(pp. 755–758).Google Scholar - Tomek, I. (1976). An experiment with the edited nearest-neighbor rule.
*IEEE Transactions on Systems, Man and Cybernetics, 6*(6), 448–452.MathSciNetMATHGoogle Scholar - Vapnik, V. (1993). Principles of risk minimization for learning theory.
*Advances in Neural Information Processing Systems, 4*, 831–838.Google Scholar - Vapnik, V. (1999).
*The nature of statistical learning theory (information science and statistics)*. New York: Springer.Google Scholar - Wess, S., Althoff, K., & Derwand, G. (1994). Using kd trees to improve the retrieval step in case-based reasoning. In
*Topics in case-based reasoning: 1st European workshop (EWCBR 93): Selected papers*(p. 167). New York: Springer.CrossRefGoogle Scholar - Wilcoxon, F. (1945). Individual comparisons by ranking methods.
*Biometrics, 1*(6), 80–83.MathSciNetCrossRefGoogle Scholar - Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data.
*IEEE Transactions on Systems, Man and Cybernetics, 2*(3), 408–421.MathSciNetCrossRefMATHGoogle Scholar - Wilson, D. R., & Martinez, T. R. (1997). Instance pruning techniques. In
*14th international conference on machine learning (ICML 97)*(pp. 403–411).Google Scholar - Wilson, D. R., & Martinez, T. R. (2000). Reduction techniques for instance-based learning algorithms.
*Machine Learning, 38*(3), 257–286.CrossRefMATHGoogle Scholar - Zhang, J. (1992). Selecting typical instances in instance-based learning. In
*9th international workshop on Machine learning (ML 92)*(pp. 470–479). San Francisco: Morgan Kaufmann.Google Scholar