Journal of Intelligent Information Systems

, Volume 35, Issue 2, pp 301–331 | Cite as

Noise reduction for instance-based learning with a local maximal margin approach

  • Nicola Segata
  • Enrico Blanzieri
  • Sarah Jane Delany
  • Pádraig Cunningham
Article

Abstract

To some extent the problem of noise reduction in machine learning has been finessed by the development of learning techniques that are noise-tolerant. However, it is difficult to make instance-based learning noise tolerant and noise reduction still plays an important role in k-nearest neighbour classification. There are also other motivations for noise reduction, for instance the elimination of noise may result in simpler models or data cleansing may be an end in itself. In this paper we present a novel approach to noise reduction based on local Support Vector Machines (LSVM) which brings the benefits of maximal margin classifiers to bear on noise reduction. This provides a more robust alternative to the majority rule on which almost all the existing noise reduction techniques are based. Roughly speaking, for each training example an SVM is trained on its neighbourhood and if the SVM classification for the central example disagrees with its actual class there is evidence in favour of removing it from the training set. We provide an empirical evaluation on 15 real datasets showing improved classification accuracy when using training data edited with our method as well as specific experiments regarding the spam filtering application domain. We present a further evaluation on two artificial datasets where we analyse two different types of noise (Gaussian feature noise and mislabelling noise) and the influence of different class densities. The conclusion is that LSVM noise reduction is significantly better than the other analysed algorithms for real datasets and for artificial datasets perturbed by Gaussian noise and in presence of uneven class densities.

Keywords

Noise reduction Editing techniques k-NN SVM Locality 

References

  1. Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6(1), 37–66.Google Scholar
  2. Angiulli, F. (2007). Fast nearest neighbor condensation for large data sets classification. IEEE Transactions on Knowledge and Data Engineering, 19(11), 1450–1464.CrossRefGoogle Scholar
  3. Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/~mlearn/MLRepository.html.
  4. Bakir, G. H., Bottou, L., & Weston, J. (2005). Breaking SVM complexity with cross-training. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems (vol. 17, pp. 81–88). Cambridge: MIT.Google Scholar
  5. Bello-Tomás, J. J., González-Calero, P. A., & Díaz-Agudo, B. (2004). JColibri: An object-oriented framework for building CBR systems. In Advances in case-based reasoning, 7th European conference, (ECCBR 2004), LNCS (vol. 3155, pp. 32–46). New York: Springer.CrossRefGoogle Scholar
  6. Beygelzimer, A., Kakade, S., & Langford, J. (2006). Cover trees for nearest neighbor. In 23rd international conference on machine learning (pp. 97–104).Google Scholar
  7. Blanzieri, E., & Bryl, A. (2007). Evaluation of the highest probability SVM nearest neighbor classifier with variable relative error cost. In Fourth conference on email and anti-spam, (CEAS 07). Mountain View, California.Google Scholar
  8. Blanzieri, E., & Melgani, F. (2006). An adaptive SVM nearest neighbor classifier for remotely sensed imagery. In IEEE international conference on geoscience and remote sensing symposium, (IGARSS 06) (pp. 3931–3934).Google Scholar
  9. Blanzieri, E., & Melgani, F. (2008). Nearest neighbor classification of remote sensing images with the maximal margin principle. IEEE Transactions on Geoscience and Remote Sensing, 46(6), 1804–1811.CrossRefGoogle Scholar
  10. Bottou, L., & Lin, C. (2007). Support vector machine solvers. In Large scale kernel machines (pp. 1–28).Google Scholar
  11. Bottou, L., & Vapnik, V. (1992). Local learning algorithms. Neural Computation, 4(6), 888–900.CrossRefGoogle Scholar
  12. Bottou, L., Cortes, C., Denker, J., Drucker, H., Guyon, I., Jackel L., et al. (1994). Comparison of classifier methods: A case study in handwritten digit recognition. In 12th IAPR international conference on pattern recognition (vol. 2).Google Scholar
  13. Brighton, H., & Mellish, C. (2002). Advances in instance selection for instance-based learning algorithms. Data Mining and Knowledge Discovery, 6(2), 153–172.MathSciNetCrossRefMATHGoogle Scholar
  14. Brodley, C. E. (1993). Addressing the selective superiority problem: Automatic algorithm/model class selection. In 10th international machine learning conference (ICML) (pp. 17–24). Amherst, MA.Google Scholar
  15. Cabailero, Y., Bello, R., Garcia, M. M., Pizano, Y., Joseph, S., & Lezcano, Y. (2005). Using rough sets to edit training set in k-NN Method. In 5th international conference on intelligent systems design and applications, (ISDA 05) (pp. 456–461).Google Scholar
  16. Cameron-Jones, R. M. (1995). Instance selection by encoding length heuristic with random mutation hill climbing. In 8th Australian joint conference on artificial intelligence (pp. 99–106).Google Scholar
  17. Cao, G., Shiu, S., & Wang, X. (2001). A fuzzy-rough approach for case base maintenance. In D. Aha & I. Watson (Eds.), Case-based reasoning research and development: 4th international conference on case-based reasoning (ICCBR 01), LNAI (vol. 2080, pp. 118–130).Google Scholar
  18. Cataltepe, Z., Abu-mostafa, Y. S., & Magdon-ismail, M. (1999). No free lunch for early stopping. Neural Computation, 11, 995–1009.CrossRefGoogle Scholar
  19. Chang, C. C., & Lin, C. J. (2001). LIBSVM: A library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  20. Chang, C. L. (1974). Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, C-23(11), 1179–1184.CrossRefMATHGoogle Scholar
  21. Chou, C. H., Kuo, B. H., & Chang, F. (2006). The generalized condensed nearest neighbor rule as a data reduction method. In 18th international conference on Pattern recognition (ICPR 06) (pp. 556–559). Washington, DC: IEEE Computer Society.CrossRefGoogle Scholar
  22. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.MATHGoogle Scholar
  23. Cunningham, P., Doyle, D., & Loughrey, J. (2003). An evaluation of the usefulness of case-based explanation. In 5th international conference on case-base reasoning (ICCBR 03) (pp. 122–130). New York: Springer.Google Scholar
  24. Delany, S. J., & Bridge, D. (2006). Textual case-based reasoning for spam filtering: A comparison of feature-based and feature-free approaches. Artificial Intelligence Review, 26(1–2), 75–87.CrossRefGoogle Scholar
  25. Delany, S. J., & Cunningham, P. (2004). An analysis of case-based editing in a spam filtering system. In P. Funk & P. González-Calero (Eds.), Advances in case-based reasoning, 7th European conference on case-based reasoning (ECCBR 2004), LNAI (vol. 3155, pp. 128–141). New York: Springer.Google Scholar
  26. Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.MathSciNetMATHGoogle Scholar
  27. Díaz-Agudo, B., González-Calero, P., Recio-García, J., & Sánchez, A. (2007). Building CBR systems with jCOLIBRI. Journal Science of Computer Programming, 69(1-3), 68–75. (special issue on Experimental Software and Toolkits).MathSciNetCrossRefMATHGoogle Scholar
  28. Dunn, O. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56, 52–64.MathSciNetCrossRefMATHGoogle Scholar
  29. Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32, 675–701.CrossRefMATHGoogle Scholar
  30. Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 11, 86–92.MathSciNetCrossRefMATHGoogle Scholar
  31. Gamberger, A., Lavrac, N., & Dzeroski, S. (2000). Noise detection and elimination in data preprocessing: Experiments in medical domains. Applied Artificial Intelligence, 14(2), 205–223.CrossRefGoogle Scholar
  32. Gates, G. (1972). The reduced nearest neighbor rule. IEEE Transactions on Information Theory, 18(3), 431–433.CrossRefGoogle Scholar
  33. Genton, M. G. (2001). Classes of kernels for machine learning: A statistics perspective. Journal of Machine Learning Research, 2, 299–312.MATHGoogle Scholar
  34. Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. ACM Sigmod Record, 14(2), 47–57.CrossRefGoogle Scholar
  35. Hao, X., Zhang, C., Xu, H., Tao, X., Wang, S., & Hu, Y. (2008). An improved condensing algorithm. In 7th IEEE/ACIS international conference on computer and information science, (ICIS 08) (pp. 316–321).Google Scholar
  36. Hart, P. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14(3), 515–516.CrossRefGoogle Scholar
  37. Hsu, C., & Lin, C. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425.CrossRefGoogle Scholar
  38. Huang, D., & Chow, T. W. S. (2005). Enhancing density-based data reduction using entropy. Neural Computation, 18(2), 470–495.CrossRefMATHGoogle Scholar
  39. Jiang, Y., & Zhou, Z. (2004). Editing training data for knn classifiers with neural network ensemble. In F. Yin, J. Wang, & Guo C. (Eds.), Advances in neural networks (ISNN 2004),LNCS (vol. 3173, pp. 356–361). New York: Springer.CrossRefGoogle Scholar
  40. Knerr, S., Personnaz, L., Dreyfus, G., Fogelman, J., Agresti, A., Ajiz, M., et al. (1990). Single-layer learning revisited: A stepwise procedure for building and training a neural network. Optimization Methods and Software, 1, 23–34.Google Scholar
  41. Koplowitz, J., & Brown, T. A. (1981). On the relation of performance to editing in nearest neighbor rules. Pattern Recognition, 13(3), 251–255.CrossRefGoogle Scholar
  42. Kressel, U., et al. (1999). Pairwise classification and support vector machines. In Advances in kernel methods: support vector learning (pp. 255–268).Google Scholar
  43. Leake, D. B. (1996). CBR in context: The present and future. In D. B. Leake (Ed.), Case based reasoning: Experiences, lessons, and future directions (pp. 3–30). Cambridge: MIT.Google Scholar
  44. Lee, Y., & Mangasarian, O. (2001). SSVM: A smooth support vector machine for classification. Computational Optimization and Applications, 20(1), 5–22.MathSciNetCrossRefMATHGoogle Scholar
  45. Li, R. L., & Hu, J. F. (2003). Noise reduction to text categorization based on density for KNN. In International conference on machine learning and cybernetics (vol. 5, pp. 3119–3124).Google Scholar
  46. Lin, H. T., Lin, C. J., & Weng, R. (2007). A note on Platt’s probabilistic outputs for support vector machines. Machine Learning, 68(3), 267–276.CrossRefGoogle Scholar
  47. Lorena, A. C., & Carvalho, A. (2004). Evaluation of noise reduction techniques in the splice junction recognition problem. Genetics and Molecular Biology, 27, 665–672.CrossRefGoogle Scholar
  48. Lowe, D. G. (1995). Similarity metric learning for a variable-kernel classifier. Neural Computation, 7(1), 72–85.CrossRefGoogle Scholar
  49. Malossini, A., Blanzieri, E., & Ng, R. T. (2006). Detecting potential labeling errors in microarrays by data perturbation. Bioinformatics, 22(17), 2114–2121.CrossRefGoogle Scholar
  50. McKenna, E., & Smyth, B. (2000). Competence-guided case-base editing techniques. In 5th European workshop on advances in case-based reasoning (ECCBR 00) (pp. 186–197). London: Springer.CrossRefGoogle Scholar
  51. Mitra, P., Murthy, C. A., & Pal, S. K. (2002). Density-based multiscale data condensation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(6), 734–747.CrossRefGoogle Scholar
  52. Nugent, C., Doyle, D., & Cunningham, P. (2008). Gaining insight through case-based explanation. Journal of Intelligent Information Systems, 32(3), 267–295.CrossRefGoogle Scholar
  53. Osuna, E., Freund, R., & Girosi, F. (1997). Support vector machines: Training and applications. Tech. rep. Cambridge: Massachusetts Institute of Technology.Google Scholar
  54. Pan, R., Yang, Q., & Pan, S. J. (2007). Mining competent case bases for case-based reasoning. Artificial Intelligence, 171(16-17), 1039–1068.MathSciNetCrossRefMATHGoogle Scholar
  55. Park, J., Im, K., Shin, C., & Park, S. (2004). MBNR: Case-based reasoning with local feature weighting by neural network. Applied Intelligence, 21(3), 265–276.CrossRefMATHGoogle Scholar
  56. Pawlak, Z. (1992). Rough sets: Theoretical aspects of reasoning about data. Norwell: Kluwer.MATHGoogle Scholar
  57. Pechenizkiy, M., Tsymbal, A., Puuronen, S., & Pechenizkiy, O. (2006). Class noise and supervised learning in medical domains: The effect of feature extraction. In 19th IEEE symposium on computer-based medical systems (CBMS 06) (pp. 708–713). Washington, DC: IEEE Computer Society.CrossRefGoogle Scholar
  58. Platt, J., Cristianini, N., & Shawe-Taylor, J. (2000). Large margin DAGs for multiclass classification. Advances in Neural Information Processing Systems, 12(3), 547–553.Google Scholar
  59. Platt, J. C. (1999a). Fast training of support vector machines using sequential minimal optimization (pp. 185–208). Cambridge: MIT.Google Scholar
  60. Platt, J. C. (1999b). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In P. J. Bartlett, B. Schölkopf, D. Schuurmans, & A. J. Smola (Eds.), Advances in large margin classifiers (pp. 61–74). Boston: MIT.Google Scholar
  61. Quinlan, J. (1986). The effect of noise on concept learning. Machine learning: An artificial intelligence approach, 2, 149–166.Google Scholar
  62. Ritter, G., Woodruff, H., Lowry, S., & Isenhour, T. (1975). An algorithm for a selective nearest neighbor decision rule. IEEE Transactions on Information Theory, 21(6), 665–669.CrossRefMATHGoogle Scholar
  63. Roth-Berghofer, T. (2004). Explanations and case-based reasoning: Foundational issues. In P. Funk & P. A. González-Calero (Eds.), Advances in case-based reasoning, 7th European conference on case-based reasoning, (ECCBR 04), LNCS (vol. 3155, pp. 389–403). New York: Springer.Google Scholar
  64. Salamó, M., & Golobardes, E. (2001). Rough sets reduction techniques for case-based reasoning. In D. W. Aha & I. Watson (Eds.), Case-based reasoning research and development, 4th international conference on case-based reasoning, (ICCBR 01), LNCS (vol. 2080, pp. 467–482). New York: Springer.Google Scholar
  65. Salamó, M., & Golobardes, E. (2002). Deleting and building sort out techniques for case base maintenance. In S. Craw & A. D. Preece (Eds.), Advances in case-based reasoning, 6th European conference on case-based reasoning, (ECCBR 02), LNCS (vol. 2416, pp. 365–379). New York: Springer.Google Scholar
  66. Salamó, M., & Golobardes, E. (2004). Global, local and mixed rough sets case base maintenance techniques. In 6th Catalan conference on artificial intelligence (pp. 127–134). Amsterdam: IOS.Google Scholar
  67. Sánchez, J. S., Barandela, R., Marqués, A. I., Alejo, R., & Badenas, J. (2003). Analysis of new techniques to obtain quality training sets. Pattern Recognition Letters, 24(7), 1015–1022.CrossRefGoogle Scholar
  68. Schölkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond (adaptive computation and machine learning). Cambridge: MIT.Google Scholar
  69. Segata, N. (2009). FaLKM-lib v1.0: A library for fast local kernel machines. Tech. Rep. DISI-09-025, DISI, University of Trento. Software, Available at http://disi.unitn.it/~segata/FaLKM-lib.
  70. Segata, N., & Blanzieri, E. (2007). Operators for transforming kernels into quasi-local kernels that improve SVM accuracy. Tech. Rep. DISI-08-009, DISI, University of Trento.Google Scholar
  71. Segata, N., & Blanzieri, E. (2009a). Empirical assessment of classification accuracy of local SVM. In The 18th annual Belgian-Dutch conference on machine learning (Benelearn 2009) (pp. 47–55).Google Scholar
  72. Segata, N., & Blanzieri, E. (2009b). Fast local support vector machines for large datasets. In 6th international conference on machine learning and data mining (MLDM 09), LNCS (vol. 5632, pp. 295–310). New York: Springer.Google Scholar
  73. Segata, N., Blanzieri, E., & Cunningham, P. (2009). A scalable noise reduction technique for large case-based systems. In 8th international conference on case-based reasoning (ICCBR 09), LNCS (vol. 5650, pp. 328–342). New York: Springer.Google Scholar
  74. Smyth, B., & Keane, M. (1995). Remembering to forget: A competence preserving case deletion policy for CBR system. In C. Mellish (Ed.), 14th international joint conference on artificial intelligence, (IJCAI 95) (pp. 337–382). San Francisco: Morgan Kaufmann.Google Scholar
  75. Sriperumbudur, B. K., & Lanckriet, G. (2007). Nearest neighbor prototyping for sparse and scalable support vector machines. Tech. rep., Dept. of ECE, UCSD.Google Scholar
  76. Tang, S., & Chen, S. P. (2008a). An effective data preprocessing mechanism of ultrasound image recognition. In 2nd international conference on bioinformatics and biomedical engineering, (ICBBE 08) (pp. 2708–2711).Google Scholar
  77. Tang, S., & Chen, S. P. (2008b). Data cleansing based on mathematic morphology. In 2nd international conference on bioinformatics and biomedical engineering, (ICBBE 08) (pp. 755–758).Google Scholar
  78. Tomek, I. (1976). An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man and Cybernetics, 6(6), 448–452.MathSciNetMATHGoogle Scholar
  79. Vapnik, V. (1993). Principles of risk minimization for learning theory. Advances in Neural Information Processing Systems, 4, 831–838.Google Scholar
  80. Vapnik, V. (1999). The nature of statistical learning theory (information science and statistics). New York: Springer.Google Scholar
  81. Wess, S., Althoff, K., & Derwand, G. (1994). Using kd trees to improve the retrieval step in case-based reasoning. In Topics in case-based reasoning: 1st European workshop (EWCBR 93): Selected papers (p. 167). New York: Springer.CrossRefGoogle Scholar
  82. Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1(6), 80–83.MathSciNetCrossRefGoogle Scholar
  83. Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Cybernetics, 2(3), 408–421.MathSciNetCrossRefMATHGoogle Scholar
  84. Wilson, D. R., & Martinez, T. R. (1997). Instance pruning techniques. In 14th international conference on machine learning (ICML 97) (pp. 403–411).Google Scholar
  85. Wilson, D. R., & Martinez, T. R. (2000). Reduction techniques for instance-based learning algorithms. Machine Learning, 38(3), 257–286.CrossRefMATHGoogle Scholar
  86. Zhang, J. (1992). Selecting typical instances in instance-based learning. In 9th international workshop on Machine learning (ML 92) (pp. 470–479). San Francisco: Morgan Kaufmann.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Nicola Segata
    • 1
  • Enrico Blanzieri
    • 1
  • Sarah Jane Delany
    • 2
  • Pádraig Cunningham
    • 3
  1. 1.Dipartimento di Ingegneria e Scienza dellInformazioneUniversity of TrentoTrentoItaly
  2. 2.Dublin Institute of TechnologyDublinIreland
  3. 3.Computer ScienceUniversity College DublinDublinIreland

Personalised recommendations