Decontamination of Training Samples for Supervised Pattern Recognition Methods

  • Ricardo Barandela
  • Eduardo Gasca
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1876)


The present work discusses what have been called’ imperfectly supervised situations’: pattern recognition applications where the assumption of label correctness does not hold for all the elements of the training sample. A methodology for contending with these practical situations and to avoid their negative impact on the performance of supervised methods is presented. This methodology can be regarded as a cleaning process removing some suspicious instances of the training sample or correcting the class labels of some others while retaining them. It has been conceived for doing classification with the Nearest Neighbor rule, a supervised nonparametric classifier that combines conceptual simplicity and an asymptotic error rate bounded in terms of the optimal Bayes error. However, initial experiments concerning the learning phase of a Multilayer Perceptron (not reported in the present work) seem to indicate a broader applicability. Results with both simulated and real data sets are presented to support the methodology and to clarify the ideas behind it. Related works are briefly reviewed and some issues deserving further research are also exposed.


Supervised methods Nearest neighbor classifier learning depuration methodology generalized edition 


  1. 1.
    Baker, J.R., S.A. Briggs, V. Gordon, A.R. Jones, J.J. Settle, J. Townsheed and B.K. Wyatt (1991). Advances in classification for land cover mapping using SPOT HRV imagery, Int. J. Remote Sensing, 12(5), 1071–1085.CrossRefGoogle Scholar
  2. 2.
    Barandela, R. (1987). The NN rule: an empirical study of its methodological aspects. Unpublished Doctoral Thesis, Berlin.Google Scholar
  3. 3.
    -. (1990a). La regla NN con muestras de entrenamiento no balanceadas. Investigacion Operacional, X(1), 45–56.Google Scholar
  4. 4.
    -. (1990b). Metodos de reconocimiento de patrones en la solucion de tareas geologogeofisicas. Ciencias de la Tierra y el Espacio, 19, 1–7.Google Scholar
  5. 5.
    -. (1995). Una metodologia para el reconocimiento de patrones en tareas geologogeofisicas. Geofisica Internacional, 34(4), 399–405.Google Scholar
  6. 6.
    -. (in press). La practica de la clasificacion con la regla NN. Editorial Ciencia y Tecnica, La Habana.Google Scholar
  7. 7.
    Barandela, R. and E. Castellanos (1996). La regla NN para la interpretacion de imágenes de percepcion remota. Tercer Taller Iberoamericano Geociencias e Informatica, La Habana.Google Scholar
  8. 8.
    Bolstad, P.V. and T.M. Lillesand (1991). Semi-automated training approaches for spectral class definition. Int. J. Remote Sensing, 13(16), 3157–3168.CrossRefGoogle Scholar
  9. 9.
    Brodley, C.E. and M.A. Friedl (1996). Identifying and eliminating mislabed training instances. AAAI-96 Proc. of the Thirteenth Nat. Conf. On Artificial Intelligence, AAAI Press.Google Scholar
  10. 10.
    Buchheim, M.P. and T.M. Lillesand (1989). Semi-automated training field extraction and analysis for efficient digital image classification. Phot. Eng. & Rem. Sensing, 55(9), 1347–1355.Google Scholar
  11. 11.
    Chitinenni, C.B. (1979). Learning with imperfectly labeled patterns. Proc. Conf. on Pattern Recognition and Image Processing, Chicago.Google Scholar
  12. 12.
    Dasarathy, B.V. (Ed.) (1990). Nearest Neighbor Norms: NN Pattern classification techniques. IEEE Computer Soc. Press, Los Alamos, California.Google Scholar
  13. 13.
    -. (1993). Is your Near Enough Neighbor friendly enough? Recognition in Partially Exposed Fuzzy Learning Environments. Proc. North American Fuzzy Information Processing Society.Google Scholar
  14. 14.
    Dasarathy, B.V. and B.V. Sheela (1977). Visiting Nearest Neighbors: a survey of Nearest Neighbors classification techniques. Proc. Int. Conf. Cybernetics and Society, Copenhaguen.Google Scholar
  15. 15.
    Denouex, T. (1995). A k-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Trans, on Systems, Man and Cybernetics, 25, 5, 804–813.CrossRefGoogle Scholar
  16. 16.
    Devijver, P.A. and J. Kittler (1982). Pattern Recognition-a statistical approach. Prentice Hall, London.zbMATHGoogle Scholar
  17. 17.
    Foody, G.M. (1990). Directed ground survey for improved Maximum Likelihood classification of remotely sensed data. Int. J. Remote Sensing, 11(10), 1935–1940.CrossRefGoogle Scholar
  18. 18.
    Foody, G.M., N.A. Campbell, N.M. Trodd and T.D. Wood (1992). Derivation and application of probabilistic measures of class membership from the maximum likelihood classification. Phot. Eng. & Rem. Sensing, 58(9), 1335–1341.Google Scholar
  19. 19.
    Gopal S. and C. Woodcock (1994). Theory and methods for accuracy assessment of thematic maps using fuzzy sets. Phot. Eng. & Rem. Sensing, 60(2), 181–188.Google Scholar
  20. 20.
    Gopalakrishnan, M., V. Sridhar and H. Krishnamurthy (1995). Some applications of clustering in the design of neural networks. Pattern Recognition Letters, 16, 59–65.CrossRefGoogle Scholar
  21. 21.
    Gowda, K.C. and G. Krishna (1979). Learning with a mutualistic teacher. Pattern Recognition, 11, 387–390.CrossRefGoogle Scholar
  22. 22.
    Guha, S., R. Rastogi and K. Shim (1998). CURE: An efficient clustering algorithm for large databases. ACM-SIGMOD Int. Conf. On Management of Data, Seattle, Washington.Google Scholar
  23. 23.
    Hand, D.J. (1997). Construction and assessment of classification rules. John Wiley & Sons, Chichester.Google Scholar
  24. 24.
    Hardin, P.J. (1994). Parametric and Nearest Neighbor methods for hybrid classification: a comparison of pixel assignment accuracy. Phot. Eng. & Rem. Sensing, 60(12), 1439–1448.Google Scholar
  25. 25.
    Hardin, P.J. and C.N. Thomson (1992). Fast nearest neighbor classification methods for multispectral imagery. The Professional Geographer, 44(2), 191–201.CrossRefGoogle Scholar
  26. 26.
    Huang, Y.S., K. Liu and C.Y. Suan (1995). A new method of optimizing prototypes for nearest neighbor classifiers using a multi-layer network. Pattern Recognition Letters, 16, 77–82.CrossRefGoogle Scholar
  27. 27.
    Hung, C.C. (1993). Competitive learning networks for unsupervised training. Int. J. Remote Sensing, 14(12), 2411–2415.CrossRefGoogle Scholar
  28. 28.
    John, G.H. (1997). Enhancements to the Data Mining Process. PhD Thesis, Stanford University.Google Scholar
  29. 29.
    Kershaw, C.D. and R.M. Fuller (1992). Statistical problems in the discrimination of land cover from satellite images: a case study in Lowland Britain. Int. J. Remote Sensing, 13(16), 3085–3104.CrossRefGoogle Scholar
  30. 30.
    Kharim, Y. and E. Zhuk (1998). Filtering of multivariate samples containing’ outliers’ for clustering. Pattern Recognition Letters, 19, 1077–1085.CrossRefGoogle Scholar
  31. 31.
    Koplowitz, J. and T.A. Brown (1978). On the relation of performance to editing in nearest neighbor rules. Proc. 4th Int. Joint Conf. on Pattern Recognition, Japan.Google Scholar
  32. 32.
    Mather, P.M. (1999). Computer processing of remotely sensed images-an introduction. Wiley and Sons, Chichester, second edition.Google Scholar
  33. 33.
    Muzzolini, R., Y.H. Yang and R. Pierson (1998). Classifier design with incomplete knowledge. Pattern Recognition, 31, 4, 345–369.CrossRefGoogle Scholar
  34. 34.
    Ritter, G. and M.T. Gallegos (1997). Outliers in statistical pattern recognition and an application to automatic chromosome classification. Pattern Recognition Letters, 18, 525–539.CrossRefGoogle Scholar
  35. 35.
    Rodriguez, M. and R. Barandela (1989). Aplicacion de algunas tecnicas de reconocimiento de patrones en la caracterizacion estratigrafica del yacimiento Varadero. Serie Geologica, 2, 29–38.Google Scholar
  36. 36.
    Sanchez, J.S., F. Pla and F. Ferri (1997). Prototype selection for the Nearest Neighbor rule through Proximity Graphs. Pattern Recognition Letters, 18, 6, 507–513.CrossRefGoogle Scholar
  37. 37.
    Tomek, I. (1976). An experiment with the Edited Nearest Neighbor rule. IEEE Trans. Syst., Man and Cyb., SMC-6, 448–452.MathSciNetGoogle Scholar
  38. 38.
    Urahama, K. and Y. Furukawa (1995). Gradient descent learning of nearest neighbor classifiers with outlier rejection. Pattern Recognition, 28, 5, 761–768.CrossRefGoogle Scholar
  39. 39.
    Valladares, S. (1986). Metodologia para la evaluacion de los colectores y sus propiedades en las rocas pertenecientes al Complejo Aloctono Eugeosinclinal. Doctoral Thesis, La Habana.Google Scholar
  40. 40.
    Warren, S.D., M.D. Johnson, W.D. Goran and V.E. Diersing (1990). An automated objective procedure for selecting representative field sample sites. Phot. Eng. & Rem. Sensing, 56(3), 333–335.Google Scholar
  41. 41.
    Weinsberg, S. (1985). Applied Linear Regression. John Wiley & Sons.Google Scholar
  42. 42.
    Wilkinson, G.G., F. Feriens and I. Kenellopoulos (1995). Integration of neural and statistical approaches in spatial data classification. Geographycal Systems, 2, 1–20.Google Scholar
  43. 43.
    Wilson, D.L. (1972). Asymptotic properties of Nearest Neighbor rules using edited data sets. IEEE Trans. Syst., Man and Cyb., SMC-2, 408–421CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Ricardo Barandela
    • 1
  • Eduardo Gasca
    • 1
  1. 1.Lab for Pattern Recognition Instituto Tecnologico de TolucaMexico

Personalised recommendations