Abstract
We propose a new statistical approach for characterizing the class separability degree in R p. This approach is based on a nonparametric statistic called “the Cut Edge Weight”. We show in this paper the principle and the experimental applications of this statistic. First, we build a geometrical connected graph like the Relative Neighborhood Graph of Toussaint on all examples of the learning set. Second, we cut all edges between two examples of a different class. Third, we calculate the relative weight of these cut edges. If the relative weight of the cut edges is in the expected interval of a random distribution of the labels on all the neighborhood graph’s vertices, then no neighborhood-based method will give a reliable prediction model. We will say then that the classes to predict are non-separable.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
S. Aivazian, I. Enukov, and L. Mechalkine. Eléments de modélisation et traitement primaire des données. MIR, Moscou, 1986.
C. L. Blake and C. J. Merz. UCI repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science [http://www.ics.uci.edu/~mlearn/MLRepository.html], 1998.
J. L. Chandon and S. Pinson. Analyse Typologique, Théories et Applications. Masson, 1981.
A. D. Cli. and J. K. Ord. Spatial processes, models and applications. Pion Limited, London, 1986.
F. N. David. Measurement of diversity. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, pages 109–136, Berkeley, USA, 1971.
F. Esposito, D. Malerba, V. Tamma, and H. H. Bock. Similarity and dissimilarity measures: classical resemblance measures. In H. H. Bock and E. Diday, editors, Analysis of Symbolic data, pages 139–152. Springer-Verlag, 2000.
R. C. Geary. The contiguity ratio and statistical mapping. The Incorporated Statistician, 5:115–145, 1954.
A. K. Jain and R. C. Dubes. Algorithms for clustering data. Prentice Hall, 1988.
P. V. A. Krishna Iyer. The first and second moments of some probability distribution arising from points on a lattice, and their applications. In Biometrika, number 36, pages 135–141, 1949.
S. Lallich, F. Muhlenbach, and D. A. Zighed. Improving classification by removing or relabeling mislabeled instances. In Proceedings of the XIIIth Int. Symposium on Methodologies for Intelligent Systems (ISMIS), 2002. To appear in LNAI.
L. Lebart. Data anlysis. In W. Gaul, O. Opitz, and M. Schader, editors, Contiguïty analysis and classification, pages 233–244, Berlin, 2000. Springer.
T. Mitchell. Machine Learning. McGraw Hill, 1997.
A. Mood. The distribution theory of runs. Ann. of Math. Statist., 11:367–392, 1940.
P. A. P. Moran. The interpretation of statistical maps. In Journal of the Royal Statistical Society, serie B, pages 246–251, 1948.
J. R. Quinlan. C4.5: Program for Machine Learning. Morgan Kaufmann, San Mateo, Ca, 1993.
C. R. Rao. Linear statistical inference and its applications. Wiley, New-York, 1972.
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–408, 1958.
M. Sebban. Modèles théoriques en reconnaissance des formes et architecture hybride pour machine perceptive. PhD thesis, Université Lyon 2, 1996.
G. Toussaint. The relative neighborhood graph of a finite planar set. Pattern recognition, 12:261–268, 1980.
V. Vapnik. Statistical Learning Theory. John Wiley, NY, 1998.
A. Wald and J. Wolfowitz. On a test wether two samples are from the same population. Ann. of Math. Statist., 11:147–162, 1940.
D. A. Zighed, J. P. Auray, and G. Duru. SIPINA: Méthode et logiciel. Lacassagne, 1992.
D. A. Zighed and M. Sebban. Sélection et validation statistique de variables et de prototypes. In M. Sebban and G. Venturini, editors, Apprentissage automatique. Hermès Science, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zighed, D.A., Lallich, S., Muhlenbach, F. (2002). Separability Index in Supervised Learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2002. Lecture Notes in Computer Science, vol 2431. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45681-3_39
Download citation
DOI: https://doi.org/10.1007/3-540-45681-3_39
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44037-6
Online ISBN: 978-3-540-45681-0
eBook Packages: Springer Book Archive