Separability Index in Supervised Learning

Zighed, Djamel A.; Lallich, Stéphane; Muhlenbach, Fabrice

doi:10.1007/3-540-45681-3_39

Djamel A. Zighed⁴,
Stéphane Lallich⁴ &
Fabrice Muhlenbach⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2431))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2847 Accesses
15 Citations

Abstract

We propose a new statistical approach for characterizing the class separability degree in R ^p. This approach is based on a nonparametric statistic called “the Cut Edge Weight”. We show in this paper the principle and the experimental applications of this statistic. First, we build a geometrical connected graph like the Relative Neighborhood Graph of Toussaint on all examples of the learning set. Second, we cut all edges between two examples of a different class. Third, we calculate the relative weight of these cut edges. If the relative weight of the cut edges is in the expected interval of a random distribution of the labels on all the neighborhood graph’s vertices, then no neighborhood-based method will give a reliable prediction model. We will say then that the classes to predict are non-separable.

Download to read the full chapter text

Chapter PDF

Supervised Classification Using Feature Space Partitioning

Separation of Finitely Many Convex Sets and Data Pre-classification

Supervised Classification Box Algorithm Based on Graph Partitioning

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

S. Aivazian, I. Enukov, and L. Mechalkine. Eléments de modélisation et traitement primaire des données. MIR, Moscou, 1986.
Google Scholar
C. L. Blake and C. J. Merz. UCI repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science [http://www.ics.uci.edu/~mlearn/MLRepository.html], 1998.
Google Scholar
J. L. Chandon and S. Pinson. Analyse Typologique, Théories et Applications. Masson, 1981.
Google Scholar
A. D. Cli. and J. K. Ord. Spatial processes, models and applications. Pion Limited, London, 1986.
Google Scholar
F. N. David. Measurement of diversity. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, pages 109–136, Berkeley, USA, 1971.
Google Scholar
F. Esposito, D. Malerba, V. Tamma, and H. H. Bock. Similarity and dissimilarity measures: classical resemblance measures. In H. H. Bock and E. Diday, editors, Analysis of Symbolic data, pages 139–152. Springer-Verlag, 2000.
Google Scholar
R. C. Geary. The contiguity ratio and statistical mapping. The Incorporated Statistician, 5:115–145, 1954.
Article Google Scholar
A. K. Jain and R. C. Dubes. Algorithms for clustering data. Prentice Hall, 1988.
Google Scholar
P. V. A. Krishna Iyer. The first and second moments of some probability distribution arising from points on a lattice, and their applications. In Biometrika, number 36, pages 135–141, 1949.
Google Scholar
S. Lallich, F. Muhlenbach, and D. A. Zighed. Improving classification by removing or relabeling mislabeled instances. In Proceedings of the XIIIth Int. Symposium on Methodologies for Intelligent Systems (ISMIS), 2002. To appear in LNAI.
Google Scholar
L. Lebart. Data anlysis. In W. Gaul, O. Opitz, and M. Schader, editors, Contiguïty analysis and classification, pages 233–244, Berlin, 2000. Springer.
Google Scholar
T. Mitchell. Machine Learning. McGraw Hill, 1997.
Google Scholar
A. Mood. The distribution theory of runs. Ann. of Math. Statist., 11:367–392, 1940.
Article MathSciNet MATH Google Scholar
P. A. P. Moran. The interpretation of statistical maps. In Journal of the Royal Statistical Society, serie B, pages 246–251, 1948.
Google Scholar
J. R. Quinlan. C4.5: Program for Machine Learning. Morgan Kaufmann, San Mateo, Ca, 1993.
Google Scholar
C. R. Rao. Linear statistical inference and its applications. Wiley, New-York, 1972.
Google Scholar
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–408, 1958.
Article MathSciNet Google Scholar
M. Sebban. Modèles théoriques en reconnaissance des formes et architecture hybride pour machine perceptive. PhD thesis, Université Lyon 2, 1996.
Google Scholar
G. Toussaint. The relative neighborhood graph of a finite planar set. Pattern recognition, 12:261–268, 1980.
Article MATH MathSciNet Google Scholar
V. Vapnik. Statistical Learning Theory. John Wiley, NY, 1998.
MATH Google Scholar
A. Wald and J. Wolfowitz. On a test wether two samples are from the same population. Ann. of Math. Statist., 11:147–162, 1940.
Article MathSciNet MATH Google Scholar
D. A. Zighed, J. P. Auray, and G. Duru. SIPINA: Méthode et logiciel. Lacassagne, 1992.
Google Scholar
D. A. Zighed and M. Sebban. Sélection et validation statistique de variables et de prototypes. In M. Sebban and G. Venturini, editors, Apprentissage automatique. Hermès Science, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

ERIC Laboratory, University of Lyon 2, 5, av. Pierre Mendès-France, F-69676, BRON Cedex, FRANCE
Djamel A. Zighed, Stéphane Lallich & Fabrice Muhlenbach

Authors

Djamel A. Zighed
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Lallich
View author publications
You can also search for this author in PubMed Google Scholar
Fabrice Muhlenbach
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Helsinki, P.O. Box 26, 00014, Helsinki, Finland
Tapio Elomaa , Heikki Mannila & Hannu Toivonen , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zighed, D.A., Lallich, S., Muhlenbach, F. (2002). Separability Index in Supervised Learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2002. Lecture Notes in Computer Science, vol 2431. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45681-3_39

Download citation

DOI: https://doi.org/10.1007/3-540-45681-3_39
Published: 18 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44037-6
Online ISBN: 978-3-540-45681-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Separability Index in Supervised Learning

Abstract

Chapter PDF

Similar content being viewed by others

Supervised Classification Using Feature Space Partitioning

Separation of Finitely Many Convex Sets and Data Pre-classification

Supervised Classification Box Algorithm Based on Graph Partitioning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Separability Index in Supervised Learning

Abstract

Chapter PDF

Similar content being viewed by others

Supervised Classification Using Feature Space Partitioning

Separation of Finitely Many Convex Sets and Data Pre-classification

Supervised Classification Box Algorithm Based on Graph Partitioning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation