Iterative Double Clustering for Unsupervised and Semi-supervised Learning

El-Yaniv, Ran; Souroujon, Oren

doi:10.1007/3-540-44795-4_11

Ran El-Yaniv³ &
Oren Souroujon³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2167))

Included in the following conference series:

European Conference on Machine Learning

2420 Accesses
8 Citations

Abstract

This paper studies the Iterative Double Clustering (IDC) meta-clustering algorithm, a new extension of the recent Double Clustering (DC) method of Slonim and Tishby that exhibited impressive performance on text categorization tasks [1]. Using synthetically gener ated data we empirically demonstrate that whenever the DC procedure is successful in recovering some of the structure hidden in the data, the extended IDC procedure can incrementally compute a dramatically better classification, with minor additional computational resources.We demonstrate that the IDC algorithm is especially advantageous when the data exhibits high attribute noise. Our simulation results also show the effectiveness of IDC in text categorization problems. Surprisingly, this unsupervised procedure can be competitive with a (supervised) SVM trained with a small training set. Finally, we propose a natural extension of IDC for (semi-supervised) transductive learning where we are given both labeled and unlabeled examples, and present preliminary empirical results showing the plausibility of the extended method in a semi-supervised setting.

Download to read the full chapter text

Chapter PDF

Generalizing from Example Clusters

LIA: A Label-Independent Algorithm for Feature Selection for Supervised Learning

Clustering

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Noam Slonim and Naftali Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR 2000, 2000.
Google Scholar
A.K. Jain and R.C. Dubes. Algorithms for Clustering Data. Prentice-Hall, New Jersey, 1988.
MATH Google Scholar
F.C. Pereira N. Tishby and W. Bialek. Information bottleneck method. In 37-th Allerton Conference on Communication and Computation, 1999.
Google Scholar
N. Slonim and N. Tishby. Agglomerative information bottleneck. In NIPS99, 1999.
Google Scholar
L. D. Baker and A. K. McCallum. Distributional clustering of words for text classification. In Proceedings of SIGIR’98, 1998.
Google Scholar
N. Slonim and N. Tishby. The power of word clustering for text classification. To appear in the European Colloquium on IR Research, ECIR, 2001.
Google Scholar
T.M. Cover and J.A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., 1991.
Google Scholar
K. Rose. Deterministic annealing for clustering, compression, classification, regression and related optimization problems. Proceedings of the IEEE, 86(11):2210–2238, 1998.
Article Google Scholar
J. Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145–151, 1991.
Article MATH Google Scholar
R. El-Yaniv, S. Fine, and N. Tishby. Agnostic classification of markovian sequences.In NIPS97, 1997.
Google Scholar
I.D. Guedalia, M. London, and M. Werman. A method for on-line clustering of non-stationary data. Neural Computation, 11:521–540, 1999.
Article Google Scholar
20 newsgroup data set. http://www.ai.mit.edu/jrennie/20_newsgroups/.
Libsvm. http://www.csie.ntu.edu.tw/cjlin/libsvm.

Download references

Author information

Authors and Affiliations

Computer Science Department, Technion - Israel Institute of Technology, Israel
Ran El-Yaniv & Oren Souroujon

Authors

Ran El-Yaniv
View author publications
You can also search for this author in PubMed Google Scholar
Oren Souroujon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Albert-Ludwigs University Freiburg, Georges Köhler-Allee, Geb. 079, 79110, Freiburg, Germany
Luc De Raedt
Department of Computer Science, University of Bristol, Merchant Ventures Bldg., Woodland Road, Bristol, BS8 1UB, UK
Peter Flach

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

El-Yaniv, R., Souroujon, O. (2003). Iterative Double Clustering for Unsupervised and Semi-supervised Learning. In: De Raedt, L., Flach, P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science(), vol 2167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44795-4_11

Download citation

DOI: https://doi.org/10.1007/3-540-44795-4_11
Published: 30 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42536-6
Online ISBN: 978-3-540-44795-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Iterative Double Clustering for Unsupervised and Semi-supervised Learning

Abstract

Chapter PDF

Similar content being viewed by others

Generalizing from Example Clusters

LIA: A Label-Independent Algorithm for Feature Selection for Supervised Learning

Clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Iterative Double Clustering for Unsupervised and Semi-supervised Learning

Abstract

Chapter PDF

Similar content being viewed by others

Generalizing from Example Clusters

LIA: A Label-Independent Algorithm for Feature Selection for Supervised Learning

Clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation