Skip to main content
Log in

Data Clustering with Partial Supervision

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Clustering with partial supervision finds its application in situations where data is neither entirely nor accurately labeled. This paper discusses a semi-supervised clustering algorithm based on a modified version of the fuzzy C-Means (FCM) algorithm. The objective function of the proposed algorithm consists of two components. The first concerns traditional unsupervised clustering while the second tracks the relationship between classes (available labels) and the clusters generated by the first component. The balance between the two components is tuned by a scaling factor. Comprehensive experimental studies are presented. First, the discrimination of the proposed algorithm is discussed before its reformulation as a classifier is addressed. The induced classifier is evaluated on completely labeled data and validated by comparison against some fully supervised classifiers, namely support vector machines and neural networks. This classifier is then evaluated and compared against three semi-supervised algorithms in the context of learning from partly labeled data. In addition, the behavior of the algorithm is discussed and the relation between classes and clusters is investigated using a linear regression model. Finally, the complexity of the algorithm is briefly discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
Figure 9.
Figure 10.
Figure 11.
Figure 12.
Figure 13.
Figure 14.
Figure 15.
Figure 16.
Figure 17.

Similar content being viewed by others

References

  • Amini, M. and Gallinari, P. 2003. Semi-supervised learning with explicit misclassification modeling. Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 555–561.

  • Basu, S., Banerjee, A., and Mooney, R. 2002. Semi-supervised clustering by seeding. Proceedings of the Int. Conference on Machine Learning, pp. 19–26.

  • Bennett, K. and Demiriz, A. 1999. Semi-supervised support vector machines. Advances in Neural Information Processing Systems 11:368–374.

    Google Scholar 

  • Bezdek, J.C. 1981. Pattern recognition with fuzzy objective function algorithms. Plenum, New York.

  • Bishop, C. 1995. Neural networks for pattern recognition. Oxford press, New York.

  • Blum, A. and Mitchell, T. 1998. Combining labeled and unlabaled data with co-training. Proceedings of the 11th Annual Conference on Computatioonal Learning Theory, pp. 92–100.

  • Blum, A., Lafferty, J., Rwebangira, M., and Reddy, R. 2004. Cluster kernels for semi-supervised learning. Proceedings of the 21th International Conference on Machine Learning, pp. 92–100.

  • Bouchachia, A. 2005a. RBF networks for learning from partially labeled data. Proceedings of the workshop on learning with partially classified training data at the 22nd international conference on machine learning,Bonn pp. 10–18.

  • Bouchachia, A. 2005b. Learning with hybrid data. Proceedings of the 5th International IEEE Conference on Intelligent Hybrid Systems, pp. 193–198, IEEE Computer Society.

  • Chapelle, O., Weston, J., and Schölkopf, B. 2002. Semi-supervised learning using randomized mincuts. Advances in Neural Information Processing Systems, 15:585–592.

    Google Scholar 

  • Demiriz, A., Bennett, K., and Embrechts, M. 1999. Semi-supervised clustering using genetic algorithms. Intelligent Engineering Systems, pp. 809–814.

  • Guyon, I., Matic, N., and Vapnik, V. 1996. Discovering information patterns and data cleaning. Advances in Knowledge Discovery and Data Mining. U. Fayyad et al. (eds.) AAAI Press, pp. 181–203.

  • Hathaway, R.J., Bezdek, J., and Hu, Y. 2000. Generalized fuzzy C-Means clustering strategies using \(L_p\)-norm distances. IEEE Transaction on Fuzzy Systems, 8(5):576–582.

    Google Scholar 

  • Jeon, B. and Landgrebe, D. 1999. Partially supervised classification using weighted unsupervised clustering. IEEE Transactions on Geoscience and Remote Sensing, 37(2):1073–1079.

    Google Scholar 

  • Klinkenberg, R. 2001. Using labeled and unlabeled data to learn drifting concepts. Proceedings of the Workshop on Learning from Temporal and Spatial Data, pp. 16–24.

  • Mason, R., Lind, D., and Marchal, W. 1983. Statistics: An Introduction. Harcourt Brace Jovanovich, Inc.

  • Nigam, K., McCallum, A., Thrun, S., and Mitchell, T. 2000. Text classification from labeled and unlabeled documents using Expectation-Maximization. Machine Learning, 39(2/3):103–134.

    Google Scholar 

  • Pedrycz, W. and Waletzky, J. 1997. Fuzzy clustering with partial supervision. IEEE Transactions on Systems Man and Cybernetics, B27(5):787–795.

    Google Scholar 

  • Pizzi, N. 1999. Fuzzy pre-processing of gold standards as applied to biomedical spectra classification. Artificial Intelligence in Medicine, 16:171–182.

    Google Scholar 

  • Snedecor, G. and Cochran, W. 1989. Statistical Methods. 8th edition, Iowa State University Press.

  • Suykens, J. and Vandewalle, J. 1999. Least squares support vector machine classifiers. Neural Processing Letters, 9(3):293–300.

    Google Scholar 

  • Zhu, X., Kandola, J., Ghahramani, Z., and Lafferty, J. 2005. Nonparametric transforms of graph kernels for semi-supervised learning. Advances in Neural Information Processing Systems, 17:1641–1648.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to ABDELHAMID BOUCHACHIA.

Rights and permissions

Reprints and permissions

About this article

Cite this article

BOUCHACHIA, A., PEDRYCZ, W. Data Clustering with Partial Supervision. Data Min Knowl Disc 12, 47–78 (2006). https://doi.org/10.1007/s10618-005-0019-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-005-0019-1

Keywords

Navigation