Skip to main content
Log in

Guided Cluster Discovery with Markov Model

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Cluster discovery is an essential part of many data mining applications. While cluster discovery process is mainly unsupervised in nature, it can often be aided by a small amount of labeled data. A probabilistic model on the clustering structure is adopted and a novel unified energy equation for clustering that incorporates both labeled data and unlabeled data is introduced. This formulation is inspired by a force-field model integrating labeling constraint on labeled data and similarity information on unlabeled data for joint estimation. Experimental results show that good clusters can be identified using small amount of labeled data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press: Cambridge, UK, 1996.

    Google Scholar 

  2. E. Forgy, “Cluster analysis of multivariate data: Efficiency vs. interpretablility of classifications,” Biometrics,vol. 21, pp. 768, 1965.

    Google Scholar 

  3. J. MacQueen, “On convergence of k-means and partitions with minimum average variance,” Ann. Math. Statist.,vol. 36, pp. 1084, 1965.

    Google Scholar 

  4. J. Puzicha, T. Hofmann, and J.M. Buhmann, “A theory of proximity based clustering: Structure detection by optimization,” Pattern Recognition,vol. 33, pp. 617–634, 2000.

    Google Scholar 

  5. G. Karpis and E.-H. Han, “Chameleon: Hierachical clustering using dynamic modeling,” IEEE Computer, pp. 68–75, 1999.

  6. R. Kannan, S. Vempala, and A. Veta, “On clusterings-good, bad and spectral,” in Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 2000.

  7. A. Blum and Shuchi Chawla, “Combining labeled and unlabeled data with co-training,” in The Eighteenth International Conference on Machine Learning, 2001.

  8. S. Goldman and Y. Zhou, “Enhancing supervised learning with unlabeled data,” in Proceedings of the Seventeenth International Conference on Machine Learning, 2000.

  9. K. Nigam, A. McCallum, Sebastian Thrun, and Tom Mitchell, “Text classification from labeled and unlabeled documents using em,” Machine Learning,vol. 34, no. 1, 1999.

  10. T.S. Chiang and Y. Chow, “Optimization approaches to semi-supervised learning,” in Applications and Algorithms of Com-plementarity, edited by M.C. Ferris, O.L. Mangasarian, and J.S. Pang, Kluwer Academic Publishers, 2000.

  11. Ross Kindermann, Markov Random Fields and Their Applications, American Mathematical Society: Providence, R.I., 1980.

    Google Scholar 

  12. M. Nadler and E.P. Smith, Pattern Recognition Engineering, Wiley-interscience: New York, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, C. Guided Cluster Discovery with Markov Model. Applied Intelligence 22, 37–46 (2005). https://doi.org/10.1023/B:APIN.0000047382.74353.8f

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:APIN.0000047382.74353.8f

Navigation