Machine Learning

, Volume 53, Issue 3, pp 199–233 | Cite as

Learning from Cluster Examples

  • Toshihiro Kamishima
  • Fumio Motoyoshi


Learning from cluster examples (LCE) is a hybrid task combining features of two common grouping tasks: learning from examples and clustering. In LCE, each training example is a partition of objects. The task is then to learn from a training set, a rule for partitioning unseen object sets. A general method for learning such partitioning rules is useful in any situation where explicit algorithms for deriving partitions are hard to formalize, while individual examples of correct partitions are easy to specify. In the past, clustering techniques have been applied to such problems, despite being essentially unsuited to the task. We present a technique that has qualitative advantages over standard clustering approaches. We demonstrate these advantages by applying our method to problems in two domains; one with dot patterns and one with more realistic vector-data images.

learning from examples clustering dot pattern image segmentation 


  1. Bensaid, A. M., Hall, L. O., Bezdek, J. C., &; Clarke, L. P. (1996). Partially supervised clustering for image segmentation. Pattern Recognition, 29:5, 859–871.Google Scholar
  2. Breiman, L., Friedman, J. H., Olshen, R. A., &; Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Inc.Google Scholar
  3. Burset, M., &; Guigó, R. (1996). Evaluation of gene structure prediction programs. Genomics, 34, 353–367.Google Scholar
  4. Cheeseman, P., &; Stutz, J. (1996). Bayesian classification (AutoClass): Theory and results. In U. M. Fayyad, G. Diatetsky-Shapiro, P. Smyth, &; R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining (pp. 153–180). AAAI Press/The MIT Press, Chapt. 6.Google Scholar
  5. Dempster, A. P., Laird, N. M., &; Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society (B), 39:1, 1–38.Google Scholar
  6. Emde, W. (1994). Inductive learning of characteristic concept descriptions from small sets of classified examples. In Proc. of European Conference of Macine Learning, (pp. 103–121). [LNAI 784].Google Scholar
  7. Everitt, B. S. (1993). Cluster Analysis. Edward Arnold, 3rd ed.Google Scholar
  8. Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139–172.Google Scholar
  9. Itoh, S. (1992). Application of MDL principle to pattern classification problems. J. of Japanese Society for Artificial Intelligence, 7:4, 608–614 (in Japanese).Google Scholar
  10. Jain, A. K., &; Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice Hall.Google Scholar
  11. Kamishima, T., Minoh, M., &; Ikeda, K. (1995). Rule formulation based on inductive learning for extraction and classification of diagram symbols. Transactions of The Information Processing Society of Japan, 36:3, 614–626 (in Japanese).Google Scholar
  12. McCallum, A., Nigam, K., &; Ungar, L. H. (2000). Efficient clustering of high-dimensional data sets with application to reference matching. In Proc. of ACM SIGKDD (pp. 169–178).Google Scholar
  13. Michalski, R. S. (1993). Inferential theory of learning as a conceptual basis for multistrategy learning. Machine Learning, 11, 111–151.Google Scholar
  14. Pagallo, G., &; Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, 5, 71–99.Google Scholar
  15. Pavlidis, T. (1992). Why progress in machine vision is so slow. Pattern Recognition Letters, 13, 221–225.Google Scholar
  16. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.Google Scholar
  17. Quinlan, J. R., &; Rivest, R. L. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, 80, 227–248.Google Scholar
  18. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. J. of the American Statistical Association, 66, 846–850.Google Scholar
  19. Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length. The Annals of Statistics, 11:2, 416–431.Google Scholar
  20. Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton University Press.Google Scholar
  21. Urquhart, R. (1982). Graph theoretical clustering based on limited neghbourhood sets. Pattern Recognition, 15:3, 173–187.Google Scholar
  22. Wallace, C. S., &; Patrick, J. D. (1993). Coding decision trees. Machine Learning, 11, 7–22.Google Scholar
  23. Wong, M. A., &; Lane, T. (1983). A kth nearest neighbour clustering procedure. Journal of the Royal Statistical Society (B), 45:3, 362–368.Google Scholar
  24. Yamanishi, K. (1992). A learning criterion for stochastic rules. Machine Learning, 9, 165–203.Google Scholar
  25. Yamanishi, K., &; Han, T. (1992). Introduction to MDL from viewpoints of information theory. J. of Japanese Society for Artificial Intelligence, 7:3, 427–434 (in Japanese).Google Scholar
  26. Zahn, C. T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. on Computers, 20:1, 68–86.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Toshihiro Kamishima
    • 1
  • Fumio Motoyoshi
    • 1
  1. 1.National Institue of Advanced Industrial Science and Technology (AIST)IbarakiJapan

Personalised recommendations