Data Clustering: A User’s Dilemma

  • Anil K. Jain
  • Martin H. C. Law
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3776)

Abstract

Cluster analysis deals with the automatic discovery of the grouping of a set of patterns. Despite more than 40 years of research, there are still many challenges in data clustering from both theoretical and practical viewpoints. In this paper, we describe several recent advances in data clustering: clustering ensemble, feature selection, and clustering with constraints.

Keywords

IEEE Transaction Feature Selection Image Segmentation Machine Intelligence Consensus Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)MATHGoogle Scholar
  2. 2.
    Jain, A., Flynn, P.: Image segmentation using clustering. In: Advances in Image Understanding, pp. 65–83. IEEE Computer Society Press, Los Alamitos (1996)Google Scholar
  3. 3.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 888–905 (2000)CrossRefGoogle Scholar
  4. 4.
    Connell, S., Jain, A.: Writer adaptation for online handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 329–346 (2002)CrossRefGoogle Scholar
  5. 5.
    Duta, N., Jain, A., Dubuisson-Jolly, M.P.: Automatic construction of 2D shape models. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 433–446 (2001)CrossRefGoogle Scholar
  6. 6.
    Sahami, M.: Using Machine Learning to Improve Information Access. PhD thesis, Computer Science Department, Stanford University (1998)Google Scholar
  7. 7.
    Baldi, P., Hatfield, G.: DNA Microarrays and Gene Expression. Cambridge University Press, Cambridge (2002)CrossRefGoogle Scholar
  8. 8.
    Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)CrossRefGoogle Scholar
  9. 9.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. Fifth Berkeley Symposium on Math. Stat. and Prob., pp. 281–297. University of California Press (1967)Google Scholar
  10. 10.
    McLachlan, G., Peel, D.: Finite Mixture Models. John Wiley & Sons, New York (2000)MATHCrossRefGoogle Scholar
  11. 11.
    Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. John Wiley & Sons, New York (2001)MATHGoogle Scholar
  12. 12.
    Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 603–619 (2002)CrossRefGoogle Scholar
  13. 13.
    Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 1101–1113 (1993)CrossRefGoogle Scholar
  14. 14.
    Fischer, B., Buhmann, J.: Path-based clustering for grouping smooth curves and texture segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 513–518 (2003)Google Scholar
  15. 15.
    Verma, D., Meila, M.: A comparison of spectral clustering algorithms. Technical Report 03-05-01, CSE Department, University of Washington (2003)Google Scholar
  16. 16.
    Smith, S., Jain, A.: Testing for uniformity in multidimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 73–81 (1984)CrossRefGoogle Scholar
  17. 17.
    Jain, A., Xu, X., Ho, T., Xiao, F.: Uniformity testing using minimum spanning tree. In: Proc. the 16th International Conference on Pattern Recognition. (2002) IV:281–IV:284Google Scholar
  18. 18.
    Dubes, R., Jain, A.: Clustering techniques: The user’s dilemma. Pattern Recognition 8, 247–260 (1976)CrossRefGoogle Scholar
  19. 19.
    Fred, A., Jain, A.: Evidence accumulation clustering. To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (2005)Google Scholar
  20. 20.
    Topchy, A., Jain, A., Punch, W.: Clustering ensembles: Models of consensus and weak partitions. To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (2005)Google Scholar
  21. 21.
    Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Fridlyand, J., Dudoit, S.: Applications of resampling methods to estimate the number of clusters and to improve the accuracy of clustering method. Technical report, University of California, Berkeley (2001)Google Scholar
  23. 23.
    Mirkin, B.: Reinterpreting the category utility function. Machine Learning 45, 219–228 (2001)MATHCrossRefGoogle Scholar
  24. 24.
    Fern, X., Brodley, C.: Random projection for high dimensional data clustering: A cluster ensemble approach. In: Proc. the 20th International Conference on Machine Learning, pp. 186–193. AAAI Press, Menlo Park (2003)Google Scholar
  25. 25.
    Topchy, A., Law, M.H., Jain, A.K., Fred, A.: Analysis of consensus partition in cluster ensemble. In: Proc. the 5th IEEE International Conference on Data Mining, pp. 225–232 (2004)Google Scholar
  26. 26.
    Meila, M.: Comparing clusterings by the variation of information. In: Proc. The 16th Annual Conference on Learning Theory, pp. 173–187. Springer, Heidelberg (2003)Google Scholar
  27. 27.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)MATHCrossRefGoogle Scholar
  28. 28.
    Blum, A., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97, 245–271 (1997)MATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    Jain, A., Zongker, D.: Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 153–157 (1997)CrossRefGoogle Scholar
  30. 30.
    Dy, J., Brodley, C.: Feature selection for unsupervised learning. Journal of Machine Learning Research 5, 845–889 (2004)MathSciNetGoogle Scholar
  31. 31.
    Roth, V., Lange, T.: Feature selection in clustering problems. In: Advances in Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2004)Google Scholar
  32. 32.
    Modha, D., Scott-Spangler, W.: Feature weighting in k-means clustering. Machine Learning 52, 217–237 (2003)MATHCrossRefGoogle Scholar
  33. 33.
    Law, M., Figueiredo, M., Jain, A.: Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 1154–1166 (2004)CrossRefGoogle Scholar
  34. 34.
    Wallace, C., Freeman, P.: Estimation and inference via compact coding. Journal of the Royal Statistical Society, Series B (Methodological) 49, 241–252 (1987)MathSciNetGoogle Scholar
  35. 35.
    Figueiredo, M., Jain, A.: Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 381–396 (2002)CrossRefGoogle Scholar
  36. 36.
    Yu, S., Shi, J.: Segmentation given partial grouping constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 173–183 (2004)CrossRefGoogle Scholar
  37. 37.
    Lange, T., Law, M., Jain, A., Buhmann, J.: Learning with constrained and unlabelled data. In: Proc. the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Anil K. Jain
    • 1
  • Martin H. C. Law
    • 1
  1. 1.Department of Computer Science and EngineeringMichigan State UniversityEast LansingUSA

Personalised recommendations