Abstract
Cluster analysis deals with the automatic discovery of the grouping of a set of patterns. Despite more than 40 years of research, there are still many challenges in data clustering from both theoretical and practical viewpoints. In this paper, we describe several recent advances in data clustering: clustering ensemble, feature selection, and clustering with constraints.
Research supported by the U.S. ONR grant no. N000140410183.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Jain, A., Flynn, P.: Image segmentation using clustering. In: Advances in Image Understanding, pp. 65–83. IEEE Computer Society Press, Los Alamitos (1996)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 888–905 (2000)
Connell, S., Jain, A.: Writer adaptation for online handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 329–346 (2002)
Duta, N., Jain, A., Dubuisson-Jolly, M.P.: Automatic construction of 2D shape models. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 433–446 (2001)
Sahami, M.: Using Machine Learning to Improve Information Access. PhD thesis, Computer Science Department, Stanford University (1998)
Baldi, P., Hatfield, G.: DNA Microarrays and Gene Expression. Cambridge University Press, Cambridge (2002)
Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. Fifth Berkeley Symposium on Math. Stat. and Prob., pp. 281–297. University of California Press (1967)
McLachlan, G., Peel, D.: Finite Mixture Models. John Wiley & Sons, New York (2000)
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. John Wiley & Sons, New York (2001)
Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 603–619 (2002)
Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 1101–1113 (1993)
Fischer, B., Buhmann, J.: Path-based clustering for grouping smooth curves and texture segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 513–518 (2003)
Verma, D., Meila, M.: A comparison of spectral clustering algorithms. Technical Report 03-05-01, CSE Department, University of Washington (2003)
Smith, S., Jain, A.: Testing for uniformity in multidimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 73–81 (1984)
Jain, A., Xu, X., Ho, T., Xiao, F.: Uniformity testing using minimum spanning tree. In: Proc. the 16th International Conference on Pattern Recognition. (2002) IV:281–IV:284
Dubes, R., Jain, A.: Clustering techniques: The user’s dilemma. Pattern Recognition 8, 247–260 (1976)
Fred, A., Jain, A.: Evidence accumulation clustering. To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (2005)
Topchy, A., Jain, A., Punch, W.: Clustering ensembles: Models of consensus and weak partitions. To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (2005)
Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)
Fridlyand, J., Dudoit, S.: Applications of resampling methods to estimate the number of clusters and to improve the accuracy of clustering method. Technical report, University of California, Berkeley (2001)
Mirkin, B.: Reinterpreting the category utility function. Machine Learning 45, 219–228 (2001)
Fern, X., Brodley, C.: Random projection for high dimensional data clustering: A cluster ensemble approach. In: Proc. the 20th International Conference on Machine Learning, pp. 186–193. AAAI Press, Menlo Park (2003)
Topchy, A., Law, M.H., Jain, A.K., Fred, A.: Analysis of consensus partition in cluster ensemble. In: Proc. the 5th IEEE International Conference on Data Mining, pp. 225–232 (2004)
Meila, M.: Comparing clusterings by the variation of information. In: Proc. The 16th Annual Conference on Learning Theory, pp. 173–187. Springer, Heidelberg (2003)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Blum, A., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97, 245–271 (1997)
Jain, A., Zongker, D.: Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 153–157 (1997)
Dy, J., Brodley, C.: Feature selection for unsupervised learning. Journal of Machine Learning Research 5, 845–889 (2004)
Roth, V., Lange, T.: Feature selection in clustering problems. In: Advances in Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2004)
Modha, D., Scott-Spangler, W.: Feature weighting in k-means clustering. Machine Learning 52, 217–237 (2003)
Law, M., Figueiredo, M., Jain, A.: Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 1154–1166 (2004)
Wallace, C., Freeman, P.: Estimation and inference via compact coding. Journal of the Royal Statistical Society, Series B (Methodological) 49, 241–252 (1987)
Figueiredo, M., Jain, A.: Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 381–396 (2002)
Yu, S., Shi, J.: Segmentation given partial grouping constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 173–183 (2004)
Lange, T., Law, M., Jain, A., Buhmann, J.: Learning with constrained and unlabelled data. In: Proc. the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jain, A.K., Law, M.H.C. (2005). Data Clustering: A User’s Dilemma. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2005. Lecture Notes in Computer Science, vol 3776. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11590316_1
Download citation
DOI: https://doi.org/10.1007/11590316_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30506-4
Online ISBN: 978-3-540-32420-1
eBook Packages: Computer ScienceComputer Science (R0)