Abstract
The phrase “data mining” was termed in the late eighties of the last century, which describes the activity that attempts to extract interesting patterns from data. Since then, data mining and knowledge discovery has become one of the hottest topics in both academia and industry. It provides valuable business and scientific intelligence hidden in a large amount of historical data
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 94–105 (1998)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)
Anderberg, M.: Cluster Analysis for Applications. Academic Press, New York (1973)
Banerjee, A., Dhillon, I., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)
Banerjee, A., Merugu, S., Dhillon, I., Ghosh, J.: Clustering with bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Proceedings of the Nineteenth International Conference on, Machine Learning, pp. 19–26 (2002)
Basu, S., Bilenko, M., Mooney, R.: A probabilistic framework for semi-supervised clustering. In: Proceedings of 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 59–68 (2004)
Bellman, R.E., Corporation, R.: Dynamic Programming. Princeton University Press, New Jersey (1957)
Bentkus, V.: On hoeffding’s inequalities. Ann. Probab. 32(2), 1650–1673 (2004)
Beringer, J., Hullermeier, E.: Online clustering of parallel data streams. Data Knowl. Eng. 58(2), 180–204 (2005)
Berkhin, P.: Survey of clustering data mining techniques. Technical Report, Accrue Software, San Jose (2002)
Berry, M., Linoff, G.: Data Mining Techniques: For Marketing, Sales, and Customer Support. Wiley, New York (1997)
Berry, M., Linoff, G.: Matering Data Mining: The Art and Science of Customer Relationship Management. Wiley, New York (1999)
Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algoritms. Plenum Press, New York (1981)
Bilmes, J.: A gentle tutorial of the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical Report, ICSITR-97-021, International Computer Science Institute and U.C. Berkeley (1997)
Boley, D., Gini, M., Gross, R., Han, E., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., Moore, J.: Partitioning-based clustering for web document categorization. Decis. Support Syst. 27(3), 329–341 (1999)
Bradley, P., Fayyad, U., Reina, C.: Scaling clustering algorithms to large databases. In: Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 9–15 (1998)
Bradley, P., Fayyad, U., Reina, C.: Scaling em (expectation maximization) clustering to large databases. Technical Report, MSR-TR-98-35, Microsoft Research (1999)
Bregman, L.: The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)
Breunig, M., Kriegel, H., Ng, R., Sander, J.: Lof: identifying density-based local outliers. In: Proceedings of 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Brun, M., Sima, C., Hua, J., Lowey, J., Carroll, B., Suh, E., Dougherty, E.: Model-based evaluation of clustering validation measures. Pattern Recognit. 40, 807–824 (2007)
Childs, A., Balakrishnan, N.: Some approximations to the multivariate hypergeometric distribution with applications to hypothesis testing. Comput. Stat. Data Anal. 35(2), 137–154 (2000)
Cover, T., Thomas, J.: Elements of Information Theory, 2nd edn. Wiley-Interscience, Hoboken (2006)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Davidson, I., Ravi, S.: Clustering under constraints: feasibility results and the k-means algorithm. In: Proceedings of the 2005 SIAM International Conference on Data Mining (2005)
Dempster, A., Laird, N., Rubin, D.: Maximum-likelihood from incomplete data via the em algorithm. J. Royal Stat. Soc. Ser. B 39(1), 1–38 (1977)
Dhillon, I., Guan, Y., Kogan, J.: Iterative clustering of high dimensional text data augmented by local search. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 131–138 (2002)
Dhillon, I., Guan, Y., Kulis, B.: Kernel k-means: Spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–556. New York (2004)
Dhillon, I., Mallela, S., Kumar, R.: A divisive information-theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)
Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98 (2003)
Ding, C., He, X., Zha, H., Gu, M., Simon, H.: A min-max cut for graph partitioning and data clustering. In: Proceedings of the 1st IEEE International Conference on Data Mining, pp. 107–114 (2001)
Domingos, P., Hulten, G.: A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the 18th International Conference on, Machine Learning, pp. 106–113 (2001)
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2000)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Forgy, E.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21(3), 768–769 (1965)
Fred, A., Jain, A.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 835–850 (2005)
Friedman, H., Rubin, J.: On some invariant criteria for grouping data. J. Am. Stat. Assoc. 62, 1159–1178 (1967)
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. SIGMOD Rec. 34(2), 18–26 (2005)
Ghosh, J.: Scalable clustering methods for data mining. In: Ye, N. (ed.) Handbook of Data Mining, pp. 247–277. Lawrence Ealbaum (2003)
Gray, R., Neuhoff, D.: Quantization. IEEE Trans. Info. Theory 44(6), 2325–2384 (1998)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: Part I. SIGMOD Rec. 31(2), 40–45 (2002)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity checking methods: Part II. SIGMOD Rec. 31(3), 19–27 (2002)
Han, E.H., Boley, D., Gini, M., Gross, R., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., Moore, J.: Webace: a web agent for document categorization and exploration. In: Proceedings of the 2nd International Conference on Autonomous Agents, pp. 408–415 (1998)
Hand, D., Yu, K.: Idiot’s bayes—not so stupid after all? Int. Stat. Rev. 69(3), 385–399 (2001)
Hansen, P., Mladenovic, N.: Variable neighborhood search: principles and applications. Euro. J. Oper. Res. 130, 449–467 (2001)
Hinneburg, A., Keim, D.: An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 58–65. AAAI Press, New York (1998)
Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85–126 (2004)
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Comput. Surv. 31(3), 264–323 (1999)
Jarvis, R., Patrick, E.: Clusering using a similarity measure based on shared nearest neighbors. IEEE Trans. Comput. C-22(11), 1025–1034 (1973)
Karypis, G., Han, E.H., Kumar, V.: Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)
Karypis, G., Kumar, V.: A fast and highly quality multilevel scheme for partitioning irregular graphs. SIAM J. Sc. Comput. 20(1), 359–392 (1998)
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Series in Probability and Statistics. Wiley, New York (1990)
Kent, J., Bibby, J., Mardia, K.: Multivariate Analysis (Probability and Mathematical Statistics). Elsevier Limited, New York (2006)
Kleinberg, J.: An impossibility theorem for clustering. In: Proceedings of the 16th Annual Conference on Neural Information Processing Systems, pp. 9–14 (2002)
Kohonen, T., Huang, T., Schroeder, M.: Self-Organizing Maps. Springer,Heidelberg (2000)
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)
Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on, World Wide Web, pp. 631–640 (2010)
Lloyd, S.: Least squares quantization in pcm. IEEE Trans. Info. Theory 28(2), 129–137 (1982)
Lu, Z., Peng, Y., Xiao, J.: From comparing clusterings to combining clusterings. In: Fox, D., Gomes, C. (eds.) Proceedings of the 23rd AAAI Conference on Artificial Intelligence, pp. 361–370. AAAI Press, Chicago (2008)
Luo, P., Xiong, H., Zhan, G., Wu, J., Shi, Z.: Information-theoretic distance measures for clustering validation: Generalization and normalization. IEEE Trans. Knowl. Data Eng. 21(9), 1249–1262 (2009)
Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
MathWorks: K-means clustering in statistics toolbox. http://www.mathworks.com
McLachlan, G., Basford, K.: Mixture Models. Marcel Dekker, New York (2000)
Meila, M.: Comparing clusterings by the variation of information. In: Proceedings of the 16th Annual Conference on Computational Learning Theory, pp. 173–187 (2003)
Meila, M.: Comparing clusterings—an axiomatic view. In: Proceedings of the 22nd International Conference on, Machine learning, pp. 577–584 (2005)
Milligan, G.: Clustering validation: Results and implications for applied analyses. In: Arabie, P., Hubert, L., Soete, G. (eds.) Clustering and Classification, pp. 345–375. World Scientific, Singapore (1996)
Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic Press, Dordrecht (1996)
Mitchell, T.: Machine Learning. McGraw-Hill, Boston (1997)
Mladenovic, N., Hansen, P.: Variable neighborhood search. Comput. Oper. Res. 24(11), 1097–1100 (1997)
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118 (2003)
Murtagh, F.: Clustering massive data sets. In: Abello, J., Pardalos, P.M., Resende, M.G. (eds.) Handbook of Massive Data Sets, pp. 501–543. Kluwer Academic Publishers, Norwell (2002)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856. MIT Press (2001)
Nguyen, N., Caruana, R.: Consensus clusterings. In: Proceedings of the 7th IEEE International Conference on Data Mining, pp. 607–612. Washington (2007)
Ordonez, C.: Clustering binary data streams with k-means. In: Proceedings of the SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2003)
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explor. 6(1), 90–105 (2004)
Pearson, K.: Contributions to the mathematical theory of evolution. Philos. Trans. Royal Soc. Lond. 185, 71–110 (1894)
Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)
Rose, K.: Deterministic annealing for clustering, compression, classification, regression and related optimization problems. Proc. IEEE 86, 2210–2239 (1998)
Rose, K., Gurewitz, E., Fox, G.: A deterministic annealing approach to clustering. Pattern Recognit. Lett. 11, 589–594 (1990)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of the KDD Workshop on Text Mining (2000)
Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Proceedings of the AAAI Workshop on AI for Web Search (2000)
Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Advances in Artificial Intelligence 2009, Article ID 421,425, 19 pp (2009)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2005)
Tang, B., Shepherd, M., Heywood, M., Luo, X.: Comparing dimension reduction techniques for document clustering. In: Proceedings of the Canadian Conference on, Artificial Intelligence, pp. 292–296 (2005)
Topchy, A., Jain, A., Punch, W.: Combining multiple weak clusterings. In: Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 331–338. Melbourne (2003)
Topchy, A., Jain, A., Punch, W.: A mixture model for clustering ensembles. In: Proceedings of the 4th SIAM International Conference on Data Mining. Florida (2004)
Vapnik, V.: The Nature of Statistical Learning. Springer, New York (1995)
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on, Machine Learning, pp. 577–584 (2001)
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Xiong, H., Pandey, G., Steinbach, M., Kumar, V.: Enhancing data analysis with noise removal. IEEE Trans. Knowl. Data Eng. 18(3), 304–319 (2006)
Xiong, H., Wu, J., Chen, J.: K-means clustering versus validation measures: a data-distribution perspective. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(2), 318–331 (2009)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Yang, J., Yuz, K., Gongz, Y., Huang, T.: Linear spatial pyramid matching using sparse coding. In: Proceedings of the 2009 IEEE Conference on Computer Vision and, Pattern Recognition, pp. 1794–1801 (2009)
Zhao, Y., Karypis, G.: Criterion functions for document clustering: experiments and analysis. Mach. Learn. 55(3), 311–331 (2004)
Zhong, S., Ghosh, J.: A unified framework for model-based clustering. J. Mach. Learn. Res. 4(6), 1001–1037 (2004)
Zhong, S., Ghosh, J.: Generative model-based document clustering: a comparative study. Knowl. Inf. Syst. 8(3), 374–384 (2005)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wu, J. (2012). Cluster Analysis and K-means Clustering: An Introduction. In: Advances in K-means Clustering. Springer Theses. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29807-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-29807-3_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29806-6
Online ISBN: 978-3-642-29807-3
eBook Packages: Computer ScienceComputer Science (R0)