Bioinformatics Methods in Clinical Research pp 81-107 | Cite as
Overview on Techniques in Cluster Analysis
- 42 Citations
- 3.3k Downloads
Abstract
Clustering is the unsupervised, semisupervised, and supervised classification of patterns into groups. The clustering problem has been addressed in many contexts and disciplines. Cluster analysis encompasses different methods and algorithms for grouping objects of similar kinds into respective categories. In this chapter, we describe a number of methods and algorithms for cluster analysis in a stepwise framework. The steps of a typical clustering analysis process include sequentially pattern representation, the choice of the similarity measure, the choice of the clustering algorithm, the assessment of the output, and the representation of the clusters.
Key words
Clustering algorithm feature selection feature extraction similarity measure cluster tendency cluster validity cluster stability relevance networks dendrogramReferences
- 1.Saeys Y, Inza I, Larrañaga P. (2007) Bioinformatics 23:2507–2517.CrossRefPubMedGoogle Scholar
- 2.Densmore D, Heath TL. (2002) Euclid’s Elements, Green Lion Press, Santa Fe, NM.Google Scholar
- 3.Zhang T, Ramakrishnman R, Linvy M. (1996) In ACM SIGMOD International Conference on Management of Data.Google Scholar
- 4.Guha S, Rastogi R, Shim K. (1998) In ACM SIGMOD International Conference on Management of Data.Google Scholar
- 5.Guha S, Rastogi R, Shim K. (1999) In IEEE Conference on Data Engineering.Google Scholar
- 6.Kaufman L, Rousseeuw P. (1990) Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, New York.Google Scholar
- 7.Gonzalez MD. (2005) In Mathematics, University of Puerto Rico, Puerto Rico.Google Scholar
- 8.Massey L. (2002) In Recent Advances in Soft-Computing (RASC02), Nottingham, UK.Google Scholar
- 9.Butte AJ, Kohane IS. (2000) In Pacific Symposium on Biocomputing.Google Scholar
- 10.Krause EF. (1987) Taxicab Geometry, Dover Publications, Dover, UK.Google Scholar
- 11.MacQueen JB. (1967) In 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, University of California Press, Berkeley.Google Scholar
- 12.Ball G, Hall D. (1967) Behav Sci 12:153–155.CrossRefPubMedGoogle Scholar
- 13.Ng R, Han J. (1994) In Proceedings of 20th VLDB Conference, Santiago, Chile.Google Scholar
- 14.Lu SY, Fu KS. (1978) IEEE Trans Syst Man Cybern 8:381–389.CrossRefGoogle Scholar
- 15.Jain A K. (1999) ACM Comp Surv 31:264–323.CrossRefGoogle Scholar
- 16.Pearson K. (1896) Philos Trans Roy Soc 187:253–318.CrossRefGoogle Scholar
- 17.Ester M, Kriegel H, Sander J, Xu X. (1996) In 2nd International Conference On Knowledge Discovery and Data Mining (KDD’96), pp. 226–231.Google Scholar
- 18.Hinneburg A, Keim D. (1998) In 4th International Conference On Knowledge Discovery and Data Mining (KDD’98), pp. 58–65.Google Scholar
- 19.Halkidi M, Batistakis Y, Vazirgiannis M. (2001) J. Intell Inform Syst 17: 107–145.CrossRefGoogle Scholar
- 20.Dunn J. (1974) J Cybern 4:95–104.CrossRefGoogle Scholar
- 21.Knudsen S. (2002) A Biologist’s Guide to Analysis of DNA Microarray Data, John Wiley & Sons, New York.Google Scholar
- 22.Sheikholeslami G, Chatterjee S, Zhang A. (1998) In Proceedings of 24th VLDB Conference, pp. 428–439.Google Scholar
- 23.Wang W, Yang J, Muntz R. (1997) In Proceedings of 23rd VLDB Conference.Google Scholar
- 24.Pearson K. (1901) Philos Mag 2:559–572.Google Scholar
- 25.Bezdeck JC, Ehrlich R, Full W. (1984) Comput Geosci 10:191–203.CrossRefGoogle Scholar
- 26.Breiman L. (1996) Mach Learn 24:123–140.Google Scholar
- 27.Suzuki R, Shimodaira H. (2006) Bioinformatics 22:1540–1542.CrossRefPubMedGoogle Scholar
- 28.Arfken G. (1985) In Mathematical Methods for Physicists, Academic Press, Orlando, FL, pp. 13–18.Google Scholar
- 29.Kohonen T. (1995) Self-Organizing Maps, Springer-Verlag, Heidelberg, Germany.Google Scholar
- 30.Herrero J, Valencia A, Dopazo J. (2001) Bioinformatics 17:126–136.CrossRefPubMedGoogle Scholar
- 31.Dopazo J, Carazo JM. (1997) J Mol Evol 44:226–233.CrossRefPubMedGoogle Scholar
- 32.Spearman C. (1906) Br J Psychol 2:89–108.Google Scholar
- 33.Kendall M. (1938) Biometrika 30:81–89.Google Scholar
- 34.Hall L, Özyurt I, Bezdek J. (1999) IEEE Trans Evol Comput 3:103–112.CrossRefGoogle Scholar
- 35.Shannon CE. (1948) Bell Syst Tech J 27:379–423 and 623–656.Google Scholar
- 36.Mirkin B. (1996) Mathematical Classification and Clustering, Kluwer Academic Publishers, Dordrecht, the Netherlands.Google Scholar
- 37.Bandeira LPC, Sousa JMC, Kaymak U. (2003) In Fuzzy Sets and Systems – IFSA 2003, Vol. 2715. Springer, Berlin.Google Scholar
- 38.Witten IH, Frank E. (2005) Data Mining: Practical Machine Learning Tools and Techniques, Elsevier, San Francisco.Google Scholar
- 39.Dash M, Choi K, Scheuermann P, Liu H. (2002) In IEEE International Conference on Data Mining (ICDM’02).Google Scholar
- 40.Yu L, Liu H. (2003) in Proceedings ICML, Washington, DC.Google Scholar
- 41.Xiong M, Fang X, Zhao J. (2001) Genome Res 11:1878–1887.PubMedGoogle Scholar
- 42.Blanco R, Larrañaga P, Inza I, Sierra B. (2004) Int J Patt Recog. Artif Intell 18:1373–1390.CrossRefGoogle Scholar
- 43.Subbarao C, Subbarao NV, Chandu SN. (1995) Environ Geol 28:175–180.CrossRefGoogle Scholar
- 44.Fisher RA. (1936) Ann Eugen 7:179–188.Google Scholar
- 45.Frank I, Friedman J. (1993) Technometrics 35:109–148.CrossRefGoogle Scholar
- 46.Friedman JH, Tukey JW. (1974) IEEE Trans Comput 23:881–890.CrossRefGoogle Scholar
- 47.Wold H. (1966) In Multivariate Analysis (Krishnaiaah PR, Ed.), Academic Press, New York, pp. 391–420.Google Scholar
- 48.Sturn A. (2000) The Institute for Genomic Research, Rockville, MD.Google Scholar
- 49.Jiang D, Tang C, Zhang A. (2004) Trans Knowl Data Eng 16:1370–1386.CrossRefGoogle Scholar
- 50.Kullback S, Leibler RA. (1951) Ann Math Stat 22:79–86.CrossRefGoogle Scholar
- 51.Xu R. (2005) IEEE Trans Neural Netw 16:645–678.CrossRefPubMedGoogle Scholar
- 52.Johnson SC. (1967) Psychometrika 2:241–254.CrossRefGoogle Scholar
- 53.Ward JH. (1963) J Am Stat Assoc 58:236–244.CrossRefGoogle Scholar
- 54.Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrousky E, Lander ES, Golub TR. (1999) Proc Natl Acad Sci 96:2907–2912.CrossRefPubMedGoogle Scholar
- 55.Fung, G. (2001) A Comprehensive Overview of Basic Clustering Algorithms. Available at http://pages.cs.wisc.edu/∼gfung/
- 56.Berkhin, P. (2002) Survey of clustering data mining techniques. Technical report,Accrue.Google Scholar
- 57.Hertz J, Krogh A, Palmer RG. (1991) Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA.Google Scholar
- 58.Fritzke B. (1994) Neural Netw 7:1441–1460.CrossRefGoogle Scholar
- 59.Goldberg DE. (1989) Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Redwood City, CA.Google Scholar
- 60.Holland JH. (1975) Adaption in Natural and Artificial Systems, University of Michigan Press, Ann Arbor.Google Scholar
- 61.Schewefel HP. (1981) Numerical Optimization of Computer Models, John Wiley and Sons, New York.Google Scholar
- 62.Fogel LJ, Owens AJ, Wals MJ. (1965) Artificial Intelligence Through Simulated Evolution, John Wiley and Sons, New York.Google Scholar
- 63.Madeira SC, Oliveira AL. (2004) IEEE/ACM Trans Comput Biol Bioinform 1:24–45.CrossRefPubMedGoogle Scholar
- 64.Davies DL, Bouldin DW. (1979) IEEE Trans Patt Recog Mach Intell 1:224–227.CrossRefGoogle Scholar
- 65.Dudoit S, Fridlyand J. (2003) Bioinformatics 19:1090–1099.CrossRefPubMedGoogle Scholar
- 66.Duran BS, Odell PL. (1974) Cluster Analysis: A Survey, Springer-Verlag, New York.Google Scholar
- 67.Diday E, Simon JC. (1976) Clustering analysis. In Digital Pattern Recognition, Springer-Verlag, Secaucus, NJ.Google Scholar
- 68.Michalski R, Stepp RE, Diday E. (1981) In Progress in Pattern Recognition (Kanal L, Rosenfeld A, Eds.), Vol. 1, Springer-Verlag, North-Holland, New York,pp. 33–55.Google Scholar
- 69.Hillis D, Bull J. (1993) Syst Biol 42:182–192.Google Scholar
- 70.Felsenstein J, Kishino H. (1993) Syst Biol 42:193–200.Google Scholar
- 71.Zharkikh A, Li WH. (1992) Mol Biol Evol 9:1119–1147.PubMedGoogle Scholar
- 72.Efron B, Halloran E, Holmes S. (1996) Proc Natl Acad Sci 93:13429–13434.CrossRefPubMedGoogle Scholar
- 73.Sanderson MJ, Wojciechwski MF. (2000) Syst Biol 49:671–685.CrossRefPubMedGoogle Scholar
- 74.Shimodaira H. (2002) Syst Biol 51:492–508.CrossRefPubMedGoogle Scholar
- 75.Shimodaira H. (2004) Ann Stat 32:2616–2641.CrossRefGoogle Scholar
- 76.Suzuki R, Shimodaira H. (2004) In 15th International Conference on Genome Informatics.Google Scholar