Skip to main content

Clustering Procedures and Module Detection

  • 3211 Accesses

Abstract

Detecting clusters (also referred to as groups or modules) of closely related objects is an important problem in data mining in general. Network modules are often defined as clusters. Partitioning-around-medoids (PAM) clustering and hierarchical clustering are often used in network applications. Partitioning-around-medoids (aka. k-medoid clustering) leads to relatively robust clusters but requires that the user specify the number k of clusters. Hierarchical clustering is attractive in network applications since (a) it does not require the specification of the number of clusters and (b) it works well when there are many singleton clusters and when cluster sizes vary greatly. But hierarchical clustering requires the user to determine how to cut branches of the resulting cluster tree. Toward this end, one can use the dynamicTreeCut method and R library. The dynamic hybrid method combines the advantages of hierarchical clustering and partitioning-around-medoids clustering. Network concepts are useful for defining cluster quality statistics (e.g., to measure the density or separability of clusters). To determine whether the cluster structure is preserved in another data sets, one can use cross-tabulation-based preservation statistics. To measure the agreement between two clusterings, one can use the Rand index and other cross-tabulation-based statistics.

Keywords

  • Dissimilarity Measure
  • Rand Index
  • Cluster Assignment
  • Adjusted Rand Index
  • Cluster Label

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-4419-8819-5_8
  • Chapter length: 28 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   169.00
Price excludes VAT (USA)
  • ISBN: 978-1-4419-8819-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   219.99
Price excludes VAT (USA)
Hardcover Book
USD   299.99
Price excludes VAT (USA)
Fig. 8.1
Fig. 8.2
Fig. 8.3
Fig. 8.4

References

  • Carlson M, Zhang B, Fang Z, Mischel P, Horvath S, Nelson SF (2006) Gene connectivity, function, and sequence conservation: Predictions from modular yeast co-expression networks. BMC Genomics 7(7):40

    PubMed  CrossRef  Google Scholar 

  • Dong J, Horvath S (2007) Understanding network concepts in modules. BMC Syst Biol 1(1):24

    PubMed  CrossRef  Google Scholar 

  • Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3(7):RESEARCH0036

    Google Scholar 

  • Gargalovic PS, Imura M, Zhang B, Gharavi NM, Clark MJ, Pagnon J, Yang WP, He A, Truong A, Patel S, Nelson SF, Horvath S, Berliner JA, Kirchgessner TG, Lusis AJ (2006) Identification of inflammatory gene modules based on variations of human endothelial cell responses to oxidized lipids. Proc Natl Acad Sci USA 103(34):12741–12746

    PubMed  CrossRef  CAS  Google Scholar 

  • Ghazalpour A, Doss S, Zhang B, Plaisier C, Wang S, Schadt EE, Thomas A, Drake TA, Lusis AJ, Horvath S (2006) Integrating genetics and network analysis to characterize genes related to mouse weight. PloS Genet 2(2):8

    CrossRef  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2001) The elements of statistcal learning: Data mining, inference, and prediction. Springer, New York

    Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    CrossRef  Google Scholar 

  • Kapp AV, Tibshirani R (2007) Are clusters found in one dataset present in another dataset? Biostat 8(1):9–31

    CrossRef  Google Scholar 

  • Kaufman L, Rousseeuw PJ (1990) Finding groups in data: An introduction to cluster analysis. Wiley, New York

    CrossRef  Google Scholar 

  • Langfelder P, Horvath S (2011) Fast R functions for robust correlations and hierarchical clustering. J Stat Software. In press

    Google Scholar 

  • Langfelder P, Zhang B, Horvath S (2007) Defining clusters from a hierarchical cluster tree: The Dynamic Tree Cut library for R. Bioinformatics 24(5):719–720

    PubMed  CrossRef  Google Scholar 

  • Langfelder P, Luo R, Oldham MC, Horvath S (2011) Is my network module preserved and reproducible? Plos Comput Biol 7(1):e1001057

    PubMed  CrossRef  CAS  Google Scholar 

  • Oldham MC, Langfelder P, Horvath S (2011) Sample networks for enhancing cluster analysis of genomic data: Application to huntington’s disease. Technical Report

    Google Scholar 

  • Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    CrossRef  Google Scholar 

  • Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11:33–40

    CrossRef  Google Scholar 

  • Tibshirani R, Walther G (2005) Cluster validation by prediction strength. J Comput Graph Stat 14:511–528

    CrossRef  Google Scholar 

  • Yip A, Horvath S (2007) Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinform 8(8):22

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steve Horvath .

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Horvath, S. (2011). Clustering Procedures and Module Detection. In: Weighted Network Analysis. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-8819-5_8

Download citation