Advertisement

Acta Biotheoretica

, Volume 54, Issue 3, pp 219–233 | Cite as

The Isolation Principle of Clustering: Structural Characteristics and Implementation

  • Hans-Rolf Gregorius
Article

Abstract

The isolation principle rests on defining internal and external differentiation for each subset of at least two objects. Subsets with larger external than internal differentiation form isolated groups in the sense that they are internally cohesive and externally isolated. Objects that do not belong to any isolated group are termed solitary. The collection of all isolated groups and solitary objects forms a hierarchical (encaptic) structure. This ubiquitous characteristic of biological organization provides the motivation to identify universally applicable practical methods for the detection of such structure, to distinguish primary types of structure, to quantify their distinctiveness, and to simplify interpretation of structural aspects. A method implementing the isolation principle (by generating all isolated groups and solitary objects) is proven to be specified by single-linkage clustering. Basically, the absence of structure can be stated if no isolated groups exist, the condition for which is provided. Structures that allow for classifications in the sense of complete partitioning into disjoint isolated groups are characterized, and measures of distinctiveness of classification are developed. Among other primary types of structure, chaining (complete nesting) and ties (isolated groups without internal structure) are considered in more detail. Some biological examples for the interpretation of structure resulting from application of the isolation principle are outlined.

Key Words:

isolaton principle internal differentiation external differentiation encapsis hierarchical structure cluster mehod single linkage classification measure of clustering structure degree of cluster isolation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arabie, P. and L.J. Hubert (1996). An overview of combinatorial data analysis. In: P., L.J. Arabie, Hubert and G. DeSoete (eds). pp. 5–63.Google Scholar
  2. Arabie, P., L.J. Hubert and G. De Soete (eds.), (1996). Clustering and Classification. World Scientific, Singapore etc.Google Scholar
  3. Barthélemy, J.-P. and F. Brucker (2001). NP-hard approximation problems in overlapping clustering. Journal of Classification 18: 159–183.Google Scholar
  4. Estabrook, G.F. (1966). A mathematical model in graph theory for biological classification. Journal of Theoretical Biology 12: 297–310.CrossRefGoogle Scholar
  5. Gordon, A.D. (1996). Hierarchical classification. In: P. Arabie and L.J. Hubert, G. De Soete (eds.),. pp. 65–121.Google Scholar
  6. Gregorius, H.-R. (2004). The isolation approach to hierarchical clustering. Journal of Classification 21: 51–69.CrossRefGoogle Scholar
  7. Jain, A.K. and R.C. Dubes (1988). Algorithms for Clustering Data. Prentice Hall.Google Scholar
  8. Jardine, N.J. and R. Sibson (1971). Mathematical Taxonomy. John Wiley & Sons, London etc.Google Scholar
  9. Kaufman, L. and P.J. Rousseeuw (1990). Finding Groups in Data. An Introduction to Cluster analysis. John Wiley & Sons, New York etc.Google Scholar
  10. Ludwig, J.A. and J.F. Reynolds (1988). Statistical Ecology – A Primer on Methods and Computing. John Wiley & Sons, New York, etc.Google Scholar
  11. Milligan, G.W. (1996). Clustering validation: results and implications for applied analysis. In: P. Arabie, L.J. Hubert and G. De Soete (eds.),. pp. 341–375.Google Scholar
  12. Muchnik, I.B. and I.A. Rybina (1989). Definitive conditions for isolation of classes in empiric classifications. Automatic Documentation and Mathematical Linguistics 23: 97–107.Google Scholar
  13. Olman, V., D. Xu and Y. Xu (2003). CUBIC: Identification of regulatory binding sites through data clustering. Journal of Bioinformatics and Computational Biology 1: 21–40.CrossRefGoogle Scholar
  14. Prim, R.C. (1957). Shortest connection networks and some generalizations. Bell System Technical Journal 36: 1389–1401.Google Scholar
  15. Xu, Y., V. Olman and D. Xu (2002). Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics 18: 536–545.CrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, Inc. 2006

Authors and Affiliations

  • Hans-Rolf Gregorius
    • 1
  1. 1.Institut für Forstgenetik und ForstpflanzenzüchtungUniversität GöttingenGöttingenGermany

Personalised recommendations