Skip to main content

Fuzzification of Agglomerative Hierarchical Crisp Clustering Algorithms

  • Conference paper
  • First Online:
Challenges at the Interface of Data Analysis, Computer Science, and Optimization

Abstract

User generated content from fora, weblogs and other social networks is a very fast growing data source in which different information extraction algorithms can provide a convenient data access. Hierarchical clustering algorithms are used to provide topics covered in this data on different levels of abstraction. During the last years, there has been some research using hierarchical fuzzy algorithms to handle comments not dealing with one topic but many different topics at once. The used variants of the well-known fuzzy c-means algorithm are nondeterministic and thus the cluster results are irreproducible. In this work, we present a deterministic algorithm that fuzzifies currently available agglomerative hierarchical crisp clustering algorithms and therefore allows arbitrary multi-assignments. It is shown how to reuse well-studied linkage metrics while the monotonic behavior is analyzed for each of them. The proposed algorithm is evaluated using collections of the RCV1 and RCV2 corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://trec.nist.gov/data/reuters/reuters.html

References

  • Bäck C, Hussain M (1996) Validity measures for fuzzy partitions. In: Bock HH, Polasek W (eds) Data analysis and information systems. Springer, Berlin, pp 114–125

    Chapter  Google Scholar 

  • Batagelj V (1981) Note on ultrametric hierarchical clustering algorithms. Psychometrika 46:351–352

    Article  MathSciNet  Google Scholar 

  • Bordogna G, Pasi G (2009) Hierarchical-hyperspherical divisive fuzzy c-means (h2d-fcm) clustering for information retrieval. In: WI-IAT ’09: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, IEEE Computer Society, pp 614–621

    Google Scholar 

  • Diday E (1987) Orders and overlapping clusters by pyramids. Tech. Rep. RR-0730, INRIA, URL http://hal.inria.fr/inria-00075822/en/

  • Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32:241–254

    Article  Google Scholar 

  • Lance GN, Williams WT (1966) A generalized sorting strategy for computer classifications. Nature 212:218–219, DOI10.1038/212218a0

    Google Scholar 

  • Lance GN, Williams WT (1967) A general theory of classificatory sorting strategies 1. hierarchical systems. Comput J 9(4):373–380

    Google Scholar 

  • Mendes MES, Sacks L (2003) Evaluating fuzzy clustering for relevance-based information access. In: IEEE International Conference on Fuzzy Systems, pp 648–653

    Google Scholar 

  • Mendes-Rodrigues MES, Sacks L (2005) A scalable hierarchical fuzzy clustering algorithm for text mining. In: The 5th International Conference on Recent Advances in Soft Computing, URL http://lesacks.googlepages.com/rasc2004.pdf

  • Milligan GW (1979) Ultrametric hierarchical clustering algorithms. Psychometrika 44:343–346

    Article  MathSciNet  MATH  Google Scholar 

  • Nuovo AGD, Catania V (2007) On external measures for validation of fuzzy partitions. In: Foundations of Fuzzy Logic and Soft Computing. Springer, Berlin, pp 491–501

    Google Scholar 

  • Torra V (2005) Fuzzy c-means for fuzzy hierachical clustering. In: FUZZ ’05: The 14th IEEE International Conference on Fuzzy Systems, pp 646–651

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mathias Bank .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bank, M., Schwenker, F. (2012). Fuzzification of Agglomerative Hierarchical Crisp Clustering Algorithms. In: Gaul, W., Geyer-Schulz, A., Schmidt-Thieme, L., Kunze, J. (eds) Challenges at the Interface of Data Analysis, Computer Science, and Optimization. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24466-7_1

Download citation

Publish with us

Policies and ethics