RandomLink – Avoiding Linkage-Effects by Employing Random Effects for Clustering

Sluiter, Gert; Schelling, Benjamin; Plant, Claudia

doi:10.1007/978-3-030-59003-1_15

Gert Sluiter¹³,
Benjamin Schelling^13,14,15 &
Claudia Plant^13,16

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12391))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

917 Accesses

Abstract

We present here a new parameter-free clustering algorithm that does not impose any assumptions on the data. Based solely on the premise that close data points are more likely to be in the same cluster, it can autonomously create clusters. Neither the number of clusters nor their shape has to be known. The algorithm is similar to SingleLink in that it connects clusters depending on the distances between data points, but while SingleLink is deterministic, RandomLink makes use of random effects. They help RandomLink overcome the SingleLink-effect (or chain-effect) from which SingleLink suffers as it always connects the closest data points. RandomLink is likely to connect close data points but is not forced to, thus, it can sever chains between clusters. We explain in more detail how this negates the SingleLink-effect and how the use of random effects helps overcome the stiffness of parameters for different distance-based algorithms. We show that the algorithm principle is sound by testing it on different data sets and comparing it with standard clustering algorithms, focusing especially on hierarchical clustering methods.

G. Sluiter and B. Schelling are contributed equally to the paper and share first authorship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20, 364–366 (1977)
Article MathSciNet MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)
Google Scholar
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., Boston (1989)
MATH Google Scholar
Hartuv, E., Shamir, R.: A clustering algorithm based on graph connectivity. Inf. Process. Lett. 76(4–6), 175–181 (2000)
Article MathSciNet MATH Google Scholar
Karger, D.R.: Minimum cuts in near-linear time. J. ACM 47(1), 46–76 (2000)
Article MathSciNet MATH Google Scholar
Karypis, G., Han, E.H.S., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999)
Article Google Scholar
Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)
Article Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 849–856. MIT Press (2002)
Google Scholar
Sang, Y., Yi, Z.: Motion determination using non-uniform sampling based density clustering. In: 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 4, pp. 81–85 (2008)
Google Scholar
Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16(1), 30–34 (1973)
Article MathSciNet Google Scholar
Sokal, R.R., Michener, C.D.: A statistical method for evaluating systematic relationships. Univ. Kans. Sci. Bull. 38, 1409–1438 (1958)
Google Scholar
Tarjan, R.E.: A class of algorithms which require nonlinear time to maintain disjoint sets. J. Comput. Syst. Sci. 18(2), 110–127 (1979)
Article MathSciNet MATH Google Scholar
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)
MathSciNet MATH Google Scholar
Ward, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)
Article MathSciNet Google Scholar
Yang, C., Zhang, X., Jiao, L., Wang, G.: Self-tuning semi-supervised spectral clustering. In: 2008 International Conference on Computational Intelligence and Security, pp. 1–5 (2008)
Google Scholar
Ye, W., Goebl, S., Plant, C., Böhm, C.: Fuse: full spectral clustering. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 1985–1994. ACM, New York (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, University of Vienna, Vienna, Austria
Gert Sluiter, Benjamin Schelling & Claudia Plant
MCML, Munich, Germany
Benjamin Schelling
Ludwig-Maximilians-Universität München, Munich, Germany
Benjamin Schelling
ds:UniVie, Vienna, Austria
Claudia Plant

Authors

Gert Sluiter
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Schelling
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Plant
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Schelling .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Johannes Kepler University of Linz, Linz, Austria
Josef Küng
Johannes Kepler University of Linz, Linz, Austria
Gabriele Kotsis
IFS, Vienna University of Technology, Vienna, Wien, Austria
A Min Tjoa
Johannes Kepler University of Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sluiter, G., Schelling, B., Plant, C. (2020). RandomLink – Avoiding Linkage-Effects by Employing Random Effects for Clustering. In: Hartmann, S., Küng, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2020. Lecture Notes in Computer Science(), vol 12391. Springer, Cham. https://doi.org/10.1007/978-3-030-59003-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-59003-1_15
Published: 14 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59002-4
Online ISBN: 978-3-030-59003-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics