Abstract
Previously, we showed that dividing 2D datasets into grid boxes could give satisfactory estimation of cluster count by detecting local maxima in data density relative to nearby grid boxes. The algorithm was robust for datasets with clusters of different sizes and distributions deviating from Gaussian distribution to a certain degree.
Given the difficulty of estimating cluster count in higher dimensional datasets by visualization, the goal was to improve the method for higher dimensions, as well as the speed of the implementation.
The improved algorithm yielded satisfactory results by looking at data density in a hypercube grid. This points towards possible approaches for addressing the curse of dimensionality. Also, a six-fold boost in average run speed of the implementation could be achieved by adopting a generalized version of quadratic binary search.
Keywords
- k-means clustering
- Unsupervised clustering
- Multidimensional analysis
This is a preview of subscription content, access via your institution.
Buying options

References
Lo, C.C.-W., et al.: Intermediate information grouping in cluster recognition. In: Schmorrow, D.D., Fidopiastis, C.M. (eds.) AC 2018. LNCS (LNAI), vol. 10915, pp. 287–298. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91470-1_24
Kumar, P.: Quadratic Search: A new and fast searching algorithm (An extension of classical Binary search strategy). https://pdfs.semanticscholar.org/3d91/97ecfcc1a16254c8667b0cbd35c93e7f9437.pdf. Accessed 1 Feb 2019
Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23.7409&rep=rep1&type=pdf. Accessed 1 Feb 2019
Pele, O.: Distance functions: Theory, algorithms and applications. https://pdfs.semanticscholar.org/c656/f090d5710a524ac26ef1b22310e772fa465c.pdf. Accessed 15 Feb 2019
Aggarwal, C.C., Yu, P.S.: The IGrid Index: Reversing the dimensionality curse for similarity indexing in high dimensional space. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=0EBCEB48BA3DE80411807AA7DF3C3A60?doi=10.1.1.129.746&rep=rep1&type=pdf. Accessed 1 Feb 2019
Clustering basic benchmark. https://cs.joensuu.fi/sipu/datasets/. Accessed 1 Feb 2019
Iris Species. https://www.kaggle.com/uciml/iris. Accessed 1 Feb 2019
Seeds dataset. https://www.kaggle.com/rwzhang/seeds-dataset. Accessed 1 Feb 2019
Acknowledgements
I must restate my deep gratitude towards Mr. Monte Hancock for recruiting me into the Sirius project and how the project turned my life around.
Inexpressible thanks go to each of my team members: Jishnu for efficient coding and good mistake spotters, Markus for meticulous eyes on overall structure and wording of the paper, Alexis for rigorous critique and proofreading, and Suraj for saving the paper when I most needed help on formatting and putting the pieces together into a complete piece. Many thanks to other workers on the Sirius team working on bits and pieces of this paper. This paper will not be complete without all of your help. Big thumbs up for Lesley the EPM (Executive/Epic Project Manager) for amazing coordination.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lo, C.Cw. et al. (2019). Facilitating Cluster Counting in Multi-dimensional Feature Space by Intermediate Information Grouping. In: Schmorrow, D., Fidopiastis, C. (eds) Augmented Cognition. HCII 2019. Lecture Notes in Computer Science(), vol 11580. Springer, Cham. https://doi.org/10.1007/978-3-030-22419-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-22419-6_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22418-9
Online ISBN: 978-3-030-22419-6
eBook Packages: Computer ScienceComputer Science (R0)