Skip to main content

Facilitating Cluster Counting in Multi-dimensional Feature Space by Intermediate Information Grouping

  • 1483 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 11580)

Abstract

Previously, we showed that dividing 2D datasets into grid boxes could give satisfactory estimation of cluster count by detecting local maxima in data density relative to nearby grid boxes. The algorithm was robust for datasets with clusters of different sizes and distributions deviating from Gaussian distribution to a certain degree.

Given the difficulty of estimating cluster count in higher dimensional datasets by visualization, the goal was to improve the method for higher dimensions, as well as the speed of the implementation.

The improved algorithm yielded satisfactory results by looking at data density in a hypercube grid. This points towards possible approaches for addressing the curse of dimensionality. Also, a six-fold boost in average run speed of the implementation could be achieved by adopting a generalized version of quadratic binary search.

Keywords

  • k-means clustering
  • Unsupervised clustering
  • Multidimensional analysis

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-22419-6_20
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-22419-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.

References

  1. Lo, C.C.-W., et al.: Intermediate information grouping in cluster recognition. In: Schmorrow, D.D., Fidopiastis, C.M. (eds.) AC 2018. LNCS (LNAI), vol. 10915, pp. 287–298. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91470-1_24

    CrossRef  Google Scholar 

  2. Kumar, P.: Quadratic Search: A new and fast searching algorithm (An extension of classical Binary search strategy). https://pdfs.semanticscholar.org/3d91/97ecfcc1a16254c8667b0cbd35c93e7f9437.pdf. Accessed 1 Feb 2019

  3. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23.7409&rep=rep1&type=pdf. Accessed 1 Feb 2019

  4. Pele, O.: Distance functions: Theory, algorithms and applications. https://pdfs.semanticscholar.org/c656/f090d5710a524ac26ef1b22310e772fa465c.pdf. Accessed 15 Feb 2019

  5. Aggarwal, C.C., Yu, P.S.: The IGrid Index: Reversing the dimensionality curse for similarity indexing in high dimensional space. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=0EBCEB48BA3DE80411807AA7DF3C3A60?doi=10.1.1.129.746&rep=rep1&type=pdf. Accessed 1 Feb 2019

  6. Clustering basic benchmark. https://cs.joensuu.fi/sipu/datasets/. Accessed 1 Feb 2019

  7. Iris Species. https://www.kaggle.com/uciml/iris. Accessed 1 Feb 2019

  8. Seeds dataset. https://www.kaggle.com/rwzhang/seeds-dataset. Accessed 1 Feb 2019

Download references

Acknowledgements

I must restate my deep gratitude towards Mr. Monte Hancock for recruiting me into the Sirius project and how the project turned my life around.

Inexpressible thanks go to each of my team members: Jishnu for efficient coding and good mistake spotters, Markus for meticulous eyes on overall structure and wording of the paper, Alexis for rigorous critique and proofreading, and Suraj for saving the paper when I most needed help on formatting and putting the pieces together into a complete piece. Many thanks to other workers on the Sirius team working on bits and pieces of this paper. This paper will not be complete without all of your help. Big thumbs up for Lesley the EPM (Executive/Epic Project Manager) for amazing coordination.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suraj Sood .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Lo, C.Cw. et al. (2019). Facilitating Cluster Counting in Multi-dimensional Feature Space by Intermediate Information Grouping. In: Schmorrow, D., Fidopiastis, C. (eds) Augmented Cognition. HCII 2019. Lecture Notes in Computer Science(), vol 11580. Springer, Cham. https://doi.org/10.1007/978-3-030-22419-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22419-6_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22418-9

  • Online ISBN: 978-3-030-22419-6

  • eBook Packages: Computer ScienceComputer Science (R0)