Advertisement

Clustering for Binary Featured Datasets

  • Peter Taraba
Conference paper

Abstract

Clustering is one of the most important concepts for unsupervised learning in machine learning. While there are numerous clustering algorithms already, many, including the popular one—k-means algorithm, require the number of clusters to be specified in advance, a huge drawback. Some studies use the silhouette coefficient to determine the optimal number of clusters. In this study, we introduce a novel algorithm called Powered Outer Probabilistic Clustering, show how it works through back-propagation (starting with many clusters and ending with an optimal number of clusters) , and show that the algorithm converges to the expected (optimal) number of clusters on theoretical examples.

Keywords

Binary valued features Clustering Emails k-Means Optimal number of clusters Probabilities 

Notes

Acknowledgements

The authors would like to thank David James Brunner for many fruitful discussions on knowledge workers information overload as well as proofreading of the first draft. The authors would also like to thank anonymous reviewers for providing feedback which led to significant improvement of this chapter.

References

  1. 1.
    J. Hartigan, Clustering Algorithms (Wiley, 1975)Google Scholar
  2. 2.
    R. Xu, D. Wunsch, Survey of clustering algorithms. IEEE Trans. Neural Netw. 16 (2005)CrossRefGoogle Scholar
  3. 3.
    G. Milligan, M. Cooper, An examination of procedures for determining the number of clusters in a data set. Psychometrika 50 (1985)CrossRefGoogle Scholar
  4. 4.
    G. Frahling, C. Sohler, A fast k-means implementation using coresets. Int. J. Comput. Geom. Appl. 18 (2008)MathSciNetCrossRefGoogle Scholar
  5. 5.
    P. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20 (1987)CrossRefGoogle Scholar
  6. 6.
    P. Taraba, Powered outer probabilistic clustering, in Proceedings of the World Congress on Engineering and Computer Science 2017, 25–27 October, 2017, San Francisco, USA. Lecture Notes in Engineering and Computer Science (2017), pp. 394–398Google Scholar
  7. 7.
    P. Taraba, Popc examples [Online] (2017), https://github.com/pepe78/POPC-examples
  8. 8.
    P. Taraba, Small bang [Online] (2017), http://www.frisky.world/p/small-bang.html

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.BerkeleyUSA

Personalised recommendations