Advertisement

A Novel Clustering Algorithm Based on a Non-parametric “Anti-Bayesian” Paradigm

  • Hugo Lewi HammerEmail author
  • Anis Yazidi
  • B. John Oommen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9101)

Abstract

The problem of clustering, or unsupervised classification, has been solved by a myriad of techniques, all of which depend, either directly or implicitly, on the Bayesian principle of optimal classification. To be more specific, within a Bayesian paradigm, if one is to compare the testing sample with only a single point in the feature space from each class, the optimal Bayesian strategy would be to achieve this based on the distance from the corresponding means or central points in the respective distributions. When this principle is applied in clustering, one would assign an unassigned sample into the cluster whose mean is the closest, and this can be done in either a bottom-up or a top-down manner. This paper pioneers a clustering achieved in an “Anti-Bayesian” manner, and is based on the breakthrough classification paradigm pioneered by Oommen et al. The latter relies on a radically different approach for classifying data points based on the non-central quantiles of the distributions. Surprisingly and counter-intuitively, this turns out to work equally or close-to-equally well to an optimal supervised Bayesian scheme, which thus begs the natural extension to the unexplored arena of clustering. Our algorithm can be seen as the Anti-Bayesian counter-part of the well-known \(k\)-means algorithm (The fundamental Anti-Bayesian paradigm need not just be used to the \(k\)-means principle. Rather, we hypothesize that it can be adapted to any of the scores of techniques that is indirectly based on the Bayesian paradigm.), where we assign points to clusters using quantiles rather than the clusters’ centroids. Extensive experimentation (This paper contains the prima facie results of experiments done on one and two-dimensional data. The extensions to multi-dimensional data are not included in the interest of space, and would use the corresponding multi-dimensional Anti-Naïve-Bayes classification rules given in [1].) demonstrates that our Anti-Bayesian clustering converges fast and with precision results competitive to a \(k\)-means clustering.

Keywords

Cluster Algorithm Cluster Performance Cluster Strategy Monte Carlo Error Bayesian Paradigm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Thomas, A., Oommen, B.J.: Order statistics-based parametric classification for multi-dimensional distributions. Pattern Recognition 46(12), 3472–3482 (2013)CrossRefGoogle Scholar
  2. 2.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Dats. Prentice Hall, Englewood Cliffs (1988)Google Scholar
  3. 3.
    Xu, R., Wunsch II, D.: Survey of clustering algorithms. Trans. Neur. Netw. 16(3), 645–678 (2005)CrossRefGoogle Scholar
  4. 4.
    Ankerst, M., Breunig, M.M., Peter Kriegel, H., Sander, J.: Optics: ordering points to identify the clustering structure, pp. 49–60. ACM Press (1999)Google Scholar
  5. 5.
    Ester, M., Peter Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise, pp. 226–231. AAAI Press (1996)Google Scholar
  6. 6.
    Murtagh, F., Contreras, P.: Methods of hierarchical clustering. CoRR abs/1105.0121 (2011)Google Scholar
  7. 7.
    Sibson, R.: SLINK: An optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1), 30–34 (1973)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Thomas, A., Oommen, B.J.: The fundamental theory of optimal “anti-bayesian” parametric pattern classification using order statistics criteria. Pattern Recognition 46(1), 376–388 (2013)CrossRefzbMATHGoogle Scholar
  9. 9.
    Oommen, B.J., Thomas, A.: Anti-Bayesian parametric pattern classification using order statistics criteria for some members of the exponential family. Pattern Recognition 47(1), 40–55 (2014)CrossRefGoogle Scholar
  10. 10.
    Hyndman, R.J., Fan, Y.: Sample quantiles in statistical packages. American Statistician 50, 361–365 (1996)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Hugo Lewi Hammer
    • 1
    Email author
  • Anis Yazidi
    • 1
  • B. John Oommen
    • 2
  1. 1.Department of Computer ScienceOslo and Akershus University College of Applied SciencesOsloNorway
  2. 2.School of Computer ScienceCarleton UniversityOttawaCanada

Personalised recommendations