Convex clustering for binary data
- 86 Downloads
We present a new clustering algorithm for multivariate binary data. The new algorithm is based on the convex relaxation of hierarchical clustering, which is achieved by considering the binomial likelihood as a natural distribution for binary data and by formulating convex clustering using a pairwise penalty on prototypes of clusters. Under convex clustering, we show that the typical \(\ell _1\) pairwise fused penalty results in ineffective cluster formation. In an attempt to promote the clustering performance and select the relevant clustering variables, we propose the penalized maximum likelihood estimation with an \(\ell _2\) fused penalty on the fusion parameters and an \(\ell _1\) penalty on the loading matrix. We provide an efficient algorithm to solve the optimization by using majorization-minimization algorithm and alternative direction method of multipliers. Numerical studies confirmed its good performance and real data analysis demonstrates the practical usefulness of the proposed method.
KeywordsBinary data Convex clustering Dimension reduction Fused penalty
Mathematics Subject Classification62H30
Hosik Choi was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2017R1D1A1B05028565). Seokho Lee was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2017R1D1A1B04030695).
- Finch H (2005) Comparison of distance measures in cluster analysis with dichotomous data. J Data Sci 3:85–100Google Scholar
- Hallac D, Leskovec J, Boyd S (2015) Network lasso: clustering and optimization in large graphs. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 387–396Google Scholar
- Hocking TD, Joullin A, Bach F, Vert J-P (2011) Cluterpath: an algorithm for clustering using convex fusion penalties. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 754–762Google Scholar
- Lichman M (2013) UCI machine learning repository [http://archive.ics.uci.edu/ml]. University of California, School of Information and Computer Science, Irvine
- Yang Y, Guan X, You J (2002) CLOPE: a fast and effective clustering algorithm for transactional data. In: SIGKDD ’02 Edmonton, Alberta, Canada, pp 682–687Google Scholar
- Zhang Z, Li T, Ding C, Zhang X (2007) Binary matrix factorization with applications. In: IEEE international conference on data mining, pp 391–400Google Scholar