Probably certifiably correct k-means clustering

Iguchi, Takayuki; Mixon, Dustin G.; Peterson, Jesse; Villar, Soledad

doi:10.1007/s10107-016-1097-0

Probably certifiably correct k-means clustering

Full Length Paper
Series A
Published: 21 December 2016

Volume 165, pages 605–642, (2017)
Cite this article

Mathematical Programming Submit manuscript

Takayuki Iguchi¹,
Dustin G. Mixon ORCID: orcid.org/0000-0003-2743-7010¹,
Jesse Peterson¹ &
…
Soledad Villar²

857 Accesses
19 Citations
Explore all metrics

Abstract

Recently, Bandeira (C R Math, 2015) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k-means clustering. First, we prove that Peng and Wei’s semidefinite relaxation of k-means Peng and Wei (SIAM J Optim 18(1):186–205, 2007) is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed k-means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei (SIAM J Optim 18(1):186–205, 2007) that is designed to solve k-means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

k Is the Magic Number—Inferring the Number of Clusters Through Nonparametric Concentration Inequalities

On a Robust Approach to Search for Cluster Centers

Article 01 October 2021

k-means++ under Approximation Stability

References

Abbe, E., Bandeira, A.S., Hall, G.: Exact recovery in the stochastic block model. IEEE Trans. Inf. Theory 62(1), 471–487 (2016)
Article MathSciNet MATH Google Scholar
Abbe, E., Sandon, C.: Community detection in general stochastic block models: fundamental limits and efficient algorithms for recovery. In: IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, pp. 670–688, 17–20 October 2015
Arthur, D., Vassilvitskii, S.: k-Means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete algorithms (2007)
Awasthi, P., Bandeira, A.S., Charikar, M., Krishnaswamy, R., Villar, S., Ward, R.: Relax, no need to round: integrality of clustering formulations. In: Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pp. 191–200. ACM (2015)
Bandeira, A.S.: A note on probably certifiably correct algorithms. C. R. Math. 354(3), 329–333 (2015)
Article MathSciNet Google Scholar
Chen, H., Peng, J.: 0–1 Semidefinite programming for graph-cut clustering: modelling and approximation. In: Data Mining and Mathematical Programming. CRM Proceedings and Lecture Notes of the American Mathematical Society, pp. 15–40 (2008)
Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–556. ACM (2004)
Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)
Article Google Scholar
Elhamifar, E., Sapiro, G., Vidal, R.: Finding exemplars from pairwise dissimilarities via simultaneous sparse recovery. In: Advances in Neural Information Processing Systems, pp. 19–27 (2012)
Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore (2012)
MATH Google Scholar
Grant, M., Boyd, S., Ye, Y.: Graph implementations for nonsmooth convex programs. In: Blondel, V., Boyd, S., Kimura,H., (eds.) Recent Advances in Learning and Control. Lecture Notes in Control and Information Sciences. Springer, London, pp. 95–110 (2008)
Grant, M., Boyd, S.: CVX: matlab software for disciplined convex programming, version 2.1 (2014). http://cvxr.com/cvx
Iguchi, T., Mixon, D.G., Peterson, J., Villar, S.: On the tightness of an SDP relaxation of k-means. arXiv preprint arXiv:1505.04778 (2015)
Jain, K., Mahdian, M., Saberi, A.: A new greedy approach for facility location problems. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing (2002)
Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28, 1302–1338 (2000)
Article MathSciNet MATH Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet MATH Google Scholar
Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Sig. Process. 41(12), 3397–3415 (1993)
Article MATH Google Scholar
Mixon, D.G.: Cone programming cheat sheet. Short, Fat Matrices (weblog) (2015)
Nellore, A., Ward, R.: Recovery guarantees for exemplar-based clustering. Inf. Comput. 245, 165–180 (2015)
Article MathSciNet MATH Google Scholar
Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming, vol. 13. SIAM, Philadelphia (1994). doi:10.1137/1.9781611970791
Ostrovsky, R., Rabani, Y., Schulman, L., Swamy, C.: The effectiveness of lloyd-type methods for the k-means problem. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (2006)
Peng, J., Wei, Y.: Approximating k-means-type clustering via semidefinite programming. SIAM J. Optim. 18(1), 186–205 (2007)
Article MathSciNet MATH Google Scholar
Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12(4), 389–434 (2012)
Article MathSciNet MATH Google Scholar
Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. arXiv:1011.3027v7 (2011)
Vinayak, R.K., Hassibi, B.: Similarity clustering in the presence of outliers: Exact recovery via convex program. In: IEEE International Symposium on Information Theory, ISIT 2016, Barcelona, Spain, pp. 91–95, 10–15 July 2016
Wang, H., Song, M.: Ckmeans.1d.dp: optimal k-means clustering in one dimension by dynamic programming. R J. 3(2), 29–33 (2011)
Google Scholar

Download references

Acknowledgements

The authors thank the anonymous referees, whose suggestions significantly improved this paper’s presentation and literature review. The authors also thank Afonso S. Bandeira and Nicolas Boumal for interesting discussions and valuable comments on an earlier version of this manuscript, and Xiaodong Li and Yang Li for interesting comments on our dual certificate. DGM was supported by an AFOSR Young Investigator Research Program award, NSF Grant No. DMS-1321779, and AFOSR Grant No. F4FGA05076J002. SV was supported by Rachel Ward’s NSF CAREER award and AFOSR Young Investigator Research Program award. The views expressed in this article are those of the authors and do not reflect the official policy or position of the United States Air Force, Department of Defense, or the U.S. Government.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Air Force Institute of Technology, Wright-Patterson AFB, OH, USA
Takayuki Iguchi, Dustin G. Mixon & Jesse Peterson
Department of Mathematics, University of Texas at Austin, Austin, TX, USA
Soledad Villar

Authors

Takayuki Iguchi
View author publications
You can also search for this author in PubMed Google Scholar
Dustin G. Mixon
View author publications
You can also search for this author in PubMed Google Scholar
Jesse Peterson
View author publications
You can also search for this author in PubMed Google Scholar
Soledad Villar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dustin G. Mixon.

Appendices

Appendix 1: Proof of Proposition 5

Proof

(a) $\Leftrightarrow $ (b): By complementary slackness, (a) is equivalent to having both

$$\begin{aligned} \langle A^* y-c,X\rangle =0 \end{aligned}$$

(21)

and

$$\begin{aligned} \langle y,b-A(X)\rangle =0. \end{aligned}$$

(22)

Since $Q\succeq 0$, we have

$$\begin{aligned} \langle A^* y-c,X\rangle =\langle Q,X\rangle =\bigg \langle Q,\sum _{t=1}^k\frac{1}{n_t}1_t1_t^\top \bigg \rangle =\sum _{t=1}^k\frac{1}{n_t}1_t^\top Q1_t \ge 0, \end{aligned}$$

with equality if and only if $Q1_a=0$ for every $a\in \{1,\ldots ,k\}$. Next, we recall that $y=z\oplus \alpha \oplus (-\beta )$, $b-A(X)\in L=0\oplus 0\oplus \mathbb {R}_{\ge 0}^{N(N+1)/2}$, and $b=k\oplus 1\oplus 0$. As such, (22) is equivalent to $\beta $ having disjoint support with $\{\langle X,\frac{1}{2}(e_ie_j^\top +e_je_i^\top )\rangle \}_{i,j=1,i\le j}^N$, i.e., $\beta ^{(a,a)}=0$ for every cluster a.

(b) $\Rightarrow $ (c): Take any solution to the dual SDP (8), and note that

$$\begin{aligned} Q^{(a,a)}&=zI+\bigg (\sum _{t=1}^k\sum _{i\in t}\alpha _{t,i}\cdot \frac{1}{2}(e_{t,i}1^\top +1e_{t,i}^\top )\bigg )^{(a,a)}-\beta ^{(a,a)}+D^{(a,a)}\\&=zI+\sum _{i\in a}\alpha _{a,i}\cdot \frac{1}{2}(e_i1^\top +1e_i^\top )+D^{(a,a)}, \end{aligned}$$

where the 1 vectors in the second line are $n_a$-dimensional (instead of N-dimensional, as in the first line), and similarly for $e_i$ (instead of $e_{t,i}$). We now consider each entry of $Q^{(a,a)}1$, which is zero by assumption:

$$\begin{aligned} 0 \nonumber&=e_r^\top Q^{(a,a)}1\\ \nonumber&=e_r^\top \bigg (zI+\sum _{i\in a}\alpha _{a,i}\cdot \frac{1}{2}(e_i1^\top +1e_i^\top )+D^{(a,a)}\bigg )1\\ \nonumber&=z+\sum _{i\in a}\alpha _{a,i}\cdot \frac{1}{2}(e_r^\top e_i1^\top 1+e_r^\top 1e_i^\top 1)+e_r^\top D^{(a,a)}1\\&=z+\sum _{i\in a}\alpha _{a,i}\cdot \frac{1}{2}(n_a\delta _{ir}+1)+e_r^\top D^{(a,a)}1. \end{aligned}$$

(23)

As one might expect, these $n_a$ linear equations determine the variables $\{\alpha _{a,i}\}_{i\in a}$. To solve this system, we first observe

$$\begin{aligned} 0&=1^\top Q^{(a,a)}1\\&=1^\top \bigg (zI+\sum _{i\in a}\alpha _{a,i}\cdot \frac{1}{2}(e_i1^\top +1e_i^\top )+D^{(a,a)}\bigg )1\\&=n_az+\sum _{i\in a}\alpha _{a,i}\cdot \frac{1}{2}(1^\top e_i1^\top 1+1^\top 1e_i^\top 1)+1^\top D^{(a,a)}1\\&=n_az+n_a\sum _{i\in a} \alpha _{a,i}+1^\top D^{(a,a)}1, \end{aligned}$$

and so rearranging gives

$$\begin{aligned} \sum _{i\in a} \alpha _{a,i} =-z-\frac{1}{n_a}1^\top D^{(a,a)}1. \end{aligned}$$

We use this identity to continue (23):

$$\begin{aligned} 0&=z+\sum _{i\in a}\alpha _{a,i}\cdot \frac{1}{2}(n_a\delta _{ir}+1)+e_r^\top D^{(a,a)}1\\&=z+\frac{n_a}{2}\alpha _{a,r}+\frac{1}{2}\sum _{i\in a}\alpha _{a,i}+e_r^\top D^{(a,a)}1\\&=z+\frac{n_a}{2}\alpha _{a,r}+\frac{1}{2}\bigg (-z-\frac{1}{n_a}1^\top D^{(a,a)}1\bigg )+e_r^\top D^{(a,a)}1, \end{aligned}$$

and rearranging yields the desired formula for $\alpha _{a,r}$.

(c) $\Rightarrow $ (a): Take any solution to the dual SDP (8). Then by assumption, the dual objective at this point is given by

$$\begin{aligned} kz+\sum _{t=1}^k\sum _{i\in t}\alpha _{t,i}&=kz+\sum _{t=1}^k\sum _{i\in t}\bigg (-\frac{1}{n_t}z+\frac{1}{n_t^2}1^\top D^{(t,t)}1-\frac{2}{n_t}e_i^\top D^{(t,t)}1\bigg )\\&=-\sum _{t=1}^k\frac{1}{n_t}1^\top D^{(t,t)}1\\&=-{\text {Tr}}(DX), \end{aligned}$$

i.e., the primal objective (3) evaluated at X. Since X is feasible in the primal SDP, we conclude that X is optimal by strong duality. $\square $

Appendix 2: Proof of Corollary 8

It suffices to have

$$\begin{aligned} \Vert P_{\Lambda ^\perp }MP_{\Lambda ^\perp }\Vert +\Vert P_{\Lambda ^\perp }BP_{\Lambda ^\perp }\Vert \le z. \end{aligned}$$

(24)

We will bound the terms in (24) separately and then combine the bounds to derive a sufficient condition for Theorem 7. To bound the first term in (24), let $\nu $ be the $N\times 1$ vector whose (a, i)th entry is $\Vert x_{a,i}\Vert _2^2$, and let $\Phi $ be the $m\times N$ matrix whose (a, i)th column is $x_{a,i}$. Then

$$\begin{aligned} D_{(a,i),(b,j)}= & {} \Vert x_{a,i}-x_{b,j}\Vert _2^2 =\Vert x_{a,i}\Vert _2^2-2x_{a,i}^\top x_{b,j}+\Vert x_{b,j}\Vert _2^2\\= & {} (\nu 1^\top -2\Phi ^\top \Phi +1\nu ^\top )_{(a,i),(b,j)}, \end{aligned}$$

meaning $D=\nu 1^\top -2\Phi ^\top \Phi +1\nu ^\top $. With this, we appeal to the blockwise definition of M (10):

$$\begin{aligned} \Vert P_{\Lambda ^\perp }MP_{\Lambda ^\perp }\Vert =\Vert P_{\Lambda ^\perp }DP_{\Lambda ^\perp }\Vert&=\Vert P_{\Lambda ^\perp }(\nu 1^\top -2\Phi ^\top \Phi +1\nu ^\top )P_{\Lambda ^\perp }\Vert \\&=2\Vert P_{\Lambda ^\perp }\Phi ^\top \Phi P_{\Lambda ^\perp }\Vert =2\Vert \Phi P_{\Lambda ^\perp }\Vert ^2 =2\Vert \Psi \Vert ^2. \end{aligned}$$

For the second term in (24), we first write the decomposition

$$\begin{aligned} B=\sum _{a=1}^k\sum _{b=a+1}^k\Big (H_{(a,b)}(B^{(a,b)})+H_{(b,a)}(B^{(b,a)})\Big ), \end{aligned}$$

where $H_{(a,b)}:\mathbb {R}^{n_a\times n_b}\rightarrow \mathbb {R}^{N\times N}$ produces a matrix whose (a, b)th block is the input matrix, and is otherwise zero. Then

$$\begin{aligned} P_{\Lambda ^\perp }BP_{\Lambda ^\perp }&=\sum _{a=1}^k\sum _{b=a+1}^kP_{\Lambda ^\perp }\Big (H_{(a,b)}(B^{(a,b)})+H_{(b,a)}(B^{(b,a)})\Big )P_{\Lambda ^\perp }\\&=\sum _{a=1}^k\sum _{b=a+1}^k\Big (H_{(a,b)}(P_{1^\perp }B^{(a,b)}P_{1^\perp })+H_{(b,a)}(P_{1^\perp }B^{(b,a)}P_{1^\perp })\Big ), \end{aligned}$$

and so the triangle inequality gives

$$\begin{aligned} \Vert P_{\Lambda ^\perp }BP_{\Lambda ^\perp }\Vert&\le \sum _{a=1}^k\sum _{b=a+1}^k\Vert H_{(a,b)}(P_{1^\perp }B^{(a,b)}P_{1^\perp })+H_{(b,a)}(P_{1^\perp }B^{(b,a)}P_{1^\perp })\Vert \\&=\sum _{a=1}^k\sum _{b=a+1}^k\Vert P_{1^\perp }B^{(a,b)}P_{1^\perp }\Vert , \end{aligned}$$

where the last equality can be verified by considering the spectrum of the square:

$$\begin{aligned}&\Big (H_{(a,b)}(P_{1^\perp }B^{(a,b)}P_{1^\perp })+H_{(b,a)}(P_{1^\perp }B^{(b,a)}P_{1^\perp })\Big )^2\\&\quad =H_{(a,a)}\Big ((P_{1^\perp }B^{(a,b)}P_{1^\perp })(P_{1^\perp }B^{(a,b)}P_{1^\perp })^\top \Big )\\&\qquad +H_{(b,b)}\Big ((P_{1^\perp }B^{(a,b)}P_{1^\perp })^\top (P_{1^\perp }B^{(a,b)}P_{1^\perp })\Big ). \end{aligned}$$

At this point, we use the definition of B (13) to get

$$\begin{aligned} \Vert P_{1^\perp }B^{(a,b)}P_{1^\perp }\Vert =\frac{\Vert P_{1^\perp }u_{(a,b)}\Vert _2\Vert P_{1^\perp }u_{(b,a)}\Vert _2}{\rho _{(a,b)}}. \end{aligned}$$

Recalling the definition of $u_{(a,b)}$ (13) and combining these estimates then produces the result.

Appendix 3: Proof Theorem 9

In this section, we apply the certificate from Corollary 8 to the $(\mathcal {D},\gamma ,n)$-stochastic ball model (see Definition 2) to prove our main result. We will prove Theorem 9 with the help of several lemmas.

Lemma 16

Denote

$$\begin{aligned} c_a:=\frac{1}{n}\sum _{i=1}^nx_{a,i}, \qquad \Delta _{ab}:=\Vert \gamma _a-\gamma _b\Vert _2, \qquad O_{ab}:=\frac{\gamma _a+\gamma _b}{2}. \end{aligned}$$

Then the $(\mathcal {D},\gamma ,n)$-stochastic ball model satisfies the following estimates:

$$\begin{aligned}&\Vert c_a-\gamma _a\Vert _2<\epsilon \qquad \hbox {w.p.}\qquad 1-e^{-\Omega _{m,\epsilon }(n)} \end{aligned}$$

(25)

$$\begin{aligned}&\bigg |\frac{1}{n}\sum _{i=1}^n\Vert r_{a,i}\Vert _2^2-\mathbb {E}\Vert r\Vert _2^2\bigg |<\epsilon \qquad \hbox {w.p.}\qquad 1-e^{-\Omega _\epsilon (n)} \end{aligned}$$

(26)

$$\begin{aligned}&\bigg |\frac{1}{n}\sum _{i=1}^n\Vert x_{a,i}-O_{ab}\Vert _2^2-\mathbb {E}\Vert r+\gamma _a-O_{ab}\Vert _2^2\bigg |<\epsilon \qquad \hbox {w.p.}\qquad 1-e^{-\Omega _{\Delta _{ab},\epsilon }(n)} \end{aligned}$$

(27)

Proof

Since $\mathbb {E}r=0$ and $\Vert r\Vert _2^2\le 1$ almost surely, one may lift

$$\begin{aligned} X_{a,i}:=\left[ \begin{array}{cc}0&{}\quad r_{a,i}^\top \\ r_{a,i}&{}\quad 0\end{array}\right] \end{aligned}$$

and apply the Matrix Hoeffding inequality [23] to conclude that

$$\begin{aligned} {\text {Pr}}\bigg (\bigg \Vert \sum _{i=1}^nr_{a,i}\bigg \Vert _2\ge t\bigg )\le me^{-t^2/8n}. \end{aligned}$$

Taking $t:=\epsilon n$ then gives (25). For (26) and (27), notice that the random variables in each sum are iid and confined to an interval almost surely, and so the result follows from Hoeffding’s inequality. $\square $

Lemma 17

Under the $(\mathcal {D},\gamma ,n)$-stochastic ball model, we have $D^{(a,b)}1-D^{(a,a)}1=4np+q$, where

$$\begin{aligned} p_i&:=r_{a,i}^\top (\gamma _a-O_{ab})+\frac{\Delta _{ab}^2}{4}\\ q_i&:=2n(x_{a,i}-O_{ab})^\top \bigg ((c_a-c_b)-(\gamma _a-\gamma _b)\bigg )\\&\quad +\left( \sum _{j=1}^n\Vert x_{b,j}-O_{ab}\Vert _2^2-\sum _{j=1}^n\Vert x_{a,j}-O_{ab}\Vert _2^2\right) \end{aligned}$$

and $|q_i|\le (6+2\Delta _{ab})n\epsilon $ with probability $1-e^{-\Omega _{m,\Delta _{ab},\epsilon }(n)}$.

Proof

Add and subtract $O_{ab}$ and then expand the squares to get

$$\begin{aligned}&e_i^\top (D^{(a,b)}1-D^{(a,a)}1) =\sum _{j=1}^n\Vert x_{a,i}-x_{b,j}\Vert _2^2-\sum _{j=1}^n\Vert x_{a,i}-x_{a,j}\Vert _2^2\\&\quad =n\left( -2(x_{a,i}-O_{ab})^\top (c_b-O_{ab})+\frac{1}{n}\sum _{j=1}^n\Vert x_{b,j}-O_{ab}\Vert _2^2\right) \\&\qquad -n\left( -2(x_{a,i}-O_{ab})^\top (c_a-O_{ab})+\frac{1}{n}\sum _{j=1}^n\Vert x_{a,j}-O_{ab}\Vert _2^2\right) \\&\quad =2n(x_{a,i}-O_{ab})^\top (c_a-c_b)+\left( \sum _{j=1}^n\Vert x_{b,j}-O_{ab}\Vert _2^2-\sum _{j=1}^n\Vert x_{a,j}-O_{ab}\Vert _2^2\right) . \end{aligned}$$

Add and subtract $\gamma _a-\gamma _b$ to $c_a-c_b$ and distribute over the resulting sum to obtain

$$\begin{aligned} e_i^\top (D^{(a,b)}1-D^{(a,a)}1)&=2n(x_{a,i}-O_{ab})^\top (\gamma _a-\gamma _b)+q\\&=4n\Big (r_{a,i}+(\gamma _a-O_{ab})\Big )^\top (\gamma _a-O_{ab})+q. \end{aligned}$$

Distributing and identifying $\Vert \gamma _a-O_{ab}\Vert _2^2=\Delta _{ab}^2/4$ explains the definition of p. To show $|q_i|\le (6+2\Delta _{ab})n\epsilon $, apply triangle and Cauchy–Schwarz to obtain

$$\begin{aligned} |q_i|&\le \bigg |2n(x_{a,i}-O_{ab})^\top \bigg ((c_a-c_b)-(\gamma _a-\gamma _b)\bigg )\bigg |\\&\quad +\left| \sum _{j=1}^n\Vert x_{b,j}-O_{ab}\Vert _2^2-\sum _{j=1}^n\Vert x_{a,j}-O_{ab}\Vert _2^2\right| \\&\le 2n \left( \Vert r_{a,i}\Vert _2+\Vert \gamma _a-O_{a,b}\Vert _2\right) \left( \Vert c_a-\gamma _a\Vert _2+\Vert c_b-\gamma _b\Vert _2\right) \\&\quad +\left| \sum _{j=1}^n\Vert x_{b,j}-O_{ab}\Vert _2^2-\sum _{j=1}^n\Vert x_{a,j}-O_{ab}\Vert _2^2\right| \\&\le 2n\left( 1+\frac{\Delta _{ab}}{2}\right) \left( \Vert c_a-\gamma _a\Vert _2+\Vert c_b-\gamma _b\Vert _2\right) \\&\quad +\left| \sum _{j=1}^n\Vert x_{b,j}-O_{ab}\Vert _2^2-\sum _{j=1}^n\Vert x_{a,j}-O_{ab}\Vert _2^2\right| . \end{aligned}$$

To finish the argument, apply (25) to the first term while adding and subtracting

$$\begin{aligned} \mathbb {E}\Vert r+\gamma _a-O_{ab}\Vert _2^2=\mathbb {E}\Vert r+\gamma _b-O_{ab}\Vert _2^2, \end{aligned}$$

from the second and apply (27). $\square $

Lemma 18

Under the $(\mathcal {D},\gamma ,n)$-stochastic ball model, we have

$$\begin{aligned} \bigg |\frac{1}{n}1^\top D^{(a,a)}1-2n\mathbb {E}\Vert r\Vert _2^2\bigg | \le 4n\epsilon \qquad \hbox {w.p.} \qquad 1-e^{-\Omega _{\Delta _{ab},\epsilon }(n)}. \end{aligned}$$

Proof

Add and subtract $\gamma _a$ and expand the square to get

$$\begin{aligned} \frac{1}{n}e_i^\top D^{(a,a)}1 =\frac{1}{n}\sum _{j=1}^n\Vert x_{a,i}-x_{a,j}\Vert _2^2 =\Vert r_{a,i}\Vert _2^2-2r_{a,i}^\top (c_a-\gamma _a)+\frac{1}{n}\sum _{j=1}^n\Vert r_{a,j}\Vert _2^2. \end{aligned}$$

The triangle and Cauchy–Schwarz inequalities then give

$$\begin{aligned}&\bigg |\frac{1}{n}1^\top D^{(a,a)}1-2n\mathbb {E}\Vert r\Vert _2^2\bigg |\\&\qquad =\bigg |\sum _{i=1}^n\bigg (\Vert r_{a,i}\Vert _2^2-2r_{a,i}^\top (c_a-\gamma _a)+\frac{1}{n}\sum _{j=1}^n\Vert r_{a,j}\Vert _2^2\bigg )-2n\mathbb {E}\Vert r\Vert _2^2\bigg |\\&\qquad \le n\bigg |\frac{1}{n}\sum _{i=1}^n\Vert r_{a,i}\Vert _2^2-\mathbb {E}\Vert r\Vert _2^2\bigg |+2\sum _{i=1}^n|r_{a,i}^\top (c_a-\gamma _a)|+n\bigg |\frac{1}{n}\sum _{j=1}^n\Vert r_{a,j}\Vert _2^2-\mathbb {E}\Vert r\Vert _2^2\bigg |\\&\qquad \le n\bigg |\frac{1}{n}\sum _{i=1}^n\Vert r_{a,i}\Vert _2^2-\mathbb {E}\Vert r\Vert _2^2\bigg |+2\sum _{i=1}^n\Vert c_a-\gamma _a\Vert _2+n\bigg |\frac{1}{n}\sum _{j=1}^n\Vert r_{a,j}\Vert _2^2-\mathbb {E}\Vert r\Vert _2^2\bigg |\\&\qquad \le 4n\epsilon , \end{aligned}$$

where the last step occurs with probability $1-e^{-\Omega _{\Delta _{ab},\epsilon }(n)}$ by a union bound over (26) and (25). $\square $

Lemma 19

Under the $(\mathcal {D},\gamma ,n)$-stochastic ball model, we have

$$\begin{aligned} 1^\top D^{(a,b)}1-1^\top D^{(a,a)}1 \ge n^2\Delta _{ab}^2-(6+4\Delta _{ab})n^2\epsilon \qquad \hbox {w.p.} \qquad 1-e^{-\Omega _{m,\Delta _{ab},\epsilon }(n)}. \end{aligned}$$

Proof

Lemma 17 gives

$$\begin{aligned} 1^\top D^{(a,b)}1-1^\top D^{(a,a)}1&=1^\top (4np+q)\\&\ge 4n\sum _{i=1}^n\bigg (r_{a,i}^\top (\gamma _a-O_{ab})+\frac{\Delta _{ab}^2}{4}\bigg )-(6+2\Delta _{ab})n^2\epsilon \\&\ge 4n\bigg (n(c_a-\gamma _a)^\top (\gamma _a-O_{ab})+\frac{n\Delta _{ab}^2}{4}\bigg )-(6+2\Delta _{ab})n^2\epsilon . \end{aligned}$$

Cauchy–Schwarz along with (25) then gives the result. $\square $

Lemma 20

Under the $(\mathcal {D},\gamma ,n)$-stochastic ball model, there exists $C=C(\gamma )$ such that

$$\begin{aligned} \mathop {\mathop {\min }\limits _{a,b\in \{1,\ldots ,k\}}}\limits _{a\ne b}\min (M^{(a,b)}1) \ge n\Delta (\Delta -2)+Cn\epsilon \qquad \hbox {w.p.} \qquad 1-e^{-\Omega _{m,\gamma ,\epsilon }(n)}, \end{aligned}$$

where $\displaystyle {\Delta :=\mathop {\mathop {\min }\limits _{a,b\in \{1,\ldots ,k\}}}\limits _{a\ne b}\Delta _{ab}}$.

Proof

Fix a and b. Then by Lemma 17, the following holds with probability $1-e^{-\Omega _{m,\Delta _{ab},\epsilon }(n)}$:

$$\begin{aligned} \min \Big (D^{(a,b)}1-D^{(a,a)}1\Big )&\ge 4n\min _{i\in \{1,\ldots ,n\}}\bigg (r_{a,i}^\top (\gamma _a-O_{ab})+\frac{\Delta _{ab}^2}{4}\bigg )-(6+2\Delta _{ab})n\epsilon \\&\ge n\Delta _{ab}^2-2n\Delta _{ab}-(6+2\Delta _{ab})n\epsilon , \end{aligned}$$

where the last step is by Cauchy–Schwarz. Taking a union bound with Lemma 18 then gives

$$\begin{aligned}&\min (M^{(a,b)}1)\\&\quad =\min \Big (D^{(a,b)}1-D^{(a,a)}1\Big )+\frac{1}{2}\bigg (\frac{1}{n}1^\top D^{(a,a)}1-\frac{1}{n}1^\top D^{(b,b)}1\bigg )\\&\quad \ge \min \Big (D^{(a,b)}1-D^{(a,a)}1\Big )\\&\qquad -\frac{1}{2}\bigg (\bigg |\frac{1}{n}1^\top D^{(a,a)}1-2n\mathbb {E}\Vert r\Vert _2^2\bigg |+\bigg |\frac{1}{n}1^\top D^{(b,b)}1-2n\mathbb {E}\Vert r\Vert _2^2\bigg |\bigg )\\&\quad \ge n\Delta _{ab}(\Delta _{ab}-2)-(10+2\Delta _{ab})n\epsilon \end{aligned}$$

with probability $1-e^{-\Omega _{\Delta _{ab},\epsilon }(n)}$. The result then follows from a union bound over a and b. $\square $

Lemma 21

Suppose $\epsilon \le 1$. Then there exists $C=C(\Delta _{ab},m)$ such that under the $(\mathcal {D},\gamma ,n)$-stochastic ball model, we have

$$\begin{aligned} \Vert P_{1^\perp }M^{(a,b)}1\Vert _2^2 \le \frac{4n^3\Delta _{ab}^2}{m}+Cn^3\epsilon \end{aligned}$$

with probability $1-e^{-\Omega _{m,\Delta _{ab},\epsilon }(n)}$.

Proof

First, a quick calculation reveals

$$\begin{aligned}&e_i^\top M^{(a,b)}1 =e_i^\top D^{(a,b)}1-e_i^\top D^{(a,a)}1+\frac{1}{2}\bigg (\frac{1}{n}1^\top D^{(a,a)}1-\frac{1}{n}1^\top D^{(b,b)}1\bigg ),\\&\frac{1}{n}1^\top M^{(a,b)}1 =\frac{1}{n}1^\top D^{(a,b)}1-\frac{1}{2}\bigg (\frac{1}{n}1^\top D^{(a,a)}1+\frac{1}{n}1^\top D^{(b,b)}1\bigg ), \end{aligned}$$

from which it follows that

$$\begin{aligned} e_i^\top P_{1^\perp }M^{(a,b)}1&=e_i^\top M^{(a,b)}1-\frac{1}{n}1^\top M^{(a,b)}1\\&=\bigg (e_i^\top D^{(a,b)}1-\frac{1}{n}1^\top D^{(a,b)}1\bigg )-\bigg (e_i^\top D^{(a,a)}1-\frac{1}{n}1^\top D^{(a,a)}1\bigg )\\&=e_i^\top P_{1^\perp }(D^{(a,b)}1-D^{(a,a)}1). \end{aligned}$$

As such, we have

$$\begin{aligned} \nonumber \Vert P_{1^\perp }M^{(a,b)}1\Vert _2^2&=\Vert P_{1^\perp }(D^{(a,b)}1-D^{(a,a)}1)\Vert _2^2\\&=\Vert D^{(a,b)}1-D^{(a,a)}1\Vert _2^2-\Vert P_1(D^{(a,b)}1-D^{(a,a)}1)\Vert _2^2. \end{aligned}$$

(28)

To bound the first term, we apply the triangle inequality over Lemma 17:

$$\begin{aligned} \Vert D^{(a,b)}1-D^{(a,a)}1\Vert _2 \le 4n\Vert p\Vert _2+\Vert q\Vert _2 \le 4n\Vert p\Vert _2+(6+2\Delta _{ab})n^{3/2}\epsilon . \end{aligned}$$

(29)

We proceed by bounding $\Vert p\Vert _2$. To this end, note that the $p_i$’s are iid random variables whose outcomes lie in a finite interval (of width determined by $\Delta _{ab}$) with probability 1. As such, Hoeffding’s inequality gives

$$\begin{aligned} \bigg |\frac{1}{n}\sum _{i=1}^n p_i^2-\mathbb {E}p_1^2\bigg | \le \epsilon \qquad \hbox {w.p.} \qquad 1-e^{-\Omega _{\Delta _{ab},\epsilon }(n)}. \end{aligned}$$

With this, we then have

$$\begin{aligned} \Vert p\Vert _2^2 =n\bigg (\frac{1}{n}\sum _{i=1}^np_i^2-\mathbb {E}p_1^2+\mathbb {E}p_1^2\bigg ) \le n\mathbb {E}p_1^2+n\epsilon \end{aligned}$$

(30)

in the same event. To determine $\mathbb {E}p_1^2$, first take $r_1:=e_1^\top r$. Then since the distribution of r is rotation invariant, we may write

$$\begin{aligned} p_1 =r_{a,1}^\top (\gamma _a-O_{ab})+\Vert \gamma _a-O_{ab}\Vert _2^2 =\frac{\Delta _{ab}}{2}r_1+\frac{\Delta _{ab}^2}{4}, \end{aligned}$$

where the second equality above is equality in distribution. We then have

$$\begin{aligned} \mathbb {E}p_1^2 =\mathbb {E}\bigg (\frac{\Delta _{ab}}{2}r_1+\frac{\Delta _{ab}^2}{4}\bigg )^2 =\frac{\Delta _{ab}^2}{4}\mathbb {E}r_1^2+\frac{\Delta _{ab}^4}{16}. \end{aligned}$$

(31)

We also note that $1\ge \mathbb {E}\Vert r\Vert _2^2=m\mathbb {E}r_1^2$ by linearity of expectation, and so

$$\begin{aligned} \mathbb {E}r_1^2\le \frac{1}{m}. \end{aligned}$$

(32)

Combining (29), (30), (31) and (32) then gives

$$\begin{aligned} \Vert D^{(a,b)}1-D^{(a,a)}1\Vert _2 \le \bigg (\frac{4n^3\Delta _{ab}^2}{m}+n^3\Delta _{ab}^4+16n^3\epsilon \bigg )^{1/2}+(6+2\Delta _{ab})n^{3/2}\epsilon . \end{aligned}$$

(33)

To bound the second term of (28), first note that

$$\begin{aligned} \Vert P_1(D^{(a,b)}1-D^{(a,a)}1)\Vert _2 =\frac{1}{\sqrt{n}}\Big |1^\top D^{(a,b)}1-1^\top D^{(a,a)}1\Big |. \end{aligned}$$

(34)

Lemma 19 then gives

$$\begin{aligned} \Big |1^\top D^{(a,b)}1-1^\top D^{(a,a)}1\Big | \ge 1^\top D^{(a,b)}1-1^\top D^{(a,a)}1 \ge n^2\Delta _{ab}^2-(6+4\Delta _{ab})n^2\epsilon \end{aligned}$$

(35)

with probability $1-e^{-\Omega _{m,\Delta _{ab},\epsilon }(n)}$. Using (28) to combine (33) with (34) and (35) then gives the result. $\square $

Lemma 22

There exists $C=C(\gamma )$ such that under the $(\mathcal {D},\gamma ,n)$-stochastic ball model, we have

$$\begin{aligned} \rho _{(a,b)} \ge n^2\big (\Delta _{ab}^2-\Delta (\Delta -2)\big )-Cn^2\epsilon \qquad \hbox {w.p.} \qquad 1-e^{-\Omega _{\mathcal {D},\gamma ,\epsilon }(n)}. \end{aligned}$$

Proof

Recall from (13) that

$$\begin{aligned} \rho _{(a,b)} =u_{(a,b)}^\top 1 =1^\top M^{(a,b)}1-nz =1^\top M^{(a,b)}1-n\mathop {\mathop {\min }\limits _{a,b\in \{1,\ldots ,k\}}}\limits _{a\ne b}\min (M^{(a,b)}1). \end{aligned}$$

(36)

To bound the first term, we leverage Lemma 19:

$$\begin{aligned} 1^\top M^{(a,b)}1&=1^\top D^{(a,b)}1-\frac{1}{2}(1^\top D^{(a,a)}1+1^\top D^{(b,b)}1)\\&=\frac{1}{2}\Big (1^\top D^{(a,b)}1-1^\top D^{(a,a)}1\Big )+\frac{1}{2}\Big (1^\top D^{(b,a)}1-1^\top D^{(b,b)}1\Big )\\&\ge n^2\Delta _{ab}^2-(6+4\Delta _{ab})n^2\epsilon \end{aligned}$$

with probability $1-e^{-\Omega _{m,\Delta _{ab},\epsilon }(n)}$. To bound the second term in (36), note from Lemma 18 that

$$\begin{aligned}&\min (M^{(a,b)}1)\\&\quad =\min \Big (D^{(a,b)}1-D^{(a,a)}1\Big )+\frac{1}{2}\bigg (\frac{1}{n}1^\top D^{(a,a)}1-\frac{1}{n}1^\top D^{(b,b)}1\bigg )\\&\quad \le \min \Big (D^{(a,b)}1-D^{(a,a)}1\Big )\\&\qquad +\frac{1}{2}\bigg (\bigg |\frac{1}{n}1^\top D^{(a,a)}1-2n\mathbb {E}\Vert r\Vert _2^2\bigg |+\bigg |\frac{1}{n}1^\top D^{(b,b)}1-2n\mathbb {E}\Vert r\Vert _2^2\bigg |\bigg )\\&\quad \le \min \Big (D^{(a,b)}1-D^{(a,a)}1\Big )+4n\epsilon \end{aligned}$$

with probability $1-e^{-\Omega _{\Delta _{ab},\epsilon }(n)}$. Next, Lemma 17 gives

$$\begin{aligned} \min \Big (D^{(a,b)}1-D^{(a,a)}1\Big ) \le n\Delta _{ab}^2+(6+2\Delta _{ab})n\epsilon +4n\min _{i\in \{1,\ldots ,n\}}r_{a,i}^\top (\gamma _a-O_{ab}). \end{aligned}$$

By assumption, we know $\Vert r\Vert _2\ge 1-\epsilon $ with positive probability regardless of $\epsilon >0$. It then follows that

$$\begin{aligned} r^\top (\gamma _a-O_{ab}) \le -\frac{\Delta _{ab}}{2}+\epsilon \end{aligned}$$

with some ($\epsilon $-dependent) positive probability. As such, we may conclude that

$$\begin{aligned} \min _{i\in \{1,\ldots ,n\}}r_{a,i}^\top (\gamma _a-O_{ab}) \le -\frac{\Delta _{ab}}{2}+\epsilon \qquad \hbox {w.p.} \qquad 1-e^{-\Omega _{\mathcal {D},\epsilon }(n)}. \end{aligned}$$

Combining these estimates then gives

$$\begin{aligned} \min (M^{(a,b)}1) \le n\Delta _{ab}^2-2n\Delta _{ab}+(10+2\Delta _{ab})n\epsilon \qquad \hbox {w.p.} \qquad 1-e^{-\Omega _{\mathcal {D},\Delta _{ab},\epsilon }(n)}. \end{aligned}$$

Performing a union bound over a and b then gives

$$\begin{aligned} \mathop {\mathop {\min }\limits _{a,b\in \{1,\ldots ,k\}}}\limits _{a\ne b}\min (M^{(a,b)}1) \le n\Delta ^2-2n\Delta +(10+2\Delta )n\epsilon \qquad \hbox {w.p.} \qquad 1-e^{-\Omega _{\mathcal {D},\gamma ,\epsilon }(n)}. \end{aligned}$$

Combining these estimates then gives the result. $\square $

Lemma 23

Under the $(\mathcal {D},\gamma ,n)$-stochastic ball model, we have

$$\begin{aligned} \Vert \Psi \Vert \le \bigg (\frac{(1+\epsilon )\sigma }{\sqrt{m}}+\epsilon \bigg )\sqrt{N} \qquad \hbox {w.p.} \qquad 1-e^{-\Omega _{m,k,\sigma ,\epsilon }(n)}, \end{aligned}$$

where $\sigma ^2:=\mathbb {E}\Vert r\Vert _2^2$ for $r\sim \mathcal {D}$.

Proof

Let R denote the matrix whose (a, i)th column is $r_{a,i}$. Then

$$\begin{aligned} \Psi =R-\Big [(c_1-\gamma _1)1^\top ~\cdots ~(c_k-\gamma _k)1^\top \Big ], \end{aligned}$$

and so the triangle inequality gives

$$\begin{aligned} \Vert \Psi \Vert \le \Vert R\Vert +\Big \Vert \Big [(c_1-\gamma _1)1^\top ~\cdots ~(c_k-\gamma _k)1^\top \Big ]\Big \Vert \le \Vert R\Vert +\bigg (n\sum _{a=1}^k\Vert c_a-\gamma _a\Vert _2^2\bigg )^{1/2}, \end{aligned}$$

where the last estimate passes to the Frobenius norm. For the first term, since $\mathcal {D}$ is rotation invariant, we may apply Theorem 5.41 in [24]:

$$\begin{aligned} \Vert R\Vert \le (1+\epsilon )\sigma \sqrt{\frac{N}{m}} \qquad \hbox {w.p.} \qquad 1-e^{-\Omega _{m,\sigma ,\epsilon }(n)}. \end{aligned}$$

For the second term, apply (25). The union bound then gives the result. $\square $

Proof of Theorem 9

First, we combine Lemmas 21, 22 and 23: For every $\delta >0$, there exists an $\epsilon >0$ such that

$$\begin{aligned} \nonumber&2\Vert \Psi \Vert ^2+\sum _{a=1}^k\sum _{b=a+1}^k\frac{\Vert P_{1^\perp }M^{(a,b)}1\Vert _2\Vert P_{1^\perp }M^{(b,a)}1\Vert _2}{\rho _{(a,b)}}\\ \nonumber&\qquad \qquad \le 2\bigg (\frac{1+\epsilon }{\sqrt{m}}+\epsilon \bigg )^2nk+\sum _{a=1}^k\sum _{b=a+1}^k\frac{4n^3\Delta _{ab}^2/m+Cn^3\epsilon }{n^2(\Delta _{ab}^2-\Delta (\Delta -2))-Cn^2\epsilon }\\&\qquad \qquad \le n\bigg (\frac{2k}{m}+\frac{4}{m}\sum _{a=1}^k\sum _{b=a+1}^k\frac{\Delta _{ab}^2}{\Delta _{ab}^2-\Delta (\Delta -2)}+\delta \bigg ) \end{aligned}$$

(37)

with probability $1-e^{-\Omega _{\mathcal {D},\gamma ,\epsilon }(n)}$. Next, the uniform bound $\Delta _{ab}\ge \Delta $ implies

$$\begin{aligned} \frac{\Delta _{ab}^2}{\Delta _{ab}^2-\Delta (\Delta -2)} =\frac{1}{1-\Delta (\Delta -2)/\Delta _{ab}^2} \le \frac{1}{1-\Delta (\Delta -2)/\Delta ^2} =\frac{\Delta }{2}. \end{aligned}$$

Combining this with (37) and considering Lemma 20, it then suffices to have

$$\begin{aligned} \frac{2k}{m}+\frac{4}{m}\cdot \left( {\begin{array}{c}k\\ 2\end{array}}\right) \cdot \frac{\Delta }{2} <\Delta (\Delta -2). \end{aligned}$$

Rearranging then gives

$$\begin{aligned} \Delta >2+\frac{2k}{m\Delta }+\frac{k(k-1)}{m}, \end{aligned}$$

which is implied by the hypothesis since $\Delta \ge 2$. $\square $

Appendix 4: Proof of Theorem 14

Put $g=\gamma /\Vert \gamma \Vert _2$ and let z have unit 2-norm. Since $\Vert \Phi _0^\top z\Vert _2\ge \Vert \Phi _0^\top g\Vert _2$, then considering Lemma 15, it suffices to show that the containment

$$\begin{aligned} S_1 :=\bigg \{v\in \mathbb {S}^{m-1}:|\langle g^\top v\rangle |\le \frac{2}{\Delta }\bigg \} \subseteq \bigg \{v\in \mathbb {S}^{m-1}:\Vert \Phi _0^\top v\Vert _2<\Vert \Phi _0^\top g\Vert _2\bigg \} =:S_2 \end{aligned}$$

holds with probability $1-e^{-\Omega _{m,\Delta }(N)}$. To this end, we will first show that each $v\in S_1$ is also a member of $S_2$ with high probability, and then we will perform a union bound over an $\epsilon $-net of $S_1$.

We start by considering $\Vert \Phi ^\top v\Vert _2$ and $\Vert \Phi ^\top g\Vert _2$. Decompose $x_i$ as either $\gamma +r_i$ or $-\gamma +r_i$ depending on whether $x_i$ belongs to the ball centered at $\gamma $ or $-\gamma $. Let w with $\Vert w\Vert _2=1$ be arbitrary. Then

$$\begin{aligned} (x_i^\top w)^2= & {} ((\pm \gamma +r_i)^\top w)^2 =(\pm \gamma ^\top w+r_i^\top w)^2 \\= & {} (\gamma ^\top w)^2\pm 2(\gamma ^\top w)(r_i^\top w)+(r_i^\top w)^2, \end{aligned}$$

and so $\mathbb {E}(x_i^\top w)^2=(\gamma ^\top w)^2+\mathbb {E}(e_1^\top r)^2$. Linearity of expectation then gives

$$\begin{aligned} \mathbb {E}\big [(x_i^\top g)^2-(x_i^\top v)^2\big ] =(\gamma ^\top g)^2-(\gamma ^\top v)^2 =\Vert \gamma \Vert ^2(1-(g^\top v)^2) \ge 1-\frac{4}{\Delta ^2}. \end{aligned}$$

Since $|(x_i^\top g)^2-(x_i^\top v)^2|\le 2(1+\Delta /2)^2$ almost surely, we may apply Hoeffding’s inequality to get

$$\begin{aligned} \Vert \Phi ^\top g\Vert _2^2-\Vert \Phi ^\top v\Vert _2^2= & {} \sum _{i=1}^N \Big ((x_i^\top g)^2-(x_i^\top v)^2\Big )\nonumber \\\ge & {} N\bigg (1-\frac{4}{\Delta ^2}\bigg )-s \quad \text {w.p.} \quad 1-e^{-\Omega _{\Delta }(s^2/N)}. \end{aligned}$$

(38)

For a properly chosen t, rearranging gives that $\Vert \Phi ^\top v\Vert _2<\Vert \Phi ^\top g\Vert _2$. Instead, we will use (38) to prove the closely related inequality $\Vert \Phi _0^\top v\Vert _2<\Vert \Phi _0^\top g\Vert _2$. Letting $\mu $ denote the centroid of the columns of $\Phi $, we know by (25) that $\Vert \mu \Vert _2\le \delta $ with probability $1-e^{-\Omega _{m,\delta }(N)}$. In this event, every w with $\Vert w\Vert _2=1$ satisfies

$$\begin{aligned} \big |\Vert \Phi _0^\top w\Vert _2-\Vert \Phi ^\top w\Vert _2\big | \nonumber&=\big |\Vert (\Phi +\mu 1^\top )^\top w\Vert _2-\Vert \Phi ^\top w\Vert _2\big |\\&=\big |\Vert \Phi ^\top w+1\mu ^\top w\Vert _2-\Vert \Phi ^\top w\Vert _2\big | \le \Vert 1\mu ^\top w\Vert _2 \le \sqrt{N}\delta . \end{aligned}$$

(39)

Furthermore,

$$\begin{aligned} \Vert \Phi _0^\top w\Vert _2 =\Vert (\Phi -\mu 1^\top )^\top w\Vert _2 \le \Vert \Phi w\Vert _2+\Vert 1\mu ^\top w\Vert _2 \le \sqrt{N}\bigg (\frac{\Delta }{2}+1+\Vert \mu \Vert _2\bigg ), \end{aligned}$$

where the last inequality follows from Cauchy–Schwarz along with the fact that $\Vert x_i\Vert _2\le \Delta /2+1$ for every i. Taking a supremum over w then gives

$$\begin{aligned} \Vert \Phi _0^\top \Vert _{2\rightarrow 2} \le \sqrt{N}\bigg (\frac{\Delta }{2}+1+\Vert \mu \Vert _2\bigg ) \le \sqrt{N}\bigg (\frac{\Delta }{2}+1+\delta \bigg ) \quad \text {w.p.} \quad 1-e^{-\Omega _{m,\delta }(N)}. \end{aligned}$$

(40)

In (38), pick $s=(N/2)(1-4/\Delta ^2)=:c_1(\Delta )N$. Then taking a union bound with (39) gives

$$\begin{aligned} \big (\Vert \Phi _0^\top v\Vert _2-\sqrt{N}\delta \big )^2 \le \Vert \Phi ^\top v\Vert _2^2 \le \Vert \Phi ^\top g\Vert _2^2c_1(\Delta ) N \le \big (\Vert \Phi _0^\top g\Vert _2+\sqrt{N}\delta \big )^2-c_1(\Delta ) N \end{aligned}$$

with probability $1-e^{-\Omega _{m,\Delta ,\delta }(N)}$. Expanding both sides and rearranging then gives

$$\begin{aligned} \Vert \Phi _0^\top v\Vert _2^2&\le \Vert \Phi _0^\top g\Vert _2^2+2\sqrt{N}\delta \big (\Vert \Phi _0^\top v\Vert _2+\Vert \Phi _0^\top g\Vert _2\big )-c_1(\Delta ) N\\&\le \Vert \Phi _0^\top g\Vert _2^2-\underbrace{\bigg (c_1(\Delta )-4\delta \bigg (\frac{\Delta }{2}+1+\delta \bigg )\bigg )}_{c_2(\Delta )} N, \end{aligned}$$

where the last step follows from (40). Thus, picking $\delta =\delta (\Delta )$ sufficiently small ensures $c_2(\Delta )>0$. Since $c_2(\Delta )N\le \Vert \Phi _0^\top g\Vert _2^2-\Vert \Phi _0^\top v\Vert _2^2=(\Vert \Phi _0^\top g\Vert _2+\Vert \Phi _0^\top v\Vert _2)(\Vert \Phi _0^\top g\Vert _2-\Vert \Phi _0^\top v\Vert _2)$, we further have

$$\begin{aligned} \Vert \Phi _0^\top g\Vert _2-\Vert \Phi _0^\top v\Vert _2 \ge \frac{c_2(\Delta )N}{\Vert \Phi _0^\top g\Vert _2+\Vert \Phi _0^\top v\Vert _2} \ge c_3(\Delta )\sqrt{N}, \end{aligned}$$

where the last inequality takes $c_3(\Delta ):=c_2(\Delta )/(\Delta /2+1+\delta )$, following (40).

At this point, we know that if $v\in S_1$, then $v\in S_2$ with probability $1-e^{-\Omega _{m,\Delta }(N)}$. It remains to perform a union bound over an $\epsilon $-net of $S_1$ to conclude that $S_1\subseteq S_2$ with high probability. To this end, pick $\epsilon <c_3(\Delta )/(\Delta /2+1+\delta )$, consider an $\epsilon $-net $\mathcal {N}_\epsilon $ of $S_1$, and suppose

$$\begin{aligned} \Vert \Phi _0^\top v\Vert _2\le \Vert \Phi _0^\top g\Vert _2-c_3(\Delta )\sqrt{N} \qquad \forall v\in \mathcal {N}_\epsilon . \end{aligned}$$

(41)

Then for every $x\in S_1$, there exists $v\in \mathcal {N}_\epsilon $ such that $\Vert x-v\Vert _2\le \epsilon $, and so (40) gives

$$\begin{aligned} \Vert \Phi _0^\top x\Vert _2\le & {} \Vert \Phi _0^\top \Vert \Vert x-v\Vert _2+\Vert \Phi _0^\top v\Vert _2\\\le & {} \sqrt{N}\bigg (\frac{\Delta }{2}+1+\delta \bigg )\epsilon +\Vert \Phi _0^\top g\Vert _2-c_3(\Delta )\sqrt{N} <\Vert \Phi _0^\top g\Vert _2, \end{aligned}$$

as desired. To measure the probability of the success event (41), a standard volume comparison argument establishes the existence of an $\epsilon $-net of size $|\mathcal {N}_\epsilon |\le (1+2/\epsilon )^m$; see Lemma 5.2 in [24]. As such, the union bound gives that (41) occurs with probability $1-e^{-\Omega _{m,\Delta }(N)}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iguchi, T., Mixon, D.G., Peterson, J. et al. Probably certifiably correct k-means clustering. Math. Program. 165, 605–642 (2017). https://doi.org/10.1007/s10107-016-1097-0

Download citation

Received: 23 April 2016
Accepted: 23 November 2016
Published: 21 December 2016
Issue Date: October 2017
DOI: https://doi.org/10.1007/s10107-016-1097-0

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probably certifiably correct k-means clustering

Abstract

Access this article

Similar content being viewed by others

k Is the Magic Number—Inferring the Number of Clusters Through Nonparametric Concentration Inequalities

On a Robust Approach to Search for Cluster Centers

k-means++ under Approximation Stability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Proposition 5

Proof

Appendix 2: Proof of Corollary 8

Appendix 3: Proof Theorem 9

Lemma 16

Proof

Lemma 17

Proof

Lemma 18

Proof

Lemma 19

Proof

Lemma 20

Proof

Lemma 21

Proof

Lemma 22

Proof

Lemma 23

Proof

Proof of Theorem 9

Appendix 4: Proof of Theorem 14

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

Probably certifiably correct k-means clustering

Abstract

Access this article

Similar content being viewed by others

k Is the Magic Number—Inferring the Number of Clusters Through Nonparametric Concentration Inequalities

On a Robust Approach to Search for Cluster Centers

k-means++ under Approximation Stability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Proposition 5

Proof

Appendix 2: Proof of Corollary 8

Appendix 3: Proof Theorem 9

Lemma 16

Proof

Lemma 17

Proof

Lemma 18

Proof

Lemma 19

Proof

Lemma 20

Proof

Lemma 21

Proof

Lemma 22

Proof

Lemma 23

Proof

Proof of Theorem 9

Appendix 4: Proof of Theorem 14

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation