Modified Subspace Constrained Mean Shift Algorithm

Ghassabeh, Youness Aliyari; Rudzicz, Frank

doi:10.1007/s00357-019-09353-1

Modified Subspace Constrained Mean Shift Algorithm

Published: 11 February 2020

Volume 38, pages 27–43, (2021)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Youness Aliyari Ghassabeh¹ &
Frank Rudzicz^1,2,3,4

409 Accesses
4 Citations
Explore all metrics

Abstract

A subspace constrained mean shift (SCMS) algorithm is a non-parametric iterative technique to estimate principal curves. Principal curves, as a nonlinear generalization of principal components analysis (PCA), are smooth curves (or surfaces) that pass through the middle of a data set and provide a compact low-dimensional representation of data. The SCMS algorithm combines the mean shift (MS) algorithm with a projection step to estimate principal curves and surfaces. The MS algorithm is a simple iterative method for locating modes of an unknown probability density function (pdf) obtained via a kernel density estimate. Modes of a pdf can be interpreted as zero-dimensional principal curves. These modes also can be used for clustering the input data. The SCMS algorithm generalizes the MS algorithm to estimate higher order principal curves and surfaces. Although both algorithms have been widely used in many real-world applications, their convergence for widely used kernels (e.g., Gaussian kernel) has not been sown yet. In this paper, we first introduce a modified version of the MS algorithm and then combine it with different variations of the SCMS algorithm to estimate the underlying low-dimensional principal curve, embedded in a high-dimensional space. The different variations of the SCMS algorithm are obtained via modification of the projection step in the original SCMS algorithm. We show that the modification of the MS algorithm guarantees its convergence and also implies the convergence of different variations of the SCMS algorithm. The performance and effectiveness of the proposed modified versions to successfully estimate an underlying principal curve was shown through simulations using the synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

A review of unsupervised feature selection methods

Article 29 January 2019

A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition

Article 05 June 2015

Notes

It is called a principal curve for D = 1, and becomes a mode of the pdf for D = 0
Note that for the special case of Gaussian distribution $\hat {f}\sim N(\mu ,\boldsymbol {\Sigma })$, the local inverse covariance matrix in Eq. 6 becomes equal to the inverse covariance matrix, i.e., $\hat {\boldsymbol {\Sigma }}^{-1}(\boldsymbol {x})=\boldsymbol {\Sigma }^{-1}$.
The equality ∥y_j+ 1 −y_j∥² = 0 happens only when the convergence occurs that Theorem 1 guarantees it for the modified MS algorithm.

References

Arias-Castro, E., Mason, D., Pelletier, B. (2016). Errata: on the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm. Journal of Machine Learning Research, 17, 14.
MATH Google Scholar
Banfield, J. D., & Raftery, A. E. (1992). Ice floe identification in satellite images using mathematical morphology and clustering about principal curves. Journal of the American Statistical Association, 87, 7–16.
Article Google Scholar
Biau, G., & Fischer, A. (2012). Parameter selection for principal curves. IEEE Trans. on Information Theory, 58, 1924–1939.
Article MathSciNet Google Scholar
Carreira-Perpiñán, M. A. (2007). Gaussian mean shift is an eM algorithm. IEEE Trans. on Pattern Analysis and Machine Intelligence, 29, 767–776.
Article Google Scholar
Chang, K. Y., & Gosh, A. (2001). A unified model for probabilistic principal surfaces. IEEE Trans. on Pattern Analysis and Machine Intelligence, 23, 22–41.
Article Google Scholar
Cheng, Y. (1995). Mean shift, mode seeking and clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence, 17, 790–799.
Article Google Scholar
Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24, 603–619.
Article Google Scholar
Comaniciu, D., Ramesh, V., Meer, P. (2000). Real-time tracking of non-rigid objects using mean shift. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR2000) (pp. 142–149). USA: Hilton Head Island.
Delicado, P. (2001). Another look at principal curves and surfaces. Journal of Multivariate Analysis, 77, 84–116.
Article MathSciNet Google Scholar
Fashing, M., & Tomasi, C. (2005). Mean shift is a bound optimization. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27, 471–474.
Article Google Scholar
Fukunaga, K., & Hostetler, L. D. (1975). Estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. on Inform. Theory, 21, 32–40.
Article MathSciNet Google Scholar
Ghassabeh, Y. A. (2015). Asymptotic stability of equilibrium points of mean shift algorithm. Machine Learning, 98, 359–368.
Article MathSciNet Google Scholar
Ghassabeh, Y. A. (2016). A sufficient condition for the convergence of the mean shift algorithm with Gaussian kernel. Journal of Multivariate Analysis, 135, 1–10.
Article MathSciNet Google Scholar
Ghassabeh, Y. A., Linder, T., Takahara, G. (2012a). On noisy source vector quantization via a subspace constrained mean shift algorithm. In Proceedings of the 26th Biennial Symp. on Communications (pp. 107–110). Canada: Kingston.
Ghassabeh, Y. A., Linder, T., Takahara, G. (2012b). On the convergence and applications of mean shift type algorithms. In: Proceedings of 25th IEEE Canadian Conference on Electrical & Computer Engineering (CCECE). Montreal, Canada, pp. 1-5.
Ghassabeh, Y. A., Linder, T., Takahara, G. (2013). On some convergence properties of the subspace constrained mean shift. Pattern Recognition, 46, 3140–3147.
Article Google Scholar
Ghassabeh, Y. A., & Rudzicz, F. (2018). Modified mean shift algorithm. IET Image Processing, 12, 2172–2177.
Article Google Scholar
Hastie, T., & Stuetzle, W. (1989). Principal curves. Journal of the American Statistical Association, 84, 502–516.
Article MathSciNet Google Scholar
Jolliffe, I.T. (2002). Principal component analysis, 1st edn. New York: Springer-Verlag.
MATH Google Scholar
Kegl, B., Krzyzak, A., Linder, T., Zeger, K. (2000). Learning and design of principal curves. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22, 281–297.
Article Google Scholar
Li, X., Hu, Z., Wu, F. (2007). A note on the convergence of the mean shift. Pattern Recognition, 40, 1756–1762.
Article Google Scholar
Ozertem, U., & Erdogmus, D. (2011). Locally defined principal curves and surfaces. Journal of Machine Learning Research, 12, 1249–1286.
MathSciNet MATH Google Scholar
Silverman, B.W. (1986). Density estimation for statistics and data analysis, 1st edn. New York: Chapman and Hall.
Book Google Scholar
Tao, W., Jin, H., Zhang, Y. (2007). Color image segmentation based on mean shift and normalized cuts. IEEE Trans. on Systems, Man, and Cybernetics Part B: Cybernetics, 37, 1382–1389.
Article Google Scholar
Tibshirani, R. (1992). Principal curves revisited. Statistics and Computation, 2, 183–190.
Article Google Scholar
Wand, M. P., & Jones, M. (1995). Kernel smoothing. London: Chapman and Hall.
Book Google Scholar
Wu, C. F. J. (1982). On the convergence properties of the eM algorithm. The Annals of Statistics, 11, 95103.
MathSciNet Google Scholar
Yuan, X. T., Hu, B. G., He, R. (2012). Agglomerative mean-shift clustering. IEEE Trans. on Knowledge and Data Engineering, 24, 209–219.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Toronto, Toronto, Canada
Youness Aliyari Ghassabeh & Frank Rudzicz
Li Ka Shing Knowledge Institute, St Michael’s Hospital, Toronto, Canada
Frank Rudzicz
Surgical Safety Technologies Inc, Toronto, Canada
Frank Rudzicz
Vector Institute for Artificial Intelligence, Toronto, Canada
Frank Rudzicz

Authors

Youness Aliyari Ghassabeh
View author publications
You can also search for this author in PubMed Google Scholar
Frank Rudzicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youness Aliyari Ghassabeh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof

of Theorem 1 Let $\mathcal {X}=\{\boldsymbol {x}_{1},\ldots ,\boldsymbol {x}_{n}\}$ denote the input data set. Let y_j+ 1≠y_j. To prove part (i), we show $\hat {f}_{h,k}(\boldsymbol {y}_{j+1})>\hat {f}(\boldsymbol {y}_{j})$. From Eq. 2, we have

$$ \begin{array}{@{}rcl@{}} &&\hat{f}_{h,k}(\boldsymbol{y}_{j+1})-\hat{f}_{h,k}(\boldsymbol{y}_{j}) \\ &=&\frac{c_{k,D}}{nh^{D}}\left[\sum\limits_{i=1}^{n}k\left( \|\frac{\boldsymbol{y}_{j+1}-\boldsymbol{x}_{i}}{h}\|^{2}\right) -k\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\|^{2}\right)\right] \\ &\ge& \frac{c_{k,D}}{nh^{D+2}}\sum\limits_{i=1}^{n}k^{\prime}\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}\|^{2}}{h}\right) \\ &&\left( \|\boldsymbol{y}_{j+1}-\boldsymbol{x}_{i}\|^{2}-\|\boldsymbol{y}_{j}-\boldsymbol{x}_{i}\|^{2}\right), \end{array} $$

(10)

where the last inequality comes from the convexity of the profile function k, i.e., $k(x_{2})-k(x1)\ge k^{\prime }(x_{1})(x_{2}-x_{1})$. By the triangle inequality, we have

$$ \|\boldsymbol{y}_{j+1}-\tilde{\boldsymbol{y}}_{j+1}\|\le \|\boldsymbol{y}_{j+1}-\boldsymbol{x}_{i}\|+\|\tilde{\boldsymbol{y}}_{j+1}-\boldsymbol{x}_{i}\|, i=1,2, \ldots, n, $$

(11)

where $\tilde {\boldsymbol {y}}_{j+1}$ is given in Eq. 8. Using Eqs. 10 and 11, we obtain

$$ \begin{array}{@{}rcl@{}} &&\hat{f}_{h,k}(\boldsymbol{y}_{j+1})-\hat{f}_{h,k}(\boldsymbol{y}_{j})\ge \frac{c_{k,D}}{nh^{D+2}}\sum\limits_{i=1}^{n}k^{\prime}\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\|^{2}\right) \\ &&\left( \|\boldsymbol{y}_{j+1}-\tilde{\boldsymbol{y}}_{j+1}\|^{2}-\|\tilde{\boldsymbol{y}}_{j+1}-\boldsymbol{x}_{i}\|^{2}\right.\\ && \left. -2\|\boldsymbol{y}_{j+1}-\tilde{\boldsymbol{y}}_{j+1}\|\|\tilde{\boldsymbol{y}}_{j+1}-\boldsymbol{x}_{i}\|-\|\boldsymbol{y}_{j}-\boldsymbol{x}_{i}\|^{2}\right). \end{array} $$

(12)

From Eq. 13, we have $\|\boldsymbol {y}_{j+1}-\tilde {\boldsymbol {y}}_{j+1}\|^{2}-\|\tilde {\boldsymbol {y}}_{j+1}-\boldsymbol {x}_{i}\|^{2}\le 0$ for x_i ∈{x₁,…,x_n}, and as a result we have

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{i=1}^{n}k^{\prime}\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\|^{2}\right)\\ &&\left( \|\boldsymbol{y}_{j+1}-\tilde{\boldsymbol{y}}_{j+1}\|^{2}-\|\tilde{\boldsymbol{y}}_{j+1}-\boldsymbol{x}_{i}\|^{2} \right)>0, \end{array} $$

(13)

where the above inequality is true since the profile k is a strictly decreasing function and $k^{\prime }(x)<0$. Furthermore, we have

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{i=1}^{n}k^{\prime}\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\|^{2}\right)\\ &&\left( -2\|\boldsymbol{y}_{j+1}-\tilde{\boldsymbol{y}}_{j+1}\|\|\tilde{\boldsymbol{y}}_{j+1}-\boldsymbol{x}_{i}\|-\|\boldsymbol{y}_{j}-\boldsymbol{x}_{i}\|^{2}\right)>0. \end{array} $$

(14)

Combining Eqs. 12, 13, and 14, we obtain

$$ \hat{f}_{h,k}(\boldsymbol{y}_{j+1})-\hat{f}_{h,k}(\boldsymbol{y}_{j})>0, $$

(15)

which implies the sequence $\{\hat {f}_{h,k}(\boldsymbol {y}_{j})\}_{j=1,2,\ldots }$ is an increasing sequence. From Eq. 13, it is obvious that y_j ∈{x₁,…,x_n},j = 1, 2,…, and since n is finite then $\hat {f}_{h,k}(\boldsymbol {y}_{j})$, given in Eq. 2, is bounded. Thus, as long as y_j+ 1≠y_j, the sequence $\{\hat {f}_{h,k}(\boldsymbol {y}_{j})\}_{j=1,2,\ldots }$ is a bounded and strictly increasing sequence, which two previous conditions imply the convergence of $\{\hat {f}_{h,k}(\boldsymbol {y}_{j})\}$.

To prove part (ii), first note that the modified MS algorithm starts from one of the data points, and in each iteration the cluster center estimate is assigned to be one of the data points. The algorithm stops when two consecutive estimates become equal, i.e., y_j+ 1 = y_j for some j ≥ 1. From part (a), in each iteration, each data point can be assigned to the cluster center estimate at most one time; otherwise, $\hat {f}_{h,k,}(\boldsymbol {y}_{j+k})=\hat {f}_{h,k}(\boldsymbol {y}_{j})$ for some k ≥ 1 which contradicts part (a). Since the number of data samples, n, is finite, after a finite number of iterations, the convergence for the sequence {y_j} occurs. □

Proof

of Theorem 2 Let $\mathcal {X}=\{\boldsymbol {x}_{1},\ldots ,\boldsymbol {x}_{n}\}, n\ge 2$ denote the set of observed data points. The subspace constrained mean shift sequence {y_j} is defined recursively by

$$ \boldsymbol{y}_{j+1}=\boldsymbol{V}_{j}{\boldsymbol{V}^{T}_{j}}\boldsymbol{m}(\boldsymbol{y}_{j})+\boldsymbol{y}_{j}, $$

(16)

where

$$ \boldsymbol{m}(\boldsymbol{y}_{j})= \frac{{\sum}_{i=1}^{n}\boldsymbol{x}_{i}g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2}\right)} {{\sum}_{i=1}^{n}g\left( \left\|\frac{\boldsymbol{y}_{j} -\boldsymbol{x}_{i}}{h}\right\|^{2}\right)}-\boldsymbol{y}_{j}, $$

(17)

with $\boldsymbol {y}_{j} \in \mathcal {X}$ being one of the input data points, as required by the modified MS algorithm. Here $g(x)=-k^{\prime }(x)$, where k is the profile of kernel K and V_j is the D × (D − d) matrix having orthonormal columns that are eigenvectors corresponding to the largest eigenvalues of the local inverse covariance matrix $\smash {\hat {\boldsymbol {\Sigma }}^{-1}}$, defined in Eq. 6, evaluated at y_j that is one of the input data point according to the modified MS algorithm.

Since the profile k is bounded, the sequence $\{\hat {f}(\boldsymbol {y}_{j})\}$ is bounded, so it suffices to show that the sequence is non-decreasing to prove the convergence. Since it is assumed that k is a convex function, we have k(t₂) − k(t₁) ≥ g(t₁)(t₁ − t₂) for all t₁,t₂ ≥ 0, where $g=-k^{\prime }$. This combined by the definition of $\hat {f}$ in Eq. 2 yields

$$ \begin{array}{@{}rcl@{}} \hat{f}(\boldsymbol{y}_{j+1})-\hat{f}(\boldsymbol{y}_{j})&=& \frac{c}{nh^{D}} \sum\limits_{i=1}^{n}\left( k\left( \left\|\frac{\boldsymbol{y}_{j+1}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right)- k\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right)\right.\\ &\ge & \frac{c}{nh^{D+2}}\sum\limits_{i=1}^{n} g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right)\left( \|\boldsymbol{y}_{j}-\boldsymbol{x}_{i}\|^{2}- \|\boldsymbol{y}_{j+1}-\boldsymbol{x}_{i}\|^{2}\right) \\ &=& C_{j} \sum\limits_{i=1}^{n} p_{j}(i) \left( \|\boldsymbol{y}_{j}-\boldsymbol{x}_{i}\|^{2}- \|\boldsymbol{y}_{j+1}-\boldsymbol{x}_{i}\|^{2}\right), \end{array} $$

(18)

where c is the normalization factor,

$$ p_{j}(i) = \frac{ g\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\|^{2}\right)}{{\sum}_{k=1}^{n} g\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{k}}{h}\|^{2}\right)}, \quad i=1,\ldots,n $$

and

$$ C_{j}= \frac{c}{nh^{D+2}} \sum\limits_{i=1}^{n} g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right). $$

Since by assumption k is strictly decreasing, then $g(t)=-k^{\prime }(t)>0$ for all t ≥ 0,p_j(1),…,p_j(n) are well defined, positive, and sum to 1. Therefore, the mean shift vector at (26) can be rewritten as

$$ \boldsymbol{m}(\boldsymbol{y}_{j})= \sum\limits_{i=1}^{n} p_{j}(i)(\boldsymbol{x}_{i}-\boldsymbol{y}_{j})= E[\boldsymbol{Z}_{j}], $$

where Z_j is a random vector in $\mathbb {R}^{D}$ with discrete probability distribution function given by $\Pr (\boldsymbol {Z}_{j}= \boldsymbol {x}_{i}-\boldsymbol {y}_{j})= p_{j}(i), i=1,\ldots ,n$, and $\boldsymbol {y}_{j} \in \mathcal {X}, j=1,\ldots ,$. Thus, letting $\boldsymbol {T}_{j}= \boldsymbol {V}_{j}{\boldsymbol {V}^{T}_{j}}$, the update step in the proposed modified SCMS algorithm can be rewritten as

$$ \boldsymbol{y}_{j+1}-\boldsymbol{y}_{j}= \boldsymbol{T}_{j} \boldsymbol{m}(\boldsymbol{y}_{j})= \boldsymbol{T}_{j} E[\boldsymbol{Z}_{j}]. $$

(19)

Let W_j be a D × D matrix representing any orthogonal projection onto the null space of T_j. Then, x = T_jx + W_jx for all $\boldsymbol {x}\in \mathbb {R}^{D}$, and T_jx and W_jy are orthogonal for all $\boldsymbol {x},\boldsymbol {y}\in \mathbb {R}^{D}$. We can rewrite the last sum in Eq. 18 as follows

$$ \begin{array}{@{}rcl@{}} &&{\sum\limits_{i=1}^{n} p_{j}(i) \left( \|\boldsymbol{x}_{i}-\boldsymbol{y}_{j}\|^{2}- \|\boldsymbol{x}_{i}-\boldsymbol{y}_{j+1}\|^{2}\right)}\\ &=& E\left[\|\boldsymbol{Z}_{j}\|^{2} \right] - E\left[\left\|\boldsymbol{Z}_{j}- \boldsymbol{T}_{j}E[\boldsymbol{Z}_{j}]\right\|^{2} \right]\\ &=& E\left[\|\boldsymbol{W}_{j}\boldsymbol{Z}_{j}\|^{2}+\|\boldsymbol{T}_{j}\boldsymbol{Z}_{j}\|^{2} \right] - E\left[\|\boldsymbol{W}_{j}\boldsymbol{Z}_{j}\|^{2} + \left\|\boldsymbol{T}_{j} \boldsymbol{Z}_{j}- \boldsymbol{T}_{j}E[\boldsymbol{Z}_{j}]\right\|^{2} \right]\\ &=& E\left[\|\boldsymbol{T}_{j}\boldsymbol{Z}_{j}\|^{2} \right] - E\left[\left\|\boldsymbol{T}_{j} \boldsymbol{Z}_{j}- E[\boldsymbol{T}_{j}\boldsymbol{Z}_{j}]\right\|^{2} \right]\\ &=& \left\| E[\boldsymbol{T}_{j}\boldsymbol{Z}_{j}]\right\|^{2} = \|\boldsymbol{y}_{j+1}-\boldsymbol{y}_{j}\|^{2}, \end{array} $$

where in the last equality, we applied the identity E[Z²] = Var[Z] + (E[Z])², which is valid for real random variables with finite variance, to the components of T_jZ_j. Combining this with Eq. 18, we obtain

$$ \hat{f}(\boldsymbol{y}_{j+1})-\hat{f}(\boldsymbol{y}_{j})\ge C_{j} \|\boldsymbol{y}_{j+1}-\boldsymbol{y}_{j}\|^{2} , $$

(20)

where C_j > 0 and ∥y_j+ 1 −y_j∥² ≥ 0^{Footnote 3} which imply that $\{\hat {f}(\boldsymbol {y}_{j})\}$ is non-decreasing and thus convergent, proving part (i) of the theorem.

To prove part (ii), we note that k(x) > 0 for all x ≥ 0. Therefore, (2) implies that $\hat {f}(\boldsymbol {y}_{1})>0, \boldsymbol {y}_{1} \in \mathcal {X}$, so part (i) yields $\min \limits \{\hat {f}(\boldsymbol {y}_{j}): j\ge 1\}=\hat {f}(\boldsymbol {y}_{1})>0$. But this in turn implies that {y_j} is a bounded sequence, since otherwise it would have a subsequence $\{\boldsymbol {y}_{j_{k}}\}$ such that $\lim _{k\to \infty } \|\boldsymbol {y}_{j_{k}}\|=\infty $ which, in view of $\lim _{x\to \infty } k(x)=0$, would give $\lim _{k\to \infty } \hat {f}(\boldsymbol {y}_{j_{k}})=0$, contradicting our uniform positive lower bound on the $\hat {f}(\boldsymbol {y}_{j})$.

In view of the above, there exists R > 0 such that ∥y_j −x_i∥≤ R for all j ≥ 1 and i = 1,…,n. Since $g=-k^{\prime }$ is non-increasing on $[0,\infty )$, we obtain

$$ C_{j}= \frac{c}{nh^{D+2}} \sum\limits_{k=1}^{n} g\left( \Big\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{k}}{h}\Big\|^{2}\right)\ge \frac{c}{h^{D+2}} g\left( \frac{R^{2}}{h^{2}}\right)=C, $$

where C > 0 since g(x) > 0 for all x ≥ 0. Thus, Eq. 20 implies

$$ \|\boldsymbol{y}_{j+1}-\boldsymbol{y}_{j}\|^{2} \le C^{-1} \left( \hat{f}(\boldsymbol{y}_{j+1})-\hat{f}(\boldsymbol{y}_{j})\right), $$

and since $\lim \limits _{j\to \infty } \left (\hat {f}(\boldsymbol {y}_{j+1})- \hat {f}(\boldsymbol {y}_{j+1})\right )=0$ by part (i), we obtain $\lim \limits _{j\rightarrow \infty } \|\boldsymbol {y}_{j+1}-\boldsymbol {y}_{j}\|=0$.

Finally, to show (iii), we note that by definition (2) of $\hat {f}$,

$$ \begin{array}{@{}rcl@{}} \nabla\hat{f}(\boldsymbol{y}_{j})&=& \frac{2c}{nh^{D+2}}\sum\limits_{i=1}^{n}(\boldsymbol{x}_{i}-\boldsymbol{y}_{j}) g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right)\\ &=& \frac{2c}{nh^{D+2}}\left[\sum\limits_{i=1}^{n} g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right) \right] \left[ \frac{{\sum}_{i=1}^{n}\boldsymbol{x}_{i}g\left( \|\frac{\boldsymbol{x}_{i}-\boldsymbol{y}_{j}}{h}\|^{2}\right)} {{\sum}_{i=1}^{n}g\left( \|\frac{\boldsymbol{x}_{i}-\boldsymbol{y}_{j}}{h}\|^{2}\right)}-\boldsymbol{y}_{j}\right]\\ &=& \frac{2c}{nh^{D+2}}\left[\sum\limits_{i=1}^{n} g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right) \right] \boldsymbol{m}(\boldsymbol{y}_{j}). \end{array} $$

Therefore,

$$ \|{\boldsymbol{V}^{T}_{j}} \nabla\hat{f}(\boldsymbol{y}_{j})\| = \frac{2c}{nh^{D+2}}\left[\sum\limits_{i=1}^{n} g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right) \right] \| {\boldsymbol{V}^{T}_{j}} \boldsymbol{m}(\boldsymbol{y}_{j}) \|. $$

Since V_j has orthonormal columns and $\boldsymbol {T}_{j}=\boldsymbol {V}_{j}{\boldsymbol {V}^{T}_{j}}$, we have $\|\boldsymbol {T}_{j}\boldsymbol {m}(\boldsymbol {y}_{j})\|=\|{\boldsymbol {V}^{T}_{j}}\boldsymbol {m}(\boldsymbol {y}_{j})\|$. This and Eq. 19 yield

$$ \|{\boldsymbol{V}^{T}_{j}} \nabla\hat{f}(\boldsymbol{y}_{j})\| = \frac{2c}{nh^{D+2}}\left[\sum\limits_{i=1}^{n} g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right) \right] \|\boldsymbol{y}_{j+1}-\boldsymbol{y}_{j}\| $$

so part (iii) follows from part (ii) and the fact that the conditions on k ensure that $g=-k^{\prime }$ is bounded. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghassabeh, Y.A., Rudzicz, F. Modified Subspace Constrained Mean Shift Algorithm. J Classif 38, 27–43 (2021). https://doi.org/10.1007/s00357-019-09353-1

Download citation

Published: 11 February 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s00357-019-09353-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modified Subspace Constrained Mean Shift Algorithm

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

A review of unsupervised feature selection methods

A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modified Subspace Constrained Mean Shift Algorithm

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

A review of unsupervised feature selection methods

A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix

Appendix

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation