Skip to main content
Log in

Modified Subspace Constrained Mean Shift Algorithm

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

A subspace constrained mean shift (SCMS) algorithm is a non-parametric iterative technique to estimate principal curves. Principal curves, as a nonlinear generalization of principal components analysis (PCA), are smooth curves (or surfaces) that pass through the middle of a data set and provide a compact low-dimensional representation of data. The SCMS algorithm combines the mean shift (MS) algorithm with a projection step to estimate principal curves and surfaces. The MS algorithm is a simple iterative method for locating modes of an unknown probability density function (pdf) obtained via a kernel density estimate. Modes of a pdf can be interpreted as zero-dimensional principal curves. These modes also can be used for clustering the input data. The SCMS algorithm generalizes the MS algorithm to estimate higher order principal curves and surfaces. Although both algorithms have been widely used in many real-world applications, their convergence for widely used kernels (e.g., Gaussian kernel) has not been sown yet. In this paper, we first introduce a modified version of the MS algorithm and then combine it with different variations of the SCMS algorithm to estimate the underlying low-dimensional principal curve, embedded in a high-dimensional space. The different variations of the SCMS algorithm are obtained via modification of the projection step in the original SCMS algorithm. We show that the modification of the MS algorithm guarantees its convergence and also implies the convergence of different variations of the SCMS algorithm. The performance and effectiveness of the proposed modified versions to successfully estimate an underlying principal curve was shown through simulations using the synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. It is called a principal curve for D = 1, and becomes a mode of the pdf for D = 0

  2. Note that for the special case of Gaussian distribution \(\hat {f}\sim N(\mu ,\boldsymbol {\Sigma })\), the local inverse covariance matrix in Eq. 6 becomes equal to the inverse covariance matrix, i.e., \(\hat {\boldsymbol {\Sigma }}^{-1}(\boldsymbol {x})=\boldsymbol {\Sigma }^{-1}\).

  3. The equality ∥yj+ 1yj2 = 0 happens only when the convergence occurs that Theorem 1 guarantees it for the modified MS algorithm.

References

  • Arias-Castro, E., Mason, D., Pelletier, B. (2016). Errata: on the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm. Journal of Machine Learning Research, 17, 14.

    MATH  Google Scholar 

  • Banfield, J. D., & Raftery, A. E. (1992). Ice floe identification in satellite images using mathematical morphology and clustering about principal curves. Journal of the American Statistical Association, 87, 7–16.

    Article  Google Scholar 

  • Biau, G., & Fischer, A. (2012). Parameter selection for principal curves. IEEE Trans. on Information Theory, 58, 1924–1939.

    Article  MathSciNet  Google Scholar 

  • Carreira-Perpiñán, M. A. (2007). Gaussian mean shift is an eM algorithm. IEEE Trans. on Pattern Analysis and Machine Intelligence, 29, 767–776.

    Article  Google Scholar 

  • Chang, K. Y., & Gosh, A. (2001). A unified model for probabilistic principal surfaces. IEEE Trans. on Pattern Analysis and Machine Intelligence, 23, 22–41.

    Article  Google Scholar 

  • Cheng, Y. (1995). Mean shift, mode seeking and clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence, 17, 790–799.

    Article  Google Scholar 

  • Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24, 603–619.

    Article  Google Scholar 

  • Comaniciu, D., Ramesh, V., Meer, P. (2000). Real-time tracking of non-rigid objects using mean shift. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR2000) (pp. 142–149). USA: Hilton Head Island.

  • Delicado, P. (2001). Another look at principal curves and surfaces. Journal of Multivariate Analysis, 77, 84–116.

    Article  MathSciNet  Google Scholar 

  • Fashing, M., & Tomasi, C. (2005). Mean shift is a bound optimization. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27, 471–474.

    Article  Google Scholar 

  • Fukunaga, K., & Hostetler, L. D. (1975). Estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. on Inform. Theory, 21, 32–40.

    Article  MathSciNet  Google Scholar 

  • Ghassabeh, Y. A. (2015). Asymptotic stability of equilibrium points of mean shift algorithm. Machine Learning, 98, 359–368.

    Article  MathSciNet  Google Scholar 

  • Ghassabeh, Y. A. (2016). A sufficient condition for the convergence of the mean shift algorithm with Gaussian kernel. Journal of Multivariate Analysis, 135, 1–10.

    Article  MathSciNet  Google Scholar 

  • Ghassabeh, Y. A., Linder, T., Takahara, G. (2012a). On noisy source vector quantization via a subspace constrained mean shift algorithm. In Proceedings of the 26th Biennial Symp. on Communications (pp. 107–110). Canada: Kingston.

  • Ghassabeh, Y. A., Linder, T., Takahara, G. (2012b). On the convergence and applications of mean shift type algorithms. In: Proceedings of 25th IEEE Canadian Conference on Electrical & Computer Engineering (CCECE). Montreal, Canada, pp. 1-5.

  • Ghassabeh, Y. A., Linder, T., Takahara, G. (2013). On some convergence properties of the subspace constrained mean shift. Pattern Recognition, 46, 3140–3147.

    Article  Google Scholar 

  • Ghassabeh, Y. A., & Rudzicz, F. (2018). Modified mean shift algorithm. IET Image Processing, 12, 2172–2177.

    Article  Google Scholar 

  • Hastie, T., & Stuetzle, W. (1989). Principal curves. Journal of the American Statistical Association, 84, 502–516.

    Article  MathSciNet  Google Scholar 

  • Jolliffe, I.T. (2002). Principal component analysis, 1st edn. New York: Springer-Verlag.

    MATH  Google Scholar 

  • Kegl, B., Krzyzak, A., Linder, T., Zeger, K. (2000). Learning and design of principal curves. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22, 281–297.

    Article  Google Scholar 

  • Li, X., Hu, Z., Wu, F. (2007). A note on the convergence of the mean shift. Pattern Recognition, 40, 1756–1762.

    Article  Google Scholar 

  • Ozertem, U., & Erdogmus, D. (2011). Locally defined principal curves and surfaces. Journal of Machine Learning Research, 12, 1249–1286.

    MathSciNet  MATH  Google Scholar 

  • Silverman, B.W. (1986). Density estimation for statistics and data analysis, 1st edn. New York: Chapman and Hall.

    Book  Google Scholar 

  • Tao, W., Jin, H., Zhang, Y. (2007). Color image segmentation based on mean shift and normalized cuts. IEEE Trans. on Systems, Man, and Cybernetics Part B: Cybernetics, 37, 1382–1389.

    Article  Google Scholar 

  • Tibshirani, R. (1992). Principal curves revisited. Statistics and Computation, 2, 183–190.

    Article  Google Scholar 

  • Wand, M. P., & Jones, M. (1995). Kernel smoothing. London: Chapman and Hall.

    Book  Google Scholar 

  • Wu, C. F. J. (1982). On the convergence properties of the eM algorithm. The Annals of Statistics, 11, 95103.

    MathSciNet  Google Scholar 

  • Yuan, X. T., Hu, B. G., He, R. (2012). Agglomerative mean-shift clustering. IEEE Trans. on Knowledge and Data Engineering, 24, 209–219.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youness Aliyari Ghassabeh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof

of Theorem 1 Let \(\mathcal {X}=\{\boldsymbol {x}_{1},\ldots ,\boldsymbol {x}_{n}\}\) denote the input data set. Let yj+ 1yj. To prove part (i), we show \(\hat {f}_{h,k}(\boldsymbol {y}_{j+1})>\hat {f}(\boldsymbol {y}_{j})\). From Eq. 2, we have

$$ \begin{array}{@{}rcl@{}} &&\hat{f}_{h,k}(\boldsymbol{y}_{j+1})-\hat{f}_{h,k}(\boldsymbol{y}_{j}) \\ &=&\frac{c_{k,D}}{nh^{D}}\left[\sum\limits_{i=1}^{n}k\left( \|\frac{\boldsymbol{y}_{j+1}-\boldsymbol{x}_{i}}{h}\|^{2}\right) -k\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\|^{2}\right)\right] \\ &\ge& \frac{c_{k,D}}{nh^{D+2}}\sum\limits_{i=1}^{n}k^{\prime}\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}\|^{2}}{h}\right) \\ &&\left( \|\boldsymbol{y}_{j+1}-\boldsymbol{x}_{i}\|^{2}-\|\boldsymbol{y}_{j}-\boldsymbol{x}_{i}\|^{2}\right), \end{array} $$
(10)

where the last inequality comes from the convexity of the profile function k, i.e., \(k(x_{2})-k(x1)\ge k^{\prime }(x_{1})(x_{2}-x_{1})\). By the triangle inequality, we have

$$ \|\boldsymbol{y}_{j+1}-\tilde{\boldsymbol{y}}_{j+1}\|\le \|\boldsymbol{y}_{j+1}-\boldsymbol{x}_{i}\|+\|\tilde{\boldsymbol{y}}_{j+1}-\boldsymbol{x}_{i}\|, i=1,2, \ldots, n, $$
(11)

where \(\tilde {\boldsymbol {y}}_{j+1}\) is given in Eq. 8. Using Eqs. 10 and 11, we obtain

$$ \begin{array}{@{}rcl@{}} &&\hat{f}_{h,k}(\boldsymbol{y}_{j+1})-\hat{f}_{h,k}(\boldsymbol{y}_{j})\ge \frac{c_{k,D}}{nh^{D+2}}\sum\limits_{i=1}^{n}k^{\prime}\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\|^{2}\right) \\ &&\left( \|\boldsymbol{y}_{j+1}-\tilde{\boldsymbol{y}}_{j+1}\|^{2}-\|\tilde{\boldsymbol{y}}_{j+1}-\boldsymbol{x}_{i}\|^{2}\right.\\ && \left. -2\|\boldsymbol{y}_{j+1}-\tilde{\boldsymbol{y}}_{j+1}\|\|\tilde{\boldsymbol{y}}_{j+1}-\boldsymbol{x}_{i}\|-\|\boldsymbol{y}_{j}-\boldsymbol{x}_{i}\|^{2}\right). \end{array} $$
(12)

From Eq. 13, we have \(\|\boldsymbol {y}_{j+1}-\tilde {\boldsymbol {y}}_{j+1}\|^{2}-\|\tilde {\boldsymbol {y}}_{j+1}-\boldsymbol {x}_{i}\|^{2}\le 0\) for xi ∈{x1,…,xn}, and as a result we have

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{i=1}^{n}k^{\prime}\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\|^{2}\right)\\ &&\left( \|\boldsymbol{y}_{j+1}-\tilde{\boldsymbol{y}}_{j+1}\|^{2}-\|\tilde{\boldsymbol{y}}_{j+1}-\boldsymbol{x}_{i}\|^{2} \right)>0, \end{array} $$
(13)

where the above inequality is true since the profile k is a strictly decreasing function and \(k^{\prime }(x)<0\). Furthermore, we have

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{i=1}^{n}k^{\prime}\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\|^{2}\right)\\ &&\left( -2\|\boldsymbol{y}_{j+1}-\tilde{\boldsymbol{y}}_{j+1}\|\|\tilde{\boldsymbol{y}}_{j+1}-\boldsymbol{x}_{i}\|-\|\boldsymbol{y}_{j}-\boldsymbol{x}_{i}\|^{2}\right)>0. \end{array} $$
(14)

Combining Eqs. 1213, and 14, we obtain

$$ \hat{f}_{h,k}(\boldsymbol{y}_{j+1})-\hat{f}_{h,k}(\boldsymbol{y}_{j})>0, $$
(15)

which implies the sequence \(\{\hat {f}_{h,k}(\boldsymbol {y}_{j})\}_{j=1,2,\ldots }\) is an increasing sequence. From Eq. 13, it is obvious that yj ∈{x1,…,xn},j = 1, 2,…, and since n is finite then \(\hat {f}_{h,k}(\boldsymbol {y}_{j})\), given in Eq. 2, is bounded. Thus, as long as yj+ 1yj, the sequence \(\{\hat {f}_{h,k}(\boldsymbol {y}_{j})\}_{j=1,2,\ldots }\) is a bounded and strictly increasing sequence, which two previous conditions imply the convergence of \(\{\hat {f}_{h,k}(\boldsymbol {y}_{j})\}\).

To prove part (ii), first note that the modified MS algorithm starts from one of the data points, and in each iteration the cluster center estimate is assigned to be one of the data points. The algorithm stops when two consecutive estimates become equal, i.e., yj+ 1 = yj for some j ≥ 1. From part (a), in each iteration, each data point can be assigned to the cluster center estimate at most one time; otherwise, \(\hat {f}_{h,k,}(\boldsymbol {y}_{j+k})=\hat {f}_{h,k}(\boldsymbol {y}_{j})\) for some k ≥ 1 which contradicts part (a). Since the number of data samples, n, is finite, after a finite number of iterations, the convergence for the sequence {yj} occurs. □

Proof

of Theorem 2 Let \(\mathcal {X}=\{\boldsymbol {x}_{1},\ldots ,\boldsymbol {x}_{n}\}, n\ge 2\) denote the set of observed data points. The subspace constrained mean shift sequence {yj} is defined recursively by

$$ \boldsymbol{y}_{j+1}=\boldsymbol{V}_{j}{\boldsymbol{V}^{T}_{j}}\boldsymbol{m}(\boldsymbol{y}_{j})+\boldsymbol{y}_{j}, $$
(16)

where

$$ \boldsymbol{m}(\boldsymbol{y}_{j})= \frac{{\sum}_{i=1}^{n}\boldsymbol{x}_{i}g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2}\right)} {{\sum}_{i=1}^{n}g\left( \left\|\frac{\boldsymbol{y}_{j} -\boldsymbol{x}_{i}}{h}\right\|^{2}\right)}-\boldsymbol{y}_{j}, $$
(17)

with \(\boldsymbol {y}_{j} \in \mathcal {X}\) being one of the input data points, as required by the modified MS algorithm. Here \(g(x)=-k^{\prime }(x)\), where k is the profile of kernel K and Vj is the D × (Dd) matrix having orthonormal columns that are eigenvectors corresponding to the largest eigenvalues of the local inverse covariance matrix \(\smash {\hat {\boldsymbol {\Sigma }}^{-1}}\), defined in Eq. 6, evaluated at yj that is one of the input data point according to the modified MS algorithm.

Since the profile k is bounded, the sequence \(\{\hat {f}(\boldsymbol {y}_{j})\}\) is bounded, so it suffices to show that the sequence is non-decreasing to prove the convergence. Since it is assumed that k is a convex function, we have k(t2) − k(t1) ≥ g(t1)(t1t2) for all t1,t2 ≥ 0, where \(g=-k^{\prime }\). This combined by the definition of \(\hat {f}\) in Eq. 2 yields

$$ \begin{array}{@{}rcl@{}} \hat{f}(\boldsymbol{y}_{j+1})-\hat{f}(\boldsymbol{y}_{j})&=& \frac{c}{nh^{D}} \sum\limits_{i=1}^{n}\left( k\left( \left\|\frac{\boldsymbol{y}_{j+1}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right)- k\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right)\right.\\ &\ge & \frac{c}{nh^{D+2}}\sum\limits_{i=1}^{n} g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right)\left( \|\boldsymbol{y}_{j}-\boldsymbol{x}_{i}\|^{2}- \|\boldsymbol{y}_{j+1}-\boldsymbol{x}_{i}\|^{2}\right) \\ &=& C_{j} \sum\limits_{i=1}^{n} p_{j}(i) \left( \|\boldsymbol{y}_{j}-\boldsymbol{x}_{i}\|^{2}- \|\boldsymbol{y}_{j+1}-\boldsymbol{x}_{i}\|^{2}\right), \end{array} $$
(18)

where c is the normalization factor,

$$ p_{j}(i) = \frac{ g\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\|^{2}\right)}{{\sum}_{k=1}^{n} g\left( \|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{k}}{h}\|^{2}\right)}, \quad i=1,\ldots,n $$

and

$$ C_{j}= \frac{c}{nh^{D+2}} \sum\limits_{i=1}^{n} g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right). $$

Since by assumption k is strictly decreasing, then \(g(t)=-k^{\prime }(t)>0\) for all t ≥ 0,pj(1),…,pj(n) are well defined, positive, and sum to 1. Therefore, the mean shift vector at (26) can be rewritten as

$$ \boldsymbol{m}(\boldsymbol{y}_{j})= \sum\limits_{i=1}^{n} p_{j}(i)(\boldsymbol{x}_{i}-\boldsymbol{y}_{j})= E[\boldsymbol{Z}_{j}], $$

where Zj is a random vector in \(\mathbb {R}^{D}\) with discrete probability distribution function given by \(\Pr (\boldsymbol {Z}_{j}= \boldsymbol {x}_{i}-\boldsymbol {y}_{j})= p_{j}(i), i=1,\ldots ,n\), and \(\boldsymbol {y}_{j} \in \mathcal {X}, j=1,\ldots ,\). Thus, letting \(\boldsymbol {T}_{j}= \boldsymbol {V}_{j}{\boldsymbol {V}^{T}_{j}}\), the update step in the proposed modified SCMS algorithm can be rewritten as

$$ \boldsymbol{y}_{j+1}-\boldsymbol{y}_{j}= \boldsymbol{T}_{j} \boldsymbol{m}(\boldsymbol{y}_{j})= \boldsymbol{T}_{j} E[\boldsymbol{Z}_{j}]. $$
(19)

Let Wj be a D × D matrix representing any orthogonal projection onto the null space of Tj. Then, x = Tjx + Wjx for all \(\boldsymbol {x}\in \mathbb {R}^{D}\), and Tjx and Wjy are orthogonal for all \(\boldsymbol {x},\boldsymbol {y}\in \mathbb {R}^{D}\). We can rewrite the last sum in Eq. 18 as follows

$$ \begin{array}{@{}rcl@{}} &&{\sum\limits_{i=1}^{n} p_{j}(i) \left( \|\boldsymbol{x}_{i}-\boldsymbol{y}_{j}\|^{2}- \|\boldsymbol{x}_{i}-\boldsymbol{y}_{j+1}\|^{2}\right)}\\ &=& E\left[\|\boldsymbol{Z}_{j}\|^{2} \right] - E\left[\left\|\boldsymbol{Z}_{j}- \boldsymbol{T}_{j}E[\boldsymbol{Z}_{j}]\right\|^{2} \right]\\ &=& E\left[\|\boldsymbol{W}_{j}\boldsymbol{Z}_{j}\|^{2}+\|\boldsymbol{T}_{j}\boldsymbol{Z}_{j}\|^{2} \right] - E\left[\|\boldsymbol{W}_{j}\boldsymbol{Z}_{j}\|^{2} + \left\|\boldsymbol{T}_{j} \boldsymbol{Z}_{j}- \boldsymbol{T}_{j}E[\boldsymbol{Z}_{j}]\right\|^{2} \right]\\ &=& E\left[\|\boldsymbol{T}_{j}\boldsymbol{Z}_{j}\|^{2} \right] - E\left[\left\|\boldsymbol{T}_{j} \boldsymbol{Z}_{j}- E[\boldsymbol{T}_{j}\boldsymbol{Z}_{j}]\right\|^{2} \right]\\ &=& \left\| E[\boldsymbol{T}_{j}\boldsymbol{Z}_{j}]\right\|^{2} = \|\boldsymbol{y}_{j+1}-\boldsymbol{y}_{j}\|^{2}, \end{array} $$

where in the last equality, we applied the identity E[Z2] = Var[Z] + (E[Z])2, which is valid for real random variables with finite variance, to the components of TjZj. Combining this with Eq. 18, we obtain

$$ \hat{f}(\boldsymbol{y}_{j+1})-\hat{f}(\boldsymbol{y}_{j})\ge C_{j} \|\boldsymbol{y}_{j+1}-\boldsymbol{y}_{j}\|^{2} , $$
(20)

where Cj > 0 and ∥yj+ 1yj2 ≥ 0Footnote 3 which imply that \(\{\hat {f}(\boldsymbol {y}_{j})\}\) is non-decreasing and thus convergent, proving part (i) of the theorem.

To prove part (ii), we note that k(x) > 0 for all x ≥ 0. Therefore, (2) implies that \(\hat {f}(\boldsymbol {y}_{1})>0, \boldsymbol {y}_{1} \in \mathcal {X}\), so part (i) yields \(\min \limits \{\hat {f}(\boldsymbol {y}_{j}): j\ge 1\}=\hat {f}(\boldsymbol {y}_{1})>0\). But this in turn implies that {yj} is a bounded sequence, since otherwise it would have a subsequence \(\{\boldsymbol {y}_{j_{k}}\}\) such that \(\lim _{k\to \infty } \|\boldsymbol {y}_{j_{k}}\|=\infty \) which, in view of \(\lim _{x\to \infty } k(x)=0\), would give \(\lim _{k\to \infty } \hat {f}(\boldsymbol {y}_{j_{k}})=0\), contradicting our uniform positive lower bound on the \(\hat {f}(\boldsymbol {y}_{j})\).

In view of the above, there exists R > 0 such that ∥yjxi∥≤ R for all j ≥ 1 and i = 1,…,n. Since \(g=-k^{\prime }\) is non-increasing on \([0,\infty )\), we obtain

$$ C_{j}= \frac{c}{nh^{D+2}} \sum\limits_{k=1}^{n} g\left( \Big\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{k}}{h}\Big\|^{2}\right)\ge \frac{c}{h^{D+2}} g\left( \frac{R^{2}}{h^{2}}\right)=C, $$

where C > 0 since g(x) > 0 for all x ≥ 0. Thus, Eq. 20 implies

$$ \|\boldsymbol{y}_{j+1}-\boldsymbol{y}_{j}\|^{2} \le C^{-1} \left( \hat{f}(\boldsymbol{y}_{j+1})-\hat{f}(\boldsymbol{y}_{j})\right), $$

and since \(\lim \limits _{j\to \infty } \left (\hat {f}(\boldsymbol {y}_{j+1})- \hat {f}(\boldsymbol {y}_{j+1})\right )=0\) by part (i), we obtain \(\lim \limits _{j\rightarrow \infty } \|\boldsymbol {y}_{j+1}-\boldsymbol {y}_{j}\|=0\).

Finally, to show (iii), we note that by definition (2) of \(\hat {f}\),

$$ \begin{array}{@{}rcl@{}} \nabla\hat{f}(\boldsymbol{y}_{j})&=& \frac{2c}{nh^{D+2}}\sum\limits_{i=1}^{n}(\boldsymbol{x}_{i}-\boldsymbol{y}_{j}) g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right)\\ &=& \frac{2c}{nh^{D+2}}\left[\sum\limits_{i=1}^{n} g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right) \right] \left[ \frac{{\sum}_{i=1}^{n}\boldsymbol{x}_{i}g\left( \|\frac{\boldsymbol{x}_{i}-\boldsymbol{y}_{j}}{h}\|^{2}\right)} {{\sum}_{i=1}^{n}g\left( \|\frac{\boldsymbol{x}_{i}-\boldsymbol{y}_{j}}{h}\|^{2}\right)}-\boldsymbol{y}_{j}\right]\\ &=& \frac{2c}{nh^{D+2}}\left[\sum\limits_{i=1}^{n} g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right) \right] \boldsymbol{m}(\boldsymbol{y}_{j}). \end{array} $$

Therefore,

$$ \|{\boldsymbol{V}^{T}_{j}} \nabla\hat{f}(\boldsymbol{y}_{j})\| = \frac{2c}{nh^{D+2}}\left[\sum\limits_{i=1}^{n} g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right) \right] \| {\boldsymbol{V}^{T}_{j}} \boldsymbol{m}(\boldsymbol{y}_{j}) \|. $$

Since Vj has orthonormal columns and \(\boldsymbol {T}_{j}=\boldsymbol {V}_{j}{\boldsymbol {V}^{T}_{j}}\), we have \(\|\boldsymbol {T}_{j}\boldsymbol {m}(\boldsymbol {y}_{j})\|=\|{\boldsymbol {V}^{T}_{j}}\boldsymbol {m}(\boldsymbol {y}_{j})\|\). This and Eq. 19 yield

$$ \|{\boldsymbol{V}^{T}_{j}} \nabla\hat{f}(\boldsymbol{y}_{j})\| = \frac{2c}{nh^{D+2}}\left[\sum\limits_{i=1}^{n} g\left( \left\|\frac{\boldsymbol{y}_{j}-\boldsymbol{x}_{i}}{h}\right\|^{2} \right) \right] \|\boldsymbol{y}_{j+1}-\boldsymbol{y}_{j}\| $$

so part (iii) follows from part (ii) and the fact that the conditions on k ensure that \(g=-k^{\prime }\) is bounded. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghassabeh, Y.A., Rudzicz, F. Modified Subspace Constrained Mean Shift Algorithm. J Classif 38, 27–43 (2021). https://doi.org/10.1007/s00357-019-09353-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-019-09353-1

Keywords

Navigation