Skip to main content
Log in

Implicitly Weighted Methods in Robust Image Analysis

  • Published:
Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript


This paper is devoted to highly robust statistical methods with applications to image analysis. The methods of the paper exploit the idea of implicit weighting, which is inspired by the highly robust least weighted squares regression estimator. We use a correlation coefficient based on implicit weighting of individual pixels as a highly robust similarity measure between two images. The reweighted least weighted squares estimator is considered as an alternative regression estimator with a clear interpretation. We apply implicit weighting to dimension reduction by means of robust principal component analysis. Highly robust methods are exploited in tasks of face localization and face detection in a database of 2D images. In this context we investigate a method for outlier detection and a filter for image denoising based on implicit weighting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others


  1. Arya, K.V., Gupta, P., Kalra, P.K., Mitra, P.: Image registration using robust M-estimators. Pattern Recognit. Lett. 28, 1957–1968 (2007)

    Article  Google Scholar 

  2. Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 711–720 (1997)

    Article  Google Scholar 

  3. Böhringer, S., Vollmar, T., Tasse, C., Würtz, R.P., Gillessen-Kaesbach, G., Horsthemke, B., Wieczorek, D.: Syndrome identification based on 2D analysis software. Eur. J. Hum. Genet. 14, 1082–1089 (2006)

    Article  Google Scholar 

  4. Chai, X., Shan, S., Chen, X., Gao, W.: Locally linear regression for pose-invariant face recognition. IEEE Trans. Image Process. 16(7), 1716–1725 (2007)

    Article  MathSciNet  Google Scholar 

  5. Chambers, J.M.: Software for Data Analysis: Programming with R. Springer, New York (2008)

    Book  MATH  Google Scholar 

  6. Chen, J.-H., Chen, C.-S., Chen, Y.-S.: Fast algorithm for robust template matching with M-estimators. IEEE Trans. Signal Process. 51(1), 230–243 (2003)

    Article  MathSciNet  Google Scholar 

  7. Čížek, P.: Robust estimation with discrete explanatory variables. In: Härdle, W., Rönz, B. (eds.) COMPSTAT 2002, Proceedings in Computational Statistics, pp. 509–514. Physica-Verlag, Heidelberg (2002)

    Google Scholar 

  8. Čížek, P.: Semiparametrically weighted robust estimation of regression models. Comput. Stat. Data Anal. 55(1), 774–788 (2011)

    Article  Google Scholar 

  9. Dabov, K., Foi, A., Katkovnik, V., Egizarian, K.: Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Trans. Image Process. 16(8), 2080–2095 (2007)

    Article  MathSciNet  Google Scholar 

  10. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR, pp. 886–893. IEEE Computer Society, Washington (2005)

    Google Scholar 

  11. Davies, P.L., Gather, U.: Breakdown and groups. Ann. Stat. 33(3), 977–1035 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  12. Davies, P.L., Kovac, A.: Local extremes, runs, strings and multiresolution. Ann. Stat. 29(1), 1–65 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  13. Donoho, D.L., Huber, P.J.: The notion of breakdown point. In: Bickel, P.J., Doksum, K., Hodges, J.L.J. (eds.) A Festschrift for Erich L. Lehmann, pp. 157–184. Wadsworth, Belmont (1983)

    Google Scholar 

  14. Ellis, S.P., Morgenthaler, S.: Leverage and breakdown in L1 regression. J. Am. Stat. Assoc. 87(417), 143–148 (1992)

    MathSciNet  MATH  Google Scholar 

  15. Fidler, S., Skočaj, D., Leonardis, A.: Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling. IEEE Trans. Pattern Anal. Mach. Intell. 28(3), 337–350 (2006)

    Article  Google Scholar 

  16. Franceschi, E., Odone, F., Smeraldi, F., Verri, A.: Finding objects with hypothesis testing. In: Proceedings of ICPR 2004, Workshop on Learning for Adaptable Visual Systems, Cambridge, 2004. IEEE Computer Society, Los Alamitos (2004)

    Google Scholar 

  17. Fried, R., Einbeck, J., Gather, U.: Weighted repeated median smoothing and filtering. J. Am. Stat. Assoc. 102(480), 1300–1308 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  18. Gervini, D., Yohai, V.J.: A class of robust and fully efficient regression estimators. Ann. Stat. 30(2), 583–616 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  19. Hájek, J., Šidák, Z., Sen, P.K.: Theory of Rank Tests, 2nd edn. Academic Press, San Diego (1999)

    MATH  Google Scholar 

  20. Härdle, W.K., Simar, L.: Applied Multivariate Statistical Analysis. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  21. He, X., Portnoy, S.: Reweighted LS estimators converge at the same rate as the initial estimator. Ann. Stat. 20(4), 2161–2167 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  22. Hillebrand, M., Müller, C.: Outlier robust corner-preserving methods for reconstructing noisy images. Ann. Stat. 35(1), 132–165 (2007)

    Article  MATH  Google Scholar 

  23. Hotz, T., Marnitz, P., Stichtenoth, R., Davies, P.L., Kabluchko, Z., Munk, A.: Locally adaptive image denoising by a statistical multiresolution criterion. Preprint statistical regularization and qualitative constraints 8/2009, University of Göttingen (2009)

  24. Huang, L.-L., Shimizu, A.: Combining classifiers for robust face detection. In: Lecture Notes in Computer Science, vol. 3972, pp. 116–121 (2006)

    Google Scholar 

  25. Hubert, M., Rousseeuw, P.J., van Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23(1), 92–119 (2008)

    Article  Google Scholar 

  26. Kalina, J.: Asymptotic Durbin-Watson test for robust regression. Bull. Int. Stat. Inst. 62, 3406–3409 (2007)

    Google Scholar 

  27. Kalina, J.: Robust image analysis of faces for genetic applications. Eur. J. Biomed. Inform. 6(2), 6–13 (2010)

    MathSciNet  Google Scholar 

  28. Kalina, J.: On multivariate methods in robust econometrics. Prague Econ. Pap. 1(2012), 69–82 (2012)

    Google Scholar 

  29. Kleihorst, R.P.: Noise filtering of image sequences. Dissertation, Technical University Delft (1997)

  30. Lin, Z., Davis, L.S., Doermann, D.S., DeMenthon, D.: Hierarchical part-template matching for human detection and segmentation. In: Proceedings of the Eleventh IEEE International Conference on Computer Vision ICCV 2007, pp. 1–8. IEEE Computer Society, Washington (2007)

    Chapter  Google Scholar 

  31. Mairal, J., Elad, M., Sapiro, G.: Sparse representation for color image restoration. IEEE Trans. Image Process. 17(1), 53–69 (2008)

    Article  MathSciNet  Google Scholar 

  32. Maronna, R.A., Martin, R.D., Yohai, V.J.: Robust Statistics: Theory and Methods. Wiley, Chichester (2006)

    Book  MATH  Google Scholar 

  33. Meer, P., Mintz, D., Rosenfeld, A., Kim, D.Y.: Robust regression methods for computer vision: A review. Int. J. Comput. Vis. 6(1), 59–70 (1991)

    Article  Google Scholar 

  34. Müller, C.: Redescending M-estimators in regression analysis, cluster analysis and image analysis. Discuss. Math., Probab. Stat. 24(1), 59–75 (2004)

    MathSciNet  MATH  Google Scholar 

  35. Naseem, I., Togneri, R., Bennamoun, M.: Linear regression for face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32(11), 2106–2112 (2010)

    Article  Google Scholar 

  36. Pitas, I., Venetsanopoulos, A.N.: Nonlinear Digital Filters. Kluwer, Dordrecht (1990)

    MATH  Google Scholar 

  37. Plát, P.: The least weighted squares estimator. In: Antoch, J. (ed.) COMPSTAT 2004, Proceedings in Computational Statistics, pp. 1653–1660. Physica-Verlag, Heidelberg (2004)

    Google Scholar 

  38. Portilla, J., Strela, V., Wainwright, M.J., Simoncelli, E.P.: Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. Image Process. 12(11), 1338–1351 (2003)

    Article  MathSciNet  Google Scholar 

  39. Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987)

    Book  MATH  Google Scholar 

  40. Rousseeuw, P.J., van Driessen, K.: Computing LTS regression for large data sets. Data Min. Knowl. Discov. 12(1), 29–45 (2006)

    Article  MathSciNet  Google Scholar 

  41. Rowley, H., Baluja, S., Kanade, S.: Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 23–38 (1998)

    Article  Google Scholar 

  42. Salibián-Barrera, M.: The asymptotics of MM-estimators for linear regression with fixed designs. Metrika 63, 283–294 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  43. Schettlinger, K., Fried, R., Gather, U.: Real time signal processing by adaptive repeated median filters. Int. J. Adapt. Control Signal Process. 24(5), 346–362 (2010)

    MathSciNet  MATH  Google Scholar 

  44. Shevlyakov, G.L., Vilchevski, N.O.: Robustness in Data Analysis: Criteria and Methods. VSP, Utrecht (2002)

    Google Scholar 

  45. Tableman, M.: The influence functions for the least trimmed squares and the least trimmed absolute deviations estimators. Stat. Probab. Lett. 19, 329–337 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  46. Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing visual features for multiclass and multiview object detection. IEEE Trans. Pattern Anal. Mach. Intell. 5, 854–869 (2007)

    Article  Google Scholar 

  47. Tuzel, O., Porikli, F., Meer, P.: Human detection via classification on Riemannian manifolds. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2007. IEEE Computer Society, Washington (2007)

    Google Scholar 

  48. Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)

    Article  Google Scholar 

  49. Víšek, J.A.: The least weighted squares II. Consistency and asymptotic normality. Bull. Czech Econom. Soc. 9(16), 1–28 (2002)

    Google Scholar 

  50. Víšek, J.A.: Robust error-term-scale estimate. In: Nonparametrics and Robustness in Modern Statistical Inference and Time Series Analysis. Institute of Mathematical Statistics Collections, vol. 7, pp. 254–267 (2010)

    Google Scholar 

  51. Víšek, J.A.: Consistency of the least weighted squares under heteroscedasticity. Kybernetika 47(2), 179–206 (2011)

    MathSciNet  MATH  Google Scholar 

  52. Wang, M., Lai, C.-H.: A Concise Introduction to Image Processing Using C++. CRC Press, Boca Raton (2008)

    Google Scholar 

  53. Wang, X., Tang, X.: Subspace analysis using random mixture models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2005, pp. 574–580. IEEE Computer Society, Washington (2005)

    Google Scholar 

  54. Wong, Y., Sanderson, C., Lovell, B.C.: Regression based non-frontal face synthesis for improved identity verification. In: Jiang, X., Petkov, N. (eds.) Computer Analysis of Images and Patterns, pp. 116–124. Springer, Heidelberg (2010)

    Google Scholar 

  55. Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Yi, M.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)

    Article  Google Scholar 

  56. Yang, M.-H., Kriegman, D.J., Ahuja, N.: Detecting faces in images: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 24(1), 34–58 (2002)

    Article  Google Scholar 

Download references


This research is fully supported by the project 1M06014 of the Ministry of Education, Youth and Sports of the Czech Republic. The author is grateful to two anonymous referees for providing valuable suggestions.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jan Kalina.

Appendix: Technical Details

Appendix: Technical Details

Definition 9

(Weight function)

Let a function ψ:[0,1]→[0,1] be non-increasing and continuous on [0,1], let ψ(0)=1 and ψ(1)=0. Moreover, we assume that both one-sided derivatives of ψ exist in all points of (0,1), that they are bounded by a common constant and we assume the existence of a finite left derivative in 0 and finite right derivative in point 1. Then the function ψ is called a weight function.

Definition 10

(Least weighted squares with adaptive weights)

In the model (1), let b 0 denote an initial robust estimator of β and let \(\hat{\sigma}_{0}^{2}\) denote a corresponding initial robust estimator of σ 2. Let F χ denote the distribution function of \(\chi^{2}_{1}\) distribution. The least weighted squares estimator of β with adaptive weights is defined as

$$ \arg\min_{\mathbf{b} \in\mathbb{R}^p} \sum_{i=1}^n \hat{w}_n \biggl[ G_n\bigl(u_i^2( \mathbf{b})\bigr) - \frac{1}{2n} \biggr] u_i^2( \mathbf{b}), $$


$$ \hat{w}_n(t) = \left \{ \begin{array}{l@{\quad}l} \hat{\sigma}_0 \frac{F_\chi^{-1}(\max\{t, c_n\})}{(G_n^0)^{-1} (\max\{t, c_n\})}, & \mbox{if } t < 1-d_n, \\ 0, & \mbox{otherwise}, \end{array} \right . $$

G n is the empirical distribution function of \(u_{i}^{2}(\mathbf{b})\), \(G_{n}^{0}\) is the empirical distribution function of \(u_{i}^{2}(\mathbf{b}_{0})\),

$$ c_n = \min \biggl\{ \frac{m}{n}; u^2_{(m)}( \mathbf{b}_0)>0 \biggr\} $$

is used to avoid dividing by zero,

$$ d_n = \sup_{t \geq c} \max \bigl\{ 0, \hat{\sigma}_0 F_\chi(t) - G_n^0(t) \bigr\}, $$

\(c=F_{\chi}^{-1}(q)\) and q∈[0.9999,1) is a chosen constant.

Assumptions \(\mathcal{A}\)

We assume a sequence of non-random vectors \(\{\mathbf{X}_{n}\}_{n=1}^{\infty}\) with values inp and a sequence of independent and identically distributed random variables \(\{e_{n}\}_{n=1}^{\infty}\) with values in ℝ, which form the model (1) for each n. The distribution function F(z) of the random error e 1 is symmetric and absolutely continuous with a bounded density f(z), which is decreasing on+. The density is positive on (−∞,∞) and its second derivative is bounded. Moreover,

$$ \sum_{i=1}^n \Vert\mathbf{X}_i \Vert^3 = \mathcal{O}(n) \quad\mbox{\textit{and}}\quad \mathsf{E}e_1^2 \in(0, \infty). $$


$$ \lim_{n\to\infty} \hat{\mathbf{Q}}_n = \mathbf{Q} $$

in probability, where Q is a regular matrix. There exists a distribution function H(x) for x∈ℝp such that


Let \(B(\boldsymbol {\beta },\delta) = \{ \tilde{\boldsymbol {\beta }}\in\mathbb{R}^{p}; ||\tilde {\boldsymbol {\beta }}-\boldsymbol {\beta }||<\delta \}\) for an arbitrary fixed δ>0. For any compact set W with WB(β,δ)≠0, there exists γ δ >0 such that

$$ \inf_{\boldsymbol {\omega }\in W\backslash B(\boldsymbol {\beta },\delta)} \frac{1}{n}\sum_{i=1}^n \bigl( \mathbf{X}_i^T(\boldsymbol {\omega }- \boldsymbol {\beta }) \bigr)^2 > \gamma_\delta. $$


(Theorem 1) Let us define

$$ s_{LWS} = \frac{1}{n-2} \bigl( \mathbf{u}^{LWS} \bigr)^T \mathbf{u}^{LWS}, $$

where \(\mathbf{u}^{LWS} = (u_{1}^{LWS}, \ldots, u_{n}^{LWS})^{T}\) are residuals of the least weighted squares estimator. We can express

$$ T=\frac{b^{LWS}_1}{s_{LWS}} \sqrt{\sum_{i=1}^n (X_i-\bar{X})^2} $$

and the statement follows from Theorem 4 for the linear regression context. □


(Theorem 2) Consequence of a general result of [50], who derived (19) as a consistent estimator of σ 2. The constant γ in (20) is independent on n and can be approximated by numerical integration as

$$ \sum_{i=2}^m (t_{i-1} - t_i) \psi \bigl( F(|t_i|) \bigr) t_i^2 f(t_i) $$

using a partition −∞<t 1<t 2<⋯<t m <∞ of the real line into intervals. Normal distribution N(0,σ 2) of errors is assumed. Without loss of generality, we use the N(0,1) distribution for the numerical integration for the examples of weights in (7) and (8). □


(Theorem 3) The LWS estimator (5) in the model (27) is defined as the minimum of \(\sum_{i=1}^{n} w_{i} u^{2}_{(i)}(\hat{\mu})\) over \(\hat{\mu} \in{\mathbb{R}}\) and over all permutations of the weights with magnitudes w 1,…,w n . The location model (27) is a special case of linear regression and therefore the solution has a form of a weighted mean \(\hat{\mu} = \sum_{i=1}^{n} w^{*}_{i} Y_{i}\), where \(w_{1}^{*},\ldots,w_{n}^{*}\) are permuted values of w 1,…,w n . Therefore, we can express the LWS estimator as

$$ \arg\min\sum_{i=1}^n w_i \Biggl(\mathbf{Y}-\sum_{j=1}^n w_j Y_j \Biggr)^2_{(i)}, $$

where the minimum is considered over all permutations of the weights with magnitudes w 1,…,w n and the notation

$$ \Biggl(\mathbf{Y}-\sum_{j=1}^n w_j Y_j \Biggr)_{(1)} \leq\cdots\leq \Biggl( \mathbf{Y}-\sum_{j=1}^n w_j Y_j \Biggr)_{(n)} $$

is used for ordered coordinates of

$$ \Biggl(Y_1 - \sum_{j=1}^n w_j Y_j, \ldots, Y_n - \sum _{j=1}^n w_j Y_j \Biggr)^T. $$

However, (53) minimizes the weighted variance \(S^{2}_{w}(\mathbf{Y})\) (28), which concludes the proof. □


(Theorem 4) The first part follows immediately from the asymptotic normality of b LWS of [37]. The independence between the numerator and the denominator is assured asymptotically in probability, because b LWS is asymptotically in probability equivalent with (X T X)−1 X T Y and (u LWS)T u LWS is asymptotically equivalent with e T Me in probability [49], where \(\mathbf{M} = \mbox{\boldmath$\mathcal{I}$}_{n} - \mathbf{H}\), H=X(X T X)−1 X T and \(\mbox{\boldmath$\mathcal{I}$}_{n}\) denotes an identity matrix of dimension n.

The second part follows from the asymptotic representation for the LWS estimator [49]. The asymptotic representation holds in the form


for n→∞, where

$$ \hat{\mathbf{Q}}_n^{(1)}= \hat{\mathbf{Q}}_n \biggl(1-\int_0^1 \alpha d\psi(\alpha) - 2\int _0^1 u_\alpha f(u_\alpha)d\psi( \alpha) \biggr), $$

ψ is the weight function defining the LWS estimator and coordinates of η=(η 1,…,η p )T are of order o P (1). Let us denote \(\boldsymbol {\tau }= - \frac{1}{\sqrt{n}}\mathbf{X}\boldsymbol {\eta }\), κ=u LWSτ and Ψ(e)=(ψ 1(e),…,ψ n (e))T with



$$ \boldsymbol {\phi }= \mathbf{H}\mathbf{e} - \mathbf{X}\bigl(\mathbf{X}^T\mathbf{X} \bigr)^{-1} \mathbf{X}^T\boldsymbol {\varPsi }(\mathbf{e})= \mathbf{H}\bigl[ \mathbf{e}- \boldsymbol {\varPsi }(\mathbf{e})\bigr]. $$

It holds κ=Me+ϕ and

$$ \boldsymbol {\kappa }^T\boldsymbol {\kappa }= \mathbf{e}^T \mathbf{M} \mathbf{e} + [ \mathbf{e}- \boldsymbol {\varPsi }\mathbf{e} ]^T \mathbf{H}\bigl[\mathbf{e}- \boldsymbol {\varPsi }( \mathbf{e})\bigr]. $$

To complete the proof of the second part we can repeat the steps of [26], who considered the test statistic of the Durbin-Watson test. The residual sum of squares (u LWS)T u LWS is asymptotically equivalent in probability with κ T κ and e T Me, which is the test statistic computed with least squares residuals. At the same time, e T Me/σ 2 follows \(\chi^{2}_{n-p}\) distribution. The third part of the theorem is proven by analogous reasoning. □


(Corollary 1) Starting with (34), we exploit the scale-invariance of the denominator. \(\sigma^{2}_{\psi}\) is estimated consistently following [37] to obtain (39). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalina, J. Implicitly Weighted Methods in Robust Image Analysis. J Math Imaging Vis 44, 449–462 (2012).

Download citation

  • Published:

  • Issue Date:

  • DOI: