Skip to main content
Log in

Asymptotic properties of \(M\)-estimators in linear and nonlinear multivariate regression models

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

We consider the (possibly nonlinear) regression model in \(\mathbb{R }^q\) with shift parameter \(\alpha \) in \(\mathbb{R }^q\) and other parameters \(\beta \) in \(\mathbb{R }^p\). Residuals are assumed to be from an unknown distribution function (d.f.). Let \(\widehat{\phi }\) be a smooth \(M\)-estimator of \(\phi = {{\beta }\atopwithdelims (){\alpha }}\) and \(T(\phi )\) a smooth function. We obtain the asymptotic normality, covariance, bias and skewness of \(T(\widehat{\phi })\) and an estimator of \(T(\phi )\) with bias \(\sim n^{-2}\) requiring \(\sim n\) calculations. (In contrast, the jackknife and bootstrap estimators require \(\sim n^2\) calculations.) For a linear regression with random covariates of low skewness, if \(T(\phi ) = \nu \beta \), then \(T(\widehat{\phi })\) has bias \(\sim n^{-2}\) (not \(n^{-1}\)) and skewness \(\sim n^{-3}\) (not \(n^{-2}\)), and the usual approximate one-sided confidence interval (CI) for \(T(\phi )\) has error \(\sim n^{-1}\) (not \(n^{-1/2}\)). These results extend to random covariates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andrews DF, Bickel PJ, Hampel FR, Huber PJ, Rogers WH, Tukey JW (1972) Robust estimates of location: survey and advances. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  • Bierens HJ (1981) Robust methods and asymptotic theory in nonlinear econometrics. Springer, New York

    Book  MATH  Google Scholar 

  • Cai T, Duan S, Ren T, Liu F (2010) A robust parametric method for power harmonic estimation based on M-estimators. Measurement 43:67–77

    Article  Google Scholar 

  • Carroll RJ, Ruppert D (1982) Robust estimation in heteroscedastic linear models. Ann Stat 10:429–441

    Article  MATH  MathSciNet  Google Scholar 

  • Chang X-W (2006) Computation of Huber’s M-estimates for a block-angular regression problem. Comput Stat Data Anal 50:5–20

    Article  MATH  Google Scholar 

  • Chao J-C, Douglas SC (2007) A robust complex fast ICA algorithm using the Huber M-estimator cost function. Lect Notes Comput Sci 4666:152–160

    Article  Google Scholar 

  • Chen J, Li DG, Lin ZY (2011) Asymptotic expansion for nonparametric M-estimator in a nonlinear regression model with long-memory errors. J Stat Plan Inference 141:3035–3046

    Article  MATH  MathSciNet  Google Scholar 

  • Deergha Rao K, Raju BVSSN (2006) Improved robust multiuser detection in non-Gaussian channels using a new M-estimator and spatiotemporal chaotic spreading sequences. In: Proceedings of the IEEE Asia Pacific conference on circuits and systems pp 1729–1732

  • Douglas SC, Chao J-C (2007) Simple, robust, and memory-efficient fast ICA algorithms using the Huber M-estimator cost function. J VLSI Signal Process 48:143–159

    Article  Google Scholar 

  • El-Yamany NA, Papamichalis PE (2008) Robust color image superresolution: An adaptive M-estimation framework. EURASIP J Image Video Process Article ID 763254

  • Faurie F, Giremus A (2010) Combining generalized likelihood ratio and M-estimation for the detection/compensation of GPS measurement biases. In: Proceedings of the 2010 IEEE international conference on acoustics, speech, and, signal processing, pp 4178–4181

  • Fouad MM, Dansereau RM, Whitehead AD (2011) Two-step super-resolution technique using bounded total variation and bisquare M-estimator under local illumination changes. In: Proceedings of the 18th international conference on image processing, pp 1381–1384

  • Fraiman R (1983) General \(M\)-estimators and applications to bounded influence estimation for non-linear regression. Commun Stat Theory Methods 12:2617–2631

    Article  MATH  MathSciNet  Google Scholar 

  • Hajek J, Sidak Z (1967) Theory of rank tests. Academic Press, New York

    MATH  Google Scholar 

  • Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics: the approach based on influence functions. Wiley, New York

    MATH  Google Scholar 

  • Hassaïne Y, Delourme B, Panciatici P, Walter E (2005) M-Arctan estimator based on the trust-region method. Int J Electr Power Energy Syst 28:590–598

    Article  Google Scholar 

  • Hoseinnezhad R, Bab-Hadiashar A (2011) An M-estimator for high breakdown robust estimation in computer vision. Comput Vis Image Underst 115:1145–1156

    Article  Google Scholar 

  • Huber PJ (1981) Robust statistics. Wiley, New York

    Book  MATH  Google Scholar 

  • Huber PJ (1996) Robust statistical procedures, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia

    Book  Google Scholar 

  • Huber PJ, Ronchetti EM (2009) Robust statistics, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Kalyani S, Giridhar K (2007) Mitigation of error propagation in decision directed OFDM channel tracking using generalized M estimators. IEEE Trans Signal Process 55:1659–1672

    Article  MathSciNet  Google Scholar 

  • Katkovnik V (1999) Robust M-estimates of the frequency and amplitude of a complex-valued harmonic. Signal Process 77:71–84

    Article  MATH  Google Scholar 

  • Katkovnik V, Lee M-S, Kim Y-H (2008) Robust M-estimation techniques for non-Gaussian CDMA wireless channels with phased array antenna. Signal Process 88:670–684

    Article  MATH  Google Scholar 

  • Kawarnura K, Hasegawa K, Yamashita O, Sato Y, Ikeuchi K (1999) Object recognition using local EGI and 3D models with M-estimators. In: Proceedings of the international conference on multisensor fusion and integration for intelligent systems, pp 80–86

  • Koul HL (1992) Weighted empiricals and linear models. Institute of Mathematical Statistics, Hayward

    MATH  Google Scholar 

  • Koul HL (2002) Weighted empirical processes in dynamic nonlinear models, 2nd edn. Institute of Mathematical Statistics, Hayward

    Book  MATH  Google Scholar 

  • Lee M-J (2010) Micro-econometrics: methods of moments and limited dependent variables, 2nd edn. Springer, New York

    Book  Google Scholar 

  • Liese F, Miescke K-J (2008) Statistical decision theory: estimation, testing, and selection. Springer, New York

    Google Scholar 

  • Marazzi A (1993) Algorithms, routines, and S functions for robust statistics. Wadsworth and Brooks/Cole Advanced Books and Software, Pacific Grove

    MATH  Google Scholar 

  • Maronna RA, Yohai VJ (1981) Asymptotic behaviour of general \(M\)-estimates for regression and scale with random carriers. Probab Theory Relat Fields 58:7–20

    MATH  MathSciNet  Google Scholar 

  • Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics: theory and methods. Wiley, Chichester

    Book  Google Scholar 

  • Mitra S, Mitra A, Kundu D (2011) Genetic algorithm and M-estimator based robust sequential estimation of parameters of nonlinear sinusoidal signals. Commun Nonlinear Sci Numer Simul 16:2796–2809

    Article  MATH  MathSciNet  Google Scholar 

  • Nguyen N-V, Shevlyakov G, Shin V (2010) Alternative to M-estimates in multisensor data fusion. World Acad Sci Eng Technol 46:1034–1038

    Google Scholar 

  • Park Y, Kim D, Kim S (2012) Robust regression using data partitioning and M-estimation. Commun Stat Simul Comput 41:1282–1300

    Google Scholar 

  • Peracchi F (2001) Econometrics. Wiley, Chichester

    MATH  Google Scholar 

  • Pfanzagl J (1994) Parametric statistical theory. Walter de Gruyter and Company, Berlin

    Book  MATH  Google Scholar 

  • Pham DS, Leung YH, Zoubir A, Brcic R (2004) Sequential M-estimation. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, pp 697–700

  • Powell JL (1984) Least absolute deviations estimation for the censored regression. J Econom 25:303–325

    Article  MATH  Google Scholar 

  • Prakasa Rao BLS (1999) Statistical inference for diffusion type processes. Edward Arnold, London

    MATH  Google Scholar 

  • Randles RH, Wolfe DA (1979) Introduction to the theory of nonparametric statistics. Wiley, New York

    MATH  Google Scholar 

  • Rao CR, Toutenburg H (1995) Linear models: least squares and alternatives. Springer, New York

    Book  MATH  Google Scholar 

  • Rey WJJ (1978) Robust statistical methods. Springer, Berlin

    MATH  Google Scholar 

  • Rieder H (1994) Robust asymptotic statistics. Springer, New York

    Book  MATH  Google Scholar 

  • Serfling RJ (1980) Approximation theorems of mathematical statistics. Wiley, New York

    Book  MATH  Google Scholar 

  • Staudte RG, Sheather SJ (1990) Robust estimation and testing. Wiley, New York

    Book  MATH  Google Scholar 

  • Sutarno D (2008) Constrained robust estimation of magnetotelluric impedance functions based on a bounded-influence regression M-estimator and the Hilbert transform. Nonlinear Process Geophys 15:287–293

    Article  Google Scholar 

  • van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York

    Book  MATH  Google Scholar 

  • van de Geer SA (2000) Applications of empirical process theory. Cambridge University Press, Cambridge

  • Venetsanopoulos AN, Zervakis ME (1990) M-estimators in robust nonlinear image restoration. Opt Eng 29:455–470

    Article  Google Scholar 

  • Verboon P (1994) A robust approach to nonlinear multivariate analysis. D.S.W.O Press, Leiden

    MATH  Google Scholar 

  • Withers CS (1982a) The distribution and quantiles of a function of parameter estimates. Ann Inst Stat Math A 34:55–68

    Article  MATH  MathSciNet  Google Scholar 

  • Withers CS (1982b) Second order inference for asymptotically normal random variables. Sankhyā 44:19–27

    MATH  MathSciNet  Google Scholar 

  • Withers CS (1983) Expansions for the distribution and quantiles of a regular functional of the empirical distribution with applications to nonparametric confidence intervals. Ann Stat 11:577–587

    Article  MATH  MathSciNet  Google Scholar 

  • Withers CS (1987) Bias reduction by Taylor series. Commun Stat Theory Methods 16:2369–2384

    Article  MATH  MathSciNet  Google Scholar 

  • Withers CS (1989) Accurate confidence intervals when nuisance parameters are present. Commun Stat Theory Methods 18:4229–4259

    Article  MATH  MathSciNet  Google Scholar 

  • Withers CS, Nadarajah S (2007) \(M\)-estimates for stationary and scaled residuals. Random Oper Stoch Equ 15:287–296

    Article  MATH  MathSciNet  Google Scholar 

  • Withers CS, Nadarajah S (2011) Expansions for the distribution of M-estimates with applications to the multi-tone problem. ESAIM: Probab Stat 15:139–167

    Article  MATH  MathSciNet  Google Scholar 

  • Xiong B, Yin Z (2011) Structural similar patches for nonlocal-means with modified robust M-estimator and residual images. In: Proceedings of the 2011 IEEE international conference on mechatronics and automation, pp 709–714

Download references

Acknowledgments

The authors would like to thank the Editor and the referee for careful reading and for their comments which greatly improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saralees Nadarajah.

Appendices

Appendix A

Here, we illustrate how to obtain the derivatives \(\delta _{i, j, \ldots } (\theta ) = \partial _i \partial _j \cdots \delta (S_R)\) at \(S_R = \theta _R\), where \(\theta _R = \mathbb E [ S_R ]\) for \(S_R\) of (3.11), where \(i, j, \ldots \) range over \(1, \ldots , \) dimension \((S_R)\). The derivatives of order up to \(2r\) are required to obtain \(C_r\) of

$$\begin{aligned} \mathbb E \left[ \widehat{\phi } \right] \approx \phi + \sum _{r = 1}^\infty n^{-r} C_r, \end{aligned}$$

or more generally \(D_r\) of \(\mathbb E [ t (\widehat{\phi }) ] \approx t (\phi ) + \sum _{r = 1}^\infty n^{-r} D_r\). The derivatives of order up to \(2r\) are also required to obtain an estimator of \(\phi \) or \(t (\phi )\) with bias \(O (n^{-r-1})\). Theorem 9.1 gives formulae for them up to order four, so allowing bias reduction to \(O (n^{-3})\) via (3.18).

Theorem 9.1

For \(x, y, \ldots \in S_R\), set \((i_1, \ldots , i_r | x, y, \ldots ) = \partial _x \partial _y \cdots (\delta _{i_1} \cdots \delta _{i_r})\), where \(\partial x = \partial / \partial x\) and \((\cdot )_0 = (\cdot )\) at \(S_R = \theta _R\). Then \(\{ (h | x_1, \ldots , x_r)_0 \} = \delta _{h \cdot i_1, \ldots , i_r} (\theta _R )\) for \(r \le 4\) are given by

$$\begin{aligned} \left( h | x \right) _0 = -I \left( x = S_h \right) \end{aligned}$$
(9.1)

and

$$\begin{aligned} \left( h | x_1, \ldots , x_s \right) _0&= -\sum ^{s-1}_{r=1} \sum ^s_{x_1, \ldots , x_s} I \left( x_s = S_{h, i_1, \ldots , i_r} \right) \left. \left( i_1, \ldots , i_r \right| x_1, \ldots , x_{s-1} \right) _0 \nonumber \\&-\sum ^s_{r=2} \left( \mathbb E \left[ S_{h, i_1, \ldots , i_r} \right] \right) \left. \left( i_1, \ldots , i_r \right| x_1, \ldots , x_s \right) _0 \end{aligned}$$
(9.2)

for \(s \ge 2\), where \(I (A) = 1\) if \(A\) is true, \(I (A) = 0\) if \(A\) is false, and

$$\begin{aligned} \displaystyle \left. \left( i_1, \ldots , i_r \right| x_1, \ldots , x_s \right) _0 = \left\{ \begin{array}{l@{\quad }l} 0, &{} \text{ for } s < r, \\ \displaystyle \sum _{x_1, \ldots , x_r}^{r!} \left. \left( i_1 \right| x_1 \right) _0 \cdots \left. \left( i_r \right| x_r \right) _0, \displaystyle &{} \text{ for } s=r. \end{array} \right. \qquad \end{aligned}$$
(9.3)

Also, in an obvious extension of the notation of (1.6),

$$\begin{aligned}&\left. \left( i_1, i_2 \right| x_1, x_2, x_3 \right) _0 = \sum ^6 \left. \left( i_1 \right| x_1, x_2 \right) _0 \left. \left( i_2 \right| x_3 \right) _0\!, \\&\left. \left( i_1, i_2 \right| x_1, \ldots , x_4 \right) _0 = \sum ^8 \left. \left( i_1 \right| x_1, x_2, x_3 \right) _0 \left. \left( i_2 \right| x_4 \right) _0 + \sum ^6 \left. \left( i_1 \right| x_1, x_2 \right) _0 \left. \left( i_2 \right| x_3, x_4 \right) _0\!,\\&\left. \left( i_1, i_2, i_3 \right| x_1, \ldots , x_4 \right) _0 = \sum ^{12} \left. \left( i_1 \right| x_1, x_2 \right) _0 \left. \left( i_2 \right| x_3 \right) _0 \left. \left( i_3 \right| x_4 \right) _0\!. \end{aligned}$$

Similarly, we can obtain \((i_1, \ldots , i_r | x_1, \ldots , x_s)_0\) from their values for \(r=1\).

Proof

Differentiating (3.7) gives

$$\begin{aligned} (h|x) = -\sum _{r=0} \left\{ I \left( x = S_{h, i_1, \ldots , i_r} \right) \left( i_1, \ldots , i_r \right) + S_{h, i_1, \ldots , i_r} \left. \left( i_1, \ldots , i_r \right| x \right) \right\} \!, \end{aligned}$$

where \((i_1, \ldots , i_r) = 1\) for \(r=0\), and

$$\begin{aligned} \left. \left( h \right| x_1, \ldots , x_s \right)&= -\sum _{r=1} \Bigg \{ \sum ^s_{x_1, \ldots , x_s} I \left( x_s = S_{h, i_1, \ldots , i_r} \right) \left. \left( i_1, \ldots , i_r \right| x_1, \ldots , x_{s-1} \right) \\&\qquad \qquad + S_{h, i_1, \ldots , i_r} \left. \left( i_1, \ldots , i_r \right| x_1, \ldots , x_s \right) \Bigg \} \end{aligned}$$

for \(s \ge 2\), where

$$\begin{aligned} \sum ^s_{x_1, \ldots , x_s} a_{x_1, \ldots , x_{s-1}} b_{x_s} = \sum ^s_{j=1} a_{(x)_j} b_{x_j}, \end{aligned}$$

where \((x)_j = x_1, \ldots , x_s\) with \(x_j\) deleted. So, (9.1) and (9.2) follow. Note that (9.3) follows since \(\delta (\theta _R) = 0\). \(\square \)

Appendix B

Theorem 10.1

Suppose \(Y_1, \ldots , Y_n\) are i.i.d in \(\mathbb{R }^r\) with mean \(\mu \) and finite covariance \(V\).

  1. (I)

    Let \(\{ a_{N, n} \} \subset \mathbb{R }^r\) satisfy

    $$\begin{aligned} \left( \max _N a'_{N, n} V a_{N, n} \right) / \sigma _n^2 \longrightarrow 0 \end{aligned}$$
    (10.1)

    as \(n \rightarrow \infty \), where \(\sigma _n^2 = \sum ^n_{N=1} a'_{N, n} V a_{N, n}\). Then \(\sum ^n_{N = 1} a'_{N, n} Y_N\) is asymptotically normal with mean \(\lim _{n \rightarrow \infty } \sum _{N = 1}^n a'_{N, n} \mu \) and variance \(\lim _{n \rightarrow \infty } \sigma ^2_n\).

  2. (II)

    Let \(\{ A_{N, n} \} \subset \mathbb{R }^{s \times r}\) satisfy

    $$\begin{aligned} \left( \max _N \text{ trace } A_{N, n} V A'_{N, n} \right) / \lambda _n \longrightarrow 0 \end{aligned}$$
    (10.2)

    as \(n \rightarrow \infty \), where \(\lambda _n\) is the minimum eigenvalue of \(C_n = \sum ^n_{N = 1} A_{N, n} V A'_{N, n}\). Then \(\sum ^n_{N = 1} A_{N, n} Y_N\) is asymptotically normal with mean \(\lim _{n \rightarrow \infty } \sum ^n_{N = 1} A_{N, n} \mu \) and covariance \(\lim _{n \rightarrow \infty } C_n\).

Suppose that \(V\) is positive-definite. Then (10.1) holds if

$$\begin{aligned} \max _N \left| a_{N, n} \right| ^2 / \sum ^n_{N = 1} \left| a_{N, n} \right| ^2 \longrightarrow 0, \end{aligned}$$
(10.3)

and (10.2) holds if

$$\begin{aligned} \left( \max _N \text{ trace } A_{N, n} A'_{N, n} \right) / \left( \text{ min. } \text{ eigenvalue } \text{ of } \sum A_{N, n} A'_{N, n} \right) \longrightarrow 0. \qquad \end{aligned}$$
(10.4)

Proof

Suppose the minimum eigenvalue of \(V\) is positive and (10.3) holds. Then the proof of (I) follows that given on page 153 of Hajek and Sidak (1967) for the case \(r=1\). The result in (II) follows under (10.4) by the Cramer-Wold device. That (I), (II) hold under (10.1), (10.2) follows by writing \(Y_j = V^{1/2} X_j\), where \(\{ X_N \}\) are i.i.d with covariance \(I_r\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Withers, C.S., Nadarajah, S. Asymptotic properties of \(M\)-estimators in linear and nonlinear multivariate regression models. Metrika 77, 647–673 (2014). https://doi.org/10.1007/s00184-013-0458-4

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-013-0458-4

Keywords

Navigation