Simple powerful robust tests based on sign depth

Up to now, powerful outlier robust tests for linear models are based on M-estimators and are quite complicated. On the other hand, the simple robust classical sign test usually provides very bad power for certain alternatives. We present a generalization of the sign test which is similarly easy to comprehend but much more powerful. It is based on K-sign depth, shortly denoted by K-depth. These so-called K-depth tests are motivated by simplicial regression depth, but are not restricted to regression problems. They can be applied as soon as the true model leads to independent residuals with median equal to zero. Moreover, general hypotheses on the unknown parameter vector can be tested. While the 2-depth test, i.e. the K-depth test for K=2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = 2$$\end{document}, is equivalent to the classical sign test, K-depth test with K≥3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K\ge 3$$\end{document} turn out to be much more powerful in many applications. A drawback of the K-depth test is its fairly high computational effort when implemented naively. However, we show how this inherent computational complexity can be reduced. In order to see why K-depth tests with K≥3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K\ge 3$$\end{document} are more powerful than the classical sign test, we discuss the asymptotic behavior of its test statistic for residual vectors with only few sign changes, which is in particular the case for some alternatives the classical sign test cannot reject. In contrast, we also consider residual vectors with alternating signs, representing models that fit the data very well. Finally, we demonstrate the good power of the K-depth tests for some examples including high-dimensional multiple regression.


Introduction
Outlier robust tests for linear models are mainly given by Wald-type tests, likelihood ratio tests, and score-type tests based on M-estimators and related estimators as originally proposed by Schrader and Hettmansperger (1980), Markatou et al. (1991), Silvapulle (1992), Heritier and Ronchetti (1994). See also Hampel et al. (2011), Chapter 7, Huber andRonchetti (2009), Chapter 13, andMaronna et al. (2019), Chapter 5. M-estimators have the disadvantage that they depend on score functions which must be specified. Moreover, they are not scale invariant so that the scale must be estimated simultaneously as this is done by the MM-estimators proposed by Yohai (1987). These MM-estimators are defined iteratively using the S-estimators for scale introduced by Rousseeuw and Yohai (1984). The robust tests given by lmRob in the R-packages robust and lmrob in robustbase are based on MM-estimators with special score functions where some efficient calculation is given by approaches of Stahel (2011, 2017). These tests are very powerful but complicated to compute since optimal regression and scale estimates are determined by an adaptive procedure.
We propose here powerful outlier robust tests which are much simpler since they are based only on signs of residuals. They can be used as soon as residuals R n (θ ), n = 1, . . . , N , of a parametric model given by a parameter θ ∈ Θ ⊂ R p , p ∈ N, can be defined and which satisfy P θ (R n (θ ) > 0) = 1 2 = P θ (R n (θ ) < 0).
Such residuals appear in linear or nonlinear regression models with realized regressors x n where observations are of the form Y n = g(x n , θ) + E n and the error variable E n has a continuous distribution with median equal to zero. Then the residuals are given by R n (θ ) = Y n − g(x n , θ). Generalized linear models are further examples of residuals satisfying (1) if the link function can be expressed by the median of the observations Y n , i.e. if med(Y n ) = g(x n , θ) ; see, e.g., Leckey et al. (2020) for a load-sharing model which also leads to residuals satisfying (1). More examples are given by stochastic processes with i.i.d. increments E n such as AR(p) processes given by Y n = g(Y n−1 , . . . , Y n− p , θ) + E n . Realizations of R 1 (θ ), . . . , R N (θ ) are denoted by r 1 (θ), . . . , r N (θ ). The proposed tests are called K -depth tests and are based on the so-called Ksign depth, shortly denoted by K -depth. The K -depth of a parameter θ in a set of realized observations y 1 , . . . , y N is the relative amount of K -tuples {n 1 , . . . , n K } ⊂ {1, . . . , N } with alternating signs of the residuals r n 1 (θ ), . . . , r n K (θ ). A hypothesis H 0 : θ ∈ Θ 0 is rejected if the K -depth of all θ ∈ Θ 0 is too small. The only hyperparameter which must be chosen is K . A good choice is often a value close to the dimension p of the parameter θ but other choices are possible.
For K = 2 and hypotheses of the form H 0 : θ = θ 0 , the K -depth test is the classical sign test which counts the number of positive (or negative) residuals r 1 (θ 0 ) and rejects the null hypothesis if the number of positive signs is too small or too large. In particular, it does not reject the null hypothesis if half of the residuals are positive and half of them are negative. However, this can also happen for alternatives with parameters far away from θ 0 , see Fig. 1, p. 13. Therefore this simple sign-test is not powerful for many alternatives. However, the proposed K -depth test with K ≥ 3 is much more powerful as we show in this paper.
The K -depth has its origin in simplicial regression depth. Simplicial regression depth is a modification of the regression depth introduced by Rousseeuw and Hubert (1999) to generalize the depth notion to regression. Originally, the halfspace depth of Tukey (1975) was used to obtain a generalization of the median for multivariate data. Liu (1988Liu ( , 1990) extended this to simplicial depth. Simplicial depth can be expressed by counting the number of all p + 1-tupels of the p-dimensional data set with positive halfspace depth. Replacing halfspace depth by regression depth leads to simplicial regression depth. For the calculation of simplicial regression depth, Rousseeuw and Hubert (1999) and Müller (2005) noted that the regression depth of a p-dimensional parameter vector within p + 1 observations is greater than zero if and only if the residuals have alternating signs. Sufficient conditions for this equivalence and a proof of this property are given by . This led to the idea to define the depth of a parameter θ directly via alternating signs of residuals in K -tuples.
It should be noted here that depth of a parameter in observations coming from a parametric model is treated only by few authors as Mizera and Müller (2004);Müller (2005); Denecke and Müller (2014); Paindaveine and Van Bever (2018); Wang (2019). Most of depth notion concern depth of data points in multivariate data sets as those of Zuo and Serfling (2000); Mosler (2002); Agostinelli and Romanazzi (2011) Any simplicial depth notion has the advantage that it is a U-statistics so that the asymptotic distribution can be derived by methods for U-statistics. For Liu's simplicial depth for multivariate data, this was used in Liu (1990); Dümbgen (1992); Arcones and Gine (1993). However, simplicial regression depth is often a degenerated U-statistic so that more effort is necessary to derive the asymptotic distribution, see Müller (2005); Wellmann et al. (2009);Wellmann and Müller (2010). The advantage of K -depth is that its distribution does not depend on the model and can be easily calculated for small sample sizes because only all 2 N combinations of signs must be considered. Moreover, its asymptotic distribution was derived in  for K = 3 and in Malcherczyk et al. (2021) for general K ≥ 3.
The derivation of the asymptotic distribution leads to an asymptotic equivalent variant of the K -depth which can be calculated in linear time O(N ) while a naive implementation has a complexity of O(N K ) if N is the sample size. Studying especially the behavior of K -depth in the situation of few sign changes in the data leads to another implementation in this paper. This implementation is based on blocks of equal signs and is exact, of complexity O(N ), and much faster than the implementation based on the asymptotic form. This allows the application in multiple regression with high dimension where K should grow with the dimension. In multiple regres-123 sion, an order of the residuals are needed. For this, we use new results of Horn and Müller (2020) -see also Horn (2021) -concerning optimal ordering of the multivariate explanatory variables.
In Sect. 2, we introduce the K -depth and the K -depth tests, discuss a relationship to runs test, and show how the computational complexity can be reduced by block implementation. Basic properties of the K -depth are derived in Sect. 3. This concerns a strong law of large numbers for the K -depth, the behavior at alternating signs of residuals and the behavior when only few sign changes occur. In particular, it is shown for all K ≥ 3 that the expected value of the K -depth and its maximal value have the same limit as the number of observations tends to infinity. It is also shown that residuals with few sign changes have a K -depth that is strictly less than this limit which explains why the K -depth test has a high power at alternatives that tend to have few sign changes. A comparison between the K -depth tests for different values of K is given in Sect. 4. At first, for K = 2, the equivalence of the K -depth test and the classical sign test is derived formally. Afterwards, the p-values of K -depth tests with K = 3, 4, 5, 6 are compared in some worst case scenarios with few sign changes taken from Sect. 3. Sect. 5 demonstrates the good power of the K -depth tests via simulations for quadratic regression and for multiple regression with high dimensions. Finally, a discussion of the results and an outlook are given in Sect. 6. More details of the proofs and the block implementation as well as further simulation results are given in the supplementary material.
Notation. Throughout the article, r 1 (θ ), . . . , r N (θ ) denote realizations of the residuals R 1 (θ), . . . , R N (θ ). If the choice of the parameter θ is clear, we also use the abbreviations r n := r n (θ ) and R n := R n (θ ) for n = 1, . . . , N . The sign of a real number x is denoted by ψ(x) = 1{x > 0} − 1{x < 0}, where 1{·} denotes the indicator function. In some asymptotic calculations we make use of the O-Notation: For real-valued sequences (a n ) n≥1 and (b n ) n≥1 , we write a n = O(b n ) if there is a constant C > 0 and an integer n 0 with |a n | ≤ C|b n | for all n ≥ n 0 . Furthermore, a n = Θ(b n ) denotes that both a n = O(b n ) and b n = O(a n ).

K -depth tests and reduction of their computational complexity
In this section, we introduce the K -depth of a vector and how to use the K -depth notion as a test statistic. We also briefly discuss the issue of a fairly high computational complexity when working with K -depth tests. This issue can be resolved by using alternative representations of the original definition of the K -depth. The results are based on the following general assumption on the statistical models with unknown parameter θ ∈ Θ ⊂ R p , p ∈ N: the residuals R 1 (θ ), . . . , R N (θ ) of N observations in R are independent and satisfy (1) if θ is the true parameter. (2)

K-depth and K-depth tests
The K-sign depth or shortly K-depth d K (r 1 , . . . , r N ) of r 1 , . . . , r N is the relative number of K -element subsets with alternating signs, i.e. for K ≥ 2, Remark 1 Note that the definition of the K -sign depth depends on the chosen order and therefore this choice is a crucial aspect. If x n ∈ R q for q > 1 then various multivariate orderings can be used. Not all of them provide powerful tests. Among the most promising approaches are orderings according to a shortest Hamiltonian path through the regressors x 1 , . . . , x N . Luckily, also less computationally intensive approximations of such a path (such as the nearest neighbor approach) seem to perform similarly well. A detailed discussion on these and other orderings can be found in Horn and Müller (2020). See also Sect. 5 for an example. Moreover, note that under some conditions given by , K -depth is equivalent to simplicial regression depth if K = p + 1 and θ ∈ R p . Hence, an appropriate choice of K is a natural number close to p + 1. However, in contrast to simplicial regression depth, other choices than K = p + 1 are possible as well.
In order to obtain a non-degenerate limit distribution, the K-depth test is based on the following test statistic: A test based on (4) requires the α-quantiles of the distribution of the test statistic. If N is small, the finite sample distribution for any K can be easily simulated since the determination of the K -depth with an underlying C++ algorithm computing Formula (3) is fairly fast for small N . For larger N , see Subsection 2.2.
With the quantiles at hand, the K -depth test, K ≥ 2, is defined as in Müller (2005): A hypothesis of the form H 0 : θ ∈ Θ 0 shall be rejected if the K -depth d K (r 1 (θ), . . . , r N (θ )) of θ or T K (θ ) is too small for all θ ∈ Θ 0 . Hence, if q α is the α-quantile of the distribution of T K (θ ) under θ then the K -depth test for H 0 : θ ∈ Θ 0 is given by 123

Remark 2
The K -depth test can also be used in a two-sided version: with α 1 + α 2 = α, for example α 1 = α 2 = α 2 . This test also rejects H 0 if too many sign changes occur in the residual vector, which is an indicator for negatively correlated residuals , for example in time series. While the one-sided version is mostly focused on detecting deviations from 0 in the median and can detect only strong positive correlation in the residuals, the two-sided version is the preferable choice when testing simultaneously whether the residuals are independent and have median zero. However, since our applications are mainly focused around tests on the median rather than on independence, this two-sided version will not be used subsequently. Nevertheless, note that a simplified version of the K -depth leads to a test which can be considered as a generalization of the runs test of Wald and Wolfowitz (1940) for testing the hypothesis of independent residuals, see e.g. Gibbons and Chakraborti (2003), pp. 78-86: This simplified K -depth uses only subsequent residuals and can be defined as in  for K ≥ 2: If K = 2 then this simplified K -depth counts the number of sign changes and thus the number of runs.  used the simplified versions because they are faster to compute and their asymptotic behavior is easy to derive. However, since the simplified K -depth only considers N − K + 1 subsets instead of N K , tests based on it are usually less powerful than tests based on the full K -depth, in particular if the independence of the residuals is ensured, see  and Falkenau (2016).

Runtime and block-implementation
A major drawback of the K -depth test is its slow runtime when using an algorithm based on the definition (3). This definition requires the consideration of all increasing K -tuples in {1, . . . , N }, hence leading to an algorithm with runtime Θ(N K ). Such an algorithm is clearly impractical in applications with fairly large sample sizes. Fortunately, the derivation of a limit theorem of the test statistic T K (θ ) leads to an asymptotically equivalent form of (4) which can be computed in linear time for all K ≥ 3. More precisely, under the true parameter θ where Ψ K is a functional given in Malcherczyk et al. (2021), o P (1) is a random variable converging to zero in probability, and Such a limit theorem is given in  for K = 3. A generalization to all K ≥ 3 as well as the resulting efficient algorithm can be found in Malcherczyk et al. (2021). We will not go into detail on how this algorithm with runtime Θ(N ) works since it requires a major part of the computation necessary to obtain the limit theorem and this is beyond this paper. Instead, we discuss a different approach which , when implemented carefully, even has a faster runtime than the algorithm from Malcherczyk et al. (2021). Moreover, the algorithm discussed below always yields the exact K -depth rather than an asymptotic approximation.
This section first provides the general idea of the algorithm that immediately results in an efficient procedure for residual vectors with only few sign changes. At the end of the section, a more careful implementation of this approach is sketched that leads to an efficient (linear time) algorithm to compute the exact K -depth. We refer to this approach as block-implementation. Aside from speeding up the implementation based on (3), this approach will be useful to derive some of the properties presented in Sect. 3.
Block-implementation. Let r := (r 1 , . . . , r N ) be a vector of residuals and let ψ (x) denote the sign of a real number x, i.e. ψ (x) := 1{x > 0} − 1{x < 0}. The vector r is decomposed into blocks by letting a new block start at index j if and only if r j−1 and r j have different signs. More formally, we define the number B(r ) of blocks and their starting positions s 1 (r ), . . . , s B(r ) (r ) via s 1 (r ) := 1 and For convenience, we define s B(r )+1 (r ) := N + 1. The block sizes are defined as ).

123
We say that the nth residual r n belongs to block j if and only if s j (r ) ≤ n < s j+1 (r ). The sign of block j is defined as the sign of the first (and thus any) element r s j (r ) belonging to that block. Blocks j 1 < . . . < j k are called alternating if and only if the signs of the blocks are alternating, i.e. the signs of block j i and j i+1 are different for all i = 1, . . . , k − 1. Note that two blocks j 1 and j 2 have different signs if and only if j 1 is even and j 2 is odd or vice versa. In particular, the blocks j 1 < . . . < j k are alternating if and only if j i+1 − j i is odd for all i = 1, . . . , k − 1.
Example 2 Consider the block decomposition for the vector r from Example 1. In this decomposition, blocks 1, 3, 5 have positive signs and blocks 2, 4 have negative signs. Hence if A denotes the set of alternating triples of blocks then Since a triple (r i , r j , r k ), i < j < k, of entries from r is alternating if and only if they belong to an alternating triple of blocks, we may count the number of triples in r with alternating signs by counting the corresponding combinations of elements from alternating blocks, i.e. in our example with r of length N = 8, More generally, we have the following alternative representation of (3): Lemma 1 Let O := 2N 0 + 1 denote the set of all odd positive integers and let Let q 1 (r ), . . . , q B(r ) (r ) be the block sizes of a vector r = (r 1 , . . . , r N ). Then Remark 3 Note that the size of A K ,B is Θ(B K ). Also note that the effort to compute the block sizes q 1 (r ), . . . , q B(r ) (r ) of a vector r = (r 1 , . . . , r N ) is Θ(N ). Hence, a naive algorithm based on the expression in Lemma 1 has computational complexity is the number of blocks in r . With some additional effort, the computational costs can even be reduced to Θ(N + B) by properly storing all relevant terms during the computation. For simplicity, we only discuss K = 3 here, more details on general K can be found in the supplementary material and in Malcherczyk (2022). Note that factoring out the length q i 2 of the second block in the representation from Lemma 1 yields This representation can be computed in linear time complexity by deriving the values of the inner sums in advance: To this end, let Note that all values (F(i 2 ), B(i 2 )), i 2 = 2, . . . , B − 1, can be computed with a total complexity of Θ(B) similarly to the cumulative sum of a vector of length B. With these values stored, (8) can be computed in linear time since the product of the inner sums equals F(i 2 ) · B(i 2 ) which now can be computed in constant time. For K ≥ 4, a similar approach leads to a representation with More details can be found in the supplementary material.
Remark 4 As a simulation study in the dissertation of Malcherczyk (2022) reveals, the efficient block implementation stated in Remark 3 is even faster than the asymptotic variant from Malcherczyk et al. (2021), even when considering residuals from the null hypothesis that have a large number of blocks. More details can be found in (Malcherczyk (2022), Chapter 5.3.)

Basic properties of the K-depth
This section contains some of the basic properties of the K -depth. In particular, we discuss the typical behavior in terms of a law of large numbers in Sect. 3.1. Sections 3.2 and 3.3 contain extremal cases where the test statistic is close to its maximal or minimal value, respectively.

Law of large numbers
Let R 1 := R 1 (θ), . . . , R N := R N (θ ) be independent random variables satisfying (1). Then the expectation of the K -depth is given by A convergence of the K -depth towards this expectation can be shown by rewriting the summands in (3) using the identity in the next lemma. In order to avoid triple indices, Studying the variance of the expression (10) reveals that it converges to zero as N → ∞. Hence Lemma 2 leads to a law of large numbers for K -depth: Proof (Sketch) Set R n = R n (θ ). The assertion follows from by using Chebyshev's inequality and the Borel-Cantelli Lemma. The bound on the variance can be deduced from the representation given in Lemma 2 by taking into account that ψ (R 1 ) , . . . , ψ (R N ) are i.i.d. and uniformly distributed on {−1, 1} and therefore

K-depth for alternating signs
In this section we study the behavior of the K -depth of residuals with alternating signs, i.e. of residuals r 1 , . . . , r N with ψ (r n ) = −ψ (r n+1 ) for n = 1, . . . , N − 1.
Alternating signs indicate a good fit and the K -depth attains its maximum value in this situation. Therefore it is of interest what exactly this maximum value is. This is given by the following theorem. As usual, we use the convention n k = 0 for n < k.
Theorem 2 Suppose r 1 , . . . , r N have alternating signs. Then, for 2 ≤ K ≤ N , Proof (Sketch) Let r 1 , . . . , r N be residuals with alternating signs. Let |A K ,N | be the size of the set A K ,N from Lemma 1. Then d K (r 1 , . . . , r N ) = |A K ,N |/ N K . It therefore only remains to count the number of 1 ≤ i 1 < . . . < i K ≤ N for which i j+1 − i j is odd for all j = 2, . . . , K . The supplementary material contains a combinatorial deduction of this number that is based on rewriting i j+1 − i j = 2a j + 1, a j ∈ N.
Note that Theorem 2 can also be used to determine the size of the index set A K ,B in the block-implementation:

Corollary 1 Let B, K ≥ 2 be integers and let A K ,B be as in Lemma 1. Then
Theorem 2 implies that the K -depth of residuals with alternating signs converges to the expected value (1/2) K −1 as N → ∞. In conjunction with Corollary 1, we may extend this property to the following more general class of alternating vectors: Proof (Sketch) A combination of Lemma 1 and Corollary 1 yield an explicit expression for the K -sign depth of a vector with blocks of equal size. The assertion follows at once for odd B + K and after rearranging this expression for even B + K .
An asymptotic analysis of the K -depth based on Lemma 3 reveals that the K -depth test statistic of residuals that alternate in blocks of size M converges to its maximal value: Theorem 3 Let M be a fixed integer. If the residuals r 1 , . . . , r N are alternating in blocks of size M, then Proof (Sketch) The assertion follows from the explicit formula given in Lemma 3 by approximating the falling factorials up to their second order term using , for x = B/2, a = (K − 2)/2, J = K − 1 and x = B/2, a = (K − 1)/2, J = K and x = N , a = 0, J = K , respectively.

Remark 5
(a) Theorem 3 yields that the maximal value of the test statistic (i.e. the value for residuals with alternating signs) is asymptotically K (K − 1)/2 K . Since the minimal K -depth is zero, the minimal value of the test statistic is −N /2 K −1 which diverges as N → ∞. Hence the (asymptotic) distribution of the test statistic T K (θ ) is bounded from above but unbounded from below. In particular, its distribution is not symmetric. (b) Since the test statistic converges to its maximal value if the residuals are alternating in blocks of size M ≥ 1, the (one-sided) K -depth test will not reject the model when such residuals are observed and N is sufficiently large. This can often be desirable in practice where alternating residuals indicate a good fit and a systematic alternation (in blocks of fixed size) can be caused by some vibration behavior which is difficult to filter out. (c) If the independence of the residuals is questionable and of additional interest then alternating residuals are indicating dependence. In such situations, the two-sided K -depth test as proposed in Remark 2 can be used. Since alternating residuals yield the maximal possible value, the two-sided test will always reject the model when such residuals are observed and N is sufficiently large.

Behavior in situations of few sign changes
Residual vectors with only few sign changes usually indicate a bad choice for the modeling parameter, see, e.g., Fig. 1 for so-called nonfits in a quadratic regression model. A nonfit is defined as in Rousseeuw and Hubert (1999): Definition 2 A parameter θ is called a nonfit if there exists another parameterθ such that |r n (θ)| < |r n (θ )| for all n = 1, . . . , N .
The 2-depth test can struggle rejecting such bad choices since this test, as we will formally show in Sect. 4.1, is equivalent to the classical sign test. In particular, it does not reject the model if nearly half of the residuals are positive, regardless of how many sign changes the residuals have. K -depth tests with K ≥ 3 are much more powerful in this regard since they immediately reject models that lead to few sign changes. More precisely, the following lemma is easy to show for residuals vectors r = (r 1 , . . . , r N ) where the number B(r ) of blocks (see Sect. 2.2) is small: Note that a K -depth of zero is the smallest possible value of the K -depth. Hence this will always lead to a rejection of the null hypothesis by the K -depth test if the sample size is high enough that a rejection at level α is possible. Usually a nonfit of a p-dimensional parameter is expressed by at most p − 1 sign changes. Hence a Kdepth test with K = p + 1 will protect against bad power at nonfits, see also . However, choices K < p + 1 can also lead to a good power of the  N (0, 1.5 2 ). The solid lines correspond to parameters that yield alternatives with either one or two sign changes: θ (1) = (120, −24, 1) yielding g(x, θ (1) ) = 120 − 24x + x 2 on the left hand side and θ (2) = (3, 6, −0.5) yielding g(x, θ (2) ) = 3 + 6 x − 0.5 x 2 on the right hand side K -depth test at alternatives for which the expected depth of (1/2) K −1 is not reached. More precisely, since all α-quantiles of the asymptotic distribution of the K -depth test statistic T K (θ ) are fixed values greater than −∞, we have the following property for growing sample size N : The strict inequality (11) is in particular satisfied if the relative number of either the positive or negative residuals is tending to 1. This is often the case when the region of explanatory variables is growing to infinity as N converges to infinity. This was used in  to show the consistency of a test based on simplicial depth for explosive AR(1) regression.
Assuming a bounded, fixed support for the explanatory variables, the relative number of positive/negative residuals usually does not tend to one for alternatives, e.g. in polynomial regression. However, one at least expects only few sign changes then; see Fig. 1 for examples with only one or two sign changes. We therefore end the section with a discussion on the K -depth of residual vectors where the number of blocks/sign changes is bounded.
For the remainder of the section, we will use the alternative representation of the K -depth based on the block-implementation (see Sect. 2.2). Recall that the K -depth of residuals r 1 , . . . , r N with B blocks and block sizes q 1 , . . . , q B is given by N ,B (q 1 , . . . , q B Although q 1 , . . . , q B are integers in practice, it will be more convenient in the subsequent analysis to let q 1 , . . . , q B be positive real numbers. In order to see that the K -depth test always rejects the null hypothesis if B is sufficiently small, we need to consider the input q 1 , . . . , q B with maximal K -depth. While it is arguably quite intuitive to assume that this maximum is attained at q j = N /B for all j = 1, . . . , N , a formal proof to determine the maximum is challenging. We therefore state the following conjecture which we only checked for some particular choices of K and B and could only prove for K = 3 completely. The proof is based on an optimization via Lagrange multipliers which, in particular, requires to show the uniqueness of its critical point. However, transforming the system of equations to deduce the uniqueness becomes very complicated for larger K , see the supplementary material for the proof and the main problem for the case K ≥ 4:

Then the following holds: (a) If K + B is even then
The necessity of a case distinction between K + B even/odd might be a bit surprising at first. But in fact it is not hard to check that the function d K ,N ,B has the following property: Lemma 5 Let K ≥ 2 and B ≥ K . If K + B is odd then The key observation to prove the lemma is that, for any (i 2 , . . . , N ,B (q 1 , . . . , q B ) where i 1 = 1 can be merged with those where i K = B, resulting in a rearranged sum equal to d K ,N ,B (q 1 + q B , q 2 , . . . , q B−1 ) as claimed.
Hence we may assume w.l.o.g. that K + B is even and use Lemma 5 to cover the odd case. Before stating the general result, we consider the special cases B = K and B = K + 1. In these cases, Conjecture 1 is easy to verify since, by definition, In particular, we have the following theorem for the maximal K -depth among all valid block sizes q 1 , . . . , q B . The set of these valid block sizes is denoted by Theorem 4 Let K ≥ 2, B ∈ {K , K + 1} and let Q N ,B be as above. Then where the inequality in (14) is strict for K ≥ 3.
Proof (Sketch) For B = K , one needs to compute the global maximum of the function given in (12) with the side conditions q 1 , . . . , q K ∈ N and N k=1 q k = N . When disregarding the integer condition, this can easily be done, e.g., by using Lagrange multipliers. This reveals a unique global maximum at q k = N /K for all k = 1, . . . , B which coincides with the integer maximum whenever N /K ∈ N. The case B = K + 1 follows from the case B = K and Lemma 5.
For the general case B ≥ K + 2, we will only consider the input q 1 = . . . = q B = N /B since this is assumed to yield the maximal depth according to Conjecture 1 if K + B is even. Lemma 3 yields the following result on the asymptotic K -depth.

Theorem 5 Let K ≥ 2 and B ≥ K be fixed. If K + B is even then
The inequality in (15) is strict for K ≥ 3.
Proof (Sketch) The equality follows from Lemma 3 since N K / N K → 1 as N → ∞.
For the upper bound, let g(x) = ( Then the bound follows since g has a unique global maximum at x = K /2.

Remark 6
If K + B is odd then Lemma 5 and Theorem 5 yield for all β ∈ (0, 1) with a strict inequality for K ≥ 3. Moreover, if we assume that Conjecture 1 is true, then (15) and (16) imply for any fixed number B of blocks (13). Moreover, the inequality above is strict for K ≥ 3. Hence, H 0 : θ ∈ Θ 0 is rejected at an alternative for sufficiently large sample sizes N if the number of blocks in (r 1 (θ ), . . . , r N (θ )) is uniformly bounded for all θ ∈ Θ 0 as N → ∞.

Comparison of K -depth tests for different K
A proper choice for K is a crucial aspect to obtain a K -depth test with high power. This section contains some basic observations for the cases K ≤ 6, in particular in terms of power when only few sign changes are observed. A more profound comparison in applications will be done in Sect. 5. As we will see in Sect. 4.1, the 2-depth test is usually a bad choice since it is equivalent to the classical sign test. This test struggles to reject the null hypothesis at alternatives that lead to a nearly equal amount of positive and negative residuals. K -depth tests with K ≥ 3 can correctly identify and reject such alternatives as long as the number of sign changes in the residual vector is fairly low. A discussion on the p-values of the K -depth tests, K = 3, . . . , 6, for several different sample sizes can be found in Sect. 4.2.

Equivalence of the 2-depth test and the classical sign test
The test statistic of the classical sign test is given by denotes the number of residuals with positive signs among the residual vector (R 1 (θ), . . . , R n (θ )). Assuming (1), this test statistic converges in distribution to the 123 standard normal distribution. Hence the classical sign test (in its asymptotic version) is defined via where u α denotes the α-quantile of the standard normal distribution. Equivalently, one can define the classical sign test via where χ 2 1,α is the α-quantile of the χ 2 1 distribution. Note that T sign (θ ) 2 is minimized if N + (θ ) = N /2. Hence the test will not reject the null hypothesis if half of the residuals are positive.
To see the relationship to the 2-depth test, note that a pair of residuals has alternating signs if and only if one of them is positive and the other one is negative. Since we have N + (θ ) positive and N − N + (θ ) negative residuals (assuming R n (θ ) = 0 P θ -almost surely for all n = 1, . . . , N ), the 2-depth satisfies P θ -almost surely: The 2-depth can be transformed into T sign (θ ) by using the identity for x = N + (θ ). A straightforward calculation based on this identity reveals that the test statistic (4) satisfies for K = 2, T sign (θ ) 2 P θ -almost surely.
Hence the 2-depth test and the classical sign test are equivalent.

Comparison of K-depth tests for K ≥ 3
As we have seen in Sect. 3.3, K -depth tests with K ≥ 3 are capable of rejecting nonfits that lead to a small number of sign changes, at least as long as the sample size N is sufficiently large. We will now take a closer look at the performance for small samples sizes up to N = 160. Recall that, according to Conjecture 1, we assume that the maximal K -depth of a residual vector r = (r 1 , . . . , r N ) with B blocks is given by  Figure 2 contains the p-values when observing a value of η K ,N ,B for B = 3, 4, 5, 6 blocks or 2, 3, 4, 5 sign changes, respectively, i.e. the probabilities are plotted for samples sizes N between 10 and 160 and K = 3, 4, 5, 6. Recall that if a residual vector has B block, i.e. B − 1 sign changes, then K -depth tests with K > B will automatically reject the null hypothesis as soon as the sample size is large enough to make a rejection possible for the test. Figure 2 thus only contains K -depth tests with K ≤ 4 for situations with two sign changes to highlight that the p-value of the 4-depth test indeed becomes 0 if N is sufficiently large. The same applies to the 5-depth test when three sign changes occur. The other two plots (four and five sign changes) do not contain the corresponding 6-and 7-depth tests since their p-values behave similarly.
All four subfigures of Fig. 2 indicate that the p-values of all considered K -depth tests are decreasing to zero for growing sample size. They decrease more slowly for K = 3, 4 than for K = 5, 6, but even the p-value of the 3-depth test reaches 0.1 for a sample size greater than N = 150. It is remarkable that the p-values of the K -depth tests with K = B − 1 and K = B are always very similar for all B − 1 = 3, 4, 5 sign changes we considered. However, this does not hold for B − 1 = 2 since the 2-depth test is the classical sign test which always has a p-value of 1 in the case of two blocks of equal size.

Applications
The high power of 3-depth tests in the case of two unknown parameters was already shown for explosive AR(1) models, namely in  for linear AR(1)models given by Y n = θ 0 + θ 1 Y n−1 + E n and in  for nonlinear AR(1)-models given by Y n = Y n−1 + θ 1 Y θ 2 n−1 + E n , see also Falkenau (2016). In particular these results showed for normally distributed errors E n that 3-depth tests possess similarly high power compared to classical tests based on least squares.
Other results for the quadratic regression model, a nonlinear AR(1)-model and an explosive AR(2)-model, each with three unknown parameters, can be found in the supplementary material. These examples show that there is not much difference in the power of the 3-depth test, the 4-depth test, and the classical F-and t-test, respectively, if the sample size is large enough, which means close to 100. There are only relevant differences if the sample size is small. See for example Fig. 3 for testing H 0 : θ = (1, 0, 1) in a quadratic regression model given by Y n = θ 0 + θ 1 x n + θ 2 x 2 n + E n with θ = (θ 0 , θ 1 , θ 2 ) . This example concerns normally distributed errors, but the results are very similar for Cauchy distributed errors. The only exception is the F-test which loses much power if the errors have a Cauchy distribution. See the supplementary material for the behavior of Cauchy distributed errors and for other alternatives.
Additionally, we demonstrate here the good power of K -depth tests with K = 21 and K = 38 for a high-dimensional multiple regression model given by Y n = D d=1 θ d x nd + E n with D ∈ {10, 20, 40, 80} and N = 100. The regressors are ordered by computing a shortest path through the multidimensional data. This is done here by the Shortest Hamiltonian Path (SHP), see for example Applegate et al. (2006). Horn and Müller (2020) show that this ordering is superior to other possibilities for ordering. The SHP belongs to the NP-hard problems. In particular, any known exact algorithm to compute it has exponential time complexity in the number of data points in the worst case. However, empirically the runtimes are quite small for medium numbers of observations, see Horn and Müller (2020) or Horn (2021).
The tested hypothesis is H 0 : θ d = 0 ∀d = 1, . . . , D vs. H 1 : ∃d = 1, . . . D : θ d = 0. The 21-depth test and the 38-depth test are compared with the classical F-test and the sign test as well as a robust Wald test and a robust score test. For the Wald test, estimators of the parameters and covariance matrix of an MM-regression obtained by the function lmRob() from the R-package robust ) are used. For the robust score test, a self implemented R-function is used based on a high-dimensional version of the procedure from Khan and Yunus (2014). The scores are computed by the R-function psi.weight() using the setting ips = 4 from the package robust and the scale factor is estimated by lmRob.S() from the package robustbase. The performance of the tests is measured in three different situations: Firstly with normally distributed errors E n , secondly with double exponentially distributed errors, and thirdly with Cauchy distributed errors. Because of the high dimensionality, the complete power functions cannot be shown, but only some aspects. Here, it is looked at the aspect λ(θ ) = θ 1 ∈ [−1, 1], where all other components of θ are set to zero. The power was simulated with 1000 repetitions for the K -depth test, F-test and sign test and with 500 repetitions for the robust Wald test and robust score test at 101 or 201 equidistant points within [−1, 1] or [−2, 2], respectively. Because of the symmetry of the model in θ , the power functions look the same for all aspects λ(θ ) = θ d , d = 1, . . . , D. Similar results are obtained if other alternatives like θ 1 = . . . = θ D = γ , γ ∈ [−1, 1], are considered, see the supplementary material. Figure 4 shows the extracts of the simulated power functions for the considered aspect. Firstly, this figure shows that the K -depth test performs better for higher K , e.g., K = 38 performs better than for K = 21 in Fig. 4 or K = 5 in the supplementary material. In general, it holds that K should have at least the same magnitude as D to reach good results of the K -depth test. It can be nicely seen in Fig. 4 that the power of the K -depth test for K = 21 is satisfying for D = 10 and D = 20, whereas it is worse for D = 40 and D = 80 compared to the case K = 38. Secondly, the robust Wald test performs well for D = 10 and D = 20, but for D = 40, the level is not maintained, i.e., the power values are much larger than α = 0.05 at H 0 , and for D = 80, the robust Wald test cannot be carried out at all. Indeed, it is not the Wald test itself which causes the problems, but the underlying MM-estimation. In our simulations, the R-function lmRob() always threw an error when trying to calculate the estimator for D = 80 dimensions and N = 100 data points stating that internally a matrix cannot be inverted because of numerical singularity. Similar problems appear when using lmrob() from the R-package robustbase instead. In contrast to this, the K -depth tests or the robust score test remain computable for such high dimensions although the power of the K -depth test is not very good due to values of K much smaller than D. The robust score test has a very small power if θ 1 is closer to zero but the power function increases more strongly for higher deviations.
Furthermore, Fig. 4 shows that of course the F-test performs best when having normally distributed errors. But for Cauchy distributed errors, the K -depth test is better than the F-test. The classical sign test performs poorly regardless of what the dimension D is. Its power is always about 0.05. Furthermore, the cases D = 10 and K = 21 or D = 20 and K = 38 show that the K -depth test can keep up with the robust Wald test, the robust score test and the F-test (for normally distributed errors) when K is sufficient large in comparison to D. Unfortunately, the parameter K of the K -depth test cannot be chosen arbitrarily high for fixed N , since otherwise the test is unable to reject at all due to the circumstance that the α-quantile can then coincide with its minimal value. Some benchmarks how high the parameter K can be chosen for given N are given in (Malcherczyk (2022), Chapter 6.3.2). For larger sample sizes, the power of the K -depth tests increases significantly for all considered dimensions D, but still do not reach the power of the robust Wald test which is computable then. See the supplementary material for N = 500.
The results in this section were computed with the help of the R-package GSignTest (Horn 2020). For computing the SHP, the package TSP (Hahsler and Hornik 2019) and the "Concorde"-solver (Applegate et al. 2004) were used. Graphics were made with the help of the packages rgl (Adler and Murdoch 2020) and ggplot2 (Wickham 2016).
Supplementary material. A file with full proofs, more details on the block implementation and further simulation results can be found under the following link: https:// doi.org/10.1007s00362-022-01337-5.

Discussion and outlook
K -sign depth can be used to define simple robust tests which we refer to as K -depth tests. While the parameter choice K = 2 essentially leads to the classical sign test and thus has several limitations in rejecting alternatives, K -depth tests for K ≥ 3 are fairly powerful. They are not as powerful as the complicated robust Wald tests based on MM-estimators but can outperform classical approaches such as the F-test, in particular in the presence of outliers. The K -depth tests are not very well-suited 123 for small sample sizes and models where the number of sign changes in the residual vector is likely to exceed K − 1 at alternatives. However, the K -depth tests perform very well in our examples once the sample size is sufficiently large.
The K -depth test can also be used when having no inherent order in the data, like for example for multiple regression. For this, ordering the regressors according to a Shortest Hamiltonian Path leads to very good power of the test for rather low dimensions. In higher dimensions, the parameter K should be of the same magnitude as the number of dimensions. When this is not possible, the power of the K -depth test decreases. However, in contrast to the robust Wald test based on MM-estimators, it still works without any errors caused by numerical issues.
To reduce the runtime of Θ(N K ) of the definition of the K -depth, a faster block implementation is presented which leads to an algorithm with linear runtime. A linear runtime of an asymptotically equivalent form can also be obtained by the derivation of the asymptotic distribution of the K -depth for K ≥ 3, see Malcherczyk et al. (2021).
Although the simulation study in this article only deals with one-point hypotheses, the K -sign depth can also be used to test general hypotheses of the form H 0 : θ ∈ Θ 0 . In this case, the maximal value of the test statistic in Θ 0 must be computed. However, more research is necessary to find an efficient algorithm for this maximum.
Moreover, this paper is mainly focused on the one-sided version of K -depth test to detect shifts in the medians of the residuals. A two-sided version of the K -depth test can also detect dependence structures within the residuals and may be useful for stationary AR-models and other stationary processes. Once again, further research is necessary to compare the two-sided K -depth test with other approaches when testing simultaneously whether residuals are independent and have medians equal to zero.
A possible extension of the presented approach to multivariate observations might be possible. In particular, multivariate sign changes based on the multivariate spatial sign of Möttönen and Oja (1995) could be used as in Paindaveine (2009) for counting the K -tuples with K − 1 sign changes. This would lead to a multivariate K -sign depth. However, it is not clear how to transfer the concept of blocks as used in this paper.