Specification testing in semi-parametric transformation models

In transformation regression models, the response is transformed before fitting a regression model to covariates and transformed response. We assume such a model where the errors are independent from the covariates and the regression function is modeled nonparametrically. We suggest a test for goodness-of-fit of a parametric transformation class based on a distance between a nonparametric transformation estimator and the parametric class. We present asymptotic theory under the null hypothesis of validity of the semi-parametric model and under local alternatives. A bootstrap algorithm is suggested in order to apply the test. We also consider relevant hypotheses to distinguish between large and small distances of the parametric transformation class to the ‘true’ transformation.


Introduction
It is very common in applications to transform data before investigation of functional dependence of variables by regression models. The aim of the transformation is to obtain a simpler model, e.g. with a specific structure of the regression function, or a homoscedastic instead of a heteroscedastic model. Typically, flexible parametric classes of transformations are considered from which a suitable one is selected data-dependently. A classical example is the class of Box-Cox power transformations (see Box and Cox (1964)). For purely parametric transformation models, see Carroll and Ruppert (1988) and references therein. Powell (1991) and Mu and He (2007) consider transformation quantile regression models. Nonparametric estimation of the transformation in the context of parametric regression models has been considered by Horowitz (1996) and Chen (2002), among others. Horowitz (2009) reviews estimation in transformation models with parametric regression in the cases where either the transformation or the error distribution or both are modeled nonparametrically. Linton et al. (2008) suggest a profile likelihood estimator for a parametric class of transformations, while the error distribution is estimated nonparametrically and the regression function semi-parametrically. Heuchenne et al. (2015) suggest an estimator of the error distribution in the same model. Neumeyer et al. (2016) consider profile likelihood estimation in heteroscedastic semi-parametric transformation regression models, i.e. the mean and variance function are modeled nonparametrically, while the transformation function is chosen from a parametric class. A completely nonparametric (homoscedastic) model is considered by Chiappori et al. (2015). Lewbel et al. (2015) provide a test for the validity of such a model. The approach of Chiappori et al. (2015) is modified and corrected by Colling and Van Keilegom (2019). The version of the nonparametric transformation estimator considered in the latter paper is then applied by Colling and Van Keilegom (2020) to suggest a new estimator of the transformation parameter if it is assumed that the transformation belongs to a parametric class.
In general, asymptotic theory for nonparametric transformation estimators is sophisticated and parametric transformation estimators show much better performance if the parametric model is true. A parametric transformation will thus lead to better estimates of the regression function. Moreover, parametric transformations are easier to interpret and allow for subsequent inference in the transformation model. For the latter purpose note that for transformation models with parametric transformation, lack-of-fit tests for the regression function as well as tests for significance for covariate components have been suggested by Colling and Van Keilegom (2016), Colling and Van Keilegom (2017), Allison et al. (2018) and Kloodt and Neumeyer (2020). Those tests cannot straightforwardly be generalized to nonparametric transformation models because known estimators in that model do not allow for uniform rates of convergence over the whole real line, see Chiappori et al. (2015) and Colling and Van Keilegom (2019).
However, before applying a transformation model with parametric transformation, it would be appropriate to test the goodness-of-fit of the parametric transformation class. In the context of parametric quantile regression, Mu and He (2007) suggest such a goodness-of-fit test. In the context of nonparametric mean regression Neumeyer et al. (2016), develop a goodness-of-fit test for the parametric transformation class based on an empirical independence process of pairs of residuals and covariates. The latter approach was modified by , who applied empirical characteristic functions. In a linear regression model with transformation of the response, Szydłowski (2020) suggests a goodness-of-fit test for the parametric transformation class that is based on a distance between the nonparametric transformation estimator considered by Chen (2002) and the parametric class. We will follow a similar approach but consider a nonparametric regression model. The aim of the transformations we consider is to induce independence between errors and covariates. The null hypothesis is that the unknown transformation belongs to a parametric class. Note that when applied to the special case of a class of transformations that contains as only element the identity, our test provides indication on whether a classical homoscedastic regression model (without transformation) is appropriate or whether first the response should be transformed. Our test statistic is based on a minimum distance between a nonparametric transformation and the parametric transformations. We present the asymptotic distribution of the test statistic under the null hypothesis of a parametric transformation and under local alternatives of n −1/2 -rate. Under the null hypothesis, the limit distribution is that of a degenerate U-statistic. With a flexible parametric class applying an appropriate transformation can reduce the dependence enormously, even if the 'true' transformation does not belong to the class. Thus, for the first time in the context of transformation goodness-of-fit tests, we consider testing for so-called precise or relevant hypotheses. Here, the null hypothesis is that the distance between the true transformation and the parametric class is large. If this hypothesis is rejected, then the model with the parametric transformation fits well enough to be considered for further inference. Under the new null hypothesis, the test statistic is asymptotically normally distributed. The term "precise hypotheses" refers to Berger and Delampady (1987). Dette et al. (2020) considered precise hypotheses in the context of comparing mean functions in the context of functional time series. Note that the idea of precise hypotheses is related to that of equivalence tests, which originate from the field of pharmacokinetics (see Lakens (2017)). Throughout, we assume that the nonparametric transformation estimator fulfills an asymptotic linear expansion. It is then shown that the estimator considered by Colling and Van Keilegom (2019) fulfills this expansion and thus can be used for evaluating the test statistic.
The remainder of the paper is organized as follows. In Sect. 2, we present the model and the test statistic. Asymptotic distributions under the null hypothesis of a parametric transformation class and under local alternatives are presented in Sect. 3, which also contains a consistency result and asymptotic results under relevant hypotheses. Section 4 presents a bootstrap algorithm and a simulation study. Section 1 of the supplementary material contains assumptions for bootstrap results, while Section 2 there treats a specific nonparametric transformation estimator and shows that it fulfills the required conditions. The proofs of the main results are given in Section 3 and a rigorous treatment of bootstrap asymptotics is given in Section 4 of the supplement.

The model and test statistic
Assume we have observed (X i , Y i ), i = 1, . . . , n, which are independent with the same distribution as (X , Y ) that fulfill the transformation regression model where E[ε] = 0 holds and ε is independent of the covariate X , which is R d X -valued, while Y is univariate. The regression function g will be modelled nonparametrically. The transformation h : R → R is strictly increasing. Throughout we assume that given the joint distribution of (X , Y ) and some identification conditions, there exists a unique transformation h such that this model is fulfilled. It then follows that the other model components are identified via g( . See Chiappori et al. (2015) for conditions under which the identifiability of h holds.
In particular, conditions are required to fix location and scale, and we will assume throughout that Now let {Λ θ : θ ∈ Θ} be a class of strictly increasing parametric transformation Our purpose is to test whether a semi-parametric transformation model holds, i.e.
for some parameter θ 0 ∈ Θ, whereε and X are independent. Due to the assumed uniqueness of the transformation h one obtains h = h 0 under validity of the semiparametric model, where Thus, we can write the null hypothesis as which thanks to (2) can be formulated equivalently as Our test statistics will be based on the following L 2 -distance where w is a positive weight function with compact support Y w . Its empirical counterpart is whereĥ denotes a nonparametric estimator of the true transformation h as discussed below, and C 1 ⊂ R + , C 2 ⊂ R are compact sets. Assumption (A6) assures that the sets are large enough to contain the true values. Let γ := (c 1 , c 2 , θ) and Υ := C 1 ×C 2 ×Θ. The test statistic is defined as and the null hypothesis should be rejected for large values of the test statistic. If the null hypothesis holds, the minimizing parameters c 1 , c 2 in Eq. (5) can be written as Hence, an alternative test statisticT can be considered as well.
We will derive the asymptotic distributions under the null hypothesis and local and fixed alternatives in Section 3 and suggest a bootstrap version of the tests in Section 4.
for the parametric transformation (assuming H 0 ) corresponding to T n andT n . They observe thatθ outperformsθ in simulations.
Nonparametric estimation of the transformation h has been considered by Chiappori et al. (2015) and Colling and Van Keilegom (2019). For our main asymptotic results, we need thatĥ has a linear expansion, not only under the null hypothesis, but also under fixed alternatives and the local alternatives as defined in the next section. The linear expansion should have the form Here, ψ needs to fulfil condition (A8) in Section 3, and we use the definitions (i = 1, . . . , n) where F Y denotes the distribution of Y and is assumed to be strictly increasing on the support of Y . To ensure that T is well-defined, the values 0 and 1 are w.l.o.g. assumed to belong to the support of Y , but can be replaced by arbitrary values a < b ∈ R (in the support of Y ). The expansion (8) could also be formulated with a linear term n −1 n i=1ψ (X i , Y i , y). In Section 2 of the supplement, we reproduce the definition of the estimatorĥ that was suggested by Colling and Van Keilegom (2019) as modification of the estimator by Chiappori et al. (2015). We give regularity assumptions under which the desired expansion holds, see Lemma 1. Other nonparametric estimators of the transformation that fulfill the expansion could be applied as well.
Due to the identifiability conditions (2), one obtains c 2 = Λ θ 0 (0) + n −1/2 r (0) and 0)). Assumption (A5) yields boundedness of r , so that we rewrite the local alternative as Note that the null hypothesis H 0 is included in the local alternative H 1,n by considering r ≡ 0 which gives h = h 0 . We assume the following data generating model under the local alternative H 1,n . Let the regression function g, the errors ε i and the covariates X i be independent of n and define , which under local alternatives depends on n through the transformation h. Throughout we use the notation (i = 1, . . . , n) Further, recall the definition of U i in (9). Note that the distribution of U i does not depend on n, even under local alternatives, because (2), and similarly F Y (1) = P(S i ≤ 1).
To formulate our main result, we need some more notations. With ψ from (8), Z i from (9) and S i from (11) and let P Z and F Z denote the law and distribution function, respectively, of Z i . The quantities which are marked with an " f ", referring to "fixed" parameters c 1 = Λθ (1) − Λθ (0) and c 2 = Λθ (0), will be used to describe the asymptotic behaviour of the test statisticT n . With these notations, the assumptions for the asymptotic results can be formulated. To this end, let Y denote the support of Y (which depends on n under local alternatives). Further, F S denotes the distribution function of S 1 as in (11) and The following assumptions are used.
(A1) The sets C 1 , C 2 and Θ are compact. (A2) The weight function w is continuous with a compact support Y w ⊂ Y. (A3) The map (y, θ) → Λ θ (y) is twice continuously differentiable on Y w with respect to θ and the (partial) derivatives are continuous in (y, θ) ∈ Y w × Θ. (A4) There exists a unique strictly increasing and continuous transformation h such that model (1) holds with X independent of ε.
(A5) The function h 0 defined in (10) is strictly increasing and continuously differentiable and r is continuous on When considering a fixed alternative H 1 or the relevant hypothesis H 0 below, (A6) and (A8) are replaced by the following Assumptions (A6') and (A8') (assumption (A8') is only relevant for H 0 ). Note that h is a fixed function then, not depending on n.
Remark 3.1 Assumptions concerning compactness of the parameter spaces, differentiability of model components and uniqueness of the minimizer γ 0 are standard assumptions in the context of goodness of fit tests. Moreover, it can be shown that the definitions of Γ 0 and Γ 0, f in (A7) coincide with those in Eqs. (14) and (15), respectively. Assumption (A8) controls the asymptotic behaviour ofĥ − h and thus the rate of local alternatives which can be detected. The Donsker and boundedness conditions are needed to obtain uniform convergence rates ofĥ − h and some negligible remainders in the proof. Assumption (A8') is the counterpart of Assumption (A8) for precise hypotheses as considered in (24).
Then, under the local alternative H 1,n , T n converges in distribution to In particular, under H 0 (i.e. for r ≡ 0), T n andT n converge in distribution to The proof is given in Section 3 of the supplementary material. Asymptotic level-α tests should reject H 0 if T n orT n are larger than the (1 − α)-quantile of the distribution of T orT , respectively. As the distributions of T andT depend in a complicated way on unknown quantities, we will propose a bootstrap procedure in Section 4. Although most results hold similarly for T n andT n , for ease of presentation, we will mainly focus on results for T n in the remainder.
and ψ from (8). Thus, the operator K defined in Theorem 3.2 is positive semidefinite. 2. The appearance of W 0 under the local alternative results from asymptotic theory for degenerate U-statistics. Related phenomena occur in the case of quadratic forms. Similar to the proof of Theorem 3.1, consider some z n + cn −1/2 , where n 1/2 z n converges to a centred normally distributed random variable, say z, and we have c = 0 under H 0 . Moreover, consider a quadratic form z T n A n z n , where A n is a positive definite matrix and n −1 A n converges to a matrix A. Then, under H 0 , z T 0 A n z 0 = z T n A n z n converges to z T Az, which has a χ 2 distribution. However, under H 1,n , we have where the first term on the right-hand side is as before. However, the second term converges to 2c T Az, which is normally distributed and corresponds to W 0 in our context. The last term converges to a constant c T Ac, corresponding to the constant summand in the limit in Theorem 3.1. Note that the limit of z T 0 A n z 0 cannot be negative due to the positive definiteness of A n .
Next, we consider fixed alternatives of a transformation h that do not belong to the parametric class, i. e.
The proof is given in Section 3 of the supplement.
The transformation model with a parametric transformation class might be useful in applications even if the model does not hold exactly. With a good choice of θ applying the transformation Λ θ can reduce the dependence between covariates and errors enormously. Estimating an appropriate θ is much easier than estimating the transformation h nonparametrically. Consequently, one might prefer the semi-parametric transformation model over a completely nonparametric one. It is then of interest how far away we are from the true model. Therefore, in the following, we consider testing precise hypotheses (relevant hypotheses) If a suitable test rejects H 0 for some small η (fixed beforehand by the experimenter), the model is considered "good enough" to work with, even if it does not hold exactly.
To test those hypotheses, we will use the same test statistic as before, but we have to standardize differently. Assume H 0 , then h is a transformation which does not belong to the parametric class, i.e. the former fixed alternative H 1 holds. Let If R + and R are replaced by C 1 and C 2 in the definition of d in (5) one has min The proof is given in Section 3 of the supplementary material. It is conjectured, that a similar result can be derived forT n , although the corresponding Hessian matrix might become more complex.

Remark 3.6
Note that not rejecting the null hypothesis H 0 does not mean that the null hypothesis is valid. Consequently, alternative approaches like for example increasing the level to accept more transformation functions instead of testing for precise hypotheses as in (24) in general do not necessarily result in evidence for applying a transformation model.

A bootstrap version and simulations
Although Theorem 3.2 shows how the test statistic behaves asymptotically under H 0 , it is hard to extract any information about how to choose appropriate critical values of a test that rejects H 0 for large values of T n . The main reasons for this are that first for any function ζ the eigenvalues of the operator defined in Theorem 3.2 are unknown that second this function is unknown and has to be estimated as well, and that third even ψ (which would be needed to estimate ζ ) mostly is unknown and rather complex (see e.g. Section 2 of the supplement). Therefore, approximating the α-quantile, say q α , of the distribution of T in Theorem 3.2 in a direct way is difficult and instead we suggest a smooth bootstrap algorithm to approximate q α . (Y 1 , X 1 ), . . . , (Y n , X n ) denote the observed data, define

Algorithm 4.1 Let
and letĝ be a consistent estimator of g θ 0 , where θ 0 is defined as in (A6) under the null hypothesis and as in (A6') under the alternative. Let κ and be smooth Lebesgue densities on R d X and R, respectively, where is strictly positive, κ has bounded support and κ(0) > 0. Let (a n ) n and (b n ) n be positive sequences with a n → 0, b n → 0, na n → ∞, nb d X n → ∞. Denote by m ∈ N the sample size of the bootstrap sample.

Remark 4.2
1. The reason for resampling the bootstrap data X * j , j = 1, . . . , m, nonparametrically consists in the need to mimic the original transformation estimator and its asymptotic behaviour with the bootstrap estimator conditional on the data. Therefore, to proceed in the proof as in Colling and Van Keilegom (2019), it is necessary to smooth the distribution of X * . The properties nb d X n → ∞ and κ(0) > 0 ensure that conditional on the original data (Y 1 , X 1 ), . . . , (Y n , X n ) the support of X * contains that of v (from assumption (B7) in Section 2 of the supplement) with probability converging to one. Thus, v can be used for calculatingĥ * as well. 2. To proceed as in Algorithm 4.1, it may be necessary to modify h * so that S * j = g(X * j ) + ε * j belongs to the domain of (h * ) −1 for all j = 1, . . . , m. As long as these modifications do not have any influence on h * (y) for y ∈ Y w , the influence on theĥ * and T n,m should be asymptotically negligible (which can be proven for the estimator by Colling and Van Keilegom (2019)).
The bootstrap algorithm should fulfil two properties: On the one hand, under the null hypothesis, the algorithm has to provide, conditionally on the original data, consistent estimates of the quantiles of T n , or rather its asymptotic distribution from Theorem 3.2. On the other hand, to be consistent under H 1 the bootstrap quantiles have to stabilize or at least converge to infinity with a rate less than that of T n . To formalize this, let (Ω, A, P) denote the underlying probability space. Assume that (Ω, A) can be written as Ω = Ω 1 × Ω 2 and A = A 1 ⊗ A 2 for some measurable spaces (Ω 1 , A 1 ) and (Ω 2 , A 2 ). Further, assume that P is characterized as the product of a probability measure P 1 on (Ω 1 , A 1 ) and a Markov kernel that is P = P 1 ⊗ P 1 2 . While randomness with respect to the original data is modelled by P 1 , randomness with respect to the bootstrap data and conditional on the original data is modelled by P 1 2 . Moreover, assume With these notations, the assumptions (A8 * ) and (A9 * ) from Section 1 of the supplementary material can be formulated.

Theorem 4.3
Let q * α denote the bootstrap quantile from Algorithm 4.1.
for all δ > 0. Hence, P(T n > q * α ) = α + o(1) under the null hypothesis. The proof is given in the supplement. Since onlyθ is used to generate the bootstrap observations in Algorithm 4.1, it is conjectured that Theorem 4.3 can be generalized to the usage ofT n from (7) in Algorithm 4.1.

Simulations
Throughout this section, g(X ) = 4X − 1, X ∼ U([0, 1]) and ε ∼ N (0, 1) are chosen. Moreover, the null hypothesis of h belonging to the Yeo and Johnson (2000) transformations Under the alternative, we choose transformations h with an inverse given by the following convex combination, for some θ 0 ∈ [0, 2], some strictly increasing function r and some c ∈ [0, 1]. In general, it is not clear if a growing factor c leads to a growing distance (5). Indeed, the opposite might be the case, if r is somehow close to the class of transformation functions considered in the null hypothesis. Simulations were conducted for r 1 (Y ) = 5Φ(Y ), r 2 (Y ) = exp(Y ) and r 3 (Y ) = Y 3 , where Φ denotes the cumulative distribution function of a standard normal distribution, and c = 0, 0.2, 0.4, 0.6, 0.8, 1. The prefactor in the definition of r 1 is introduced because the values of Φ are rather small compared to the values of Λ θ , that is, even when using the presented convex combination in (27), Λ θ 0 (except for c = 1) would dominate the "alternative part" r of the transformation function without this factor. Note that r 2 and Λ 0 only differ with respect to a different standardization. Therefore, if h is defined via (27) with r = r 2 the resulting function is for c = 1 close to the null hypothesis case. For calculating the test statistic the weighting function w was set equal to one. The nonparametric estimator of h was calculated as in Colling and Van Keilegom (2019) (see Section 2 of the supplement for details) with the Epanechnikov kernel K (y) = 3 4 (1 − y 2 ) 2 I [−1,1] (y) and a normal reference rule bandwidth (see for example Silverman (1986)) x , whereσ 2 u andσ 2 x are estimators of the variance of U = T (Y ) and X , respectively. The number of evaluation points N x for the nonparametric estimator of h was set equal to 100 (see Section 2 of the supplement for details). The integral in (S3) was computed by applying the function integrate implemented in R. In each simulation run, n = 100 independent and identically distributed random pairs (Y 1 , X 1 ), . . . , (Y n , X n ) were generated as described before, and 250 bootstrap quantiles, which are based on m = 100 bootstrap observations (Y * 1 , X * 1 ), . . . , (Y * m , X * m ), were calculated as in Algorithm 4.1 using κ the U ([−1, 1])-density, the standard normal density and a n = b n = 0.1. To obtain more precise estimators of the rejection probabilities under the null hypothesis, 800 simulation runs were performed for each choice of θ 0 under the null hypothesis, whereas in the remaining alternative cases 200 runs were conducted. Among other things the nonparametric estimation of h, the integration in (S3), the optimization with respect to θ and the number of bootstrap repetitions cause the simulations to be quite computationally demanding. Hence, an interface for C++ as well as parallelization were used to conduct the simulations.
The main results of the simulation study are presented in Table 1. There, the rejection probabilities of the settings with h = (Λ θ 0 (·) − Λ θ 0 (0))/(Λ θ 0 (1) − Λ θ 0 (0)) under the null hypothesis, and h as in (27) under the alternative with r ∈ {r 1 , r 2 , r 3 }, c ∈ {0, 0.2, 0.4, 0.6, 0.8, 1} and θ 0 ∈ {0, 0.5, 1, 2} are listed. The significance level was set equal to 0.05 and 0.10. Note that the test sticks to the level or is even a bit conservative. Under the alternatives, the rejection probabilities not only differ between different choices of r , but also between different transformation parameters θ 0 that are inserted in (27). While the test shows high power for some alternatives, there are also cases, where the rejection probabilities are extremely small. There are certain reasons that explain these observations. First, the class of Yeo-Johnson transforms seems to be quite general and second the testing approach itself is rather flexible due to the minimization with respect to γ . Having a look at the definition of the test statistic in (6), it attains small values if the true transformation function can be approximated by a linear transformation of Λθ for some appropriateθ ∈ [0, 2]. In the following, this issue will be explored further by analysing some graphics. All of the three figures that occur in the following have the same structure and consist of four panels. The upper left panel shows the true transformation function with inverse function (27) , which represents the part of h corresponding to the null hypothesis, is plotted against the true transformation function in the last panel.
In the lower left panel, one can see if the true transformation function can be approximated by a linear transform of Λθ for someθ ∈ [0, 2], which is an indicator for rejecting or not rejecting the null hypothesis as was pointed out before. As already mentioned, the rejection probabilities not only differ between different deviation functions r , but also within these settings. For example, when considering r = r 1 with Table 1 Rejection probabilities at  Figures 1 and 2 explain why the rejection probabilities differ that much. While for θ 0 = 0.5 the transformation function can be approximated quite well by transforming Λ 1.06 linearly, the best approximation for θ 0 = 2 is given by Λ 1.94 and seems to be relatively bad. The best approximation for c = 1 can be reached for θ around 1.4. In contrast to that considering θ 0 = 2 and r = r 3 results in a completely different picture. As can be  Fig. 3 even for c = 0.2 the resulting h differs so much from the null hypothesis that it cannot be linearly transformed into a Yeo-Johnson transform (see the lower left subgraphic). Consequently, the rejection probabilities are rather high. A way to overcome this problem can consist in applying the modified test statistic T n from (7). Although Colling and Van Keilegom (2020) showed that the estimator θ seems to outperformθ from Remark 2.1 in simulations, fixing c 1 , c 2 beforehand might lead due to the reduced flexibility of the minimization procedure to higher rejection probabilities when usingT n instead of T n . Table 2 contains rejection probabilities which are based on the bootstrap version ofT n . The same simulation setting and procedures as before have been used. Indeed, some of the rejection probabilities have increased compared to Table 1. For example, the rejection probabilities for r = r 1 , θ 0 = 0.5 and c = 0.6 amount to 0.115 and 0.17 instead of 0.0035 and 0.05 in Table 1. Nevertheless, this cannot be generalized since the rejection probabilities when usingT n are sometimes below those for T n , e.g. for θ 0 = 0 and r = r 1 or θ 0 = 2 and r = r 2 .
Under some alternatives the rejection probabilities are even smaller than the level. This behaviour indicates that from the presented test's perspective, these models seem to fulfil the null hypothesis more convincingly than the null hypothesis models themselves. The reason for this is shown in Fig. 4 for the setting θ 0 = 1, c = 0.4 and r = r 1 . There, the relationship between the nonparametric estimator of the transformation function and the true transformation function is shown. While the diagonal line represents the identity, the nonparametric estimator seems to flatten the edges of the transformation function. In contrast to this, using r = r 1 in (27) steepens the edges so that both effects neutralize each other. Similar effects cause low rejection probabilities for r = r 2 , although the reasoning is slightly more sophisticated and is also associated with the boundedness of the parameter space Θ 0 = [0, 2].  One possible solution could consist in adjusting the weight function w such that the boundary of the support of Y does no longer belong to the support of w. In Table 3, the rejection probabilities for a modified weighting approach are presented. There, the weight function was chosen such that the smallest five percent and the largest five percent of observations were omitted to avoid the flattening effect of the nonparametric estimation. Indeed, the resulting rejection probabilities under the alternatives increase and lie above those under the null hypotheses.
At last, simulations for precise hypotheses as in (24) were conducted. For sake of brevity, only rejection probabilities resulting from the self normalized test statistic are Table 4 Rejection probabilities at presented since this approach seems to outperform that based on the estimatorσ 2 from Section 3 by far in the simulated settings. Since only a fraction of the data is used to calculate V n , the sample size was increased to n = 500. The settings and techniques remain the same as before. The probability measure ν was set to ν = 1 10 δ 0.6 + 2 10 δ 0.7 + 3 10 δ 0.8 + 4 10 δ 0.9 to put a higher weight on those parts of V n where more data points are used. Furthermore, the threshold was chosen to be η = 0.02, which roughly corresponds to plugging the logit-function r 4 (y) := 5 exp(y) 1+exp(y) and c = 1 into Eq. (27) and calculating min θ∈Θ d(Λ θ , h). Hence, we expect the test to reject the null hypothesis H 0 if T n < nη = 10 holds.
A detailed analysis would go beyond the scope of this manuscript, so that only some rejection probabilities are given in Table 4. Moreover, the mean values of the test statistic T n are listed to link the rejection probabilities to the distance between the expected value of the test statistic and the threshold of η = 0.02. First, the smaller the value of T n is the more likely the test seems to reject the null hypothesis H 0 . Further, the test holds the level, but is slightly conservative. Alternatives seem to be detected for mean values of T n around or below eight. Nevertheless, the power of the test is quite high in scenarios with small expected values of the test statistic, which often corresponds to transformations functions which are close to the parametric class. For θ = 0.5 and θ = 1 the rejection probabilities are in these cases above 0.90 and sometimes even close to one. Although the influence of simulation parameters such as the sample size n or the probability measure ν has not been examined, the results indicate that using the self normalized test statistic can be a good way to test for the precise hypotheses H 0 and H 1 .