Centre-free kurtosis orderings for asymmetric distributions

The concept of kurtosis is used to describe and compare theoretical and empirical distributions in a multitude of applications. In this connection, it is commonly applied to asymmetric distributions. However, there is no rigorous mathematical foundation establishing what is meant by kurtosis of an asymmetric distribution and what is required to measure it properly. All corresponding proposals in the literature centre the comparison with respect to kurtosis around some measure of central location. Since this either disregards critical amounts of information or is too restrictive, we instead revisit a canonical approach that has barely received any attention in the literature. It reveals the non-transitivity of kurtosis orderings due to an intrinsic entanglement of kurtosis and skewness as the underlying problem. This is circumvented by restricting attention to sets of distributions with equal skewness, on which the proposed kurtosis ordering is shown to be transitive. Moreover, we introduce a functional that preserves this order for arbitrary asymmetric distributions. As application, we examine the families of Weibull and sinh-arsinh distributions and show that the latter family exhibits a skewness-invariant kurtosis behaviour.


Introduction
There has been much discussion in the literature concerning the question of what kurtosis describes exactly.In particular, a number of articles have been published both advocating its equivalent to R F G being concave up to the median of F and convex from there onward.While this switch necessarily takes place at the median in a symmetric setting, it is very limiting and not expedient to require this in the general case.A more flexible generalization of the concave-convex order ≤ s is proposed in Section 3.
For the second order ≤ S , the spread function of a distribution function F is defined by considered in the literature except by Hosking (1989), who shows that his kurtosis measure based on L-moments preserves ≤ 3 for symmetric distributions.This disregard may partly be due to Oja (1981) criticizing the order for not being transitive.If the order ≤ 3 was lacking transitivity on symmetric distributions, this would indeed be a serious downside compared to ≤ s .However, in Section 2, it is proved that ≤ 3 is transitive on this set.For the set of all distributions, we argue that transitivity cannot be expected, since skewness or asymmetry interferes with the quantification of kurtosis.This intrinsic entanglement was already mentioned by MacGillivray & Balanda (1988) and Balanda & MacGillivray (1990), and proposals for skewness-invariant kurtosis measures were made by Blest (2003) and Jones et al. (2011).In Sections 2 and 3, this entanglement is shown to be related to the transitivity of kurtosis orders.

Basics
We begin by defining convex functions of order k ∈ N 0 and the induced stochastic orders.
Definition 1.Let I ⊆ R be an open interval and let ϕ : I → R be a function.For k ∈ N and x 0 , . . ., x k ∈ I with x 0 < . . .< x k , the zeroth and k-th divided difference, respectively, of ϕ at x 0 , . . ., x k is defined by ϕ is said to be convex of order k or k-convex on I, if holds for all x 0 , . . ., x k ∈ I with x 0 < . . .< x k .Moreover, ϕ is said to be strictly convex of order k on I, if inequality (1) is strict.
The k-convexity of functions can also be defined via the non-negativity of determinants of (k + 1) × (k + 1)-matrices (see Oja, 1981, p. 155).It is easy to see that both approaches are equivalent.Throughout this work, we assume the following.
Assumption 2. All (univariate) distribution functions have interval support and are three times differentiable.The interior of the support of a distribution function F is denoted by D F and f = F ′ is assumed to be strictly positive on D F .The set of all such distribution functions is denoted by P. Oja (1981, p. 156) defined a family of stochastic orders in the following way.Definition 3. Let k ∈ N 0 and F, G ∈ P.Then, F ≤ k G is said to hold, if the function Here, G −1 denotes both the inverse function and the quantile function of G, which coincide given our regularity conditions.Note that ≤ 0 coincides with the usual stochastic order ≤ st , the most basic location order.Similarly, ≤ 1 coincides with the basic dispersion order ≤ disp and ≤ 2 coincides with the basic skewness order ≤ c by van Zwet (1964).For more details, see Oja (1981).Since the k-convexity of a k times differentiable function ϕ is equivalent to ϕ (k) ≥ 0, one obtains the following corollary (see, e.g., Oja, 1981).

Lack of transitivity and its implications
We now focus our attention on the order ≤ 3 as a canonical choice for a basic kurtosis order.In a rare mention in the literature, Oja (1981, p. 168) states without proof that ≤ 3 is not transitive.This is confirmed by the following example.
Example 5. Define by three infinitely often differentiable distribution functions on the unit interval.Note that D F = D G = D H = (0, 1).Since both R F G and R GH are shifted versions of the third monomial with restricted domains, both F ≤ 3 G and G ≤ 3 H hold. Straightforward calculations yield and thereby contradicts F ≤ 3 H.
The orders ≤ 0 , ≤ 1 and ≤ 2 can all be equivalently characterized by families of measures of location, dispersion and skewness, respectively.Because all such measures are mappings from a set of probability distributions to the real numbers, their values are compared using the transitive relation ≤.Since this is not compatible with the non-transitivity of the kurtosis order ≤ 3 , we obtain the following negative result.Corollary 6.There does not exist a family {κ ι : Note that stochastic orders are usually not strongly connected, which means that F ≤ 3 H does not imply H ≤ 3 F .However, for the distributions in Example 5, it can be shown that H ≤ 3 F does indeed hold.This is a more disturbing result than the mere non-transitivity of ≤ 3 : if one additionally requires that κ preserves the strict version of ≤ 3 , i.e. that F < 3 G implies κ(F ) < κ(G), it can be shown that there exists no mapping κ : P → R that preserves the order ≤ 3 .
Remark 7. It should be emphasized that missing transitivity can also be found in more familiar areas.The best known example is probably the location order defined by X ≤ p Y if the relative effect p = P (X < Y ) + P (X = Y )/2 is greater than or equal to 1/2.It is well known that this order is not transitive, as exemplified by non-transitive dice (Gardner, 1970).Still, the empirical counterpart to p is the key quantity of important nonparametric tests like the Wilcoxon-Mann-Whitney, Fligner-Policello and Brunner-Munzel test (Divine et al., 2018).
An example for a non-transitive dispersion ordering is the dangerousness order: given random variables X, Y on the positive half line, X is said to be less dangerous than Here, the situation somewhat differs from the foregoing example, since the dangerousness order has a transitive closure, the convex order (Müller, 1996).
The observation preceding Remark 7 suggests that kurtosis measures in the classical orderbased sense, used by Oja (1981) among others, do not exist.In the literature, ≤ s is usually chosen as the kurtosis order to be preserved by a kurtosis measure.Because of the limitations of ≤ s , this can only be used to validate kurtosis measures for symmetric distributions, which is unsatisfactory for the reasons mentioned in Section 1.The question how to use the much more general applicability of the order ≤ 3 in spite of its non-transitivity can be answered in two ways.
The first possibility is to move away from the classical idea of measures of kurtosis and instead consider functionals that quantify the difference in kurtosis between two given distributions.For example, consider the quantile-based mapping for 0 < α < η < 1/2, which is listed as a kurtosis measure by Ruppert (1987), Balanda & MacGillivray (1988) and Jones et al. (2011) among others, since it preserves the order ≤ s .Similarly constructed quantile-based mappings using lower-order differences are measures of location, dispersion and skewness and can even be used to characterize the orders ≤ 0 , ≤ 1 and ≤ 2 in the sense of Corollary 6.By customizing the evaluation points to a second distribution, one arrives at the functional where . This functional preserves the kurtosis order ≤ 3 even for asymmetric distributions, as the following result shows.
Proposition 8. Let F, G ∈ P. Then F ≤ 3 G implies κ α QF (F, G) ≥ 0. Proof.According to Definitions 1 and 3, F ≤ 3 G is equivalent to The subsequent division by the α-interquantile range of G is done to obtain a scale-invariant functional.
Remark 9. Again, the situation is similar for the Wilcoxon-Mann-Whitney location order ≤ p in Remark 7. The usual unbiased estimator for the relative effect is a U-statistic involving both samples, and there cannot exist a measure depending on one sample like the mean or median, which is consistent with ≤ p in general.

Transitivity sets
The second possibility is to restrict the comparison of kurtosis to suitable subsets of distributions, e.g. the subset of symmetric distributions.In the following, we analyze the transitivity sets of the order ≤ 3 .As a starting point, all pairs of distributions that are ordered with respect to ≤ 3 are divided into two mutually exclusive categories.For that, let F, G ∈ P satisfy F ≤ 3 G, implying that the function R ′′ F G is increasing.Now, F and G are either skewnesscomparable with respect to ≤ 2 , i.e., F ≤ 2 G or G ≤ 2 F holds, or they are not.In the latter case, R F G has an inflection point at a The inflection point at t F G is, in general, not unique since R F G can be linear on a given non-degenerate interval.However, any inflection point of R F G can be uniquely identified by the value p F G = F (t F G ) ∈ (0, 1).Furthermore, note that F ≤ 2 G and G ≤ 2 F can be viewed as limiting cases with t F G = inf D F or t F G = sup D F , yielding p F G = 0 or p F G = 1, respectively.So in order to obtain the most general setting, we allow As stated before, any pair F, G satisfying F ≤ 3 G has at least one inflection value.Requiring R ′′′ F G (t) > 0 for all t ∈ D F is sufficient for the inflection value p F G to be unique.With this in mind, we analyze more closely why ≤ 3 is not transitive.Let F, G and holds for all t ∈ D F .Note that R F G and R GH are increasing as a composition of two increasing functions.Hence, the first two summands on the right side of equation ( 4) are non-negative and By assumption, the sets Π F G and Π GH are both nonempty.If the intersection of these two sets is also non-empty, i.e., if there exists a ) coincide for all p ∈ (0, 1) since they are both non-positive for p < p 0 and both non-negative for p > p 0 .Otherwise, if the intersection of Π F G and Π GH is empty, choose a representative from each set such that their difference is minimal.Assuming without restriction that p F G < p GH , where We summarize our results thus far in the following proposition.
Proposition 11.Let p 0 ∈ [0, 1] and let F 0 be a set of cdf's such that any pair F, G ∈ F 0 with F ≤ 3 G has p 0 as an inflection value.Then, the order ≤ 3 is transitive on F 0 .
We now study the structure of the sets mentioned in Proposition 11 or suitable subsets thereof.First, we assume that F and G with F ≤ 3 G have an inflection value p F G ∈ (0, 1).

The fact that p
Denoting by f and g the derivatives of F and G, respectively, we get for all t ∈ D F .Hence, p F G is an inflection value of F and G, if and only if Thus, any pair that is ordered with respect to ≤ 3 out of a given set of cdf's has the same inflection value p 0 ∈ (0, 1), if and only if coincides for all cdf's F in the set.The following result is obtained by combining this observation with Proposition 11.
Proposition 12. Let p 0 ∈ (0, 1) and let F 0 be a set of cdf's such that γ p 0 D (F ) coincides for all F ∈ F 0 .Then, all pairs F, G ∈ F 0 with F ≤ 3 G have p 0 as an inflection value.
If p F G ∈ {0, 1} is the sole inflection value of F and G with F ≤ 3 G, (5) is not valid because the densities f and g are not uniquely defined at the edges of their respective supports.Thus, no easily verifiable sufficient condition for inflection points as in Proposition 12 can be obtained in this case.In summary, defining the set for all p ∈ (0, 1) and all t ∈ R gives the following result.
Theorem 13.For any t ∈ R and any p ∈ (0, 1), the kurtosis order ≤ 3 is transitive on the set T t D,p .
As mentioned in Section 1, a number of authors have identified an intrinsic entanglement between skewness and kurtosis.By considering the mapping γ p D more closely, this observation is confirmed and refined by Theorem 13.Recall that the critical property of a skewness measure γ : 5) and ( 6) into inequalities yields that γ p D preserves ≤ 2 for all p ∈ (0, 1) and thus measures skewness.In fact, γ p D (F ) ≤ γ p D (G) for all p ∈ (0, 1) is equivalent to F ≤ 2 G, so these measures characterize the order ≤ 2 in a way that is not possible for ≤ 3 according to Corollary 6.However, for p = 1/2, γ p D measures skewness in an asymmetric or non-central way because the additional requirement γ p D (−X) = −γ p D (X) (see, e.g., Groeneveld & Meeden, 1984, p. 393) is not satisfied.
The fact that ≤ 3 is transitive, if a suitable skewness measure is constant, suggests that the non-transitivity of ≤ 3 on the set of all cdf's is because pairs of cdf's with differing degrees of skewness lack comparability with respect to kurtosis.As opposed to location and dispersion, a distribution cannot be standardized with respect to skewness by an arithmetic operation like addition for location and scalar multiplication for dispersion.Thus, in order to obtain a transitive kurtosis order without interference caused by skewness, attention has to be restricted to sets of constant skewness.Note that, for all p ∈ (0, 1), the sets T t D,p , t ∈ R, constitute an partition of the set P of distributions.For each partition, p is also the inflection value of every kurtosis comparable pair of distributions from the same transitivity set of the partition.Thus, each F ∈ P lies within a subset of P on which ≤ 3 is transitive.In light of these observations, one could adapt the classical order-based approach to define measures of location, dispersion and skewness to kurtosis.Instead of requiring a mapping κ : P → R to generally preserve the order ≤ 3 , one could require the restriction of κ to the transitivity set T t D,1/2 to preserve ≤ 3 for all t ∈ R.
These observations raise the question whether there exist other skewness measures that induce transitivity sets analogous to Theorem 13.To that end, note that a simple sufficient condition for the term γ p 0 D (F ) to coincide is to require f ′ (F −1 (p 0 )) = 0 for all cdf's F in the given set.Hence, for each p 0 ∈ (0, 1), ≤ 3 is transitive on the set of all cdf's, the density of which has a stationary point at the p 0 -quantile.One well known point, at which this commonly occurs, is the mode of a distribution.For the following considerations, we assume that all distributions are unimodal and denote the mode of F by M F .If the mode lies in the interior of the support, the assumptions on F directly yield f ′ (M F ) = 0.It follows that, for any p ∈ (0, 1), γ p D (F ) = 0 holds for all cdf's F in the set where p = 1 − 2p.In combination with Propositions 11 and 12, this observation yields the following result.
Theorem 14.For any p ∈ (−1, 1), the kurtosis order ≤ 3 is transitive on the set T p M ode .
For any p ∈ (−1, 1) and any pair of cdf's F, G ∈ T p M ode with F ≤ 3 G, the corresponding inflection value is given by p = (p + 1)/2.Arnold & Groeneveld (1995, p. 35) showed that γ M ode (F ) = 1−2F (M F ), F ∈ P, is a measure of skewness, which entails that it preserves the skewness order ≤ 2 .Thus, the transitivity of ≤ 3 on the sets T p M ode has a similar interpretation to before: for ≤ 3 to be transitive, the skewness of the involved distributions needs to be constant in some sense.
For distributions with modes at the boundaries of their supports, the above transitivity property does not hold, i.e., ≤ 3 is not transitive on T −1 M ode and T 1 M ode in general.The crucial result in Proposition 12 does not hold in these cases.Counterexamples can be constructed using Weibull distributions, applying the results given in Section 4 below.Thus, the sets T p M ode , p ∈ (−1, 1), do not provide a partition of the set of all (sufficiently regular) probability distributions on the real numbers.
The notion of a mode can be generalized without losing the transitivity of ≤ 3 on the corresponding sets T p M ode .Specifically, Theorem 14 still holds if f only attains a local maximum at M F , no longer assuming F to be unimodal.However, Arnold & Groeneveld (1995) only proved γ M ode to be a skewness measure under the assumption of unimodality.
The relationships between the transitivity sets found in this section and their connection to the set of all symmetric distributions are summarized in the following remark.
Remark 15.Let p ∈ (−1, 1) and let F ∈ T p M ode be unimodal.It follows that M F = F −1 (p), where p = (p + 1)/2 ∈ (0, 1).Since M F lies within the interior of the support of F , we obtain f ′ (F −1 (p)) = 0 and therefore γ p D (F ) = 0. Thus, the inclusion T p M ode ⊆ T 0 D,p holds for all p ∈ (0, 1) with p = 2p − 1.In particular, T 0 M ode ⊆ T 0 D,1/2 .Now, let F ∈ P be symmetric, denoted by F ∈ S. Since both γ p D and γ M ode are invariant under transformations of the form x → ax+ b for a > 0 and b ∈ R, we can assume without restriction that the symmetry centre of F is 0. Because this implies γ If, additionally, F is assumed to be unimodal, M F = 0 and γ M ode (F ) = 0 follows.Thus, in this case, S ⊆ T 0 M ode ⊆ T 0 D,1/2 holds.
Since ≤ 3 is transitive on T 0 D , it is also transitive on the set of all symmetric cdf's.Oja (1981), virtually the only work which mentions the order ≤ 3 , dismissed it due to its non-transitivity, and instead focused on the previously mentioned concave-convex order ≤ s .However, Oja restricted his considerations concerning kurtosis to symmetric distributions, and therefore also proved the transitivity of ≤ s only on this class.Since ≤ 3 is also transitive on symmetric distributions, Oja's argument is not convincing.

Equivalence with respect to ≤ 3
Two distributions F, G ∈ P are said to be equivalent with respect to ≤ 3 , denoted by , it follows that Hence, we have the following result.
Proposition 16.F = 3 G holds, if and only if R F G satisfies the differential inequality The fact that F = 3 G is not equivalent to R ′′′ F G ≡ 0 is notable as it systematically differs from what can be observed with the orders ≤ 0 , ≤ 1 and ≤ 2 of location, dispersion and skewness.Equivalence with respect to any of these orders occurs if and only if the corresponding derivative of R F G is constantly zero.Thus, for an a > 0 and a b ∈ R. Heuristically, equivalence with respect to dispersion means that G is a shifted or relocated version of F and equivalence with respect to skewness means that G is a shifted and rescaled version of F , allowing for changes in location and dispersion.This suggests that the functions satisfying the differential inequality (7) can change the location, the dispersion and the skewness of a distribution while being kurtosis-invariant.However, the fact that this family of functions are not as simple as the family of all affine linear transformations suggests that there exists no simple operation to standardize distributions with respect to skewness.
In the following example, Proposition 16 is applied to monomials.
Example 17.Let R F G (t) = t p , 0 < t < 1, for some p > 0. This arises, for example, for

Note that R ′′′
F G ≡ 0 and therefore also being a polynomial of degree ≤ 1, the fact that R F G is a polynomial of degree ≤ 2 is only a sufficient, but not a necessary condition for F = 3 G.

Concave-convex kurtosis orders
In the literature, there exist two major proposals for generalizing the concave-convex order ≤ s to asymmetric distributions, denoted by ≤ a and ≤ S (see MacGillivray &Balanda, 1988 andBalanda &MacGillivray, 1990).The order ≤ S is not considered further in the present work because it disregards a critical amount of information, as expanded upon in Section 1.The critical drawback of the order ≤ a can best be explained using the notion of the inflection value from Definition 10.Just like in our considerations in Section 2.3, F ≤ a G requires that the function R F G has one change from negative to positive curvature, whose location can be identified by an inflection value p F G ∈ (0, 1).While p F G = 1/2 necessarily holds if F and G are symmetric, there is no reason to assume it to be a prerequisite for two asymmetric distributions to be ordered with respect to kurtosis.Thus, whereas the generally applicable order ≤ 3 is stronger than ≤ s in a symmetric setting, the same can not be said about the generalized version ≤ a of ≤ s in a general setting.In the following, we propose an alternative generalization of ≤ s that is not a priori restricted to a specific inflection value.
Definition 18. F is said to be less kurtotic in the concave-convex sense than G, denoted by The fact that F ≤ 3 G implies F ≤ gs G for all F, G ∈ P is a direct consequence of Theorem 20 below.The essential difference between the two orders is that the first requires that a function (in this case R ′′ F G ) is increasing whereas the second requires that the same function changes values from negative to positive at some point.This principle has also been used in the literature to obtain weakenings of other orders from the family ≤ k , k ∈ N 0 .As an example, we can consider the visually more striking characteristic of dispersion based on the order ≤ 1 .Instead of assuming that ∆ F G increases, which is equivalent to F ≤ 1 G, we can require that the values of ∆ F G switch from negative to positive at some point.A similar dispersion order has been proposed by Oja (1981, p. 158).He writes The sole difference to the order introduced before is the threshold, which changes from zero to the difference of the expectations.Unlike zero, the difference of the expectations is guaranteed to be taken as a value of ∆ F G at some point.This can be seen by considering the centred versions of F and G. If, for example, the locations of F and G differ substantially, using the threshold zero is obviously not reasonable.
This line of arguments can also be applied to the order ≤ gs and the function R ′′ F G .For general distribution functions F and G, there is no reason to assume that R ′′ F G takes the value zero at some point.Thus, Definition 18 needs to be modified.However, because F and G can only be standardized with respect to location and dispersion and not with respect to skewness, we cannot use the same technique as for ≤ * 1 to obtain an alternative threshold.Therefore, the following definition uses a variable threshold.
Definition 19.Let t 0 ∈ R.Then, F is said to be less kurtotic than G in the concave-convex sense with threshold t 0 , denoted by ∞).Note that the orders ≤ 0 gs and ≤ gs coincide.While the order ≤ t 0 gs is formally defined for all )) are said to be reasonable.The only exception is the case that the set of reasonable thresholds is empty, which is equivalent to R ′′ F G being constant.In this case, the sole value of R ′′ F G is the only candidate for a reasonable threshold.
The relationship between ≤ 3 and the family ≤ t 0 gs , t 0 ∈ R, given in the following theorem, underpins the idea that the latter consists of natural weakenings of ≤ 3 .
Proof.The implication from left to right holds by construction.For the reverse implication, let t 1 ∈ D F .If t 1 lies within an interval on which by assumption.The assertion follows since t 1 was arbitrary.
In Theorem 20, the set int The following result states that the proposed extension of the concave-convex order ≤ s to asymmetric distributions is not transitive in general, implying that it is not superior to ≤ 3 in this respect.
Proposition 21.For all t 0 ∈ R, the kurtosis order ≤ t 0 gs is not transitive in general.Proof.A counterexample can be obtained for all t 0 ∈ R by reusing Example 5 with a rescaled version of H −1 .For that, let c > 0 and This implies that the functions R GH and R F H as well as all of their derivatives are multiplied by the factor c. So, additionally to F ≤ 3 G, R ′′′ GH (t) = 6c ≥ 0 holds for all t ∈ [0, 1], and, thus, G ≤ 3 H.By Theorem 20, F ≤ t 0 gs G and G ≤ t 0 gs H hold for all t 0 ∈ R. In contrast, we have It follows that, for any t 0 > 0, there exists c > 0 such that R ′′ F H first takes values smaller than t 0 , then larger, and finally smaller again.For any t 0 < 0, there exists c > 0 such that R ′′ F H first takes values larger than t 0 , then smaller and finally larger again.For t 0 = 0, we obtain For symmetric cdf's F and G, R F G always has an inflection point at F −1 (1/2).Thus, ≤ gs is equivalent to ≤ s on S and therefore also transitive on S (see Oja, 1981, p. 165).The situation is different for ≤ t 0 gs , t 0 = 0 because the critical switch from Remark 22.The specific order ≤ 0 gs (or, equivalently, ≤ gs ) can be altered slightly to become transitive on the more general sets T p M ode , p ∈ (−1, 1), and T t D,p , t ∈ R, p ∈ (0, 1).For two cdf's F and G, we say that F < gss G holds if there exists a p Note that < gss is not equivalent to < gs since the latter is defined by as usual for strict versions of orders.To see that < gss is transitive, let p ∈ (0, 1) and F, G, H ∈ T t D,p with F < gss G and G < gss H.By the line of reasoning used to prove Proposition 12 and Theorem 13, Since, by definition of < gss , there exists at most one t ∈ D F and one s ∈ D G such that R ′′ F G (t) = 0 and R ′′ GH (s) = 0, t = F −1 (p) and s = G −1 (p) follows.Considering (3) for t = F −1 (p) along with the fact that R GH is increasing, this yields R ′′ F H (F −1 (q)) < 0 for q < p and R ′′ F H (F −1 (q)) > 0 for q > p. Overall, F < gss H follows.The transitivity of < gss on the sets T p M ode , p ∈ (−1, 1), now follows from T p M ode ⊆ T 0 D,p , where p = (p + 1)/2.
It is not possible to show the transitivity of the order ≤ gs on the given sets in the same way as for < gss , since, assuming F ≤ gs G, R ′′ F G (F −1 (p)) = 0 for any p ∈ (0, 1) is not sufficient to infer that p is an inflection value.Because the concavity and the convexity of R F G on either side of the actual inflection value is not assumed to be strict, the function could be convex on both sides of F −1 (p) or concave on both sides.

Weibull distribution
As an example of a well-known family of distributions with varying degrees of skewness, we consider Weibull distributions.Without restriction, we set the scale parameter to 1, and denote the distribution with shape parameter k by W(k).Let X ∼ W(k), Y ∼ W(ℓ) for 0 < k < ℓ.For t > 0, we have R F G (t) = t k/ℓ .It follows directly from Example 17 that F ≤ 3 G holds for all k < ℓ, whereas F = 3 G holds for 2k ≤ ℓ.Thus, if the two parameters differ by less than a factor two, the distribution with the higher parameter value is strictly more kurtotic.If the two parameters differ at least by a factor two, the two distributions are equivalent with respect to the order ≤ 3 .Considering that a large difference between the two parameter values is also associated with a large difference in skewness, this may best be interpreted as follows.
If the difference in skewness between two Weibull distributions is too large, they cannot be unambiguously ordered with respect to kurtosis.
This rather unintuitive behaviour allows us to construct another counterexample for the transitivity of ≤ 3 since, e.g., W(k) ≤ 3 W(1.5k)= 3 W(0.7k)≥ 3 W(k) holds for all k > 0. Furthermore, it is easy to show that ≤ t 0 gs coincides with ≤ 3 on the family of Weibull distributions for all reasonable thresholds t 0 .Thus, the given counterexample also applies to ≤ t 0 gs .

Sinh-arsinh distribution
The family of sinh-arsinh distributions was introduced by Jones & Pewsey (2009).It is dependent upon four parameters, which are associated with location, dispersion, skewness and tailweight.Here, we consider a simplified two-parameter family by fixing the location and dispersion parameters to zero and one, respectively.A random variable X is said to be sinh-arsinh-distributed with skewness parameter ν ∈ R and tailweight τ > 0, denoted by X ∼ SAS(ν, τ ), if the random variable is standard normal.Skewness to the right increases with increasing ν and tailweight decreases with increasing τ .More specifically, Jones & Pewsey, 2009, pp. 763, 765, 766).One can directly infer the corresponding distribution function F = Φ • S ν,τ and quantile function There exist numerous other distribution families with four parameters that are associated with location, dispersion, skewness and tailweight or kurtosis.Examples include the skewt distribution (Azzalini, 1985;Azzalini & Capitanio, 2003) and Tukey's g-and-h or g-and-k distributions (Tukey, 1977;Hoaglin, 2006;Haynes et al., 1997).However, these families do not have similarly explicit representations of both their distribution and quantile functions.Furthermore, while the skew-t distributions do include the standard normal distribution, it only appears as a limiting case and not as a standard case as for the sinh-arsinh distributions.Finally, the sinh-arsinh transformation can also be applied to (symmetric) other than the standard normal.For example, Rosco et al. (2011) Hence, the ordering of F and G in terms of kurtosis only depends upon two parameters instead of four.The following result gives conditions for the ordering of sinh-arsinh distributions with respect to the kurtosis orders ≤ 3 and ≤ gs .
The key characteristics of R ′′ F G are summarized in Table 1.The proof of Theorem 23 can be found in the appendix.
Since the usual order of the real numbers used in the equivalent conditions in Theorem 23 is transitive, the following result is directly implied.
Heuristically, Theorem 23 implies that, within the family of sinh-arsinh distributions, comparisons in terms of kurtosis are skewness-invariant.This is due to the fact that equivalent characterizations for both major kurtosis orders are independent of both ν F and ν G , which are skewness parameters by construction and also in the sense of ≤ 2 for τ F = τ G (see Jones & Pewsey, 2009, p. 763).Moreover, the characterizations in Theorem 23 not only stay the same for equally skewed asymmetric distributions, but also for pairs of distributions with arbitrarily big differences in skewness.Also note that these results can be generalized to families of sinharsinh distributions that arise from symmetric base distributions other than the normal since the functions R F G only depend on the transformations and not on the specific base distribution.The skewness-invariance of the sinh-arsinh distribution in terms of kurtosis was noted by Jones et al. (2011, pp. 91-92).Specifically, they showed that quantile-based kurtosis measures that are constructed from symmetric differences of the form F −1 (1−α)−F −1 (α), α ∈ (0, 1/2), are invariant under changes of the skewness parameter ν.Theorem 23 generalizes this skewnessinvariance from a specific family of kurtosis measures to the underlying kurtosis orders.also exists a t ℓ < t u such that R ′′ F G (t ℓ ) > t 0 .This contradicts F ≤ t 0 gs G.If t 0 < 0, R ′′ F G (s ℓ ) > t 0 follows for s ℓ small enough and, by assumption, there also exists an s u > s ℓ such that R ′′ F G (s u ) < t 0 , thus also contradicting F ≤ t 0 gs G.We now prove the implication τ holding for all t ∈ R. Because of C ν,τ (t) ≥ 0 and (τ 2 + 2)t 2 + τ 2 − 1 ≥ 6t 2 + 3 > 0, the left hand side of inequality ( 11) is positive for all t.Hence, substituting both sides of the inequality with their squares gives a sufficient condition.We obtain The second summand on the left hand side is obviously non-negative.It is now sufficient to show that all coefficients of the polynomial, with which C 2 ν,τ (t) is multiplied, are non-negative.For the constant (τ 2 − 1) 2 , this is obvious.The coefficient of t 2 is equal to 2τ 4 − 7τ 2 − 4 = (τ 2 − 4)(2τ 2 + 1), which is non-negative since τ ≥ 2 was assumed.The same is true for the coefficient of t 4 , which equals τ 4 − 5τ 2 + 4 = (τ 2 − 4)(τ 2 − 1).This concludes the proof of the chain (8) of implications.
In the symmetric case of ν = 0, a number of special cases stand out, which are also singled out in Table 1.First, for τ = 1, R ′′ F G ≡ 0 obviously holds since F = G and, therefore, R F G is the identity function (see lower central panel of Figure 1).Then, for τ = 2, the rather simple form R F G (t) = 2t √ t 2 + 1 is obtained, yielding the second derivative R ′′ F G (t) = (4t 3 + 6t)(t 2 + 1) −3/2 , which converges to 4 as t → ∞ and to −4 as t → −∞ (see lower central panel of Figure 2).Finally, for τ = 3, the RIDF is given by R F G (t) = 4t 3 + 3t, which leads to the linear second derivative R ′′ F G (t) = 24t (see central panel of Figure 3).

Table 1 :
Behaviour of the function R ′′ F G and kurtosis orders for distribution functions F and G