1 Introduction

There has been much discussion in the literature concerning the question of what kurtosis describes exactly. In particular, a number of articles have been published both advocating its interpretation as ‘peakedness’ of a distribution and opposing it. See Crack (2022, pp. 72–79) and Westfall (2014) for examples of either position and Fiori and Zenga (2009) for a more neutral historical review. Balanda and MacGillivray (1988, p. 116) provide a critical review of the literature concerning kurtosis and, based on that, aptly describe an increase in kurtosis as ‘the location- and scale-free movement of probability mass from the shoulders of a distribution into its center and tails’. This heuristic, as is usually the case for kurtosis, is applied solely to unimodal symmetric distributions.

As for other distributional characteristics, the concept of kurtosis is usually rooted in a stochastic order. Among the first authors to introduce this order-based approach for location and dispersion were Bickel and Lehmann (1975; 1976), it was later generalized by Oja (1981), among others. In particular, they required any measure \(\nu \) of a specific characteristic of a distribution to preserve a corresponding order \(\preccurlyeq \), i.e. that \(F \preccurlyeq G\) implies \(\nu (F) \le \nu (G)\) for all sufficiently regular distribution functions FG. The necessity of underpinning measures in this way is, e.g., demonstrated in Eberl and Klar (2021). The most popular choice in the literature for this fundamental stochastic order in the case of kurtosis was introduced by van Zwet (1964) and is denoted by \(\le _s\). Two distributions FG are said to be ordered with respect to kurtosis in the sense of \(F \le _s G\), if the function \(x \mapsto R_{FG}(x) = G^{-1}(F(x))\) is convex for \(x \ge F^{-1}(1/2)\). Again, this fundamental order is only meaningful if F and G are symmetric.

Although not well-founded, the application of kurtosis and, more specifically, measures of kurtosis to asymmetric distributions is commonplace: when proposing new families of continuous distributions or methods for generating such families, shape parameters are often related to skewness and kurtosis. Examples are Goerg (2011), Alzaatreh et al. (2013) and Fischer and Herrmann (2013). Only occasionally, authors are more reluctant: Jones and Pewsey (2009) term the second shape parameter of their sinh-arcsinh distribution as kurtosis only in the symmetric case, otherwise they speak about tailweight, well aware of the underlying subtleties.

When modeling stock market volatility, Gabaix et al. (2006) consider distributions with large values of moment-based skewness and kurtosis and conclude “The use of [moment] kurtosis should be banished from use with fat-tailed distributions.” Asmussen (2022) studies higher order cumulants for a selection of financial models from the literature. His motivation “comes from numerous statements in the financial literature in the spirit that S [skewness] accounts for asymmetry and K [kurtosis] for a sharper mode and heavier tails than for the Black–Scholes model.”

In particular in applied work, the notion of kurtosis is routinely used for skewed distributions, and sample skewness and kurtosis are frequently documented in the literature. Examples are Bai and Ng (2005) and Kim and White (2004) in the context of modeling financial returns; Szczygielski et al. (2020) and López-Martín et al. (2022) for modeling cryptocurrencies; Martins (1965) and Cooper (2020) for environmental data; Eling (2012) and Sherrick et al. (2004) in the context of insurance risk.

All major approaches in the literature to define a fundamental kurtosis order for asymmetric distributions have the same critical drawback. Namely, they artificially centre the comparison of two distributions with respect to kurtosis around some measure of location, usually the median. Examples include the anti-skewness order \(\le _a\) by MacGillivray and Balanda (1988) and the order \(\le _S\) by Balanda and MacGillivray (1990), which is based on the so-called spread function. The order \(\le _a\) basically imposes the same requirement as \(\le _s\) since \(F \le _a G\) is equivalent to \(R_{FG}\) being concave up to the median of F and convex from there onward. While this switch necessarily takes place at the median in a symmetric setting, it is very limiting and not expedient to require this in the general case. A more flexible generalization of the concave-convex order \(\le _s\) is proposed in Sect. 3.

For the second order \(\le _S\), the spread function of a distribution function F is defined by

$$\begin{aligned} S_F(\alpha ) = F^{-1} \big (\tfrac{1}{2}+\alpha \big ) - F^{-1}\big (\tfrac{1}{2}-\alpha \big ), \quad \alpha \in \big [0, \tfrac{1}{2} \big ), \end{aligned}$$

which can be interpreted as one half of a symmetrized quantile function of F. Heuristically, the distribution is again artificially centred around the median by folding it around the median and averaging out the two overlaying halves of the distribution. If the resulting half of a distribution is then mirrored at the median, a symmetric distribution is obtained, which can be ordered with respect to kurtosis using \(\le _s\). This methodology is equivalent to defining the symmetrized kurtosis order \(\le _S\) by

$$\begin{aligned} F \le _S G \quad \Leftrightarrow \quad S_G \circ S_F^{-1} \text { is convex}. \end{aligned}$$

This definition of a kurtosis order is fairly easy to use and theoretically applicable to all univariate distributions. It does, however, have significant downsides, especially if it is intended to be used as a foundational order that establishes what is meant by the notion of kurtosis. This was in part noted by Balanda and MacGillivray (1990, p. 29) themselves. First, a significant amount of information is lost in just combining the two ‘sides’ (with respect to the median) of the distribution. The order \(\le _S\) theoretically allows arbitrarily large deviations from the desired concavity or convexity on one side, if they are compensated by the other side. This kind of behaviour is not desirable for a basic order. In a financial context, for instance, negative and positive values of the distribution, i.e. losses and gains, have to be interpreted differently, and relevant information about the shape of the distribution is lost by forcing symmetry. The second downside becomes apparent if we consider skewed distributions on the positive half line. In this case, the support ends close to the median on one side and is infinite on the other side, and the symmetrized version is not representative of the original distribution.

Further proposals of kurtosis orders for asymmetric distributions were discussed by Oja (1981), Balanda and MacGillivray (1988), Arnold and Groeneveld (1992), and Fiori (2008), but they exhibit similar drawbacks to \(\le _a\) and \(\le _S\).

Oja (1981, p. 168) also briefly mentions the kurtosis order \(\le _3\), where \(F \le _3 G\) is said to hold if \(R_{FG}\) is convex of order three. If both F and G are three times differentiable, this is equivalent to \(R_{FG}''' \ge 0\). This definition naturally arises from basic orders of location, dispersion and skewness that are based upon the function \(\Delta _{FG}(x) = R_{FG}(x) - x\). These orders, denoted \(\le _0\), \(\le _1\) and \(\le _2\) by Oja (1981), hold if \(\Delta _{FG}\) is non-negative, increasing and convex in the usual sense, respectively. For continuous distributions, \(\le _0\) coincides with the usual stochastic order. Under appropriate differentiability assumptions, the definition of all three orders can be unified by stating that \(F \le _k G~(k = 0, 1, 2)\) holds if \(\Delta _{FG}^{(k)} \ge 0\). Since the concepts of location, dispersion, skewness and kurtosis are hierarchically connected, as can be seen from the classical measures, the first, second, third and fourth standardized moment, \(\le _3\) seems to be the canonical basic kurtosis order. In particular, \(\le _3\) is naturally applicable to asymmetric distributions. In spite of these observations, the order \(\le _3\) is otherwise not considered in the literature except by Hosking (1989), who shows that his kurtosis measure based on L-moments preserves \(\le _3 \) for symmetric distributions. This disregard may partly be due to Oja (1981) criticizing the order for not being transitive.

If the order \(\le _3\) was lacking transitivity on symmetric distributions, this would indeed be a serious downside compared to \(\le _s\). However, in Sect. 2, it is proved that \(\le _3\) is transitive on this set. For the set of all distributions, we argue that transitivity cannot be expected, since skewness or asymmetry interferes with the quantification of kurtosis. This intrinsic entanglement was already mentioned by MacGillivray and Balanda (1988) and Balanda and MacGillivray (1990), and proposals for skewness-invariant kurtosis measures were made by Blest (2003) and Jones et al. (2011). In Sects. 2 and 3, this entanglement is shown to be related to the transitivity of kurtosis orders.

2 The kurtosis order \(\le _3\) and its transitivity properties

2.1 Basics

We begin by defining convex functions of order \(k \in \mathbb {N}_0\) and the induced stochastic orders.

Definition 1

Let \(I \subseteq \mathbb {R}\) be an open interval and let \(\varphi : I \rightarrow \mathbb {R}\) be a function. For \(k \in \mathbb {N}\) and \(x_0, \ldots , x_k \in I\) with \(x_0< \cdots < x_k\), the zeroth and k-th divided difference, respectively, of \(\varphi \) at \(x_0, \ldots , x_k\) are defined by

$$\begin{aligned} {[}x_0 \mid \varphi ]&= \varphi (x_0),\\ {[}x_0, \ldots , x_k \mid \varphi ]&= \frac{\left[ x_1, \ldots , x_k \mid \varphi \right] - \left[ x_0, \ldots , x_{k-1} \mid \varphi \right] }{x_k - x_0}. \end{aligned}$$

\(\varphi \) is said to be convex of order k or k-convex on I, if

$$\begin{aligned} {[}x_0, \ldots , x_k \mid \varphi ] \ge 0 \end{aligned}$$
(1)

holds for all \(x_0, \ldots , x_k \in I\) with \(x_0< \ldots < x_k\). Moreover, \(\varphi \) is said to be strictly convex of order k on I, if inequality (1) is strict.

The k-convexity of functions can also be defined via the non-negativity of determinants of \((k+1)\times (k+1)\)-matrices (see Oja 1981, p. 155). It is easy to see that both approaches are equivalent. Throughout this work, we assume the following.

Assumption 2

All (univariate) distribution functions have interval support and are three times differentiable. The interior of the support of a distribution function F is denoted by \(D_F\) and \(f=F'\) is assumed to be strictly positive on \(D_F\). The set of all such distribution functions is denoted by \(\mathscr {P}\).

Oja (1981, p. 156) defined a family of stochastic orders in the following way.

Definition 3

Let \(k \in \mathbb {N}_0\) and \(F, G \in \mathscr {P}\). Then, \(F \le _k G\) is said to hold, if the function

$$\begin{aligned} \Delta _{FG}: D_F \rightarrow \mathbb {R}, \quad x \mapsto R_{FG}(x) - x = G^{-1}(F(x)) - x \end{aligned}$$

is convex of order k.

Here, \(G^{-1}\) denotes both the inverse function and the quantile function of G, which coincide given our regularity conditions. Note that \(\le _0\) coincides with the usual stochastic order \(\le _{st}\), the most basic location order. Similarly, \(\le _1\) coincides with the basic dispersion order \(\le _{disp}\) and \(\le _2\) coincides with the basic skewness order \(\le _c\) by van Zwet (1964). For more details, see Oja (1981). Since the k-convexity of a k times differentiable function \(\varphi \) is equivalent to \(\varphi ^{(k)} \ge 0\), one obtains the following corollary (see, e.g., Oja 1981).

Corollary 4

Let \(k \in \mathbb {N}_0\) and \(F, G \in \mathscr {P}\). Then, \(F \le _k G\) is equivalent to \(\Delta _{FG}^{(k)} \ge 0\). If \(k \ge 2\), \(F \le _k G\) is also equivalent to \(R_{FG}^{(k)} \ge 0\).

2.2 Lack of transitivity and its implications

We now focus our attention on the order \(\le _3\) as a canonical choice for a basic kurtosis order. Whereas Definition 1 suggests that it is possible to relax the differentiability requirements in Assumption 2, it is important to note that we are concerned only with continuous distributions. Even the skewness order \(\le _2\) is virtually meaningless for discrete distributions (Eberl and Klar 2019), and the same holds for \(\le _3\).

In a rare mention in the literature, Oja (1981, p. 168) states without proof that \(\le _3\) is not transitive. This is confirmed by the following example.

Example 5

Define by

$$\begin{aligned} F: [0, 1] \rightarrow [0, 1],&\quad t \mapsto t^3,\\ G: [0, 1] \rightarrow [0, 1],&\quad t \mapsto t,\\ H: [0, 1] \rightarrow [0, 1],&\quad t \mapsto 1 - (1-t)^{1/3} \end{aligned}$$

three infinitely often differentiable distribution functions on the unit interval. Note that \(D_F = D_G = D_H = (0, 1)\). Since both \(R_{FG}\) and \(R_{GH}\) are shifted versions of the third monomial with restricted domains, both \(F \le _3 G\) and \(G \le _3 H\) hold. Straightforward calculations yield

$$\begin{aligned} R_{FH}'''(t)&= 18(28t^6-20t^3+1), \end{aligned}$$

which implies

$$\begin{aligned} R_{FH}'''(t) < 0 \text { for } t \in \left( \left( \tfrac{5-3\surd {2}}{14}\right) ^{1/3}, \left( \tfrac{5+3\surd {2}}{14}\right) ^{1/3} \right) \approx (0.378, 0.871) \subseteq [0, 1], \end{aligned}$$

and thereby contradicts \(F \le _3 H\).

The orders \(\le _0\), \(\le _1\) and \(\le _2\) can all be equivalently characterized by families of measures of location, dispersion and skewness, respectively. Because all such measures are mappings from a set of probability distributions to the real numbers, their values are compared using the transitive relation \(\le \). Since this is not compatible with the non-transitivity of the kurtosis order \(\le _3\), we obtain the following negative result.

Corollary 6

There does not exist a family \(\{\kappa _{\iota }: \mathscr {P}\rightarrow \mathbb {R}\,\mid \, \iota \in I\}\) of mappings such that

$$\begin{aligned} \kappa _{\iota }(F) \le \kappa _{\iota }(G) \quad \forall \iota \in I \end{aligned}$$

is equivalent to \(F\le _3 G\).

Note that stochastic orders are usually not strongly connected, which means that \(F \not \le _3 H\) does not imply \(H \le _3 F\). However, for the distributions in Example 5, it can be shown that \(H \le _3 F\) does indeed hold. This is a more disturbing result than the mere non-transitivity of \(\le _3\): if one additionally requires that \(\kappa \) preserves the strict version of \(\le _3\), i.e. that \(F <_3 G\) implies \(\kappa (F) < \kappa (G)\), it can be shown that there exists no mapping \(\kappa : \mathscr {P}\rightarrow \mathbb {R}\) that preserves the order \(\le _3\).

Remark 7

It should be emphasized that missing transitivity can also be found in more familiar areas. The best known example is probably the location order defined by \(X \le _p Y\) if the relative effect \(p=P(X<Y)+P(X=Y)/2\) is greater than or equal to 1/2. It is well known that this order is not transitive, as exemplified by non-transitive dice (Gardner 1970). Still, the empirical counterpart to p is the key quantity of important nonparametric tests like the Wilcoxon–Mann–Whitney, Fligner–Policello and Brunner–Munzel test (Divine et al. 2018).

An example for a non-transitive dispersion ordering is the dangerousness order: given random variables XY on the positive half line, X is said to be less dangerous than Y if there is some \(c\ge 0\) with \(F\le G\) on [0, c), \(F\ge G\) on \([c,\infty )\) and \(E(X)\le E(Y)\). Here, the situation somewhat differs from the foregoing example, since the dangerousness order has a transitive closure, the convex order (Müller 1996).

The observation preceding Remark 7 suggests that kurtosis measures in the classical order-based sense, used by Oja (1981) among others, do not exist. In the literature, \(\le _s\) is usually chosen as the kurtosis order to be preserved by a kurtosis measure. Because of the limitations of \(\le _s\), this can only be used to validate kurtosis measures for symmetric distributions, which is unsatisfactory for the reasons mentioned in Sect. 1. The question how to use the much more general applicability of the order \(\le _3\) in spite of its non-transitivity can be answered in two ways.

The first possibility is to move away from the classical idea of measures of kurtosis and instead consider functionals that quantify the difference in kurtosis between two given distributions. For example, consider the quantile-based mapping

$$\begin{aligned} \kappa _Q^{\alpha ,\eta }: \mathscr {P}\rightarrow \mathbb {R}, \quad F \mapsto \frac{F^{-1}(1-\alpha )-3F^{-1}(1-\eta )+3F^{-1}(\eta )-F^{-1}(\alpha )}{F^{-1}(1-\eta )-F^{-1}(\eta )} \end{aligned}$$

for \(0< \alpha< \eta < 1/2\), which is listed as a kurtosis measure by Ruppert (1987), Balanda and MacGillivray (1988) and Jones et al. (2011) among others, since it preserves the order \(\le _s\). Similarly constructed quantile-based mappings using lower-order differences are measures of location, dispersion and skewness and can even be used to characterize the orders \(\le _0\), \(\le _1\) and \(\le _2\) in the sense of Corollary 6. By customizing the evaluation points to a second distribution, one arrives at the functional

$$\begin{aligned} \kappa _{QF}^\alpha (F,G) = \frac{G^{-1}(1-\alpha ) - 3G^{-1}\left( \eta _F(1-\alpha )\right) + 3G^{-1}\left( \eta _F(\alpha )\right) - G^{-1}(\alpha )}{G^{-1}(1-\alpha ) - G^{-1}(\alpha )} \end{aligned}$$

\((0<\alpha <1/2)\), where

$$\begin{aligned} \eta _F(q) = F\left( \tfrac{2}{3}F^{-1}(q)+\tfrac{1}{3}F^{-1}(1-q)\right) . \end{aligned}$$

This functional preserves the kurtosis order \(\le _3\) even for asymmetric distributions, as the following result shows.

Proposition 8

Let \(F, G \in \mathscr {P}\). Then \(F \le _3 G\) implies \(\kappa _{QF}^\alpha (F, G) \ge 0\).

Proof

According to Definitions 1 and 3, \(F \le _3 G\) is equivalent to

$$\begin{aligned} \frac{ \frac{G^{-1}(p_3)-G^{-1}(p_2)}{F^{-1}(p_3)-F^{-1}(p_2)} -\frac{G^{-1}(p_2)-G^{-1}(p_1)}{F^{-1}(p_2)-F^{-1}(p_1)} }{F^{-1}(p_3)-F^{-1}(p_1)} -\frac{ \frac{G^{-1}(p_2)-G^{-1}(p_1)}{F^{-1}(p_2)-F^{-1}(p_1)} -\frac{G^{-1}(p_1)-G^{-1}(p_0)}{F^{-1}(p_1)-F^{-1}(p_0)} }{F^{-1}(p_2)-F^{-1}(p_0)}&\ge 0 \end{aligned}$$
(2)

for all \(0< p_0< p_1< p_2< p_3 < 1\). By choosing \(p_0 = \alpha \), \(p_1 = \eta _F(\alpha )\), \(p_2 = \eta _F(1-\alpha )\) and \(p_3 = 1 - \alpha \), (2) boils down to

$$\begin{aligned} G^{-1}(1-\alpha )-3G^{-1}(\eta _F(1-\alpha ))+3G^{-1}(\eta _F(\alpha ))-G^{-1}(\alpha )\ge 0. \end{aligned}$$

The subsequent division by the \(\alpha \)-interquantile range of G is done to obtain a scale-invariant functional. \(\square \)

Remark 9

Again, the situation is similar for the Wilcoxon–Mann–Whitney location order \(\le _p\) in Remark 7. The usual unbiased estimator for the relative effect is a U-statistic involving both samples, and there cannot exist a measure depending on one sample like the mean or median, which is consistent with \(\le _p\) in general.

2.3 Transitivity sets

The second possibility is to restrict the comparison of kurtosis to suitable subsets of distributions, e.g. the subset of symmetric distributions. In the following, we analyze the transitivity sets of the order \(\le _3\). As a starting point, all pairs of distributions that are ordered with respect to \(\le _3\) are divided into two mutually exclusive categories. For that, let \(F, G \in \mathscr {P}\) satisfy \(F \le _3 G\), implying that the function \(R_{FG}''\) is increasing. Now, F and G are either skewness-comparable with respect to \(\le _2\), i.e., \(F \le _2 G\) or \(G \le _2 F\) holds, or they are not. In the latter case, \(R_{FG}\) has an inflection point at a \(t_{FG} \in D_F = {\text {int}}({\text {supp}}(F))\) with \(R_{FG}''(t) \le 0\) for \(t \le t_{FG}\) and \(R_{FG}''(t) \ge 0\) for \(t \ge t_{FG}\). More specifically, there exist values \(t_\ell , t_u \in D_F\) with \(t_\ell< t_{FG} < t_u\) such that \(R_{FG}''(t_\ell ) < 0\) and \(R_{FG}''(t_u) > 0\). The inflection point at \(t_{FG}\) is, in general, not unique since \(R_{FG}\) can be linear on a given non-degenerate interval. However, any inflection point of \(R_{FG}\) can be uniquely identified by the value \(p_{FG} = F(t_{FG}) \in (0, 1)\). Furthermore, note that \(F \le _2 G\) and \(G \le _2 F\) can be viewed as limiting cases with \(t_{FG} = \inf D_F\) or \(t_{FG} = \sup D_F\), yielding \(p_{FG} = 0\) or \(p_{FG} = 1\), respectively. So in order to obtain the most general setting, we allow \(t_{FG} \in \overline{D_F} = {\text {supp}}(F)\).

Definition 10

Let F and G be two cdf’s satisfying \(F \le _3 G\). A value \(p_{FG} \in [0, 1]\) is said to be an inflection value of F and G, if \(R_{FG}''(t) \le 0\) for all \(t \le F^{-1}(p_{FG})\) and \(R_{FG}''(t) \ge 0\) for all \(t \ge F^{-1}(p_{FG})\). The set of all inflection values of F and G is denoted by \(\Pi _{FG}\).

As stated before, any pair FG satisfying \(F \le _3 G\) has at least one inflection value. Requiring \(R_{FG}'''(t) > 0\) for all \(t \in D_F\) is sufficient for the inflection value \(p_{FG}\) to be unique. With this in mind, we analyze more closely why \(\le _3\) is not transitive. Let FG and H satisfy \(F \le _3 G\) and \(G \le _3 H\). Then,

$$\begin{aligned} R_{FH}(t) =&\; H^{-1}(F(t)) = H^{-1}(G(G^{-1}(F(t)))) = R_{GH}(R_{FG}(t)),\nonumber \\ R_{FH}''(t) =&\; R_{GH}''(R_{FG}(t)) \cdot (R_{FG}'(t))^2 + R_{GH}'(R_{FG}(t)) \cdot R_{FG}''(t), \end{aligned}$$
(3)
$$\begin{aligned} R_{FH}'''(t) =&\; R_{GH}'''(R_{FG}(t)) \cdot (R_{FG}'(t))^3 + R_{GH}'(R_{FG}(t)) \cdot R_{FG}'''(t)\nonumber \\&\quad + 3 R_{GH}''(R_{FG}(t)) \cdot R_{FG}'(t) \cdot R_{FG}''(t) \end{aligned}$$
(4)

holds for all \(t \in D_F\). Note that \(R_{FG}\) and \(R_{GH}\) are increasing as a composition of two increasing functions. Hence, the first two summands on the right side of Eq. (4) are non-negative and

$$\begin{aligned} R_{GH}''(G^{-1}(p)) \cdot R_{FG}''(F^{-1}(p)) \ge 0 \quad \text {for all } p \in (0, 1) \end{aligned}$$

is a sufficient condition for \(F \le _3 H\). By assumption, the sets \(\Pi _{FG}\) and \(\Pi _{GH}\) are both non-empty. If the intersection of these two sets is also non-empty, i.e., if there exists a \(p_0 \in [0, 1]\) such that \(p_0 \in \Pi _{FG}\) and \(p_0 \in \Pi _{GH}\), the signs of \(R_{FG}''(F^{-1}(p))\) and \(R_{GH}''(G^{-1}(p))\) coincide for all \(p \in (0, 1)\) since they are both non-positive for \(p < p_0\) and both non-negative for \(p > p_0\). Otherwise, if the intersection of \(\Pi _{FG}\) and \(\Pi _{GH}\) is empty, choose a representative from each set such that their difference is minimal. Assuming without restriction that \(p_{FG} < p_{GH}\), where \(p_{FG} \in \Pi _{FG}\) and \(p_{GH} \in \Pi _{GH}\), it follows that

$$\begin{aligned} R_{GH}''(G^{-1}(p)) \cdot R_{FG}''(F^{-1}(p)) < 0 \quad \text {for all } p \in (p_{FG}, p_{GH}). \end{aligned}$$

We summarize our results thus far in the following proposition.

Proposition 11

Let \(p_0 \in [0, 1]\) and let \(\mathscr {F}_0\) be a set of cdf’s such that any pair \(F, G \in \mathscr {F}_0\) with \(F \le _3 G\) has \(p_0\) as an inflection value. Then, the order \(\le _3\) is transitive on \({\mathscr {F}}_0\).

We now study the structure of the sets mentioned in Proposition 11 or suitable subsets thereof. First, we assume that F and G with \(F \le _3 G\) have an inflection value \(p_{FG} \in (0, 1)\). The fact that \(p_{FG} = F(t_{FG}) \in (0, 1)\) is an inflection value of the pair FG is equivalent to \(R_{FG}''(t_{FG}) = 0\). Denoting by f and g the derivatives of F and G, respectively, we get

$$\begin{aligned} R_{FG}''(t)&= \frac{f'(t) \cdot (g(R_{FG}(t)))^2 - f^2(t) \cdot g'(R_{FG}(t))}{(g(R_{FG}(t)))^3} \end{aligned}$$
(5)

for all \(t \in D_F\). Hence, \(p_{FG}\) is an inflection value of F and G, if and only if

$$\begin{aligned} \frac{f'(F^{-1}(p_{FG}))}{(f(F^{-1}(p_{FG})))^2}&= \frac{g'(G^{-1}(p_{FG}))}{(g(G^{-1}(p_{FG})))^2}. \end{aligned}$$
(6)

Thus, any pair that is ordered with respect to \(\le _3\) out of a given set of cdf’s has the same inflection value \(p_0 \in (0, 1)\), if and only if

$$\begin{aligned} \gamma _D^{p_0}(F) = \frac{f'(F^{-1}(p_0))}{(f(F^{-1}(p_0)))^2} \end{aligned}$$

coincides for all cdf’s F in the set. The following result is obtained by combining this observation with Proposition 11.

Proposition 12

Let \(p_0 \in (0, 1)\) and let \(\mathscr {F}_0\) be a set of cdf’s such that \(\gamma _D^{p_0}(F)\) coincides for all \(F \in \mathscr {F}_0\). Then, all pairs \(F, G \in \mathscr {F}_0\) with \(F \le _3 G\) have \(p_0\) as an inflection value.

If \(p_{FG} \in \{0, 1\}\) is the sole inflection value of F and G with \(F \le _3 G\), (5) is not valid because the densities f and g are not uniquely defined at the edges of their respective supports. Thus, no easily verifiable sufficient condition for inflection points as in Proposition 12 can be obtained in this case. In summary, defining the set

$$\begin{aligned} {\mathscr {T}}_{D, p}^t = \{ F \in \mathscr {P}: \gamma _D^p(F) = t \} \end{aligned}$$

for all \(p \in (0, 1)\) and all \(t \in \mathbb {R}\) gives the following result.

Theorem 13

For any \(t \in \mathbb {R}\) and any \(p \in (0, 1)\), the kurtosis order \(\le _3\) is transitive on the set \({\mathscr {T}}_{D, p}^t\).

As mentioned in Sect. 1, a number of authors have identified an intrinsic entanglement between skewness and kurtosis. By considering the mapping \(\gamma _D^p\) more closely, this observation is confirmed and refined by Theorem 13. Recall that the critical property of a skewness measure \(\gamma : \mathscr {P}\rightarrow \mathbb {R}\) is that it preserves the skewness order \(\le _2\), i.e. that \(F \le _2 G\) implies \(\gamma (F) \le \gamma (G)\) for all \(F, G \in \mathscr {P}\). Since \(F \le _2 G\) is equivalent to \(R_{FG}'' \ge 0\), changing Eqs. (5) and (6) into inequalities yields that \(\gamma _D^p\) preserves \(\le _2\) for all \(p \in (0, 1)\) and thus measures skewness. In fact, \(\gamma _D^p(F) \le \gamma _D^p(G)\) for all \(p \in (0, 1)\) is equivalent to \(F \le _2 G\), so these measures characterize the order \(\le _2\) in a way that is not possible for \(\le _3\) according to Corollary 6. However, for \(p \ne 1/2\), \(\gamma _D^p\) measures skewness in an asymmetric or non-central way because the additional requirement \(\gamma _D^p(-X) = -\gamma _D^p(X)\) (see,e.g., Groeneveld and Meeden 1984, p. 393) is not satisfied.

The fact that \(\le _3\) is transitive, if a suitable skewness measure is constant, suggests that the non-transitivity of \(\le _3\) on the set of all cdf’s is because pairs of cdf’s with differing degrees of skewness lack comparability with respect to kurtosis. As opposed to location and dispersion, a distribution cannot be standardized with respect to skewness by an arithmetic operation like addition for location and scalar multiplication for dispersion. Thus, in order to obtain a transitive kurtosis order without interference caused by skewness, attention has to be restricted to sets of constant skewness. Note that, for all \(p \in (0, 1)\), the sets \({\mathscr {T}}_{D, p}^t, t \in \mathbb {R},\) constitute a partition of the set \(\mathscr {P}\) of distributions. For each partition, p is also the inflection value of every kurtosis comparable pair of distributions from the same transitivity set of the partition. Thus, each \(F \in \mathscr {P}\) lies within a subset of \(\mathscr {P}\) on which \(\le _3\) is transitive. In the light of these observations, one could adapt the classical order-based approach to define measures of location, dispersion and skewness to kurtosis. Instead of requiring a mapping \(\kappa : \mathscr {P}\rightarrow \mathbb {R}\) to generally preserve the order \(\le _3\), one could require the restriction of \(\kappa \) to the transitivity set \({\mathscr {T}}_{D,1/2}^t\) to preserve \(\le _3\) for all \(t \in \mathbb {R}\).

These observations raise the question whether there exist other skewness measures that induce transitivity sets analogous to Theorem 13. To that end, note that a simple sufficient condition for the term \(\gamma _D^{p_0}(F)\) to coincide is to require \(f'(F^{-1}(p_0)) = 0\) for all cdf’s F in the given set. Hence, for each \(p_0 \in (0, 1)\), \(\le _3\) is transitive on the set of all cdf’s, the density of which has a stationary point at the \(p_0\)-quantile. One well known point, at which this commonly occurs, is the mode of a distribution. For the following considerations, we assume that all distributions are unimodal and denote the mode of F by \(M_F\). If the mode lies in the interior of the support, the assumptions on F directly yield \(f'(M_F) = 0\). It follows that, for any \(p \in (0, 1)\), \(\gamma _D^p(F) = 0\) holds for all cdf’s F in the set

$$\begin{aligned} {{\mathscr {T}}}_{Mode}^{\tilde{p}} = \{F: M_F = F^{-1}(p)\} = \{F: 1 - 2 F(M_F) = \tilde{p}\}, \end{aligned}$$

where \(\tilde{p} = 1 - 2p\). In combination with Propositions 11 and 12, this observation yields the following result.

Theorem 14

For any \(\tilde{p} \in (-1, 1)\), the kurtosis order \(\le _3\) is transitive on the set \({\mathscr {T}}_{Mode}^{\tilde{p}}\).

For any \(\tilde{p} \in (-1, 1)\) and any pair of cdf’s \(F, G \in {\mathscr {T}}_{Mode}^{\tilde{p}}\) with \(F \le _3 G\), the corresponding inflection value is given by \(p = (\tilde{p}+1)/2\). Arnold and Groeneveld (1995, p. 35) showed that \(\gamma _{Mode}(F) = 1 - 2 F(M_F), F \in \mathscr {P},\) is a measure of skewness, which entails that it preserves the skewness order \(\le _2\). Thus, the transitivity of \(\le _3\) on the sets \({\mathscr {T}}_{Mode}^{\tilde{p}}\) has a similar interpretation to before: for \(\le _3\) to be transitive, the skewness of the involved distributions needs to be constant in some sense.

For distributions with modes at the boundaries of their supports, the above transitivity property does not hold, i.e., \(\le _3\) is not transitive on \({\mathscr {T}}_{Mode}^{-1}\) and \({\mathscr {T}}_{Mode}^1\) in general. The crucial result in Proposition 12 does not hold in these cases. Counterexamples can be constructed using Weibull distributions, applying the results given in Sect. 4 below. Thus, the sets \({\mathscr {T}}_{Mode}^{\tilde{p}}, \tilde{p} \in (-1, 1),\) do not provide a partition of the set of all (sufficiently regular) probability distributions on the real numbers.

The notion of a mode can be generalized without losing the transitivity of \(\le _3\) on the corresponding sets \({\mathscr {T}}_{Mode}^{\tilde{p}}\). Specifically, Theorem 14 still holds if f only attains a local maximum at \(M_F\), no longer assuming F to be unimodal. However, Arnold and Groeneveld (1995) only proved \(\gamma _{Mode}\) to be a skewness measure under the assumption of unimodality.

The relationships between the transitivity sets found in this section and their connection to the set of all symmetric distributions are summarized in the following remark.

Remark 15

Let \(\tilde{p} \in (-1, 1)\) and let \(F \in {\mathscr {T}}_{Mode}^{\tilde{p}}\) be unimodal. It follows that \(M_F = F^{-1}(p)\), where \(p = (\tilde{p}+1)/2 \in (0, 1)\). Since \(M_F\) lies within the interior of the support of F, we obtain \(f'(F^{-1}(p)) = 0\) and therefore \(\gamma _D^{p}(F) = 0\). Thus, the inclusion \({\mathscr {T}}_{Mode}^{\tilde{p}} \subseteq {\mathscr {T}}_{D, p}^0\) holds for all \(p \in (0, 1)\) with \(\tilde{p} = 2p-1\). In particular, \({\mathscr {T}}_{Mode}^{0} \subseteq {\mathscr {T}}_{D, 1/2}^0\).

Now, let \(F \in \mathscr {P}\) be symmetric, denoted by \(F \in \mathscr {S}\). Since both \(\gamma _D^p\) and \(\gamma _{Mode}\) are invariant under transformations of the form \(x \mapsto a x + b\) for \(a > 0\) and \(b \in \mathbb {R}\), we can assume without restriction that the symmetry centre of F is 0. Because this implies \(\gamma _D^{1/2}(F) = f'(0)/(f(0))^2 = 0\), we obtain the inclusion \(\mathscr {S}\subseteq {\mathscr {T}}_{D, 1/2}^0\). If, additionally, F is assumed to be unimodal, \(M_F = 0\) and \(\gamma _{Mode}(F) = 0\) follows. Thus, in this case, \(\mathscr {S} \subseteq {\mathscr {T}}_{Mode}^0 \subseteq {\mathscr {T}}_{D,1/2}^0\) holds.

Since \(\le _3\) is transitive on \({\mathscr {T}}_D^0\), it is also transitive on the set of all symmetric cdf’s. Oja (1981), virtually the only work which mentions the order \(\le _3\), dismissed it due to its non-transitivity, and instead focused on the previously mentioned concave-convex order \(\le _s\). However, Oja restricted his considerations concerning kurtosis to symmetric distributions, and therefore also proved the transitivity of \(\le _s\) only on this class. Since \(\le _3\) is also transitive on symmetric distributions, Oja’s argument is not convincing.

2.4 Equivalence with respect to \(\le _3\)

Two distributions \(F, G \in \mathscr {P}\) are said to be equivalent with respect to \(\le _3\), denoted by \(F =_3 G\), if both \(F \le _3 G\) and \(G \le _3 F\) hold. This is equivalent to \(R_{FG}''' \ge 0\) and \(R_{GF}''' \ge 0\). Using \(R_{GF}=R_{FG}^{-1}\) to rewrite the third derivative of \(R_{GF}\) as

$$\begin{aligned} R_{GF}'''(t) = \frac{3 (R_{FG}''(R_{GF}(t)))^2 - R_{FG}'''(R_{GF}(t)) R_{FG}'(R_{GF}(t))}{(R_{FG}'(R_{GF}(t)))^5}, \end{aligned}$$

it follows that

$$\begin{aligned} G \le _3 F \Leftrightarrow R_{FG}'''(t) \le 3 \frac{(R_{FG}''(t))^2}{R_{FG}'(t)} \quad \forall t \in D_F. \end{aligned}$$

Hence, we have the following result.

Proposition 16

\(F =_3 G\) holds, if and only if \(R_{FG}\) satisfies the differential inequality

$$\begin{aligned} 0 \le \varphi '''(t) \le 3 \frac{(\varphi ''(t))^2}{\varphi '(t)} \quad \forall t \in D_F. \end{aligned}$$
(7)

The fact that \(F =_3 G\) is not equivalent to \(R_{FG}''' \equiv 0\) is notable as it systematically differs from what can be observed with the orders \(\le _0\), \(\le _1\) and \(\le _2\) of location, dispersion and skewness. Equivalence with respect to any of these orders occurs if and only if the corresponding derivative of \(R_{FG}\) is constantly zero. Thus, \(F =_0 G\) is equivalent to \(F = G\), \(F =_1 G\) is equivalent to \(F(\cdot ) = G(\cdot + b)\) for a \(b \in \mathbb {R}\), and \(F =_2 G\) is equivalent to \(F(\cdot ) = G(a \cdot + b)\) for an \(a > 0\) and a \(b \in \mathbb {R}\). Heuristically, equivalence with respect to dispersion means that G is a shifted or relocated version of F and equivalence with respect to skewness means that G is a shifted and rescaled version of F, allowing for changes in location and dispersion. This suggests that the functions satisfying the differential inequality (7) can change the location, the dispersion and the skewness of a distribution while being kurtosis-invariant. However, the fact that this family of functions is not as simple as the family of all affine linear transformations suggests that there exists no simple operation to standardize distributions with respect to skewness.

In the following example, Proposition 16 is applied to monomials.

Example 17

Let \(R_{FG}(t)=t^p, 0<t<1,\) for some \(p>0\). This arises, for example, for \(F(t)=t, G(t)=t^{1/p}\), for \(F(t)=t^p, G(t)=t\), or, with support \(t>0\), for Weibull distributions (see Sect. 4). For \(p \notin \{1, 2\}\), \(F \le _3 G\) is equivalent to

$$\begin{aligned} 0 \le R_{FG}'''(t) = p(p-1)(p-2)t^{p-3} \ \forall t \quad \Leftrightarrow \quad p \notin (1, 2). \end{aligned}$$

Since \(R_{FG}''' \equiv 0\) for \(p \in \{1, 2\}\), \(F \le _3 G\) is equivalent to \(p \notin (1, 2)\). Conversely, for \(p \notin \{1, 2\}\), \(G \le _3 F\) is equivalent to

$$\begin{aligned} R_{FG}'''(t) \le 3 \frac{(R_{FG}(t)'')^2}{R_{FG}'(t)} = 3p(p-1)^2 t^{p-3} \ \forall t \quad \Leftrightarrow \quad p \notin (\tfrac{1}{2}, 1). \end{aligned}$$

Since the inequality is obviously satisfied for \(p \in \{1, 2\}\), \(G \le _3 F\) is equivalent to \(p \notin (\frac{1}{2}, 1)\). Overall, \(F =_3 G\) is satisfied, if and only if

$$\begin{aligned} p \in (0,1/2] \cup \{1\} \cup [2,\infty ). \end{aligned}$$

In particular, \(F(t)=t, t \in (0, 1),\) and \(G(t)=t^2, t \in (0, 1),\) are equivalent with respect to \(\le _3\).

Note that \(R_{FG}''' \equiv 0\) and therefore also \(F =_3 G\) holds if \(R_{FG}\) is any polynomial of degree \(\le 2\). While \(F =_2 G\) is equivalent to \(R_{FG}\) being a polynomial of degree \(\le 1\), the fact that \(R_{FG}\) is a polynomial of degree \(\le 2\) is only a sufficient, but not a necessary condition for \(F =_3 G\).

3 Concave–convex kurtosis orders

In the literature, there exist two major proposals for generalizing the concave–convex order \(\le _s\) to asymmetric distributions, denoted by \(\le _a\) and \(\le _S\) (see MacGillivray and Balanda 1988; Balanda and MacGillivray 1990). The order \(\le _S\) is not considered further in the present work because it disregards a critical amount of information, as expanded upon in Sect. 1. The critical drawback of the order \(\le _a\) can best be explained using the notion of the inflection value from Definition 10. Just like in our considerations in Sect. 2.3, \(F \le _a G\) requires that the function \(R_{FG}\) has one change from negative to positive curvature, whose location can be identified by an inflection value \(p_{FG} \in (0, 1)\). While \(p_{FG} = 1/2\) necessarily holds if F and G are symmetric, there is no reason to assume it to be a prerequisite for two asymmetric distributions to be ordered with respect to kurtosis. Thus, whereas the generally applicable order \(\le _3\) is stronger than \(\le _s\) in a symmetric setting, the same can not be said about the generalized version \(\le _a\) of \(\le _s\) in a general setting. In the following, we propose an alternative generalization of \(\le _s\) that is not a priori restricted to a specific inflection value.

Definition 18

F is said to be less kurtotic in the concave–convex sense than G, denoted by \(F \le _{gs} G\), if there exists a \(p_{FG} \in [0, 1]\) such that \(R_{FG}\) is concave on \(D_F \cap (- \infty , F^{-1}(p_{FG}))\) and convex on \(D_F \cap (F^{-1}(p_{FG}), \infty )\).

The fact that \(F \le _3 G\) implies \(F \le _{gs} G\) for all \(F, G \in \mathscr {P}\) is a direct consequence of Theorem 20 below. The essential difference between the two orders is that the first requires that a function (in this case \(R_{FG}''\)) is increasing whereas the second requires that the same function changes values from negative to positive at some point. This principle has also been used in the literature to obtain weakenings of other orders from the family \(\le _k, k \in \mathbb {N}_0\). As an example, we can consider the visually more striking characteristic of dispersion based on the order \(\le _1\). Instead of assuming that \(\Delta _{FG}\) increases, which is equivalent to \(F \le _1 G\), we can require that the values of \(\Delta _{FG}\) switch from negative to positive at some point. A similar dispersion order has been proposed by Oja (1981, p. 158). He writes \(F \le _1^* G\) if there exists \(x_0 \in D_F\) such that \(\Delta _{FG}(x) \le E(Y) - E(X)\) for \(x \le x_0\) and \(\Delta _{FG}(x) \ge E(Y) - E(X)\) for \(x \ge x_0\). The sole difference to the order introduced before is the threshold, which changes from zero to the difference of the expectations. Unlike zero, the difference of the expectations is guaranteed to be taken as a value of \(\Delta _{FG}\) at some point. This can be seen by considering the centred versions of F and G. If, for example, the locations of F and G differ substantially, using the threshold zero is obviously not reasonable.

This line of argument can also be applied to the order \(\le _{gs}\) and the function \(R_{FG}''\). For general distribution functions F and G, there is no reason to assume that \(R_{FG}''\) takes the value zero at some point. Thus, Definition 18 needs to be modified. However, because F and G can only be standardized with respect to location and dispersion and not with respect to skewness, we cannot use the same technique as for \(\le _1^*\) to obtain an alternative threshold. Therefore, the following definition uses a variable threshold.

Definition 19

Let \(t_0 \in \mathbb {R}\). Then, F is said to be less kurtotic than G in the concave-convex sense with threshold \(t_0\), denoted by \(F \le _{gs}^{t_0} G\), if there exists a \(p_{FG}^{t_0} \in [0, 1]\) such that \(R_{FG}''(t) \le t_0\) holds for all \(t \in D_F \cap (- \infty , F^{-1}(p_{FG}^{t_0}))\) and \(R_{FG}''(t) \ge t_0\) holds for all \(t \in D_F \cap (F^{-1}(p_{FG}^{t_0}), \infty )\).

Note that the orders \(\le _{gs}^{0}\) and \(\le _{gs}\) coincide. While the order \(\le _{gs}^{t_0}\) is formally defined for all \(t_0 \in \mathbb {R}\), it is only meaningful if \(t_0 \in \textrm{int}(R_{FG}''(D_F))\). Otherwise, it is obvious that either \(R_{FG}''(t) \le t_0\) or \(R_{FG}''(t) \ge t_0\) holds for all \(t \in D_F\). Hence, all thresholds \(t_0 \in \textrm{int}(R_{FG}''(D_F))\) are said to be reasonable. The only exception is the case that the set of reasonable thresholds is empty, which is equivalent to \(R_{FG}''\) being constant. In this case, the sole value of \(R_{FG}''\) is the only candidate for a reasonable threshold.

The relationship between \(\le _3\) and the family \(\le _{gs}^{t_0}, t_0 \in \mathbb {R},\) given in the following theorem, underpins the idea that the latter consists of natural weakenings of \(\le _3\).

Theorem 20

Let \(F, G \in \mathscr {P}\). Then, \(F \le _3 G\) is equivalent to \(F \le _{gs}^{t_0} G\) for all \(t_0 \in \textrm{int}(R_{FG}''(D_F))\).

Proof

The implication from left to right holds by construction. For the reverse implication, let \(t_1 \in D_F\). If \(t_1\) lies within an interval on which \(R_{FG}''\) is constant, \(R_{FG}'''(t_1) = 0\) follows. Otherwise, it follows that \(t_0 = R_{FG}''(t_1) \in \textrm{int}(R_{FG}''(D_F))\). Now

$$\begin{aligned} R_{FG}'''(t_1) = \lim _{\varepsilon \searrow 0} \frac{R_{FG}''(t_1 + \varepsilon ) - R_{FG}''(t_1 - \varepsilon )}{2 \varepsilon } \ge 0 \end{aligned}$$

holds because of \(R_{FG}''(t_1 + \varepsilon ) \ge t_0\) and \(R_{FG}''(t_1 - \varepsilon ) \le t_0\) by assumption. The assertion follows since \(t_1\) was arbitrary. \(\square \)

In Theorem 20, the set \(\textrm{int}(R_{FG}''(D_F))\) can be replaced by \(\mathbb {R}\) because either \(R_{FG}''(t) \le t_0\) or \(R_{FG}'' \ge t_0\) is true by construction for all unreasonable thresholds \(t_0 \notin \textrm{int}(R_{FG}''(D_F))\).

The following result states that the proposed extension of the concave-convex order \(\le _s\) to asymmetric distributions is not transitive in general, implying that it is not superior to \(\le _3\) in this respect.

Proposition 21

For all \(t_0 \in \mathbb {R}\), the kurtosis order \(\le _{gs}^{t_0}\) is not transitive in general.

Proof

A counterexample can be obtained for all \(t_0 \in \mathbb {R}\) by reusing Example 5 with a rescaled version of \(H^{-1}\). For that, let \(c>0\) and

$$\begin{aligned} H: [0, c] \rightarrow [0, 1], \quad t \mapsto 1 - \root 3 \of {\frac{c-t}{c}}. \end{aligned}$$

This implies that the functions \(R_{GH}\) and \(R_{FH}\) as well as all of their derivatives are multiplied by the factor c. So, additionally to \(F \le _3 G\), \(R_{GH}'''(t) = 6c \ge 0\) holds for all \(t \in [0, 1]\), and, thus, \(G \le _3 H\). By Theorem 20, \(F \le _{gs}^{t_0} G\) and \(G \le _{gs}^{t_0} H\) hold for all \(t_0 \in \mathbb {R}\). In contrast, we have

$$\begin{aligned} R_{FH}''(t) = 18c (4t^7-5t^4+t) {\left\{ \begin{array}{ll} < 0 \quad &{}\text { for } t \in (2^{-\frac{2}{3}}, 1),\\ = 0 \quad &{}\text { for } t \in \{0, 2^{-\frac{2}{3}}, 1\},\\ > 0 \quad &{}\text { for } t \in (0, 2^{-\frac{2}{3}}). \end{array}\right. } \end{aligned}$$

It follows that, for any \(t_0 > 0\), there exists \(c > 0\) such that \(R_{FH}''\) first takes values smaller than \(t_0\), then larger, and finally smaller again. For any \(t_0 < 0\), there exists \(c > 0\) such that \(R_{FH}''\) first takes values larger than \(t_0\), then smaller and finally larger again. For \(t_0 = 0\), we obtain \(R_{FH}''(t) \ge 0\) for \(t \le 2^{-2/3}\) and \(R_{FH}''(t) \le 0\) for \(t \ge 2^{-2/3}\). All three cases pose a contradiction to \(F \le _{gs}^{t_0} G\). \(\square \)

For symmetric cdf’s F and G, \(R_{FG}\) always has an inflection point at \(F^{-1}(1/2)\). Thus, \(\le _{gs}\) is equivalent to \(\le _s\) on \(\mathscr {S}\) and therefore also transitive on \(\mathscr {S}\) (see Oja 1981, p. 165). The situation is different for \(\le _{gs}^{t_0}, t_0 \ne 0\) because the critical switch from \(R_{FG}''(t) \le t_0\) to \(R_{FG}''(t) \ge t_0\) cannot occur at \(F^{-1}(1/2)\) due to the point symmetry of \(R_{FG}\).

Remark 22

The specific order \(\le _{gs}^0\) (or, equivalently, \(\le _{gs}\)) can be altered slightly to become transitive on the more general sets \({\mathscr {T}}_{Mode}^{\tilde{p}}, \tilde{p} \in (-1, 1),\) and \({\mathscr {T}}_{D, p}^t, t \in \mathbb {R}, p \in (0, 1)\). For two cdf’s F and G, we say that \(F <_{gss} G\) holds if there exists a \(p_{FG} \in [0, 1]\) such that \(R_{FG}''\) is strictly negative on \(D_F \cap (- \infty , F^{-1}(p_{FG}))\), and strictly positive on \(D_F \cap (F^{-1}(p_{FG}), \infty )\). Note that \(<_{gss}\) is not equivalent to \(<_{gs}\) since the latter is defined by

$$\begin{aligned} F <_{gs} G \Leftrightarrow F \le _{gs} G \text { and } F \ne _{gs} G \Leftrightarrow F \le _{gs} G \text { and } G \not \le _{gs} F, \end{aligned}$$

as usual for strict versions of orders. To see that \(<_{gss}\) is transitive, let \(p \in (0, 1)\) and \(F, G, H \in {\mathscr {T}}_{D, p}^{t}\) with \(F <_{gss} G\) and \(G <_{gss} H\). By the line of reasoning used to prove Proposition 12 and Theorem 13, \(R_{FG}''(F^{-1}(p)) = 0 = R_{GH}''(G^{-1}(p))\) then holds. Since, by definition of \(<_{gss}\), there exists at most one \(t \in D_F\) and one \(s \in D_G\) such that \(R_{FG}''(t) = 0\) and \(R_{GH}''(s) = 0\), \(t = F^{-1}(p)\) and \(s = G^{-1}(p)\) follows. Considering (3) for \(t = F^{-1}(p)\) along with the fact that \(R_{GH}\) is increasing, this yields \(R_{FH}''(F^{-1}(q)) < 0\) for \(q < p\) and \(R_{FH}''(F^{-1}(q)) > 0\) for \(q > p\). Overall, \(F <_{gss} H\) follows. The transitivity of \(<_{gss}\) on the sets \({\mathscr {T}}_{Mode}^{\tilde{p}}, p \in (-1, 1),\) now follows from \({\mathscr {T}}_{Mode}^{\tilde{p}} \subseteq {\mathscr {T}}_{D, p}^0\), where \(p = (\tilde{p}+1)/2\).

It is not possible to show the transitivity of the order \(\le _{gs}\) on the given sets in the same way as for \(<_{gss}\), since, assuming \(F \le _{gs} G\), \(R_{FG}''(F^{-1}(p)) = 0\) for any \(p \in (0,1)\) is not sufficient to infer that p is an inflection value. Because the concavity and the convexity of \(R_{FG}\) on either side of the actual inflection value is not assumed to be strict, the function could be convex on both sides of \(F^{-1}(p)\) or concave on both sides.

4 Application to specific distributions

4.1 Weibull distribution

As an example of a well-known family of distributions with varying degrees of skewness, we consider Weibull distributions. Without restriction, we set the scale parameter to 1, and denote the distribution with shape parameter k by \({\text {W}}(k)\). Let \(X \sim {\text {W}}(k), Y \sim {\text {W}}(\ell )\) for \(0<k<\ell \). For \(t > 0\), we have \(R_{FG}(t) = t^{k/\ell }\). It follows directly from Example 17 that \(F \le _3 G\) holds for all \(k<\ell \), whereas \(F =_3 G\) holds for \(2k\le \ell \). Thus, if the two parameters differ by less than a factor two, the distribution with the higher parameter value is strictly more kurtotic. If the two parameters differ at least by a factor two, the two distributions are equivalent with respect to the order \(\le _3\). Considering that a large difference between the two parameter values is also associated with a large difference in skewness, this may best be interpreted as follows. If the difference in skewness between two Weibull distributions is too large, they cannot be unambiguously ordered with respect to kurtosis.

This rather unintuitive behaviour allows us to construct another counterexample for the transitivity of \(\le _3\) since, e.g., \({\text {W}}(k) \le _3 {\text {W}}(1.5k) =_3 {\text {W}}(0.7k) \not \ge _3 {\text {W}}(k)\) holds for all \(k > 0\). Furthermore, it is easy to show that \(\le _{gs}^{t_0}\) coincides with \(\le _3\) on the family of Weibull distributions for all reasonable thresholds \(t_0\). Thus, the given counterexample also applies to \(\le _{gs}^{t_0}\).

4.2 Sinh-arcsinh distribution

The family of sinh-arcsinh distributions was introduced by Jones and Pewsey (2009). It is dependent upon four parameters, which are associated with location, dispersion, skewness and tailweight. Here, we consider a simplified two-parameter family by fixing the location and dispersion parameters to zero and one, respectively. A random variable X is said to be sinh-arcsinh-distributed with skewness parameter \(\nu \in \mathbb {R}\) and tailweight \(\tau > 0\), denoted by \(X \sim {\text {SAS}}(\nu , \tau )\), if the random variable

$$\begin{aligned} Z = S_{\nu , \tau }(X) = \sinh (\tau \cdot {\text {arsinh}}(X) - \nu ) \end{aligned}$$

is standard normal. Skewness to the right increases with increasing \(\nu \) and tailweight decreases with increasing \(\tau \). More specifically, \(F \le _2 G\) if \(\nu _F \le \nu _G, \tau _F = \tau _G\) and \(F \le _{gs} G\) if \(\nu _F = \nu _G = 0, \tau _F \le \tau _G\) (see Jones and Pewsey 2009, pp. 763, 765, 766). One can directly infer the corresponding distribution function \(F = \Phi \circ S_{\nu , \tau }\) and quantile function \(F^{-1} = S_{\nu , \tau }^{-1} \circ \Phi ^{-1} = S_{-\nu /\tau , 1/\tau } \circ \Phi ^{-1}\) of X.

There exist numerous other distribution families with four parameters that are associated with location, dispersion, skewness and tailweight or kurtosis. Examples include the skew-t distribution (Azzalini 1985; Azzalini and Capitanio 2003) and Tukey’s g-and-h or g-and-k distributions (Tukey 1977; Hoaglin 2006; Haynes et al. 1997). However, these families do not have similarly explicit representations of both their distribution and quantile functions. Furthermore, while the skew-t distributions do include the standard normal distribution, it only appears as a limiting case and not as a standard case as for the sinh-arcsinh distributions. Finally, the sinh-arcsinh transformation can also be applied to (symmetric) base distributions other than the standard normal. For example, Rosco et al. (2011) applied it to Student’s t-distribution.

Let \(X \sim {\text {SAS}}(\nu _F, \tau _F)\) and \(Y \sim {\text {SAS}}(\nu _G, \tau _G)\) with distribution functions F and G. It follows that \(R_{FG}(t) = S_{\tilde{\nu }, \tilde{\tau }}(t)\), where \(\tilde{\nu } = (\nu _F - \nu _G)/\tau _G\) and \(\tilde{\tau } = \tau _F/\tau _G\). Note that the fulfilment of \(F \le _{gs}^{t_0} G\) and \(F \le _3 G\) is solely dependent on \(R_{FG}\). Hence, the ordering of F and G in terms of kurtosis only depends upon two parameters instead of four. The following result gives conditions for the ordering of sinh-arcsinh distributions with respect to the kurtosis orders \(\le _3\) and \(\le _{gs}\).

Theorem 23

Let \(F \ne G\) and \(t_0 \in \textrm{int}(R_{FG}''(D_F))\). Then, \(F \le _3 G\) holds if and only if \(\tau _F \ge 2 \tau _G\). Likewise, for \(t_0\ne 0\), \(F \le _{gs}^{t_0} G\) is equivalent to \(\tau _F \ge 2 \tau _G\). Furthermore, \(F \le _{gs}^0G\) if and only if \(\tau _F > \tau _G\).

The key characteristics of \(R_{FG}''\) are summarized in Table 1. The proof of Theorem 23 can be found in the supplementary material document.

Since the usual order of the real numbers used in the equivalent conditions in Theorem 23 is transitive, the following result is directly implied.

Corollary 24

Let \(t_0 \in \textrm{int}(R_{FG}''(D_F))\). Then, the orders \(\le _3\) and \(\le _{gs}^{t_0}\) are transitive on the set \(\{F \in \mathscr {P}: \exists \nu \in \mathbb {R}, \tau > 0: F = {\text {SAS}}(\nu , \tau )\}\).

Table 1 Behaviour of the function \(R_{FG}''\) and kurtosis orders for distribution functions F and G of \(X \sim {\text {SAS}}(\nu _F, \tau _F)\) and \(Y \sim {\text {SAS}}(\nu _G, \tau _G)\)

Heuristically, Theorem 23 implies that, within the family of sinh-arcsinh distributions, comparisons in terms of kurtosis are skewness-invariant. This is due to the fact that equivalent characterizations for both major kurtosis orders are independent of both \(\nu _F\) and \(\nu _G\), which are skewness parameters by construction and also in the sense of \(\le _2\) for \(\tau _F = \tau _G\) (see Jones and Pewsey 2009, p. 763). Moreover, the characterizations in Theorem 23 not only stay the same for equally skewed asymmetric distributions, but also for pairs of distributions with arbitrarily big differences in skewness. Also note that these results can be generalized to families of sinh-arcsinh distributions that arise from symmetric base distributions other than the normal since the functions \(R_{FG}\) only depend on the transformations and not on the specific base distribution. The skewness-invariance of the sinh-arcsinh distribution in terms of kurtosis was noted by Jones et al. (2011, pp.91–92). Specifically, they showed that quantile-based kurtosis measures that are constructed from symmetric differences of the form \(F^{-1}(1-\alpha ) - F^{-1}(\alpha ), \alpha \in (0, 1/2),\) are invariant under changes of the skewness parameter \(\nu \). Theorem 23 generalizes this skewness-invariance from a specific family of kurtosis measures to the underlying kurtosis orders.