Abstract
Available empirical evidence suggests that skewness preference plays an important role in understanding asset pricing and gambling. This paper establishes a skewnesscomparability condition on probability distributions that is necessary and sufficient for any decisionmaker's preferences over the distributions to depend on their means, variances, and third moments only. Under the condition, an Expected Utility maximizer's preferences for a larger mean, a smaller variance, and a larger third moment are shown to parallel, respectively, his preferences for a firstdegree stochastic dominant improvement, a meanpreserving contraction, and a downside risk decrease and are characterized in terms of the von NeumannMorgenstern utility function in exactly the same way. By showing that all Bernoulli distributions are mutually skewness comparable, we further show that in the wide range of economic models where these distributions are used individuals’ decisions under risk can be understood as tradeoffs between mean, variance, and skewness. Our results on skewnessinducing transformations of random variables can also be applied to analyze the effects of progressive tax reforms on the incentive to make risky investments.
Introduction
Do individual decisionmakers, other things being equal, prefer a more positively skewed distribution? There is a substantial and growing body of empirical evidence suggesting that they do. Building on the earlier seminal contributions of Arditti (1967) and Kraus and Litzenberger (1976), Harvey and Siddique (2000),^{Footnote 1} for example, show in an asset pricing model that systematic skewness is economically important and commands a substantial premium. Studying the data from horse race betting and from state lotteries (in the U.S.), respectively, Golec and Tamarkin (1998) and Garrett and Sobel (1999) find evidence supporting the contention that gamblers are not necessarily risk lovers but skewness lovers.
So far, however, skewness preference has no firm choice theoretic foundation. Skewness has been treated as synonymous with the (unstandardized) third central moment but it is wellknown that preference for a larger third moment is in general not consistent with Expected Utility (EU) maximisation unless the utility function is cubic. As a result, in studies of skewness preference to date, either a cubic utility function is assumed^{Footnote 2} or a cubic Taylor approximation of the EU is taken (i.e., the utility function is approximated by a Taylor series truncated to three terms before taking expectations). The limitations of these approaches are obvious. A truncated Taylor series, for instance, can be a reasonable approximation only for small risks.^{Footnote 3} Menezes et al. (1980) come closest to establish a formal linkage between skewness preference and EU maximisation by showing that a distribution having more “downside risk” implies, but is not implied by, its (unstandardized) third moment being smaller, and that downside risk aversion is characterized by a von NeumannMorgenstern (VNM) utility function with a positive third derivative.
In the statistics literature, Van Zwet (1964) defines a distribution F to be more positively skewed than G if R(x)≡F^{−1}(G(x)) is convex and it has become widely accepted that a good skewness measure should preserve the skewness ordering so defined (see, for example, Oja (1981) and Arnold and Groeneveld (1995)). Oja (1981) proposes a condition in terms of the number of crossings of two standardized distribution functions that relaxes Van Zwet's (1964) skewnesscomparability condition. The preferences of EU maximizing decisionmakers over skewnesscomparable distributions as defined by these authors, on the other hand, have not been explored and characterized.
This paper establishes a skewnesscomparability condition on probability distributions that is necessary and sufficient for any decisionmaker's preferences over the distributions to depend on their means, variances, and third moments only. Under the condition, a EU maximizer's preferences for a larger mean, smaller variance, and a larger third moment are shown to parallel, respectively, his preferences for a firstdegree stochastic dominant (FSD) improvement, a meanpreserving contraction (MPC), and a downside risk decrease and are characterized in terms of the VNM utility function in exactly the same way. The condition generalizes not just the skewnesscomparability conditions proposed by Van Zwet (1964) and Oja (1981) but also the condition for two distributions to be comparable in terms of downside risk defined by Menezes et al. (1980). Furthermore, distributions satisfying the “locationscale” or “linear class” condition of Meyer (1987) and Sinn (1983), which they show to be sufficient for the consistency between the meanvariance analysis and EU maximisation, are shown to be skewnesscomparable distributions with identical standardized third moments. By showing that all Bernoulli distributions are mutually skewness comparable, we further show that in the wide range of economic models where these distributions are used individuals’ decisions under risk can be understood as tradeoffs between mean, variance, and skewness. Our basic characterizations also immediately imply that a concave transformation of a random variable reduces the skewness of the distribution and hence, other things being equal, the attractiveness of the distribution to a skewnesspreferring decisionmaker. An application of this general regularity addresses the issue of whether a progressive tax reform reduces the incentive to take risks.
The rest of the paper is organized as follows. Skewness comparability and expected utility maximisation section sets out the basic definitions and main results on skewness comparability. Skewness of the Bernoulli distributions section establishes the skewness comparability of the widely used Bernoulli distributions and examines its implications. Comparison with the existing approach and implications for gambling and tax reforms section concludes with discussions on the comparison with the existing approach to modelling skewness preference, the implications for the decision to gamble, and the effects of progressive tax reforms on risk taking.
Skewness comparability and EU maximisation
Preliminaries and stochastic dominance
Throughout the paper, (cumulative) distribution functions, denoted by F(x),G(x), etc., have the supports of their densities contained in [a, b]. We denote the mean, the variance, and the standardized and the unstandardized third central moments of a distribution F(x) by μ_{ F }, σ_{ F }^{2}, m_{ F }^{3}, and m̂_{ F }^{3}, respectively. That is,
For reasons that will become clear, when the abbreviated term “the third moment” is used in what follows, it refers exclusively to the standardized third central moment, never the unstandardized one. VNM utility functions are denoted by u,v, etc.
For a distribution function F(x), define F^{(1)}(x)=F(x) and
The standard notion of nthdegree stochastic dominance is defined as follows:
Definition 1

The change from F(x) to G(x) is an nthdegree stochastic dominant improvement (deterioration) if [G^{(n)}(x)−F^{(n)}(x)]⩽(⩾)0 for all x∈[a, b], where the inequality is strict for some subinterval (s), and [G^{(k)}(b)−F^{(k)}(b)]⩽(⩾)0 for k=2, … n−1.
We will henceforth use [F(x) → G(x)] as a shorthand for the change of distributions from F(x) to G(x). It is wellknown that ∫_{ a }^{b}u(y)d[G(y)−F(y)]>0 for all u(x) such that u′(x)>0 for all x if and only if [F(x) → G(x)] is a FSD improvement. The related notions of a meanpreserving spread (contraction) and a downside risk increase (decrease) can be defined as special cases of stochastic dominant deterioration (improvement).
Definition 2

(i) A seconddegree stochastic dominant deterioration (improvement) is a meanpreserving spread (contraction) [MPS (MPC)] if [G^{(2)}(b)−F^{(2)}(b)]=0.
(ii) A thirddegree stochastic dominant deterioration (improvement) is a downside risk increase (decrease) if [G^{(2)}(b)−F^{(2)}(b)]=0 and [G^{(3)}(b)−F^{(3)}(b)]=0.
The definitions of an MPS and a downside risk increase here are, of course, equivalent to those in Rothschild and Stiglitz (1970) and Menezes et al. (1980), respectively. Menezes et al. (1980) show that ∫_{ a }^{b}u(y)d[G(y)−F(y)]<0 for all u(x) such that u′′′(x)>0 for all x if and only if [F(x) → G(x)] is a downside risk increase. They further show the following:
Lemma 1

(Menezes et al.) [G(x) → F(x)] being a downside risk increase implies, but is not implied by, m_{ F }^{3}<m_{ G }^{3}.
The better known result of Rothschild and Stiglitz (1970) on the other hand establishes that ∫_{ a }^{b}u(y)d[G(y)−F(y)]<0 for all u(x) such that u′′(x)<0 for all x if and only if [F(x) → G(x)] is an MPS.
Skewness comparability
Van Zwet (1964, p. 9) argues that since intuitively “convex transformation of a random variable effects a contraction of the lower part of the scale of measurement and an extension of the upper part”, a distribution F can be defined to be more skewed to the right than G if R(x)≡F^{−1}(G(x)) is convex.^{Footnote 4} It has since become widely accepted that a good skewness measure should preserve the skewness ordering so defined (see, for example, Oja (1981) and Arnold and Groeneveld (1995)). Following Oja (1981), we define strong skewness comparability as follows:
Definition 3

(i) Distributions F and G are strongly skewness comparable if F^{−1}(G(x)) is convex or concave.
(ii) F is more skewed to the right than G in the sense of Van Zwet if F^{−1}(G(x)) is convex.
The condition for skewness comparability is however too strong and may not be strictly satisfied in many typical cases where one distribution is considered more skewed than another such as distributions F and G and their respective density functions f and g illustrated in Figures 1 and 2. We observe in Figures 1 and 2 that if two distributions have the same mean and, loosely speaking, the same “spread” as depicted, then one distribution being more skewed to the right typically implies the two distribution functions cross twice. However, in cases such as depicted, F^{−1} may or may not be an exact convex transformation of G^{−1} as is required by Van Zwet's definition. Noting that if F(x) is the distribution function for a random variable x̃, then F(σ_{ F }x+μ_{ F }) is the distribution for the standardized random variable (x̃−μ_{ F })/σ_{ F }, we state Oja's (1981) weaker comparability condition as follows:
Definition 4

(i) Distribution s F and G are skewness comparable in the sense of Oja if G(σ_{ G }x+μ_{ G }) crosses F(σ_{ F }x+μ_{ F }) exactly twice or F(σ_{ F }x+μ_{ F })=G(σ_{ G }x+μ_{ G }).
(ii) F is more skewed to the right than G in the sense of Oja if G(σ_{ G }x+μG) crosses F(σ_{ F }x+μ_{ F }) exactly twice first from above.
The following lemma confirms that this is a weaker notion of skewness comparability and relates it to the concept of increasing downside risk of Menezes et al. (1980).^{Footnote 5}
Lemma 2

(i) If F and G are strongly skewness comparable, then they are skewness comparable in the sense of Oja.
(ii) If F is more skewed to the right than G in the sense of Oja, then [F(σ_{ F }x+μ_{ F }) → G(σ_{ G }x+μ_{ G })] is a downside risk increase.
In view of Lemma 2, we define our notion of “generalized skewness comparability” based on the notion of a downside risk increase and show that this is a necessary and sufficient condition for preferences over two distributions to be determined by their means, variances, and third moments alone. For expositional ease, we henceforth simply use “skewness comparability” to mean “generalized skewness comparability”.
Definition 5

(i) Distributions F and G are (generalized) skewness comparable if [F(σ_{ F }x+μ_{ F }) → G(σ_{ G }x+μ_{ G })] is a downside risk increase or a downside risk decrease or F(σ_{ F }x+μ_{ F })=G(σ_{ G }x+μ_{ G }).
(ii) F is more skewed to the right than G if [F(σ_{ F }x+μ_{ F }) → G(σ_{ G }x+μ_{ G })] is a downside risk increase.^{Footnote 6}
Theorem 1

μ_{ F }=μ_{ G }, σ_{ F }^{2}=σ_{ G }^{2}, and m_{ F }^{3}=m_{ G }^{3} imply F(x)=G(x) if and only if F and G are skewness comparable.
The result clearly shows that any decisionmaker's preferences over skewnesscomparable distributions are determined by the first three moments of the distributions. We, however, restrict our attention to EU theory because it remains the only widely used decision model known to be consistent with downside risk aversion.^{Footnote 7} The result implies, in particular, that for skewnesscomparable changes in a distribution F, U(μ_{ F },σ_{ F }^{2},m_{ F }^{3})≡∫_{ a }^{b}u(x)dF(x) is a welldefined function from R × R^{+} × R to R. We next show that for skewnesscomparable distributions, an EU maximizer's preferences for a larger mean, a smaller variance, and a larger third moment parallel, respectively, his preferences for a FSD improvement, an MPC, and a downside risk decrease and are characterized in terms of the VNM utility function in exactly the same way.
Theorem 2

(i) Supposing μ_{ F }=μ_{ G } and σ_{ F }^{2}=σ_{ G }^{2}, then m_{ F }^{3}>m_{ G }^{3} implies ∫_{ a }^{b}u(x)dF(x)>∫_{ a }^{b}u(x)dG(x) for any two skewnesscomparable distributions F and G if and only if u′′′(x)>0 for all x.
(ii) Supposing μ_{ F }=μ_{ G } and m_{ F }^{3}=m_{ G }^{3}, then σ_{ F }^{2}>σ_{ G }^{2} implies ∫_{ a }^{b}u(x)dF(x)<∫_{ a }^{b}u(x)dG(x) for any two skewnesscomparable distributions F and G if and only if u′′(x)<0 for all x.
(iii) Supposing σ_{ F }^{2}=σ_{ G }^{2} and m_{ F }^{3}=m_{ G }^{3}, then μ_{ F }>μ_{ G } implies ∫_{ a }^{b}u(x)dF(x)>∫_{ a }^{b}u(x)dG(x) for any two skewnesscomparable distributions F and G if and only if u′(x)>0 for all x.
Or equivalently,
Theorem 2a

U(μ_{ F },σ_{ F }^{2},m_{ F }^{3})=∫_{ a }^{b}u(x)dF(x) is increasing in μ_{ F } and m_{ F }^{3} and decreasing in σ_{ F }^{2} for skewnesscomparable changes in any distribution F if and only if u′(x)>0, u′′(x)<0, and u′′′(x)>0 for all x.
With standard results in Rothschild and Stiglitz (1970) and Menezes et al. (1980), the result is implied by the following lemma, which may be of independent interest.
Lemma 3

Suppose F and G are (generalized) skewness comparable. Then
(i) m_{ F }^{3}>m_{ G }^{3} if and only if F is more skewed to the right than G.
(ii) Assuming μ_{ F }=μ_{ G } and m_{ F }^{3}=m_{ G }^{3}, σ_{ F }^{2}>σ_{ G }^{2} if and only if [G(x) → F(x)] is an MPS.
(iii) Assuming σ_{ F }^{2}=σ_{ G }^{2} and m_{ F }^{3}=m_{ G }^{3}, μ_{ F }>μ_{ G } if and only if [G(x) → F(x)] is an FSD improvement.
For any two skewnesscomparable distributions F and G, we can thus have a simple and useful decomposition of the difference in EU as follows.
where F_{1}(x)≡F(x+μ_{ F }−μ_{ G }) and . Hence F_{1} and F differ only by their means, F_{2} and F_{1} have the same mean and , and [F_{2}(x) → G(x)] is a downside risk increase or decrease or G(x)=F_{2}(x). As will be shown, such a simple decomposition, which depicts tradeoffs between mean, variance (or risk), and skewness, is useful in understanding individuals’ choice among skewnesscomparable distributions.
Sinn (1983) and Meyer (1987) define that two distributions F and G are in the “linear class” or the “locationscale” model if F(x)=G(βx+α) with β>0 and show that EU maximizers’ preferences over distributions in this model are determined by the means and variances of the distributions only, that is, meanvariance decision models are consistent with EU maximisation. Meyer (1987) further shows that in many important economic models, including Sandmo's (1971) model of competitive firms facing random output price and Tobin's (1958) theory of liquidity preference with a single risky and riskless asset, comparative statics analysis can be reformulated as choice among distributions in this class. Clearly, if F and G are in the “locationscale” model, F(σ_{ F }x+μ_{ F })=G(σ_{ G }x+μ_{ G }). That is, distributions in the “locationscale” model are skewness comparable ones with identical third moments.
Skewness of the Bernoulli distributions
Skewness comparability of the Bernoulli distributions
Their simple parametric structure notwithstanding, the Bernoulli distributions are applicable to a wide range of economic problems and are used in a wide range of economic models. We show that the answer to the question of skewness comparability for this simple but important family of distributions is very clearcut. Let [(y, p)(z, 1−p)] denote a Bernoulli distribution that gives y with probability p and z with probability (1−p).
Proposition 1

For i=1, 2, let F_{ i }(x), μ_{ i } and σ_{ i } be the cumulative distribution function, the mean and the standard deviation of [(y_{ i }, p_{ i })(z_{ i },1−p_{ i })], respectively, and y_{ i }<z_{ i }. Then
(i) p_{1}<p_{2} if and only if F_{2}(x) is more skewed to the right than F_{1}(x).
(ii) p_{1}=p_{2} if and only if F_{1}(σ_{1}x+μ_{1})=F_{2}(σ_{2}x+μ_{2}).
The result clearly shows that not only are all Bernoulli distributions skewness comparable but also their degrees of skewness are determined by the parameter p alone.^{Footnote 8} This gives a novel perspective on individuals’ decisions in the wide range of economic models where the choices available are assumed to be Bernoulli distributions: These decisions can be understood as tradeoffs between mean, variance, and skewness. We will illustrate in what follows the usefulness of this perspective in understanding individuals’ betting behaviour and selfprotection decisions. The same approach can potentially yield interesting insights in such important models as those of auctions, tournaments, among others. Moreover, the result also shows that if two Bernoulli distributions share the same value for the parameter p, they are not just skewness comparable but also in the “locationscale model” or “linear class” and hence are consistent with meanvariance preferences.
Empirical evidence for Gamblers’ skewness preference
The result that any pair of Bernoulli distributions are skewness comparable indicates that the empirical findings of Golec and Tamarkin (1998) and Garrett and Sobel (1999) do represent evidence for gamblers’ skewness preference as is defined and characterized in this paper. Specifically, a bet on horse h considered by Golec and Tamarkin (1998) (and Ali (1977)) takes the form [(0, 1−p_{ h })(X_{ h },p_{ h })], where X_{ h } denotes the return of a winning bet on horse h and a losing bet returns zero to the bettor, and assuming bettors have identical utility function u( ), their EU betting on horse h is
Assuming that u(0)=0 and u(X_{ H })=1, where H represents the highestodds horse, and that the amount bet on each horse is such that bettors are indifferent between bets on any horse h, for any h, we have
which gives p_{ H }/p_{ h }=u(X_{ h }). Racetrack data are then used to estimate the utility function assumed to take the cubic form u(X_{ h })=a+b_{1}X_{ h }+b_{2}X_{ h }^{2}+b_{3}X_{ h }^{3}. The estimated coefficients b_{1} and b_{3} are positive and b_{2} negative, all of which are highly significant. The estimated utility function is thus concave for low values of X_{ h } and convex for high values. Bettors are therefore not globally risk loving as suggested by earlier studies such as Ali (1977). More importantly, a utility function with u′′′( )>0 estimated using a data set of skewnesscomparable distributions with different degrees of skewness does indicate (global) skewness preference as defined in this paper.^{Footnote 9} Using data from U.S. state lotteries, Garrett and Sobel (1999) follow the exact same methodology by assuming that lottery players completely disregard the prizes of a lottery other than the top prize (i.e., winning anything other than the top prize of a lottery gives zero utility) and hence a choice among state lotteries is effectively a choice among Bernoulli distributions. They obtain identical results in terms of the characteristics of the utility function. That is, to the extent that lottery players do play only to win the top prize, the state lottery data also support global skewness preference.
Selfprotection
Ehrlich and Becker (1972) define selfprotection to be the expenditure on reducing the probability of suffering a loss and highlight its conceptual distinction from selfinsurance, which is the expenditure on reducing the severity of loss.^{Footnote 10} Denoting the initial wealth by w and the probability of suffering a loss l by p, selfprotection is the expenditure on reducing the probability p of the Bernoulli distribution [(w−l, p)(w, 1−p)]. Proposition 1 shows that a reduction in p implies a reduction in (positive) skewness and an EU maximizer's preferences regarding selfprotection are completely determined by its effects on the mean, variance, and third moment. Let F and G denote, respectively, the distributions before and after a reduction in p by ɛ, we can explicitly decompose the effect of selfprotection as in (1):
where F_{1}(x)≡F(x+μ_{ F }−μ_{ G }) and . This gives a novel and definitive characterization of all the relevant factors determining the choice of selfprotection and brings together, and offers straightforward interpretations to, results from recent attempts to relate selfprotection to skewness preference (i.e., the third derivative of a VNM utility function).^{Footnote 11} For example, if the individual pays the fair price ɛl for the reduction in p, then clearly ∫_{ a }^{b}u(x)d[F_{1}(x)−F(x)]=0. It follows that he is willing to pay more than the fair price for the reduction in loss probability if ∫_{ a }^{b}u(x)d[G(x)−F_{2}(x)]+∫_{ a }^{b}u(x)d[F_{2}(x)−F_{1}(x)]>0. More specifically, the change in variance caused by a reduction in p by ɛ is
If p>1/2, a riskaverse skewnesspreferring individual will not be willing to pay the fair price for a small reduction in p, that is, for ɛ⩽(2p−1). On the other hand, if p⩽1/2, selfprotection reduces both the skewness and variance and consequently ∫_{ a }^{b}u(x)d[G(x)−F_{2}(x)]<0 and ∫_{ a }^{b}u(x)d[F_{2}(x)−F_{1}(x)]>0. Whether a riskaverse skewnesspreferring individual is willing to pay more than the fair price for selfprotection depends on the strength of his skewness preference relative to his risk aversion, which, as is shown in Chiu (2005a), the prudence measure, −u′′′(x)/u′′(x), can be interpreted as measuring.^{Footnote 12} The simple decomposition in (1) thus not only offers much more straightforward interpretations for the results in Chiu (2000, 2005b) and Eeckhoudt and Gollier (2005), but also suggests that the problem of selfprotection can be analyzed without using the firstorder approach, which entails assuming the secondorder condition and its implied restrictions on the relationship between the selfprotection expenditure and the reduction in the loss probability.^{Footnote 13}
Comparison with the existing approach and implications for gambling and tax reforms
The existing approach to skewness preference
The theoretical justification for considering skewness preference has so far been a Taylor series approximation of the EU. Specifically, letting F be the distribution function for random variable x̃,
Clearly if we have a cubic utility function u(x)=c_{0}+c_{1}x+c_{2}x^{2}+c_{3}x^{3}, the cubic expansion will be precise and the EU given F can be explicitly calculated as
That is, if either the Taylor series represents a good approximation or the utility is cubic, u′′′(x)>0 appears to imply a preference for the unstandardized third central moment m̂_{ F }^{3}.^{Footnote 14} On the one hand, our results in the previous section can be seen as confirming that the EU given a distribution can be written as a function of its mean, variance, and unstandardized third moment for mutually skewnesscomparable distributions: since we have shown that a function U(μ_{ F },σ_{ F }^{2},m_{ F }^{3})=∫_{ a }^{b}u(x)dF(x) is well defined for skewnesscomparable changes in F, for such changes we can define
On the other hand, what (3), (4), and (5) all say is that, assuming u′′′(x)>0, a larger m̂_{ F }^{3} implies a larger Eu(x̃) if μ_{ F } and σ_{ F }^{2} are held constant. For two distributions F and G with σ_{ F }^{2}>σ_{ G }^{2}, in particular, m̂_{ F }^{3}>m̂_{ G }^{3} does not imply either that F is more skewed than G or that skewness plays any role in determining their comparative desirability to an individual. This seems to be an insight wellhidden in using the traditional approach, as is exemplified by Tsiang's (1972, p. 363) attempt to explain the Borch (1969) paradox by invoking skewness preference.^{Footnote 15} Other pitfalls in using the traditional approach can also be seen in Markowitz's (1952a) conjecture on skewness preference and the decision to gamble discussed in what follows.
Skewness preference and the decision to gamble
Skewness preference has been associated with gambling since long before the work of Golec and Tamarkin (1998) and Garrett and Sobel (1999). Markowitz (1952a) suggests that “the third moment of the probability distribution of returns from the portfolio may be connected with a propensity to gamble” and that if individuals’ utility of a probability distribution is a function of the third moment as well as the mean and variance of the distribution, then some fair bets would be accepted.^{Footnote 16} So is it possible for an individual with a thirdmoment utility function who is averse to larger variances, as is usually assumed in the context of meanvariance analysis, to accept an independent fair gamble given a sufficiently strong skewness preference? The Taylor approximation in (3) gives the impression that this is possible. To examine the possibility, suppose an individual with initial wealth distribution F is contemplating taking fair gambles that increase the skewness of F in the sense defined in this paper and are independent of F. Then
is well defined. Consider first the case where U(μ_{ F },σ_{ F }^{2},m_{ F }^{3}) is decreasing in σ_{ F }^{2} and increasing in m_{ F }^{3} for all distribution F. Theorem 2a clearly indicates that U(μ_{ F },σ_{ F }^{2},m_{ F }^{3}) being decreasing in σ_{ F }^{2} for all distribution F is equivalent to risk aversion (i.e., u′′(x)<0 for all x) and since accepting a fair gamble independent of his initial wealth induces a MPS, by the classic result of Rothschild and Stiglitz (1970), it always reduces his EU given his risk aversion whatever the strength of his skewness preference. Alternatively, assume that for all distribution F, Û(μ_{ F },σ_{ F }^{2},m̂_{ F }^{3}) is decreasing in σ_{ F }^{2} and increasing in m_{ F }^{3}, that is, (assuming differentiability) Û_{2}(μ_{ F },σ_{ F }^{2},m̂_{ F }^{3})<0 and Û_{3}(μ_{ F },σ_{ F }^{2},m̂_{ F }^{3})>0. Then since U(μ_{ F },σ_{ F }^{2},m_{ F }^{3})=Û(μ_{ F },σ_{ F }^{2},σ_{ F }^{3}m_{ F }^{3}), simple differentiation shows that
Clearly, Û_{2}(μ_{ F },σ_{ F }^{2},m_{ F }^{3})<0 for all distribution F implies U_{2}(μ_{ F },σ_{ F }^{2},m_{ F }^{3})<0 for all distribution F because if U_{2}(μ_{ F },σ_{ F }^{2},m_{ F }^{3})⩾0 for a negatively skewed or symmetrical distribution F, that is, m_{ F }^{3}⩽0, then (given _{3}(μ_{ F },σ_{ F }^{2},m̂_{ F }^{3})>0) Û_{2}(μ_{ F },σ_{ F }^{2},m_{ F }^{3})⩾0. In other words, with a thirdmoment utility function, whether the utility is defined on the standardized or unstandardized third moment, aversion to larger variances implies risk aversion and precludes taking fair gambles whatever the strength of the skewness preference.
The incentive effects of tax reforms
In considering the implications of his pioneering analysis of skewness preference, Tsiang (1972, p. 370) suggests that
the effect of income tax on risktaking should be examined not only with respect to its impacts on the mean and variance of investment returns after tax, but also with respect to its impacts on the skewness of net returns. A progressive income tax ... could certainly have a greater adverse effect on the willingness to take risk than a proportional tax with perfect loss offset that leave the mean and variance after tax at the same levels.
Does a progressive tax necessarily reduce the skewness of the net returns of a risky investment and hence have a greater adverse effect on the willingness to take risk than a proportional income tax? More generally, since the 1980s, there has been a broad international trend towards the flattening of personal income tax structures. Does such a reform increase the skewness of the aftertax income distribution and as a result, other things being equal, enhance the incentive to make risky investments? Our basic results on skewness preference can be applied to give definitive answers to these questions, under a particular definition of a “more progressive tax” as follows.^{Footnote 17}
Definition 6

A tax schedule t_{1}(x) is more residualconcave than another t_{2}(x) if r_{1}(r_{2}^{−1}(τ)) is concave where for i=1, 2, r_{ i }(x)≡x−t_{ i }(x) is the residual income function under tax schedule t_{ i }(x).
That is, a tax schedule t_{1}(x) is more progressive than another t_{2}(x) in the sense of residual concavity if the residual income function [x−t_{1}(x)] is a concave transformation of [x−t_{2}(x)]. Under this definition, any graduatedrate tax is more residualconcave than any proportional tax and a tax schedule becoming less residualconcave more generally defines a particular kind of flattening of the tax schedule. For example, flattening a graduatedrate tax by reducing the top marginal tax rate or by abolishing the income band where the highest marginal tax rate applies leads to a less residual concave tax schedule.^{Footnote 18} We next show that in most relevant cases in practice, a more residualconcave tax schedule is a more progressive one as is usually defined in the literature on income inequality measurement (see, e.g., Lambert (2001)).
Proposition 2

Suppose r_{1}(r_{2}^{−1}(0))⩾0. Then a tax schedule t_{1}(x) has more residual progression than t_{2}(x), that is, [x−t_{1}(x)]/[x−t_{2}(x)] is nonincreasing for all x, if t_{1}(x) is more residualconcave than t_{2}(x).
The condition r_{1}(r_{2}^{−1}(0))⩾0, which is equivalent to [x−t_{2}(x)]=0 implying [x−t_{1}(x)]⩾0, is clearly satisfied if we only consider tax schedules involving no lumpsum elements, that is, t_{ i }(0)=0, in which case r_{1}(r_{2}^{−1}(0))=0. Typical realworld tax schedules with a personal allowance, that is, an amount subtracted from pretax income in arriving at taxable income, are clearly in this category.
Given Definition 6, Lemmas 2 and 3 immediately imply the following.
Proposition 3

For a given pretax income distribution, let F and G denote the aftertax income distributions under tax schedules t_{1}(x) and t_{2}(x), respectively. If t_{1}(x) is more residualconcave than t_{2}(x), then G is more skewed to the right than F and m_{ G }>m_{ F }.
For an interpretation of the result, suppose an investor's initial income is nonrandom and F and G represent the aftertax prospective income distributions given a risky investment under tax schedules t_{1}(x) and t_{2}(x), respectively. The result implies that if t_{1}(x) is more residualconcave than t_{2}(x), we can decompose the effect on the EU of the change of tax schedules from t_{1} to t_{2} as in (1)
where F_{1}(x)≡F(x+μ_{ F }−μ_{ G }) and . That is, not only does a tax flattening in the form of the change from t_{1} to t_{2} unequivocablly increase the skewness of the prospective income distribution but how it affects the attractiveness of the investment is completely determined by its effect on the mean, variance, and third moment of the aftertax distribution. Furthermore, assuming skewness preference, such a tax reform increases the attractiveness of the investment compared with a “skewnessneutral” tax reform that achieves the same effects on the mean and the variance of the aftertax income distribution. More specifically, noting the relationship between F_{2}(x) and F(x), if a tax schedule t_{3}(x) is such that , then t_{3}(x) clearly induces an aftertax income distribution equal to F_{2}(x) (which has the same mean and variance as G(x)) and a tax reform from t_{1}(x) to t_{2}(x) clearly makes the investment more attractive compared with the reform from t_{1}(x) to t_{3}(x). Since any graduatedrate (i.e., convex) tax schedule is more residualconcave than a proportional tax as remarked earlier, a corollary of this is a formal validation of Tsiang's conjecture if a progressive tax is understood to be a graduatedrate tax: Any graduatedrate tax has a greater adverse effect on the attractiveness of a risky investment than a proportional tax with perfect loss offset that leaves the mean and variance after tax at the same levels.^{Footnote 19}
Notes
1 See also the references therein for a sample of other related empirical work.
2 Hanoch and Levy (1970) is an early example of using the cubic utility function in portfolio choice theory.
3 Other pitfalls of these approaches are discussed in the text.
4 Letting F and G be the distribution functions for random variables x̃ and ỹ, respectively, F^{−1}(G(x)) being convex is equivalent to x̃ (or F^{−1}( )) being a convex transformation of ỹ (or G^{−1}( )).
Proofs of all formal results not immediate from existing results are given in the Appendix.
Generalized skewness comparability is more general than Oja's skewness comparability not only in the sense that the former is implied by but does not imply the latter but also that the relation of “more generalized skewed” is transitive while that of “more skewed in the sense of Oja” is not. I am grateful to a referee for pointing this out. It should also be noted that the concept of skewness comparability is distinct from that of thirddegree stochastic dominance. Simple examples can be constructed to show that two distributions being skewness comparable neither implies nor is implied by one of the distributions thirddegree stochastically dominating the other. This is reinforced by the observation that many of the useful properties of skewness comparability discussed in the sequel are not shared by thirddegree stochastic dominance.
Chateauneuf et al. (2002) show that in the widely used RankDependent Expected theory, which generalizes EU theory, downside risk aversion implies EU maximisation.
This immediately shows that Tsiang's (1972) wellcited attempt to explain the Borch (1969) paradox is misguided. More detailed discussion on this is given in the next section.
That is, since individuals’ preferences over these distributions are determined by their means, variances, and degrees of skewness alone, if individuals were averse or indifferent to skewness, the estimates of b_{3} should have been negative or close to zero.Cain and David (2004) point out that for the class of Bernoulli distributions of the form [(0,1−p_{ h })(X_{ h }, p_{ h })] (with one of the possible outcomes fixed at 0), the mean and the variance of a distribution determine its unstandardized and standardized third moments. It is therefore not sensible to claim, as did Golec and Tamarkin (1998), that bettors “trade off negative expected return and variance for positive skewness”. Nevertheless Proposition 1 implies that Bernoulli distributions of this form do have different degrees of skewness as determined by the value of ph. That is, the data sets used in these empirical studies consist of distributions with different means, variances, and degrees of skewness only that there is an implicit restriction on their relationship that leaves only 2 degrees of freedom. A utility function with u′′′( )>0 estimated using such data sets still does represent evidence for skewness preference as defined in this paper.
In particular, unlike selfinsurance, selfprotection may be attractive to both risk averters and risk lovers, and market insurance and selfprotection can be complements. Examples of selfprotection includes crime prevention measures such as the purchase of burglary alarms, paying a higher price for a safer car or healthier food or a house in a less crimeprone area, the purchase of fire prevention equipments such as smoke detectors, etc. The problem of selfprotection is also embedded in the usual moral hazard models and in models of enviromental protection.
Until recently, the literature on selfprotection focuses primarily on the effect of risk aversion. Briys and Schlesinger (1990) first suggest a link between selfprotection and downside risk aversion. Chiu (2000) shows that a riskaverse individual is willing to pay more than the fair price for selfprotection if the initial loss probability p is below a threshold, which is less than 1/2 if and only if u′′′>0 and is lower if −u′′′/u′′ is larger. Eeckhoudt and Gollier (2005) obtain results suggesting that the spending on selfprotection is less if u′′′>0 than if u′′′<0. Chiu (2005b) shows that if marginal changes in selfprotection expenditure are mean preserving, a larger −u′′′/u′′ implies a lower spending on selfprotection. The precise role played by the change in variance in selfprotection decisions has never been recognized.
Chiu (2005a) shows that the prudence measure can be interpreted as measuring the strength of an individual's downside risk aversion relative to his own risk aversion. Since under our definition of skewness comparability, a downside risk increase is a “pure” decrease in skewness where the two distributions have the same mean and variance, the prudence measure can equivalently be said to measure the strength of skewness preference relative to risk aversion.
For example, it may be the case that the initial p is larger than 1/2 and the cost of reducing it by a small amount ɛ is larger than ɛl, and yet for a larger reduction in p (through the purchase of more expensive devices), that is, for ɛ large, the total cost is less than ɛl. Then the decomposition in (1) clearly indicates that it is not optimal to choose a small reduction in p but it may be optimal to choose a large reduction. Such possibilities are ruled out in using the firstorder approach which requires that the cost of selfprotection is a continuous and differentiable function of the reduction in loss probability and such a function is usually further assumed to be convex to guarantee the secondorder condition.
This perhaps explains why the unstandardized third moment has been treated synonymously with skewness in the economics and finance literature though as is pointed out in Arditti's (1967, p. 20) pioneering analysis of skewness preference, the term skewness is usually saved for the standardized third moment in the statistics literature.
Numerous authors have published comments on Tsiang's (1972) paper but none seemed aware of this particular flaw in his argument. See the June 1974 issue of the American Economic Review. Since the two Bernoulli distributions constructed in Borch's (1969) celebrated example share a common probability parameter value, Proposition 1 in the last section indicates that they are consistent with meanvariance preferences. Any attempt to explain the Borch paradox by invoking skewness preference is thus clearly misguided.
In addition, Markowitz (1952b) points out that an individual with the utility function Friedman and Savage (1948) use to explain simultaneous gambling and insurance tends to prefer positively skewed distributions and cites as evidence of positive skewness preference the experimental regularity uncovered by Mosteller and Nogee (1951) that gamblers play more conservatively when losing and more liberally when winning.
I received valuable advice from Peter Lambert on the presentation of concepts and results related to tax progression.
This can be best illustrated considering a tax schedule t_{2}(x), its residual income function and a concave function T(τ) as follows:
Let a tax schedule t_{1}(x) be such that x−t_{1}(x)=T(x−t_{2}(x)). Then the change from t_{1}(x) to t_{2}(x) is equivalent to reducing the top marginal tax rate if r̂x̄ and to abolishing the top rate income band if r̂>x̄. In the United Kingdom, for example, the top marginal tax rate was cut in 1979 and the top rate income band was abolished in 1988.
A completely analogous interpretation can be developed in terms of the impacts of tax reforms on income inequality and on a Social Welfare function or an inequality index, which exhibits “downside inequality aversion” or “transfer sensitivity”. A useful and novel role of the third moment in the analysis of income inequality is also implied. The details are however left to readers well versed in the related literature.
References
Ali, M.M. (1977) ‘Probability and utility estimates for racetrack bettors’, Journal of Political Economy 85 (August): 807–815.
Arditti, F.D. (1967) ‘Risk and the required return on equity’, Journal of Finance 22 (1): 19–36.
Arnold, B.C. and Groeneveld, R.A. (1995) ‘Measuring skewness with respect to the mode’, American Statistician 49 (1): 34–38.
Borch, K. (1969) ‘A note on uncertainty and indifference curves’, Review of Economic Studies 36: 1–4.
Briys, E. and Schlesinger, H. (1990) ‘Risk aversion and the propensities for selfinsurance and selfprotection’, Southern Economic Journal 57: 458–467.
Cain, M. and David, P. (2004) ‘Utility and the skewness of return in gambling’, Geneva Papers on Risk and Insurance Theory 29: 145–163.
Chateauneuf, A., Gajdos, T. and Wilthien, P.H. (2002) ‘The principle of strong diminishing transfer’, Journal of Economic Theory 103: 311–332.
Chiu, W.H. (2000) ‘On the propensity to selfprotect’, Journal of Risk and Insurance 67: 555–578.
Chiu, W.H. (2005a) ‘Skewness preference, risk aversion, and the precedence relations on stochastic changes’, Management Science 51 (12): 1816–1828.
Chiu, W.H. (2005b) ‘Degree of downside risk aversion and selfprotection’, Insurance: Mathematics and Economics 36 (1): 93–101.
Eeckhoudt, L. and Gollier, C. (2005) ‘The impact of prudence on optimal prevention’, Economic Theory 26: 989–994.
Ehrlich, I. and Becker, G. (1972) ‘Market insurance, selfinsurance and selfinsurance’, Journal of Political Economy 80: 623–648.
Friedman, M. and Savage, L.J. (1948) ‘The utility analysis of choices involving risk’, Journal of Political Economy 56: 279–304.
Garrett, T.A. and Sobel, R.S. (1999) ‘Gamblers favor skewness, not risk: Further evidence from United States’ lottery games’, Economic Letters 63: 85–90.
Golec, J. and Tamarkin, M. (1998) ‘Bettors love skewness, not risk, at the horse track’, Journal of Political Economy 106: 205–225.
Hanoch, G. and Levy, H. (1970 ‘Efficient portfolio selection with quadratic and cubic utility’, Journal of Business 43 (2): 181–189.
Harvey, C.R. and Siddique, A. (2000) ‘Conditional skewness in asset pricing tests’, Journal of Finance 55 (3): 1263–1295.
Kraus, A. and Litzenberger, R. (1976) ‘Skewness preference and the valuation of risk assets’, Journal of Finance 31 (4): 1085–1100.
Lambert, P.J. (2001) The Distribution and Redistribution of Income, Manchester: Manchester University Press.
Markowitz, H. (1952a) ‘Portfolio selection’, Journal of Finance 7: 77–91.
Markowitz, H. (1952b) ‘The utility of wealth’, Journal of Political Economy 60: 151–158.
Menezes, C. and Wang, X.H. (2005) ‘Increasing outer risk’, Journal of Mathematical Economics 41 (7): 875–886.
Menezes, C., Geiss, C. and Tressler, J. (1980) ‘Increasing downside risk’, American Economic Review, 921–932.
Meyer, J. (1987) ‘Twomoment decision models and expected utility maximisation’, American Economic Review 77 (3): 421–430.
Mosteller, F. and Nogee, P. (1951) ‘An experimental measurement of utility’, Journal of Political Economy 59: 371–404.
Oja, H. (1981) ‘On location, scale, skewness and kurtosis of univariate distributions’, Scandinavian Journal of Statistics 8: 154–168.
Rothschild, M. and Stiglitz, J. (1970) ‘Increasing risk I: A definition’, Journal of Economic Theory 2: 225–243.
Sandmo, A. (1971) ‘On the Theory of the competitive firm under price uncertainty’, American Economic Review 61: 65–73.
Sinn, H.W. (1983) Economic Decisions under Uncertainty, Amsterdam: NorthHolland Publishing Company.
Tobin, J. (1958) ‘Liquidity preference as behaviour towards risk’, Review of Economic Studies 25: 65–86.
Tsiang, S.C. (1972) ‘Rationale for meanstandard deviation analysis, skewness preference, and demand for money’, American Economic Review 62 (3): 354–371.
Van Zwet, W.R. (1964) Convex Transformations of Random Variables, Amsterdam: Mathematical Centre Tracts 7, Mathematisch Centrum.
Acknowledgements
I received valuable comments on earlier drafts of this paper from Carmen Menezes, Peter Lambert, and Roger Hartley and from seminar participants at the University of Manchester and the World Risk and Insurance Economics Congress in Salt Lake City. I alone am responsible for all remaining errors and weaknesses.
Author information
Authors and Affiliations
Appendix
Appendix
Proof of Lemma 2

(i) Let F̂(x)=F(σ_{ F }x+μ_{ F }) and Ĝ(x)=G(σ_{ G }x+μ_{ G }). Then F^{−1}(G(x)) is convex if and only if F̂^{−1}(Ĝ(x)) is convex, which in turn implies and is implied by [F̂^{−1}(Ĝ(x))−x] being convex. F̂ can thus cross Ĝ at most twice. But μ_{ F̂ }=μ_{ Ĝ } and σ_{ F̂ }=σ_{ Ĝ } imply that F̂ cannot cross Ĝ less than twice.
(ii) Let F̂(x)=F(σ_{ F }x+μ_{ F }) and Ĝ(x)=G(σ_{ G }x+μ_{ G }). μ_{ F̂ }=μ_{ Ĝ } and σ_{ F̂ }=σ_{ Ĝ } are equivalent to ∫_{ a }^{b}[Ĝ(y)−F̂(y)]dy=0 and ∫_{ a }^{b}∫_{0}^{y}[Ĝ(s)−F̂(s)]dsdy=0 (see Menezes et al. (1980) for a proof). Ĝ crossing F̂ twice first from above implies that ∫_{ a }^{x}Ĝ(y)dy can cross ∫_{ a }^{x}F̂(y)dy at most once from above, which, together with ∫_{ a }^{b}∫_{0}^{y}[Ĝ(s)−F̂(s)]dsdy=0, implies that ∫_{ a }^{x}F̂(y)dy crosses ∫_{ a }^{x}Ĝ(y)dy once from above and ∫_{ a }^{x}∫_{0}^{y}[Ĝ(s)−F̂(s)]dsdy⩾0 for all x. □
Proof of Theorem 1

If F and G are skewness comparable, then, by Lemma 1, μ_{ F }=μ_{ G }, σ_{ F }^{2}=σ_{ G }^{2}, and m_{ F }^{3}=m_{ G }^{3} clearly imply F(x)=G(x). (If μ_{ F }=μ_{ G }, σ_{ F }^{2}=σ_{ G }^{2}, and F(x)≠G(x), then m_{ F }^{3}≠m_{ G }^{3}.)
For the converse, we are to show that if F and G are not skewness comparable, then it is possible that (μ_{ F },σ_{ F }^{2},m_{ F }^{3})=(μ_{ G },σ_{ G }^{2},m_{ G }^{3}) and F(x)≠G(x). Let F and G be such that μ_{ F }=μ_{ G }=μ, σ_{ F }^{2}=σ_{ G }^{2}=σ^{2}, and ∫_{ a }^{x}∫_{ a }^{y}[G(z)−F(z)]dzdy>0 for x◯ and . ^{20} Since μ_{ F }=μ_{ G } implies ∫_{ a }^{b}[G(y)−F(y)]dy=0 and σ_{ F }^{2}=σ_{ G }^{2} together with μ_{ F }=μ_{ G } implies ∫_{ a }^{b}∫_{ a }^{y}[G(s)−F(s)]dsdy=0, repeated integration by parts gives
That is, (μ_{ F },σ_{ F }^{2},m_{ F }^{3})=(μ_{ G },σ_{ G }^{2},m_{ G }^{3}) does not imply F=G if F and G are not skewness comparable.□
Proof of Lemma 3

(i) By Lemma 1, F being more skewed to the right than G implies m_{ F }>m_{ G }. Conversely, if F is not more skewed to the right than G, by skewness comparability, either [F(σ_{ F }x+μ_{ F })−G(σ_{ G }x+μ_{ G })] is a downside risk increase or F(σ_{ F }x+μ_{ F })=G(σ_{ G }x+μ_{ G }), which implies m_{ F }^{3}⩽m_{ G }^{3}.
(ii) μ_{ F }=μ_{ G }≡μ, m_{ F }^{3}=m_{ G }^{3}, and skewness comparability imply that F(σ_{ F }x+μ)=G(σ_{ G }x+μ) or equivalently . Furthermore, since
, σ_{ F }>σ_{ G } implies that, for x<μ, and hence F(x)=G(σ_{ G }/σ_{ F }(x−μ)+μ)⩾G(x). and similarly, for x>μ, F(x)⩽G(x). That is, [G(x) → F(x)] is a simple MPS. Conversely if σ_{ F }⩽σ_{ G }, then by analogous reasoning [G(x) → F(x)] is an MPC or F(x)=G(x) and hence not an MPS.
(iii) σ_{ F }=σ_{ G }, m_{ F }^{3}=m_{ G }^{3} and skewness comparability imply that F(x)≡G(x+μ_{ G }−μ_{ F }). μ_{ F }>μ_{ G } clearly then implies that [G(x) → F(x)] is an FSD improvement. Conversely, if μ_{ F }⩽μ_{ G }, either [G(x) → F(x)] is an FSD deterioration or F(x)=G(x) and hence [G(x) → F(x)] is not an FSD improvement.□
Proof of Proposition 1

First we know
Second, being Bernoulli distributions, F_{1}(σ_{1}x+μ_{1}) and F_{2}(σ_{2}x+μ_{2}) can cross at most twice, but (7) precludes the case where they cross once because if two distributions with the same mean cross once, then one is an MPS from the other and has a larger variance. That is, to show p_{1}<p_{2} implies that F_{2}(x) is more skewed to the right than F_{1}(x), we only need to show p_{1}<p_{2} implies that F_{1}(σ_{1}x+μ_{1}) crosses F_{2}(σ_{2}x+μ_{2}) first from above, that is, (y_{1}−μ_{1})/σ_{1}<(y_{2}−μ_{2})/σ_{2}. To see that, suppose p_{1}<p_{2}. Then since (6) is equivalent to
if (y_{1}−μ_{1})/σ_{1}⩾(y_{2}−μ_{2})/σ_{2}, then
which implies that F_{1}(σ_{1}x+μ_{1}) singlecrosses F_{2}(σ_{2}x+μ_{2}) and contradicts (7). Hence, p_{1}<p_{2} implies that (y_{1}−μ_{1})/σ_{1}<(y_{2}−μ_{2})/σ_{2} and F_{1}(σ_{1}x+μ_{1}) crosses F_{2}(σ_{2}x+μ_{2}) exactly twice first from above.
If p_{1}=p_{2}, then by (6)
which gives
and thus contradicts (7). That is, p_{1}=p_{2} implies (y_{1}−μ_{1})/σ_{1}=(y_{2}−μ_{2})/σ_{2}, which by (6) implies (z_{1}−μ_{1})/σ_{1}=(z_{2}−μ_{2})/σ_{2} and hence F_{1}(σ_{1}x+μ_{1})=F_{2}(σ_{2}x+μ_{2}).
This completes the proof of both (i) and (ii) because what is shown also implies that if p_{1}≮p_{2}, then F_{2}(x) is not more skewed to the right than F_{1}(x), and that if p_{1}≠p_{2}, then F_{1}(σ_{1}x+μ_{1})≠F_{2}(σ_{2}x+μ_{2}).
Proof of Proposition 2 Let T(τ)≡r_{1}(r_{2}^{−1}(τ)). Then t_{1}(x) having more residual progression than t_{2}(x) is equivalent to T(τ)/τ being nonincreasing and t_{1}(x) being more residualconcave than t_{2}(x) is equivalent to T being concave. T(τ)/τ is nonincreasing in τ if
But by the MeanValue Theorem, for any τ>0, there exists such that
The concavity of T and T(0)⩾0 thus implies
That is, T(τ)/τ is nonincreasing in τ. □
Rights and permissions
About this article
Cite this article
Chiu, W. Skewness Preference, Risk Taking and Expected Utility Maximisation. Geneva Risk Insur Rev 35, 108–129 (2010). https://doi.org/10.1057/grir.2009.9
Published:
Issue Date:
DOI: https://doi.org/10.1057/grir.2009.9
Keywords
 skewness preference
 risk aversion
 downside risk
 moment
 gambling