The life of a people is conceived as a scheme of cooperation spread out in historical time. It is to be governed by the same conception of justice that regulates the cooperation of contemporaries.
(John Rawls, A Theory of Justice).
Abstract
Patient and Pareto responsive (pPr) societal preferences were introduced and studied in Khan and Stinchcombe (2018). This paper develops a tractable subclass of the pPr preferences that satisfy a strong equity criterion formulated to match intuitions and results for large but finite models. In population models where the number and happiness of future people is stochastic, the only optimal policies require sustainability (resp. an abundance of effort) in the presence of irreversible (resp. difficult to reverse) negative externalities suffered by future generations. Partially ordering the preferences by increasing degrees of inequality aversion over generations, more inequality averse preferences give rise to choices that are counterintutive from population ethics viewpoint in smaller sets of problems.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In choosing present policies, there is both a moral and a pragmatic imperative to consider the welfare of those not yet born. The normative theory of intergenerational equity has occupied social choice theorists for a long time. The pressing problems of the day, including climate change and environmental sustainability, have brought generational ethics into sharp focus as a matter of immense practical importance. This paper is a part of an ongoing project that aims to a) build a coherent theoretical framework for dealing with intergenerational equity, and b) develop answers to practical questions that arise in situations involving irreversible, or difficult to reverse, changes, as they often warrant considerations of intergenerational ethics. To this end, the current paper refines and enriches the theory first proposed in Khan and Stinchcombe (2018) (henceforth KS), and illustrates the implications of said framework in two examples drawn from environmental economics and one from political economy/political philosophy.
The key takeaway from this paper is that serious considerations of intergenerational equity tends to provide qualitatively different policy prescriptions compared to standard discounted utility models. In particular, they push for greater sacrifices for the future in the form of more cautious and “sustainable” policies when dealing with potentially irreversible negative externalities. And these lead to very different long-run predicted outcomes. Remarkably, in the presence of irreversibility, these results hold true regardless of the “degree of discounting” in the standard models. When it comes to making decisions that have significant long run consequences, perhaps the right question to ask is not “How much discounting,” rather, it is, “Should we discount at all?” In the rest of the introduction we outline our approach and the structure of the paper.
1.1 Different scales
Thinking about intergenerational ethics is a problem with a very large scale.
With 500 million years left of acceptable habitat for humans on Earth, population being stable at 10 billion with an average length of life equal to 73 years, the ratio of people who will potentially live in the future to people living now is approximately 10 million to 1. (Asheim 2010)
The numbers ‘500 million,” “10 billion” and “10 million” are large, but finite. Our approach is to replace the large but finite populations and time horizon by a particular kind of continuous, non-atomic (or “oceanic” in Aumann and Shapley’s (1974) evocative term) population model. Our approach to non-atomic models takes seriously the idea that infinite models should be interpretable as limits of large finite models. To guarantee this, we define our models using sequences of increasingly large finite models so that the results and definitions from the finite models carry over.
Within this class of population models, KS formulated social welfare functions that are Pareto responsive and patient in the sense of being invariant to the largest class of permutations that had been used in the literature. To the KS model, we add a stronger, ‘limit’ equity condition. It requires invariance with respect to a much larger class of permutations, a class that directly parallels the set of permutations for finite models. This guarantees equal treatment of all generations.
Taking a “pragmatic” point of view, we judge ethical assumptions by an examination of their implications in economic models.Footnote 1 Our limit equity condition delivers a subclass of the KS social welfare functions, a subclass for which one can find strong implications of the precautionary principle for irreversible or difficult-to-reverse problems. It also delivers analyses of intergenerational trade-offs that appear more sensible than some extant formulations of “Rawlsian” and other patient preferences that aim to capture intergenerational equity.
1.2 The limit equity condition
To see what is involved in our limit equity condition, let us start with large but finite models. We will work with a limit formulation of sequences of large population models, examining sequences \(I_n = \{0, 1, 2, \ldots , T_n\}\) of generations with \(T_n \rightarrow \infty\). For each finite model \(I_n\), the most basic of the equity conditions is invariance with respect to permutations, and we extend this directly to the limit population model.
A one-to-one and onto mapping \(\pi\) from a finite \(I = \{0, 1, 2, \ldots , T\}\) to itself is a bijection. Let \({{\varvec{u}}}: I \rightarrow {\mathbb {R}}\) denote the utility assignments of the population. The equity condition for the social preferences is that \({{\varvec{u}}}^\pi\) and \({{\varvec{u}}}\) should be indifferent where \({{\varvec{u}}}^\pi := (u_{\pi (0)}, u_{\pi (1)}, u_{\pi (2)}, \ldots , u_{\pi (I)})\). This has an alternative, probabilistic formulation.
With \(\Lambda _I\) denoting the uniform distribution on I, every utility profile \({{\varvec{u}}}= (u_0, \ldots , u_T)\) induces a distribution of utility \(p_{{\varvec{u}}}\) defined by letting \(p_{{\varvec{u}}}(A)\) denote the proportion of the population receiving a utility level in the set A, \(p_{{\varvec{u}}}(A) = \Lambda _I(\{t: u_t \in A\}) = \frac{1}{T+1} \#\{t: u_t \in A\}\).Footnote 2 With \({\mathcal {B}}_I\) denoting the set of bijections on I, under a uniform distribution on I, we have the following property, called homogeneity, of the finite probability space \(\{0, 1, 2, \ldots , I\}\) and the uniform distribution \(\Lambda _I\):
For finite populations, strong equity is the condition that societal preferences should be invariant to shuffles of who receives what. This is equivalent to social welfare depending only on the portions of the population receiving the various utility levels. Specifically, preferences between \({{\varvec{u}}}\) and \({{\varvec{v}}}\) can only depend on \(p_{{{\varvec{u}}}}\) and \(p_{{{\varvec{v}}}}\). While this works for, indeed characterizes, the class of infinite population models developed here, it does not work for many other infinite population models.
For a non-atomic probability space \((\Omega ,{\mathcal {F}},P)\), measure automorphisms are the generalization of bijections: a measurable function \(\pi :\Omega \rightarrow \Omega\) is a measure automorphism if \(P(E) = P(\pi ^{-1}(E))\) for all measurable \(E \subset \Omega\).Footnote 3 Given an automorphism \(\pi\) and defining \({{\varvec{u}}}^\pi (\omega ) = {{\varvec{u}}}(\pi (\omega ))\), the distributions of utility induced by \({{\varvec{u}}}\) and \({{\varvec{u}}}^\pi\) are equal to each other because \({{\varvec{u}}}^{-1}(A)\) always has the same P-mass as \(\pi ^{-1}({{\varvec{u}}}^{-1}(A))\) for any measurable \(A \subset {\mathbb {R}}\). For preferences that depend only on the distribution of utilities, this is the right set of transformations to consider. But indifference to such automorphisms need not capture the idea of equity that comes from invariance to bijections.
Two examples help pinpoint the difficulties. First, measure automorphisms need not be 1-to-1, if \(\Omega\) is the unit interval (0, 1] (rather than the limit-of-finite population model that we will use) and P is the uniform distribution, then the two-to-one, onto function \(\pi (\omega ) = 2 \omega \cdot 1_{(0, \frac{1}{2}]}(\omega ) + (2\omega - 1) 1_{(\frac{1}{2}, 1]}(\omega )\) is an automorphism. But a two-to-one \(\pi\) can have no interpretation as a switching of utility levels between generations. Second, 1-to-1 and onto functions need not be measure automorphisms. With the same probability space, the one-to-one, onto function \(\pi (\omega ) = \omega ^2\) is not a measure automorphism (except for the point mass distribution on \(\omega = 1\)).
For the non-atomic, limit-of-finite population model \({{\mathbb {I}}}\) that we use here, the appropriate equity condition is that for any almost everywhere one-to-one, onto \(\pi :{{\mathbb {I}}}\rightarrow {{\mathbb {I}}}\) and any measurable utility allocation \({{\varvec{u}}}:{{\mathbb {I}}}\rightarrow {\mathbb {R}}\), \({{\varvec{u}}}\) and \({{\varvec{u}}}^\pi\) are indifferent. This works because, in the model we use, (1) \(\pi\) is a measure automorphism iff (2) it is almost everywhere one-to-one and onto, and the model also satisfies homogeneity, (3) for any measurable \({{\varvec{u}}}, {{\varvec{v}}}:{{\mathbb {I}}}\rightarrow {\mathbb {R}}\), \(p_{{\varvec{u}}}= p_{{\varvec{v}}}\) if and only if \({{\varvec{v}}}= {{\varvec{u}}}^\pi\) for an almost everywhere one-to-one and onto function \(\pi\). Except for the need to have the “almost everywhere” qualifier to deal with null sets, this is the same homogeneity condition that finite models satisfy.
The existence of non-atomic homogeneous probability spaces was settled by von Neumann (1932). However, we use results due to Jerome Keisler (1984) who showed that when a hyperfinite sets from nonstandard analysis is given the uniform (or counting) distribution, then properties (1) and (2) are equivalent, and the model is homogeneous (3). Further, from Robinson (1964, Theorem 5.1), these hyperfinite population models can always be understood as the limits of larger and larger sequences of finite sets with the uniform distribution. As we will see, the tools of nonstandard analysis allow us to analyze these models using techniques familiar from finite sets.
1.3 Outline
The next section gives the formal setting and the relevant results on patient social welfare from KS. Of particular interest is the observation that patience—formalized as indifference to the largest class of permutations used in the previous literature—is compatible with the Pareto criterion. An example shows that this definition of patience is compatible with an almost complete abrogation of equity.
The subsequent section, Sect. 3, gives the nonstandard analysis necessary to construct the limit population models behind the work in KS. Using these tools, we add the strong equity condition, and in Sect. 4 we apply the tools to analyze applications in two environmental economics problems and one political economy problem. One of the environmental problems involves the risk of an irreversible change, the second involves very difficult to reverse changes. The political economy problem involves a study of when some counterintutive conclusions from population ethics do and do not hold within our class of social welfare functions. The final section gives a summary and discusses some of the open problems, and proofs not given in the text are gathered in the appendix.
2 Preliminaries
Equitable social preferences are those indifferent to permutations of the names of those receiving benefits. Patient preferences are those indifferent to permutations in the arrival time of benefits. For social welfare functions defined on intergenerational streams of utilities, these ideas are close to each other.
This section begins with a review of the setting and main results about large intergenerational population models used in KS. It then contrasts the KS settings and results to previous work on intergenerational equity/patience. That literature showed that it is difficult to incorporate patience, understood as a form of intergenerational equity, and still satisfy the Pareto criterion. Such findings make it hard to operationalize equitable societal preferences for intergenerational problems. And, by extension, this can turn into an argument that some form of discounting must be used.
KS gave the population model that was implicit in the literature. In this model, the purported examples of the failure of the Pareto criterion involve increasing the welfare of only a null subset of the population. KS then integrated patience/intergenerational equity and Pareto responsiveness in a class of social welfare functions. This paper adds a more thorough implementation of equity to the KS preferences.
2.1 Overview
The primitives in the model of KS are intergenerational streams of well-being, normalized to belong to \({{\varvec{W}}}\), the non-negative elements of \(\ell _\infty = \ell _\infty ({{\mathbb {N}}_0})\), \({{\mathbb {N}}_0}= \{0, 1, 2, \ldots , \}\).Footnote 4 The typical element in \({{\varvec{W}}}\) is \({{\varvec{u}}}= (u_0, u_1, u_2, \ldots )\), and the norm distance between \({{\varvec{u}}}, {{\varvec{v}}}\in {{\varvec{W}}}\) is given by \(\Vert {{\varvec{u}}}- {{\varvec{v}}}\Vert = \sup _{t \in {{\mathbb {N}}_0}} |u_t - v_t|\). Both KS and this paper study preferences over the subset of \(\Delta ({{\varvec{W}}})\) supported by norm bounded sets that can represented \(p \succ q\) if and only if \(\int _{{{\varvec{W}}}} S({{\varvec{u}}})\,dp({{\varvec{u}}}) > \int _{{{\varvec{W}}}} S({{\varvec{u}}})\,dq({{\varvec{u}}})\) where \(S:{{\varvec{W}}}\rightarrow [0, \infty )\) is norm continuous, concave, and satisfies an additional condition meant to capture aspects of equity/patience. That additional condition is that \(S({{\varvec{u}}}) > S({{\varvec{v}}})\) when \({{\varvec{u}}}\) asymptotically first order dominates \({{\varvec{v}}}\).
Definition 2.1
A stream \({{\varvec{u}}}\in {{\varvec{W}}}\) asymptotically first order dominates a stream \({{\varvec{v}}}\in {{\varvec{W}}}\), denoted \({{\varvec{u}}}\succ _{fo} {{\varvec{v}}}\), if
for all continuous strictly increasing \(f:{\mathbb {R}}_+ \rightarrow {\mathbb {R}}\).
Essentially, this asks that for large T, the distribution of the utilities \(\{u_t: t = 0, \ldots , T\}\) strictly first order dominates the distribution of the utilities \(\{v_t: t = 0, \ldots , T\}\). This captures a sense of patience in that it need only hold for large T, and it captures a sense of equity in that the ordering \(\succ _{fo}\) is immune to several classes of permutations.
The set \(\{\ldots , -2, -1, 0, 1, 2, \ldots \}\) is denoted \({\mathbb {Z}}\). A permutation is a 1-to-1 function \(\pi :{{\mathbb {N}}_0}\rightarrow {\mathbb {Z}}\) that is onto \({{\mathbb {N}}_0}\). Given \({{\varvec{u}}}= (u_0, u_1, u_2, \ldots ) \in \ell _\infty\) and a permutation \(\pi\), define \({{\varvec{u}}}^\pi\) as \((u_{\pi ^{-1}(0)}, u_{\pi ^{-1}(1)}, u_{\pi ^{-1}(2)}, \ldots )\). In increasing order of generality, the literature has considered the following classes of permutations, and has interpreted indifference to permutations variously as “equity,” “weak anonymity,” or “intergenerational neutrality.”
-
\(\pi\) is a shift permutation if \(\pi (T) = T-F\) for some integer F.
-
\(\pi\) is a bounded permutation if, for all T, \(|\pi (T) - T| \le F\) for some integer F.
-
\(\pi\) is a asymptotic permutation if \(\lim _{T \rightarrow \infty } |\pi (T)-T|/T = 0\).Footnote 5
It can be shown that if \(\pi\) is an asymptotic permutation, then \({{\varvec{u}}}\succ _{fo} {{\varvec{v}}}\) iff \({{\varvec{u}}}^\pi \succ _{fo} {{\varvec{v}}}\) iff \({{\varvec{u}}}\succ _{fo} {{\varvec{v}}}^\pi\). Missing in these three classes of permutations is a parallel to the strong equity condition for finite sets of generations. The key observation is that the distribution of the utilities \(\{u_t: t = 0, \ldots , T\}\) first order dominates the distribution of the utilities \(\{v_t: t = 0, \ldots , T\}\) iff it dominates after any permutation of the finite set \(\{0, \ldots , T\}\), including those that take large t’s and switch them with small t’s, e.g. \(\pi (t) = T-t\).
In this paper, we study population models that are limits, as \(T_n \rightarrow \infty\), of the finite sets \({{\mathbb {I}}}_n = \{0, \ldots , T_n\}\). The equity condition that we here require is indifference with respect to all limits of permutations \(\pi _n\), where each \(\pi _n\) is a permutation on \({{\mathbb {I}}}_n\). There are two additional points to be made.
-
First, there is an “almost everywhere” qualifier to the “all” in the previous sentence. The sequence of permutations \(\pi _n\) need only apply outside a sequence of ‘exceptional sets.’ The exceptional sets, \(E_n \subset {{\mathbb {I}}}_n\), satisfy \(\frac{\# E_n}{T_n} \rightarrow 0\). With this qualifier, if \(\pi\) is either a shift, a bounded, or an asymptotic permutation, then the sequence of restrictions of \(\pi ^{-1}\) to \({{\mathbb {I}}}_n\) defines an almost everywhere sequence of permutations.
-
Second, restricting a single \(\pi\) to the sequence \({{\mathbb {I}}}_n\) cannot e.g. interchange large t’s with small t’s when \(T_n\) is large. But a sequence \(\pi _n\) can perform this sort of interchange.
This last point is the crucial difference between what we do here and what has come before. While the literature has worked with permutations on all of \({{\mathbb {N}}_0}\), we work with sequences of permutations on sequences of approximations to \({{\mathbb {N}}_0}\). Example 2.1 shows that not having invariance with respect to this richer class of permutations allows for social preferences that totally downweight the far future. All of this is most easily seen within a tractable subclass of the KS preferences.
2.2 A tractable class of preferences
The most tractable class of preferences in KS involve non-atomic, shift-invariant probabilities, Q, on \({{\mathbb {N}}_0}\) (also known as Banach-Mazur limits when the focus is on integrals as continuous linear operators on \(\ell _\infty\)). The non-atomicity captures the ‘limit of large finite populations’ aspect of the problem, and the shift-invariance captures patience. Shift-invariance of a probability Q is the requirement that if \({{\varvec{u}}}= (u_0, u_1, u_2, \ldots )\) and \({{\varvec{v}}}= (u_1, u_2, u_3, \ldots )\), then \(\int u_t\,dQ(t) = \int v_t\,dQ(t)\). If \(g:{\mathbb {R}}_+ \rightarrow {\mathbb {R}}_+\) is bounded, then the shift-invariance of Q also delivers \(\int g(u_t)\,dQ(t) = \int g(v_t)\,dQ(t)\). Of particular note is the case that \(g = 1_A\) for \(A \subset {\mathbb {R}}\).
For each \({{\varvec{u}}}\) and Q, there is an induced distribution of generational utilities given by
The tractable subclass of KS preferences are those represented by utility functions of the form
where \(\varphi :{\mathbb {R}}_+ \rightarrow {\mathbb {R}}_+\) is strictly increasing and concave. Integrals are appearing twice in rather different ways in this class of prefences: the function \({{\varvec{u}}}\mapsto S_{\varphi ,Q}({{\varvec{u}}})\) from \({{\varvec{W}}}\) to \({\mathbb {R}}_+\) is the integral, over \({\mathbb {R}}_+\), of \(\varphi (\cdot )\) with respect to \(p_{{{\varvec{u}}},Q}\); and preferences between probabilities \(\alpha\) and \(\beta\) on \({{\varvec{W}}}\) are determined by the integrals over \({{\varvec{W}}}\), \(\int _{{{\varvec{W}}}} S_{\varphi ,Q}({{\varvec{u}}}) \,d\alpha ({{\varvec{u}}})\) and \(\int _{{{\varvec{W}}}} S_{\varphi ,Q}({{\varvec{u}}}) \,d\beta ({{\varvec{u}}})\).
The goal is to find the implications of maximizing the utility functions in the class (4) in stochastic dynamic problems. A pair of results in Robinson (1964) make the solutions both interpretable and tractable.
-
Interpretability. (Robinson 1964, Theorem 5.1) gives an alternative limit formulation for the Q’s: there is a generalized sequence (filter) of large finite population models, \(\{0, 1, 2, \ldots , T_\alpha \}\) where \(T_\alpha \rightarrow \infty\), and a sequence of probabilities, \(Q_\alpha\) on those large finite models, with the property that for each \({{\varvec{u}}}\), \(p_{{{\varvec{u}}},Q} = \lim _\alpha p_{{{\varvec{u}}},Q_\alpha }\). This means that solutions will always have interpretation as the limit of large finite horizon problems.
-
Tractability. (Robinson 1964, Theorem 3.6) shows that it is possible to represent a shift-invariant probability as a distribution Q on a population model \(\{0, 1, 2, \ldots , T\}\) where T is an ‘unlimited’ or ‘infinite’ integer (from the field of mathematics known as non-standard analysis pioneered by Robinson (1966, 1996)). In particular, this means that most of the calculations can be done using techniques involving finite sums. The restriction on Q that delivers both non-atomicity and shift-invariance is that the total differences in weights given to adjoining generations must be infinitesimal, \(\sum _{t=0}^T |Q(t+1) - Q(t)| \simeq 0\). In terms of the limit formulation, this is the requirement that \(\lim _\alpha \sum _{t=0}^{T_\alpha } |Q_\alpha (t+1) - Q_\alpha (t)| = 0\).
2.3 Equity in ‘Limit’ population models
It can be shown that if Q is shift-invariant and \(\pi\) is an asymptotic permutation, then \(p_{{{\varvec{u}}},Q} = p_{{{\varvec{u}}}^\pi ,Q}\). This guarantees that the KS preferences described in (4) satisfy previous equity, weak anonymity, and intergenerational neutrality criteria. However, they do not generally satisfy the strong equity condition we study in this paper, immunity to almost-everywhere ‘limit’ permutations. We here sketch what is involved without the nonstandard analysis tools developed in the next section. Instead, we use sequences of utility vectors in \({{\varvec{W}}}\) restricted to sequences of finite horizon models while simultaneously matching them with sequences of probabilities and sequences of permutations.
Example 2.1
(Two shift invariant distributions) For a sequence \(T_n \rightarrow \infty\), let \(\Lambda _n\) denote the uniform distribution on \(I_n:= \{0, 1, \ldots , T_n\}\) so that \(\Lambda _n(t) = 1/(T_n+1)\) for \(0 \le t \le T_n\). For a sequence \(\beta _n \uparrow 1\), pick \(T_n \rightarrow \infty\) such that \(S_n:= (1-\beta _n)\sum _{t=0}^{T_n} \beta _n^t \uparrow 1\) and let \(Q_{n}\) denote the geometric distribution with parameter \(\beta _n\) conditioned to the interval \(I_n:= \{0, \ldots , T_n\}\) so that \(Q_{n}(t) = \frac{1}{S_n} (1 - \beta _n) \beta _n^t\) for \(0 \le t \le T_n\). The limit probabilities are shift invariant because \(\sum _{t \in {{\mathbb {N}}_0}} |\Lambda _n(t+1) - \Lambda _n(t)| \rightarrow 0\) and \(\sum _{t \in {{\mathbb {N}}_0}} |Q_{n}(t+1) - Q_{n}(t)| \rightarrow 0\).
A classical result tells us that the set of finitely additive probabilities is compact if we define convergence of probabilities by \(p^\alpha \rightarrow p\) iff \(p^\alpha (B) \rightarrow p(B)\) for all B in the domain of the probability. This means that the sequences of probabilities just given have accumulation points. Robinson (1964) uses nonstandard analysis tools to facilitate working with such accumulation points.
Example 2.2
(Limit permutations) On the sequences of population models \(I_n = \{0, 1, \ldots , T_n\}\), consider the sequence of allocations \({{\varvec{u}}}_{|I_n}\) that give the early generations, \(\{0, \ldots , T_n/2\}\) a boon, with additional utility 1, and give the remaining generations no additional utility. Equity requires indifference between switching the boon between the earlier and the later generations. One of the shift invariant distributions above has this property while the other completely downweights the boon if it goes to the later generations.
-
(a)
For the sequence \(\Lambda _n\), the limit distribution of boons associated with the allocations \({{\varvec{u}}}_{|I_n}\) is \(\frac{1}{2}\delta _1 + \frac{1}{2}\delta _0\), that is, half of the population receives the good outcome and the remaining half does not. Further, this is invariant with respect to all sequences of permutations \(\pi _n\) on \(I_n\), including those that switch the early for the later generations.
-
(b)
For the sequence \(Q_n\), note that \((1-\beta _n) \sum _{t=0}^{T_n} \beta _n^t = (1-\beta _n^{T_n+1}) \rightarrow 1\) iff \((1-\beta _n) \sum _{t=0}^{T_n/2} \beta _n^t = (1-\beta _n^{T_n/2+1}) \rightarrow 1\). Therefore, the limit distribution of utilities associated with the allocations \({{\varvec{u}}}_{|I_n}\) is \(\delta _1\), but switching the early generations, \(\{0, \ldots , T_n/2\}\), for the later generations, \(\{T_n/2, \ldots , T_n\}\), makes the limit distribution of utilities \(\delta _0\).
This example shows that even the asymptotic discounting model yields a theory that is subject to Ramsey’s (1928) rather withering critique—when one considers the later enjoyments of groups of the same size, it “discount(s) later enjoyments in comparison with earlier ones, a practice which is ethically indefensible and arises merely from the weakness of the imagination.” Our response to this ethically indefensible position is the imposition of indifference to all permutations, and we will see that, in terms of the limit constructions just used, the only translation invariant Q’s that satisfy this criterion are the limits of the uniform distributions on \(\{0, 1, \ldots , T_n\}\). This is certainly intuitive—if we wish our criterion to treat generations equally, then it must weight them equally.
2.4 On patience and the Pareto criterion
The literature on intergenerational social welfare functions has various results interpreted to be indicating the difficulties of combining the Pareto criterion and patience. In this literature, patience is variously understood as invariance with respect to the shift, bounded, and asymptotic permutations described above. The non-atomic population perspective used in KS, and here, provides a different interpretation. It shows that the purported Pareto improvements used in this literature are best understood as boons given to null coalitions.Footnote 6
Using the shift invariance criterion from Marinacci’s (1998) development of “complete patience,” one can see how patience works with \({{\mathbb {N}}_0}\) as the model of generations. For a utility stream \({{\varvec{u}}}= (u_0, u_1, u_2, u_3, \ldots )\) and any finite F, consider the shift \({{\varvec{u}}}^F:= (u_F, u_{F+1}, u_{F+2}, u_{F+3}, \ldots )\). Patience can be understood as invariance with respect to such shifts. For example, suppose that \({{\varvec{u}}}\) is a sequence with \(4 \le u_t \le 7\) for all t. The stream \({{\varvec{u}}}\) is the F-shifted stream of, hence is indifferent to, either
The indifference between \({{\varvec{u}}}(0,F)\) and \({{\varvec{u}}}\) captures patience as a social willingness to wait for rewards. The indifference between \({{\varvec{u}}}(9,F)\) and \({{\varvec{u}}}\) captures a social willingness to ignore benefits accruing to a finite subset of an infinite population while waiting for the long-term pattern to start.
Invariance with respect to finite shifts and continuity seemingly lead to a violation of the Pareto criterion. Consider \({{\varvec{r}}}= (r_0,r_1, \ldots )\) with \(r_t \downarrow 0\) and compare \({{\varvec{u}}}+ {{\varvec{r}}}\) to \({{\varvec{u}}}\) assuming that preferences are represented by a uniformly continuous \(S(\cdot )\) (and KS, Theorem C shows that most of the patient social welfare functions in the literature are Lipschitz continuous). We have
Indifference to finite shifts gives both of the “\(= 0\)” conclusions, and “\(\rightarrow 0\)” conclusion follows from \(\Vert {{\varvec{u}}}^F - ({{\varvec{u}}}+{{\varvec{r}}})^F\Vert = r_F \downarrow 0\). Thus, a preference relation represented by an \(S(\cdot )\) that is both shift invariant and uniformly continuous must be indifferent to improvements in an allocation that are modeled by \({{\varvec{r}}}\).
For any shift invariant Q, the allocational increase represented by \({{\varvec{r}}}\) has the property that for any \(\epsilon > 0\), the Q-mass of the population receiving less than \(\epsilon\) is equal to 1. Put more bluntly, \({{\varvec{r}}}\) represents a positive boon to only a null subset of the population. Since the earliest uses of non-atomic population models, e.g. Hildenbrand (1969), increasing the utility of a null subset of the population does not count as a Pareto improvement.
The limit formulation of null and substantial coalitions in KS are as follows: \(N \subset {{\mathbb {N}}_0}\) is null if \(\limsup _T \frac{1}{T+1} \sum _{t=0}^T 1_N(t) = 0\), and \(S \subset {{\mathbb {N}}_0}\) is substantial if \(\liminf _T \frac{1}{T+1} \sum _{t=0}^T 1_S(t) > 0\). From KS, Theorem A, the preferences satisfying respect for asymptotic first order dominance both ignore boons to null coalitions and respond to boons to substantial coalitions, i.e. they are Pareto responsive.
One can be very uncomfortable with the conclusion that \({{\varvec{u}}}(0,F)\), \({{\varvec{u}}}\) and \({{\varvec{u}}}(9,F)\) from (5) and (6) are indifferent, especially if one belongs to the first F generations. This is a cost of moving to a nonatomic population model, a model in which there are many null sets, even many infinite null sets. Applied to maximization problems, this indifference is a symptom of what is called “underselectiveness” in the parts of the operations research literature that studies the problem of maximizing the long run average performance of a stochastic dynamic system. For such problems, the outcome \({{\varvec{u}}}\) is optimal iff both \({{\varvec{u}}}(0,F)\) and \({{\varvec{u}}}(9,F)\) are also optimal.
An alternative approach is to abandon shift invariance as a patience criterion and build in patience and respect for the Pareto criterion through other methods. Jonsson and Voorneveld (2018) study a “limit of discounted utility” ordering on \({{\varvec{W}}}\) given by \({{\varvec{u}}}\succsim _{LDU} {{\varvec{v}}}\) if
As shown above, this is subject to Ramsey’s criticism, at least if one takes seriously the idea that the infinite models should be interpretable as limits of large finite models. However, applied to \({{\varvec{u}}}(0,F)\), \({{\varvec{u}}}\) and \({{\varvec{u}}}(9,F)\) from (5) and (6), we have \({{\varvec{u}}}(9,F) \succ _{LDU} {{\varvec{u}}}\succ _{LDU} {{\varvec{u}}}(0,F)\) which may accord better with one’s intuition about what “should be” the case.
However, this respect for Pareto dominance in the classical sense rather than in the KS nonatomic population sense means that its use in maximization problems can be quite difficult. For example, one needs to evaluate outcomes involving randomness, so the relation \(\succsim _{LDU}\) needs to be extended to distributions on \({{\varvec{W}}}\). But this could be problematic: if \({\varvec{U}}\) and \({\varvec{V}}\) are independent random points in \({{\varvec{W}}}\) with the \(\{u_t: t \in {{\mathbb {N}}_0}\}\) and \(\{v_t: t \in {{\mathbb {N}}_0}\}\) both i.i.d. and having non-degenerate distributions with the same mean, then the probability that \({\varvec{U}} \succsim _{LDU} {\varvec{V}}\) or that \({\varvec{V}} \succsim _{LDU} {\varvec{U}}\) is equal to 0, that is, they are non-comparable with probability 1.Footnote 7
3 The infinite population models
This section develops the basic nonstandard analysis needed to represent the ‘limit’ objects we use in our analysis. Of central interest are ‘limit’ versions of large populations, \(\{0, 1, \ldots , T_n\}\), \(T_n \rightarrow \infty\), of ‘limit’ probabilities and ‘limit’ permutations on the large population ‘limit,’ and of ‘limit’ allocations of utilities for those large populations. The general method of construction of a ‘limit’ object in a set X is as follows: it starts with the set of sequences in X; defines an equivalence relation on the set of sequences; defines the set of equivalence classes to be the ‘limit’ objects; and denotes the set of these limit objects as \({}^*\!X\), read as “star X.” For a simple example, if \(X = {\mathbb {R}}\), then \({}^*{\mathbb {R}}\) is the set of nonstandard real numbers. For a more complicated examples, if \(X = {\mathcal {P}}_{Fin}({{\mathbb {N}}_0})\) is the set of finite subsets of \({{\mathbb {N}}_0}= \{0, 1, 2, \ldots \}\), then our ‘limit’ population models belong to \({}^*\!{\mathcal {P}}_{Fin}({{\mathbb {N}}_0})\), i.e. they are subsets of \({}^*{{\mathbb {N}}_0}\).
Notationally, a sequence in X can be denoted \(n \mapsto x_n\) with each \(x_n \in X\) when the focus is on a sequence as a function from \({\mathbb {N}}\) to X, or as \((x_1, x_2, x_3, \ldots )\), when the focus is on a sequence as an ordered list, or as \(\{x_n: {n \in {\mathbb {N}}}\}\) when the focus is on a sequence as an indexed set. The set of all sequences in X is denoted \(X^{\mathbb {N}}\). This section begins with the definition of the equivalence relation, works through the basic properties of the construction in the most familiar case, the real numbers, \({\mathbb {R}}\) and its ‘limit’ or ‘nonstandard’ version, \({}^*{\mathbb {R}}\). Following this, we develop the other tools needed for our results and applications.
To be clear, we use \({\mathbb {N}}\) as an index set for the construction of limit objects, and we use \({{\mathbb {N}}_0}\) to index the generations.
3.1 The equivalence relation
We will use a finitely additive \(\{0, 1\}\)-value probability, denoted \(\mu\), on the index set, \({\mathbb {N}}\), and define two sequences \((x_1, x_2, x_3, \ldots )\) and \((y_1, y_2, y_3, \ldots )\) to be equivalent in \(X^{\mathbb {N}}\) if \(\mu (\{{n \in {\mathbb {N}}}: x_n = y_n\}) = 1\). We will then identify each equivalence class as a point in our new “nonstandard” space, denoted \({}^*\!X\). By doing this systematically, we can extend operations such as addition, multiplication, and division, and relations such as “greater than” or “a permutation of” to these new spaces. All of this starts with an examination of the properties of \(\mu\). Let \({\mathcal {P}}(X)\) denote the class of all subsets of a set X (aka the power set of X).
Definition 3.1
A function \(\mu :{\mathcal {P}}({\mathbb {N}}) \rightarrow [0, 1]\) is purely finitely additive, zero–one probability if
-
(1)
for all \(A \subset {\mathbb {N}}\), \(\mu (A) = 0\) or \(\mu (A) = 1\);
-
(2)
\(\mu (A \cup B) = \mu (A) + \mu (B)\) for all disjoint \(A,B \subset {\mathbb {N}}\);
-
(3)
\(\mu ({\mathbb {N}}) = 1\); and
-
(4)
\(\mu (A) = 0\) if \(A \subset {\mathbb {N}}\) is finite.Footnote 8
By induction, we can replace (2) by (2\('\)), \(\mu (\cup _{k=1}^K E_k) = \sum _{k=1}^K \mu (E_k)\) for all finite collections of disjoint sets \(\{E_k: k = 1, \ldots , K\}\). Combined with (1) and (3), if \(\{E_k: k = 1, \ldots , K\}\) is a partition of \({\mathbb {N}}\), then \(\mu (E_k) = 1\) for one and only one of the sets \(E_k\).
Definition 3.2
For any set X, two sequences \((x_1, x_2, x_3, \ldots )\) and \((y_1, y_2, y_3, \ldots )\) in \(X^{\mathbb {N}}\) are equivalent, denoted
if they are \(\mu\)-almost everywhere equal, \(\mu (\{{n \in {\mathbb {N}}}: x_n = y_n\}) = 1\).
Since the sets \(\{{n \in {\mathbb {N}}}: x_n = y_n\}\) and \(\{{n \in {\mathbb {N}}}: x_n \ne y_n\}\) partition \({\mathbb {N}}\), one and only one of them has \(\mu\)-mass 1, and an elementary check of the properties of \(\mu\) yields the following. For completeness, the proof of this and other results not given in the text are gathered in the appendix.
Lemma 3.1
The relation \(\sim\) is an equivalence relation on \(X^{\mathbb {N}}\).
Equivalence classes are central to the development, and they have their own notation: the equivalence class of an \((x_1, x_2, x_3, \ldots ) \in X^{\mathbb {N}}\) is defined as the set \(\{y \in X^{\mathbb {N}}: (x_1, x_2, x_3, \ldots ) \sim (y_1, y_2, y_3, \ldots )\}\), and it is denoted \(\langle x_1, x_2, x_3, \ldots \rangle\).
Definition 3.3
The nonstandard version of a set X is denoted as \({}^*\!X\) and defined as \(\{ \langle x_1, x_2, x_3, \ldots \rangle : (x_1, x_2, x_3, \ldots ) \in X^{\mathbb {N}}\}\).
We will use this construction for sets X of various degrees of complexity. When X is the set of finite subsets of \({{\mathbb {N}}_0}\), \({}^*\!X\) will contain our population models, \({{\mathbb {I}}}\). When X is the set of probabilities on finite subsets of \({{\mathbb {N}}_0}\), we will have distributions on our population model. The value of such a probability is the equivalence class of a sequence of numbers in [0, 1], that is, it is a number in \({}^*[0, 1]\). As a preview of the developments below: the Unicity Lemma (Lemma 3.3 below) shows how to go from probabilities taking values in \({}^*[0, 1]\) to probabilities taking values in [0, 1]; and Loeb’s Theorem (Theorem B below) shows how to extend these to the appropriate \(\sigma\)-field of subsets of \({{\mathbb {I}}}\).
3.2 Expansions and unicity
There is a natural embedding of X in \({}^*\!X\): every \(x \in X\) is identified as the equivalence class of the constant sequence \((x, x, x, \ldots )\). The corresponding point in \({}^*\!X\) is still denoted x, that is, \(x = \langle x, x, x, \ldots \rangle \in {}^*\!X\). This leads to the distinction between standard objects and nonstandard objects: if \(x \in X\), then the point \(x = \langle x, x, x, \ldots \rangle \in {}^*\!X\) is called standard; and if \(x = \langle x_1, x_2, x_3, \ldots \rangle \in {}^*\!X\) is not standard, then it is called nonstandard.
The next two results use the properties of \(\mu\) on partitions of the index set \({\mathbb {N}}\) in a central way. The first tells us when to expect new nonstandard objects to exist in \({}^*\!X\), and the second is a ‘unicity’ result.
Lemma 3.2
Every \(x \in {}^*\!A\) is standard iff A is finite.
Proof
Suppose that \(A = \{a_k: k = 1, \ldots , K\}\) is finite and that \(x = \langle x_1, x_2, x_3, \ldots \rangle \in {}^*\!A\). For each \(k = 1, \ldots , K\), define \(E_k:= \{{n \in {\mathbb {N}}}: x_n = a_k\}\). The \(E_k\) form a partition of \({\mathbb {N}}\). Hence one and only one of them, say \(E_{k'}\) has \(\mu (E_{k'}) = 1\), that is, \(x = a_{k'}\).
Suppose now that A is infinite. It contains a set \(\{a_n: {n \in {\mathbb {N}}}\}\) with \(a_n \ne a_m\) for \(n \ne m\). Define \(x = \langle a_1, a_2, a_3, \ldots \rangle\). We have \(x \ne a\) for any \(a \in A\) because \(\{{n \in {\mathbb {N}}}: a_n = a\}\) contains at most 1 element, hence has \(\mu\)-mass 0. \(\square\)
The next result shows that for every \(x \in {}^*[0, 1]\), there is a unique number \(h \in [0, 1]\) such that for all standard \(\epsilon > 0\), \(|h - x| < \epsilon\). Our probabilities will take values in \({}^*[0, 1]\), and this result allows us to change them to probabilities taking values in [0, 1]. This unicity result is part of the ‘genius’ of the \(\mu\)-almost everywhere construction, it shows that even if \(x_n\) is a sequence with \(\liminf _n x_n < \limsup _n x_n\), \(\langle x_1, x_2, x_3, \ldots \rangle\) is as close to a number as one could ever hope.Footnote 9 We state the result for the interval [0, 1], but the argument clearly applies to any interval [a, b].
Lemma 3.3
(Unicity) If \(\mu(\{n \in \mathbb{N}: x_n \in [0, 1]\}) = 1\) then there exists a unique \(h \in [0, 1]\) such that for all standard \(\epsilon > 0\), \(\mu(\{n \in \mathbb{N}: |x_n - h| < \epsilon\}) = 1\).
Proof
For each \(m \in {\mathbb {N}}\), the interval [0, 1] can be covered by the \(2^m + 1\) disjoint half-open dyadic intervals \((-\frac{1}{2^m}, 0]\), \((\frac{0}{2^m}, \frac{1}{2^m}]\), \((\frac{1}{2^m}, \frac{2}{2^m}]\), etc. Denote these as \(A_{m,k}\) and define \(E_{m,k} = \{{n \in {\mathbb {N}}}: x_n \in A_{m,k}\}\). This is a partition of \({\mathbb {N}}\), hence \(\mu (E_{m,k}) = 1\) for exactly one k, denoted k(m). For each m, we have \(A_{m+1,k(m+1)} \subset A_{m,k(m)}\), and as the diameter of \(A_{m,k(m)}\) decreases to 0, there is a unique h in the intersection of the closures of the \(A_{m,k(m)}\). For any standard \(\epsilon > 0\), once \(1/2^m < \epsilon\), we have \(E_{m,k(m)} \subset \{{n \in {\mathbb {N}}}: |x_n - h| < \epsilon \}\) so \(\mu (\{{n \in {\mathbb {N}}}: |x_n - h| < \epsilon \}) = 1\). \(\square\)
3.3 Monads and ‘Limit’ objects in \({\mathbb {R}}\)
The set \({}^*{\mathbb {R}}\) of all \(\sim\)-equivalence classes of sequences in \({\mathbb {R}}^{\mathbb {N}}\) is called the set of hyperreals. We directly extend the relation “<” and the operations of addition and multiplication from \({\mathbb {R}}\) to \({}^*{\mathbb {R}}\) using the “\(\mu\)-almost everywhere” idea as follows: \(\langle x_1, x_2, x_3, \ldots \rangle < \langle y_1, y_2, y_3, \ldots \rangle\) iff \(\mu (\{{n \in {\mathbb {N}}}: x_n < y_n\}) = 1\); \(\langle x_1, x_2, x_3, \ldots \rangle + \langle y_1, y_2, y_3, \ldots \rangle = \langle x_1+y_1, x_2+y_2, x_3+y_2, \ldots \rangle\); and \(\langle x_1, x_2, x_3, \ldots \rangle \cdot \langle y_1, y_2, y_3, \ldots \rangle = \langle x_1 \cdot y_1, x_2 \cdot y_2, x_3 \cdot y_2, \ldots \rangle\). In principle, the relation should be written “\(\,{}^*\!\!<\),” and the operations should be written “\(\,{}^*\!\!+\)” and “\(\,{}^*\!\,\cdot\),” but the notational burden is too high, so we continue to use “<,” “\(+\)” and “\(\,\cdot\).”
Remember that 0 and 1 are the equivalence classes of the sequences constant at 0 and 1 respectively. We define \(x \in {}^*{\mathbb {R}}\) to be strictly positive if \(0 < x\) (i.e. \(\mu (\{{n \in {\mathbb {N}}}: 0 < x_n\}) = 1\)). It is easy to check that properties of \(\mu\) deliver the following elementary facts: if x is strictly positive, then for all \(y \in {}^*{\mathbb {R}}\), \(y < y+x\); \(y + x = y\) for all \(y \in {}^*{\mathbb {R}}\) iff \(x = 0\); and \(y \cdot x = y\) for all \(y \in {}^*{\mathbb {R}}\) iff \(x = 1\).
Our first example of extending a \({\mathbb {R}}\)-valued function on \({\mathbb {R}}\) is the absolute value function, for \(x = \langle x_1, x_2, x_3, \ldots \rangle\), \(|x| = \langle |x_1|, |x_2|, |x_3|, \ldots \rangle\). It is worth bearing in mind the following examples: \(dx = \langle 1, \frac{1}{2}, \frac{1}{3}, \ldots \rangle\) which satisfies \(0< |x| < \epsilon\) for every standard positive \(\epsilon\) as \(\{{n \in {\mathbb {N}}}: 0< |\frac{1}{n}| < \epsilon \}\) has a finite complement and \(\mu (A) = 0\) for all finite sets; \(y = \frac{1}{dx} = \langle 1, 2, 3, \ldots \rangle\) which satisfies \(|y| > B\) for every standard positive B for the same reason; and \(z = \langle z_1, z_2, z_3, \ldots \rangle\) where \(n \mapsto z_n\) is a bounded sequence, hence satisfies \(|z| \le B\) for some standard positive B. By the Unicity Lemma, there is a unique \(h \in {\mathbb {R}}\) such that \(|z-h| < \epsilon\) for all standard \(\epsilon > 0\).
Definition 3.4
The infinitesimals are the \(x \in {}^*{\mathbb {R}}\) that satisfy, for all standard \(\epsilon > 0\), \(|x| < \epsilon\) so that 0 is the only standard infinitesimal; the limited or finite elements are the \(x \in {}^*{\mathbb {R}}\) that satisfy, for some standard \(B > 0\), \(|x| \le B\); and the unlimited or infinite elements are the \(x \in {}^*{\mathbb {R}}\) such that for all standard \(B > 0\), \(|x| > B\).
The infinitesimals are limited because they satisfy e.g. \(|x| \le 1\). Further, the product or sum of two infinitesimals is also infinitesimal because, for arbitrary standard strictly positive \(\epsilon < 1\), if \(|x| < \epsilon\) and \(|y| < \epsilon\) in \({}^*{\mathbb {R}}\), then \(|xy| < \epsilon\) and \(|x+y| < 2 \epsilon\). In more detail, the statement about the absolute value of the sum follows from setting \(N_x = \{{n \in {\mathbb {N}}}: |x_n| < \epsilon \}\) and \(N_y = \{{n \in {\mathbb {N}}}: |y_n| < \epsilon \}\), then noting that \(\mu (N_x) = \mu (N_y) = 1\) implies that \(\mu (N_x \cap N_y) = 1\), and that \(N_x \cap N_y \subset \{{n \in {\mathbb {N}}}: |x_n+y_n| < 2 \epsilon \}\).
For \(x, y \in {}^*{\mathbb {R}}\), we write \(x \simeq y\) if \(x-y\) is infinitesimal, equivalently, if \((x-y) \simeq 0\).
Definition 3.5
For \(r \in {\mathbb {R}}\), the set of \(r' \in {}^*{\mathbb {R}}\) with \(r-r' \simeq 0\) is called the monad of r, written \(\textrm{mon}(r) = \{r' \in {}^*{\mathbb {R}}: r \simeq r'\}\). If \(r' \in \textrm{mon}(r)\) for some \(r \in {\mathbb {R}}\), then we write \(r = \textrm{st}(r')\) or \(r = {}^\circ r'\) and say that r is the standard part of \(r'\) and that \(r'\) is nearstandard.
Lemma 3.4
If it exists, then the standard part of a \(r' \in {}^*{\mathbb {R}}\) is unique.
Proof
From the triangle inequality, \(|r'-r| + |r'-s| \ge |r-s|\), so if \(r \ne s\) are both the standard part of \(r'\), then \(|r-s| \simeq 0\), but the only standard infinitesimal is 0, hence \(r = s\). \(\square\)
We have used the triangle inequality and the relation “\(\ge\)” here. The triangle inequality holds because \(\mu (\{{n \in {\mathbb {N}}}: |x_n - y_n| + |y_n - z_n| \ge |x_n - z_n|\}) = 1\) for any sequences \(n \mapsto x_n\), \(n \mapsto y_n\) and \(n \mapsto z_n\), and the relation \(\ge\) is defined, in what should begin to look like the “usual procedure,” by \(x \ge y\) in \({}^*{\mathbb {R}}\) if \(\mu (\{{n \in {\mathbb {N}}}: x_n \ge y_n\}) = 1\).
The condition that the standard part exists is simple.
Lemma 3.5
A hyperreal \(r' \in {}^*{\mathbb {R}}\) is limited iff \(\textrm{st}(r')\) exists iff \(r'\) is nearstandard.
The argument for this uses the logic of Unicity Lemma above and we sketch it here: if \(r'\) is limited, then for some standard \(B > 0\), \(-B \le r' \le +B\), the interval \([-B, +B]\) can be covered by an increasingly fine sequence of half-open intervals, and \(\textrm{st}(r')\) belongs to the intersection of the closure of a nested subsequence of these intervals; if \(\textrm{st}(r') = r\), then \(r'\) is nearstandard; and if \(r'\) is nearstandard and \(\textrm{st}(r') = r\), then we can take \(B = |r|+1\) to show that \(r'\) is limited.
The idea of infinitesimals has a long and productive history (see e.g. (Robinson 1996, Ch. X)). Intuitively: a function \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is continuous at \(x \in {\mathbb {R}}\) if for every non-zero infinitesimal dx, \(f(x + dx) \simeq f(x)\), and \(f'(x) = r\) if \(\frac{f(x+dx)-f(x)}{dx} \simeq r\). To make these and related ideas precise we need to extend functions on \({\mathbb {R}}\) to functions on \({}^*{\mathbb {R}}\).
3.4 ‘Limit’ functions and ‘Limit’ sets
For any set X, any \(A \subset {X}\) can be enlarged to \({}^*\!A \subset {}^*\!X\) by applying the “\(\mu\)-almost everywhere” construction as follows: \((x_1, x_2, x_3, \ldots ) \in X^{\mathbb {N}}\), \(\langle x_1, x_2, x_3, \ldots \rangle \in {}^*\!A\) iff \(\mu (\{{n \in {\mathbb {N}}}: x_n \in A\}) = 1\). For example, regarding the relation “<” as a subset of \({\mathbb {R}}\times {\mathbb {R}}\), we have \((x,y) \in ({}^*\!\!<)\) iff \(\mu (\{{n \in {\mathbb {N}}}: (x_n,y_n) \in \, < \}) = 1\) iff \(\mu (\{{n \in {\mathbb {N}}}: x_n < y_n\}) = 1\), which agrees with the definition given above.
The graph of a function \(f:X \rightarrow Y\) is a subset of \(X \times Y\), that is \(gr(f) = \{(x,y): x \in X, \ y \in Y, \ \hbox {and} \ y = f(x)\}\). Taking \({}^*\!gr(f)\) as the definition of the function \({}^*\!f: {}^*\!X \rightarrow {}^*\!Y\), we have
If it is clear from context that the domain is \({}^*\!X\) rather than X and the range is \({}^*\!Y\) rather than Y, the function \({}^*\!f\) may be denoted by f.
A sequence in X can be regarded as a function \(n \mapsto x_n\) from \({\mathbb {N}}\) to X. Its extension is a function from \({}^*{\mathbb {N}}\) to \({}^*\!X\), sometimes known as a hypersequence. Limit properties of a sequence are determined by the values of the hypersequence at unlimited elements of \({}^*{\mathbb {N}}\). The leading example of this can be used to give continuity and differentiablity infinitesimal formulations.
Lemma 3.6
For a bounded sequence \(n \mapsto r_n\) in \({\mathbb {R}}\) and \(r, s, t \in {\mathbb {R}}\); \(r \le \liminf _n r_n\) iff for all unlimited \(N \in {}^*{\mathbb {N}}\), \(r \le \textrm{st}(r_N)\); \(s = \lim _n r_n\) iff for all unlimited \(N \in {}^*{\mathbb {N}}\), \(\textrm{st}(r_N) = s\); and \(\limsup _n r_n \le t\) iff for all unlimited \(N \in {}^*{\mathbb {N}}\), \(\textrm{st}(r_N) \le t\).
We will have bounded sequences of utilities \({{\varvec{u}}}\), a population model \({{\mathbb {I}}}= \langle I_1, I_2, I_3, \ldots \rangle\), \(I_n = \{0, 1, \ldots , T_n\}\) and probabilities P on \({{\mathbb {I}}}\). The “\(\mu\)-almost everywhere” construction gives the integral of \({{\varvec{u}}}\) against P the form of a finite sum. Let \({{\varvec{u}}}_{|I_n}\) denote the restriction of the sequence \({{\varvec{u}}}\) to the interval \(I_n\) and let \(\Lambda _n\) denote the uniform distribution on \(I_n\). The average, or integral, of \({{\varvec{u}}}_{|I_n}\) with respect to \(\Lambda _n\) over the set \(I_n\) is \(\frac{1}{T_n+1} \sum _{t=0}^{T_n} u_t\). With \(\Lambda\) denoting the ‘limit’ version of the uniform probability \(\langle \Lambda _1, \Lambda _2, \Lambda _3, \ldots \rangle\), the integral of \({{\varvec{u}}}\) with respect to \(\Lambda\) is \(\textrm{st}(\frac{1}{T+1} \sum _{t=0}^{T} u_t)\) where \(T = \langle T_1, T_2, T_3, \ldots \rangle\) and the summation is defined as an extension using, as usual, the “\(\mu\)-almost everywhere” construction. The argument used for the Unicity Lemma and for Lemma 3.5 shows that this standard part is well-defined.
We hope it is becoming clear that the “\(\mu\)-almost everywhere” construction can be widely applied. Being able to see what is happening without all of the sequences will be a drastic simplification. We now begin to develop the tools that allow it.
3.5 The transfer principle and internal sets
Despite its simplicity, the transfer principle has proved to be immensely useful.
Lemma 3.7
(A Simple Transfer Principle) For set a set X and \(A, B \subset X\), \(A \subset B\) iff \({}^*\!A \subset {}^*\!B\).
Proof
If \(A \subset B\) and \(a = \langle a_1, a_2, a_3, \ldots \rangle \in {}^*\!A\), then \(\mu (\{{n \in {\mathbb {N}}}: a_n \in A\}) = 1\), and since \(A \subset B\), \(\mu (\{{n \in {\mathbb {N}}}: a_n \in B\}) = 1\), so \(a \in {}^*\!B\). If there exists \(x \in A\) with \(x \not \in B\), then \(x = \langle x, x, x, \ldots \rangle \not \in {}^*\!B\). \(\square\)
This is very useful because a formal statement “if \({\mathbb {A}}\) holds, then \({\mathbb {B}}\) holds” can be rewritten as a subset relation, \(A \subset B\). This set theory rewrite has A denoting the set of instances in which the statement “\({\mathbb {A}}\)” holds and B the set of instances in which the statement “\({\mathbb {B}}\)” holds. Viewed this way, the transfer principle says, loosely, that, “a statement is true in the standard model iff the corresponding statement ‘with stars everywhere’ is true in the nonstandard model.”
Most often, the sets A and B will themselves be collections of sets. The subtlety arises with the need to account for the implications of having the “stars” before A and B. An example makes the point.
Example 3.1
Let \({\mathcal {A}}\) denote the class of non-empty subsets of \({\mathbb {R}}\) that are bounded below, and let \({\mathcal {B}}\) denote the class of non-empty subsets of \({\mathbb {R}}\) that have a greatest lower bound (aka \(\inf\)). The statement that a non-empty subset of \({\mathbb {R}}\) that is bounded below has a greatest lower bound is \({\mathcal {A}}\subset {\mathcal {B}}\).
Let \(A \subset {}^*{\mathbb {R}}\) denote the set of standard numbers in \({}^*{\mathbb {R}}\) that are strictly positive; it is bounded below, but it does not have a greatest lower bound in \({}^*{\mathbb {R}}\): no strictly positive standard \(\epsilon > 0\) is a lower bound, any strictly positive \(\epsilon \simeq 0\) is a lower bound, but so is \(2 \cdot \epsilon\). Hence to claim \(A \subset {}^*{\mathcal {B}}\) will be a mistake.
The mistake is due to the fact that the set A in Example 3.1 does not belong to \({}^*\!{\mathcal {A}}\), i.e. it is not of the form \(\langle A_1, A_2, A_3, \ldots \rangle\) for a sequence of subsets of \({\mathbb {R}}\). There is a name for the particular kind of sets needed to avoid such mistakes. The set A in the previous example is not an internal subset of \({}^*{\mathbb {R}}\), and the transfer principle, in its statement, only concerns the internal sets.
Definition 3.6
If X is a set and \({\mathcal {P}}(X)\) is the class of all subsets of X, then the internal subsets of \({}^*\!X\) are the elements of \({}^*\!{\mathcal {P}}(X)\).
In terms of the \(\mu\)-almost everywhere construction, A is an internal subset of \({}^*\!X\) iff \(A = \langle A_1, A_2, A_3, \ldots \rangle\) where \(\mu (\{{n \in {\mathbb {N}}}: A_n \in {\mathcal {P}}(X)\}) = 1\). For us, the most usesful class of internal sets are the hyperfinite ones.
Definition 3.7
If X is a set and \({\mathcal {P}}_{Fin}(X)\) is the class of non-empty finite subsets of X, then an internal subset of \({}^*\!X\) is hyperfinite if it belongs to \({}^*\!{\mathcal {P}}_{Fin}(X)\).
Taking \(X = {{\mathbb {N}}_0}\), our population models are the hyperfinite sets of the form \({{\mathbb {I}}}= \langle I_1, I_2, I_3, \ldots \rangle\) where \(I_n = \{0, 1, \ldots , T_n\}\) and \(T_n \rightarrow \infty\). Two properties of such hyperfinite sets play a role below. First, for any \(t \in {{\mathbb {N}}_0}\), \(\mu (\{{n \in {\mathbb {N}}}: t \in I_n\}) = 1\), that is, our model of the infinite population contains each and every \(t \in {{\mathbb {N}}_0}\). And second, despite containing every \(t \in {{\mathbb {N}}_0}\), \({{\mathbb {I}}}\) “acts like” a finite set, e.g. there is an integer \(N \in {}^*{\mathbb {N}}\) and bijection between \(\{1, \ldots , N\}\) and \({{\mathbb {I}}}\). This follows by transfer: let \({\mathcal {A}}\) denote the subsets \(A \subset {{\mathbb {N}}_0}\) for which there is a bijection between A and some initial segment \(\{1, \ldots , N\}\) and let \({\mathcal {B}}\) denote the finite non-empty subsets of \({{\mathbb {N}}_0}\); we have \({\mathcal {A}}\subset {\mathcal {B}}\subset {\mathcal {A}}\), and by transfer, \({}^*\!{\mathcal {A}}\subset {}^*\!{\mathcal {B}}\subset {}^*\!{\mathcal {A}}\).
3.6 Probabilities on hyperfinite sets
Before developing probabilities on hyperfinite sets, we review the fundamental properties of probabilites. The domain of a probability P is the class of sets B for which the probability, P(B), is defined. The domain is always a field, and most often, it has an additional property that makes it a \(\sigma\)-field.
Definition 3.8
An \({\mathcal {F}}\subset {\mathcal {P}}(\Omega )\) is a field if satisfies (a), (b), and (c), and if it also satisfies (d), then it is a \(\sigma\)-field.
-
(a)
\(\emptyset , \Omega \in {\mathcal {F}}\).
-
(b)
For \(A \in {\mathcal {F}}\), \(A^c \in {\mathcal {F}}\).
-
(c)
For any finite \(\{A_k: n = 1, \ldots , K\} \subset {\mathcal {F}}\), \(\cup _{k=1}^K A_k \in {\mathcal {F}}\).
-
(d)
For any countable \(\{A_k: k \in {\mathbb {N}}\} \subset {\mathcal {F}}\), \(\cup _{k \in {\mathbb {N}}} A_k \in {\mathcal {F}}\).
We know that \((\cup _i E_i)^c = \cap _i E_i^c\) and that a field is closed under complements. Hence, the assumption of closure under finite or countable unions could as well be written as closure under finite or countable intersections.
Lemma 3.8
If A is an internal subset of \({}^*\!X\) for some set X, then the class of internal subsets of A form a field.
We will soon show that the class of internal sets is not a \(\sigma\)-field except when X is finite and the distinction between a field and a σ-field is moot. The following set difference and symmetric set difference will be of use.
Definition 3.9
The set difference of sets \(A, B \subset X\) is denoted \(A {\setminus } B\) and defined as \(\{x \in X: x \in A, \ x \not \in B\}\), i.e. as \(A \cap B^c\). The symmetric difference of two sets A and B is denoted \(A \Delta B\) and defined as \((A \cap B^c) \cup (B \cap A^c) = (A {\setminus } B) \cup (B {\setminus } A)\).
It is the closure of a \(\sigma\)-field under countable unions and intersection that guarantees that limit events have probabilities. For example, the strong law of large numbers is the statement that with probability 1, the sample average of an i.i.d. sequence will converge to the theoretical average. The point is that we must assign probability to the event that the sample average converges, and this event is only expressible using countable unions and intersections.
There is a property of probabilities, countable additivity, that is complementary to the domain being a \(\sigma\)-field.
Definition 3.10
For a \(\sigma\)-field \({\mathcal {F}}\) of subsets of a set \(\Omega\), a finitely additive probability is a function \(P:{\mathcal {F}}\rightarrow [0, 1]\) that satisfies (a) and (b), and if it also satisfies (c), then it is a countably additive probability.
-
(a)
\(P(\Omega ) = 1\).
-
(b)
For finite disjoint collections \(\{A_k: k = 1, \ldots K\} \subset {\mathcal {F}}\), \(P(\cup _{k=1}^K A_k = \sum _{k=1}^K P(A_k)\).
-
(c)
For countable disjoint collections \(\{A_n: {n \in {\mathbb {N}}}\} \subset {\mathcal {F}}\), \(P(\cup _{{n \in {\mathbb {N}}}} A_n) = \sum _{{n \in {\mathbb {N}}}} P(A_n)\).
The vast majority of probability theory work uses countably additive probabilities on \(\sigma\)-fields. Without countable additivity, basic limit results such as the weak and the strong law of large numbers, or even the Borel-Cantelli lemma do not hold. To do the work we intend to with our model, countable additivity and \(\sigma\)-fields are needed.
3.7 Saturation and internal sets
In \({\mathbb {R}}\), the intersection of the nested sequence of sets \((0, \frac{1}{k})\), \(k \in {\mathbb {N}}\), is empty. By contrast, in \({}^*{\mathbb {R}}\), the intersection of the nested sequence of internal sets \({}^*(0, \frac{1}{k})\), \(k \in {\mathbb {N}}\), is the non-empty set of strictly positive infinitesimals. More generally, the countable intersection of a decreasing sequence of internal sets is always non-empty.
Theorem A
If \(A^1 \supset A^2 \supset \ldots \supset A^k \supset \ldots\) is a decreasing sequence of of non-empty internal subsets of an internal set \({}^*\!X\), then
This is called the countable saturation property of internal sets. There are strong parallels between non-empty internal subsets of an internal set and non-empty compact subsets of a metric space. If the \(A^k\) were a nested sequence of non-empty compact subsets of a metric space, then the same non-empty intersection conclusion would hold. One can see a bit more of the parallel in the proofs found in the literature, references are provided in the appendix. It is a variant of the diagonalization argument used to show that countable products of compact metric spaces are compact.
For our purposes, the following two implications of Theorem A will be most useful.
Corollary A.1
(Spillover) Suppose that A is an internal subset of \({}^*{{\mathbb {N}}_0}\). If A contains arbitrarily large limited integers, then it contains an unlimited integer, and if A contains arbitrarily small unlimited integers, then it contains a limited integer.
The proof of the first part uses countable saturation on the sets \(B^k:= \{t \in A: t \ge k\}\), the proof of the second part uses transfer on the statement that every non-empty subset of \({{\mathbb {N}}_0}\) contains its lower bound. The next Corollary gives the sense in which the field of internal subsets of \({{\mathbb {I}}}\) is as far as possible from being a \(\sigma\)-field.
Corollary A.2
If \(\{A^k: k \in {\mathbb {N}}\}\) is a nested decreasing collection of internal subsets of an internal set \({}^*\!X\), then \(\bigcap _{k \in {\mathbb {N}}} A^k\) is internal iff it is equal to \(\bigcap _{k \le K} A^k\) for some \(K \in {\mathbb {N}}\), and if \(\{A^k: k \in {\mathbb {N}}\}\) is a nested increasing collection of internal sets, then \(\bigcup _{k \in {\mathbb {N}}} A^k\) is internal iff it is equal to \(\bigcup _{k \le K} A^k\) for some \(K \in {\mathbb {N}}\).
3.8 ‘Limit’ probabilities and Loeb measures
We find ourselves in the following situation: we have an internal set \({{\mathbb {I}}}= \langle I_1, I_2, I_3, \ldots \rangle\): we have internal \({}^*[0, 1]\)-valued probabilities on \({{\mathbb {I}}}\), \(P = \langle P_1, P_2, P_3, \ldots \rangle\) where each \(P_n\) is a probability on the finite set \(I_n\); and for each internal \(A = \langle A_1, A_2, A_3, \ldots \rangle\) in the field, not \(\sigma\)-field, of internal subsets of \({{\mathbb {I}}}\), we have the [0, 1]-valued, finitely additive probability \(P(A) = {}^\circ \langle P_1(A_1), P_2(A_2), P_3(A_3), \ldots \rangle\) (well-defined by the Unicity Lemma). The finite additivity of \(P(\cdot )\) arises because each \(P_n\) defining P is a probability on a finite set. But \(P(\cdot )\) is not countably additive on a \(\sigma\)-field of sets because it is not defined for anything but internal sets, and Corollary A.2 tells us that the class of internal sets is not a \(\sigma\)-field. It was Loeb’s pioneering Loeb (1971), Loeb (1975) that allowed us to take a finitely additive probability P on a field of internal sets and extend it to a countably additive probability, still denoted P, on the \(\sigma\)-field generated by the field of internal sets.Footnote 10
Theorem B
(Loeb) If X is an internal set, \(\mathcal{X}^{\,\circ}\) is the field of internal subsets of X, \({\mathcal {X}}\) is the \(\sigma\)-field generated by \(\mathcal{X}^{\,\circ}\), and \(P:\mathcal{X}^{\,\circ} \rightarrow [0, 1]\) is a finitely additive probability, then
-
(1)
P has a unique countably additive extension, also denoted P, from the \(\mathcal{X}^{\,\circ}\) to \({\mathcal {X}}\), and
-
(2)
for any \(B \in {\mathcal {X}}\), there is an internal \(B^\circ \in \mathcal{X}^{\,\circ}\) with \(P(B \Delta B^\circ ) = 0\).
Corollary A.2 told us that the class of internal sets fails to be a \(\sigma\)-field by not containing any countable unions or intersections that are not also finite unions or intersections. This seems to be saying that the field of internal sets is “as far from” being a \(\sigma\)-field as possible. But the last part of Theorem B tells us that for probability theory, the difference does not matter, and the distance is “as small as possible.”
Fix a hyperfinite set \({{\mathbb {I}}}\), let \(\mathcal{I}^{\,\circ}\) denote the field of internal subsets of \({{\mathbb {I}}}\) and \({\mathcal {I}}\) the \(\sigma\)-field generated by \(\mathcal{I}^{\,\circ}\). The uniform distribution on the measure space \(({{\mathbb {I}}}, {\mathcal {I}})\) is the Loeb extension of the finitely additive probability \(\Lambda (A) = {}^\circ (\#A/\#{{\mathbb {I}}})\).Footnote 11 If \({{\varvec{u}}}:{{\mathbb {I}}}\rightarrow {\mathbb {R}}\) is a measurable functions, i.e. \({{\varvec{u}}}^{-1}((-\infty , r]) \in {\mathcal {I}}\), then it induces the distribution having the cdf \(F_{{{\varvec{u}}}}(r):= P(\{t \in {{\mathbb {I}}}: {{\varvec{u}}}(t) \le r\})\).
The following example of the \(\mu\)-almost everywhere construction is central to what we do.
Definition 3.11
An internal bijection on \({{\mathbb {I}}}= \langle I_1, I_2, I_3, \ldots \rangle\) is the equivalence class \(\pi = \langle \pi _1, \pi _2, \pi _3, \ldots \rangle\) where \(\mu (\{{n \in {\mathbb {N}}}: \pi _n:I_n \leftrightarrow I_n \ \hbox {is a bijection}\}) = 1\).
If \(\pi :{{\mathbb {I}}}\leftrightarrow {{\mathbb {I}}}\) is an internal bijection, then B and \(\pi (B)\) have the same cardinality for any internal \(B \in \mathcal{X}^{\,\circ}\), hence \(\Lambda (B) = \Lambda (\pi ^{-1}(B))\). Using Theorem B(2), this extends to \(B \in {\mathcal {X}}\), implying that the measurable \(t \mapsto {{\varvec{v}}}(t):= {{\varvec{u}}}(\pi (t))\) induces the same distribution as \({{\varvec{u}}}\). The reverse is also true.
Jerome Keisler (1984) systematically extended Anderson’s (1976) hyperfinite treatment of Brownian motion to a hyperfinite treatment of more general stochastic processes. We will borrow a representation result that he developed for the solutions to stochastic differential equations and adapt it to hyperfinite population models.
Definition 3.12
A probability space \((\Omega , {\mathcal {F}}, P)\) is homogenous if two random variables \(X, Y:\Omega \rightarrow {\mathbb {R}}\) induce the same distribution iff there is a measurable bijection \(\pi :\Omega \leftrightarrow \Omega\) such that \(X = Y \circ \pi\) P-almost everywhere.
The following is (Jerome Keisler 1984, Theorem 9.2, p. 134), and it delivers the ‘limit’ version the homogeneity property discussed for finite sets in §1.
Theorem C
(Keisler) The probability space \(({{\mathbb {I}}}, {\mathcal {I}}, \Lambda )\) is homogenous and the bijections (in Definition 3.12) can be taken to be internal.
The uniform distribution in this result can be changed and still have homegeneity, but the change must be infinitesimal. One can show that if P is an internal \({}^*[0, 1]\)-valued probability on the internal subsets of \({{\mathbb {I}}}\), then, with P denoting its extension to \({\mathcal {I}}\), \(({{\mathbb {I}}}, {\mathcal {I}}, P)\) is homogenous iff \(\sum _t |P(t) - \Lambda (t)| \simeq 0\). In particular, the extension must equal \(\Lambda\).
An internal \(\Lambda\)-almost everywhere bijection if it is a bijection except perhaps on an internal set of exceptions, \(E = \langle E_1, E_2, E_3, \ldots \rangle\), with \(\Lambda (E) = 0\). One could as easily state Keisler’s theorem using \(\Lambda\)-almost everywhere bijections instead of asking that \({{\varvec{v}}}= {{\varvec{u}}}\circ \pi\) on a set having \(\Lambda\)-mass 1.
3.9 Invariance with respect to internal bijections
Utility allocations for the population model \(({{\mathbb {I}}}, {\mathcal {I}}, \Lambda )\) are bounded, \({\mathcal {I}}\)-measurable functions \({{\varvec{u}}}:{{\mathbb {I}}}\rightarrow [0, \infty )\). The set of all bounded, measurable utility allocations is denoted \({{\varvec{W}}}_{{\mathbb {I}}}\). The \(L_\infty\)-norm is defined by \(\Vert {{\varvec{u}}}\Vert _\infty = \inf \{r \ge 0: \Lambda (\{t \in {{\mathbb {I}}}: |{{\varvec{u}}}(t)| \le t\}) = 1\}\), and the associated distance is \(d({{\varvec{u}}}, {{\varvec{v}}}) = \Vert {{\varvec{u}}}- {{\varvec{v}}}\Vert _\infty\). The domain for preferences is \({\mathcal {M}}_{{\mathbb {I}}}\), the set of countably additive Borel measures, q, on \({{\varvec{W}}}_{{\mathbb {I}}}\) that put mass 1 on norm bounded sets, \(q(\{{{\varvec{u}}}: \Vert {{\varvec{u}}}\Vert \le B\}) = 1\) for some B.
We impose the following on a preference relation \(\succsim\) on \({\mathcal {M}}_{{\mathbb {I}}}\) with strict preference \(\succ\).
-
Postulate I.
Weak Order. \(\succ\) is an asymmetric weak order.
-
Postulate II.
Independence. For all \(p,q,r \in {\mathcal {M}}_{{\mathbb {I}}}\) and all \(\alpha \in (0, 1)\), if \(p \succ q\), then \(\alpha p + (1-\alpha ) r \succ \alpha q + (1-\alpha ) r\).
-
Postulate III.
Continuity. For all \(q \in {\mathcal {M}}_{{\mathbb {I}}}\), the sets \(\{p \in {\mathcal {M}}_{{\mathbb {I}}}: p \succ q\}\) and \(\{p \in {\mathcal {M}}_{{\mathbb {I}}}: p \prec q\}\) are open.
-
Postulate IV.
Inequality aversion. For any \({{\varvec{u}}},{{\varvec{v}}}\in {{\varvec{W}}}_{{\mathbb {I}}}\) and any \(0< \alpha < 1\), the distribution putting mass 1 on \(\alpha {{\varvec{u}}}+ (1-\alpha ){{\varvec{v}}}\) is weakly preferred to the distribution putting mass \(\alpha\) on \({{\varvec{u}}}\) and \((1-\alpha )\) on \({{\varvec{v}}}\).
-
Postulate V.
Monotonicity. If \({{\varvec{u}}}\ge {{\varvec{v}}}\) and \(\Lambda (\{{{\varvec{u}}}> {{\varvec{v}}}\}) > 0\), then \({{\varvec{u}}}\succ {{\varvec{v}}}\).
-
Postulate VI.
Strong equity. For any internal bijection \(\pi :{{\mathbb {I}}}\leftrightarrow {{\mathbb {I}}}\), \({{\varvec{u}}}\sim {{\varvec{u}}}^\pi\).
The first four Postulates are directly from Fishburn’s (1982, Theorem 4, Ch. 3) work on expected utility preferences over distributions on convex subsets of vector spaces. On their own, they guarantee that preferences have a continuous, concave expected utility representation. The monotonicity assumption guarantees Pareto responsiveness (see Sect. 2.4 above).
The strong equity assumption is the essential ingredient. It guarantees that the enjoyments of later generations are equally weighted with the enjoyments of earlier ones.Footnote 12
Theorem D
A preference relation \(\succsim\) on \({\mathcal {M}}_{{\mathbb {I}}}\) satisfies Postulates I - VI if and only if there exists a \(S:{{\varvec{W}}}_{{\mathbb {I}}}\rightarrow [0, \infty )\) such that \([p \succsim q] \Leftrightarrow [\int S({{\varvec{u}}})\,dp({{\varvec{u}}}) \ge \int S({{\varvec{u}}})\,dq({{\varvec{u}}})]\) where \(S({{\varvec{u}}}) = \int \varphi (u_t)\,d\Lambda (t)\) with \(\varphi :[0,\infty ) \rightarrow [0, \infty )\) a continuous, increasing, concave function and \(\Lambda\) the uniform distribution on \(\mathbb{I}\).
Two comments are in order.
The function \(\varphi (\cdot )\) is not uniquely determined. Rather, it captures the inequality aversion of the social welfare function.Footnote 13 Two of the applications in Sect. 4 investigate the implications of differing degrees of inequality aversion. The first finds that more inequality aversion implies that optimal efforts to avoid or to recover from future disasters increase as \(\varphi (\cdot )\) becomes more concave. The second, Lemma 4.4, shows that (one form of) the repugnant conclusion holds in fewer instances when there is more inequality aversion.
The tractable class of preferences identified in KS were of the form \(S({{\varvec{u}}}) = \int \varphi (u_t)\,dQ(t)\) where \(Q = \langle Q_1, Q_2, Q_3, \ldots \rangle\) is the limit population measure associated with a sequence of probabilities \(Q_n\) having the property that \(\lim _n \sum _{t=0}^\infty |Q_n(t+1) - Q_n(t)| = 0\). Here Postulate VI imposes the restriction that the measure Q must be the limit of uniform distributions on intervals, thus what was a special case of the subclass of tractable preferences identified in KS becomes the only possible form.
3.10 Ergodicity
For many applications, the outcomes are random, but still regular in the following sense.
Definition 3.13
A stream of utilities, \({{\varvec{u}}}\), is ergodic with occupation measure \(\nu (\cdot |{{\varvec{u}}})\) if for any sequence \(T_n \rightarrow \infty\), the empirical cdfs of the utilities up till \(T_n\),
converge weakly to \(\nu (\cdot |{{\varvec{u}}})\), i.e. for all bounded continuous f,
The Hardy-Littlewood Tauberian theorem tells us that if \({{\varvec{u}}}\) is a bounded sequence of numbers, then \(\lim _{\beta \uparrow 1} (1-\beta ) \sum _{t=0}^\infty u_t \beta ^t\) exists if and only if \(\lim _{T \uparrow \infty } \frac{1}{T+1} \sum _{t=0}^T u_t\) exists, and when they exist, the limits are equal. Combined with the nonstandard characterizations of \(\liminf _n r_n\), \(\limsup _n r_n\), and \(\lim _n r_n\) given in Lemma 3.6, we have the following result. For \(\beta \in {}^*[0, 1)\), \(Q_\beta\) is the geometric distribution on \({{\mathbb {N}}_0}\) with parameter \(\beta\), i.e. \(Q_\beta (t) = (1-\beta ) \beta ^t\).
Lemma 3.9
The following are equivalent:
-
(a)
\({{\varvec{u}}}\) is ergodic with occupation measure \(\nu\);
-
(b)
for any \(Q_\beta\), \(\beta \simeq 1\) and any measurable \(E \subset {\mathbb {R}}\), \(Q_\beta (\{t: u_t \in E\}) = \nu (E)\); and
-
(c)
for any uniform distribution \(\Lambda\) on a hyperfinite interval and any measurable \(E \subset {\mathbb {R}}\), \(\Lambda (\{t: u_t \in E\}) = \nu (E)\).
For stochastic dynamic programming problems with outcomes that belong to \({{\textbf {Erg}}}\) with probability 1, this result tells us that maximizing limit discounted sums and maximizing limit average payoffs will yield the same policies. We will see this at work in the first two applications in Sect. 4.
3.11 Stochastic population models
Suppose now that for each \(t \in \{0, 1, \ldots , T\}\), there is a newly born population of size \(I_t\), T and each \(I_t\) an unlimited integer. Each i in the cohort \(I_t\) lives from t to \(t+A_i\), \(A_i\) random.Footnote 14 Policies affect the joint distribution of the \(I_t\), the \(A_i\), and other aspects of the quality of life for \(i \in I_t\). These determine the utility \(u_{i,t}\) for each \(i \in I_t\). To extend the social welfare functions of Theorem D to this class of population models, we replace the \({{\varvec{u}}}= (u_0, u_1, u_2, \ldots , u_T)\), T an infinite integer, with
If we take \({{\mathbb {I}}}\) to be the union of the populations \(I_t\) and P to the be uniform distribution on \({{\mathbb {I}}}\), Theorems B and D apply directly. Now however, the permutations can switch people in the same cohort, or switch people across cohorts. In this setting, equity requires invariance with respect to such permutations.Footnote 15
There is a long-standing distinction between studies of equity that focus on current, intra-generational issues and those that focus on future, inter-generational issues (e.g. Tremmel’s introduction in Tremmel (2018)). In this model, invariance with respect to this class of permutations mixes these considerations into a single framework. This allows, for example, investigations of how wide expansions of access to resources in the current generation, as currently happening in Bangladesh, China and India, can lead to changes in intergenerational allocations of welfare. For the applications in the next section, we will work with the hyperfinite interval population models \({{\mathbb {I}}}= \{0, 1, 2, \ldots , T\}\), T unlimited. At the cost of adapting the models to the cohort formulations in (13), similar analyses can be done.
In stochastic models, policies affect the set of future people who are born. This leads to the interpretational question about what the permutations mean when they may apply to different people.Footnote 16 Our interpretation comes from the Dietz and Asheim (2012) ex post approach. In the end, some set of people will be born into and live in different settings. We require that, conditional on whatever the set of people being born ends up being, our welfare criterion is immune to permuting who gets what within that set. This highlights an uncomfortable aspect of all of the social welfare functions with permutation invariance: cross-generational permutations are purely fictional. Just as the choices of future generations do not lead to external effects on current generations, there is no way to permute people across generations.
4 Applications
We give two environmental and one political economy application of the social welfare functions \(S_\varphi ({{\varvec{u}}}) = \int \varphi (u_t)\,d\Lambda (t)\), \(\Lambda\) the uniform on \({{\mathbb {I}}}\). The first application studies the \(S_\varphi\)-optimal level of risk exposure to an irreversible, negative change, and finds that for our equitable preferences, indeed for all of the patient utility functions in the literature that can be applied to stochastic models, none of the early generations should ever run the risk. By contrast, with any finite level of discounting, the optimal cumulative risk guarantees that the irreversible change will happen, and that the future will be, by this measure, impoverished.
The second application studies the \(S_\varphi\)-optimal level efforts to be made to avoid long-lasting and expensive-to-reverse decisions. Here, the degree of inequality aversion encoded in the concavity of \(\varphi (\cdot )\) determines the appropriate levels of sacrifice for the future, and increases in the degree of inequality aversion increase the optimal levels. This is in sharp contrast to the optimal policies for the ‘Rawlsian’ social welfare function, but agrees with many of the other patient preferences. The ‘Rawlsian’ social welfare functions are infinitely inequality averse, they depend only on the welfare of the worst off. Policies that maximize the utility of the worst off generations may require no sacrifice at all for the benefit of future generations.
The third application studies versions of two much-discussed conclusions in population ethics—the ‘sadistic conclusion’ and the ‘mere addition paradox’. Higher levels of inequality aversion in \(\varphi (\cdot )\) mean that there are fewer instances in which \(S_\varphi\)-optimal choices are those that may lead to these counterintutive conclusions.
4.1 Other patient/equitable preferences
We will use the tools developed here to examine the performance of several previous proposals for patient, or intergenerationally equitable, social welfare preferences. The KS preferences include inequality aversion over generational utilities. With the exception of the Rawslian preferences, which are effectively infinitely inequality averse, the others work off of variants of limit average utilities.
Theorem C in KS shows that for the following 4 social welfare functionals, \(S_1, \ldots , S_4\), there are sets of translation invariant probabilities \(TI(1) --- TI(4)\) such \(S_k({{\varvec{u}}}) = \min _{Q \in TI(k)} \int {{\varvec{u}}}\,dQ\). This result allows us to more easily examine optimal policies for these preferences.Footnote 17
-
(1)
Limits of discounted utility, \(S_{1}({{\varvec{u}}}) = \liminf _{\beta \uparrow 1} (1-\beta ) \sum _{t=0}^\infty u_t \beta ^t\).
-
(2)
Tail patient payoffs, \(S_2({{\varvec{u}}}) = \liminf _{T \uparrow \infty } \inf _{j \ge 0} \frac{1}{T+1} \sum _{t=0}^T u_{j+t}\).
-
(3)
\(\epsilon\)-tail patient payoffs, \(S_3({{\varvec{u}}}) = \liminf _{\epsilon \downarrow 0} \liminf _{T \rightarrow \infty } \frac{1}{\epsilon T} \sum _{t=(1-\epsilon )T}^T u_t\).
-
(4)
Liminf average payoffs, \(S_4({{\varvec{u}}}) = \liminf _T \frac{1}{T+1} \sum _{t=0}^T u_t = S_{{\mathfrak {U}}}({{\varvec{u}}})\).
The preferences in (1) have been extensively used in the analysis of Folk Theorems in game theory (e.g. (Fudenberg and Tirole 1991, Ch. 5, §1)). The preferences in (4) have been extensively used in operations research applications, and (2) and (3) are variants of these preferences that look to put more weight on the far distant tail/future.Footnote 18 There are two more social welfare functions that been used in the economic theory literature on patient or intergenerationally equitable preferences.Footnote 19
(5) ‘Rawlsian’ preferences, \(S_5({{\varvec{u}}}) = \inf _t u_t\).
(6) Long run ‘Rawslian’ preferences, \(S_6({{\varvec{u}}}) = \liminf _t u_t\).
These social welfare functions pay no attention to generations whose utility is above \(\inf _t u_t\) or above \(\liminf _t u_t\). For applications, this will matter.Footnote 20
4.2 Species extinction tipping points
We give a very simplified fishery model in which different generations balance current exploitation of a resource against the risks of extinction.
Example 4.1
Suppose there are two possible states, f and e, corresponding to a fishery being viable and the fish being extinct. We suppose further that the sets of available actions to each generation are \(A(f) = [0, 1]\) if the fishery is viable, and \(A(e) = \{0\}\) if the fish are extinct. Generational utilities after extinction are \(u(e,0) = 0\). In the viable state, f, the choice of \(a \in [0, 1]\) corresponds to the degree of current exploitation of the resource. We assume that \(u(f,a) > 0\) and that \(\partial u(f,a)/\partial a > 0\). But higher actions also make it more likely that the fish will become extinct. Extinction is absorbing, \(p_{e,e}(1) = 1\). Assume that the probability of extinction (moving from f to e) as a function of a is given by
where \(g(0) = 0\), \(g(\cdot )\) is positive and strictly increasing, and \(g'(0) = 0\).Footnote 21
4.2.1 The unique patient outcome avoids extinction
Extraction at the rate \(a^\circ\) keeps the fishery safe. For this sustainable policy, the population utility, as measured by \(S_\varphi (\cdot )\), is the utility associated with permanent repetition of the per generation utility \(\varphi (u(a^\circ ))\). The question is, can one do better? The answer is, “No,” and the answer is the same for any of the patient preferences given in (1)-(6) above.Footnote 22
Lemma 4.1
For the \(S_\varphi (\cdot )\) preferences, and for any of the social welfare functions (1)-(6) given above, in any optimal policy, the social welfare is bounded above by the per period welfare of the long-run sustainable policy \(a_t \equiv a^\circ\).
4.2.2 Disastrous discounting
Using any standard discount factor \(\beta < 1\) to discount future utilities leads to ‘optimal’ policies that destroy the fishery, an outcome that minimizes any of the patient/equitable social welfare functions.Footnote 23 There is a discontinuity between the occupation measures: the fish are long-run extinct with standard discounting; and the fish are long-run viable for any of the patient preferences. However, the discontinuity is a bit less sharp than this seems to imply. As \(\beta \uparrow 1\), the associated random time until extinction, \(\tau _\beta\), has the property than \(Prob(\tau _\beta > N) \rightarrow 1\) for all standard N.
Lemma 4.2
For any standard discount factor \(\beta < 1\), maximizing the welfare function \((1-\beta )\sum _{t=0}^\infty \varphi (u_t) \beta ^t\) leads to certain extinction, which minimizes any of the patient/equitable social welfare functions, but as \(\beta \uparrow 1\), \(Prob(\tau _\beta > N) \rightarrow 1\) for all standard \(N\).
There is a fundamental asymmetry between present and future generations. The actions of the current generation impose risks on future generations, but there are no risks that the future generations can impose on the present generation. As Dierksmeier (2006) argues, “Rawls’ attempt to derive the notion of rights out of a conception of reciprocal arrangements to enhance the individuals’ self-interests \(\ldots\) cannot provide a satisfactory foundation for the rights of future generations.” Basic properties of geometric growth show that any standard level of discounting very strongly downweights the far future. The present optimal actions for a discounted social welfare function involve risks that no-one in the future would tolerate, if only they had an effective method to protest.
4.3 World-wide climate catastrophe
The previous application studied extinction, and extinction is forever. We now turn to the study of changes that are reversible, but only at great cost, this in a drastically simplified model in which the richness of the biosphere is a crucial ingredient to human welfare.Footnote 24
Example 4.2
The world’s ecosystem can be in a livable state, L, or a crippled state, C. In C, the seas, forests and the biota that survive are unable to produce oxygen and resources in the amounts humans have become accustomed to. In L, the seas and forests are able to produce oxygen concentrations and resources that can support life as we currently know it. Payoffs and actions capture the following tradeoffs: a generation in L can sacrifice present utility in order to lower the transition probability, r, from L to C, notationally \(u_L'(r) > 0\); and a generation in C can sacrifice present utility to increase the transition probability, s, back from C to L, notationally \(u_C'(s) < 0\). We assume that \(\min _r u_L(r) \gg \max _s u_C(s)\), that is, it is much worse to be living on a planet with a crippled ecosystem.
4.3.1 Analysis
There is no loss in examining stationary policies, those specified by the pair (r, s). Stationary policies lead to a Markov chain of outcomes with the steady state distribution spending \(\frac{s}{r+s}\) of the time in L and \(\frac{r}{r+s}\) of the time in C. Because the outcome is Markovian, it can be shown that the problems of maximizing the preferences given in (1), (2), (3), and (4) above reduce to maximizing long run average utility,Footnote 25
In a similar fashion, the problem of maximizing \(S_\varphi (\cdot )\) reduces to
The FOCs for interior solutions for the first utility functions are
and for \(S_\varphi (\cdot )\), they are
To interpret these: on the left-hand sides, s is the probability of transitioning from C to L in a generation while \(\left[ u'_L(r) - u'_C(s) \right]\) or \(\left[ \varphi (u_L(r)) u'_L(r) - \varphi (u_C(s)) u'_C(s) \right]\) is the difference of two terms, the marginal benefit of the current activities that tip the world toward disaster, and the marginal cost of the activities that make it possible for future generations to recover; the right-hand sides are the difference in per generation utility between being in the livable state and the crippled state.
Because \(\varphi (\cdot )\) is concave, comparing (17) to (18) shows that, since the weight on the \(u'_L(r)\) term relative to the \(u'_C(s)\) term declines, we expect the optimal r to be smaller and s to be larger when the social welfare function dislikes inequality. Put simply, more inequality aversion in the intergenerational utility function means that optimal policies make more effort to avoid inequality. We suspect that in many sensibly calibrated versions of this model, if there are non-boundary limits on s and r, then the solutions will involve s being as high as possible and r as low. Our preliminary investigation of boundary solutions indicates that the same basic analysis holds, although the comparitive statics increases in s or decreases in r are replaced by higher Lagrangean multipliers on the constraints. This too is useful information, the higher the multiplier on a constraint, the more important it is to loosen it.
4.3.2 Disastrous ‘Rawlsian’ preferences
In this model, the patient utility functions call for current sacrifice to avoid hurting future generations. The “maximize the welfare of population’s worst off” encompasses an extreme aversion to inequality and is often understood as a morally sound directive. In this context, the implied complete disregard of everyone who is not worst off is puzzling, and in this class of models, it can lead to disastrous policy recommendations. Disasters are recurrent if the minimal possible r and s are strictly positive, and maximizing the utility functions \(S_{5}({{\varvec{u}}}) = \inf _t u_t\) and \(S_{6}({{\varvec{u}}}) = \liminf _t u_t\) in recurrent versions of the model has the following implications.
Lemma 4.3
If the minimal possible r and s are strictly positive, then optimal policies for either \(S_{5}(\cdot )\) or \(S_{6}(\cdot )\) involve no sacrifice in state C, and any feasible choice is optimal in state L.
The intuition is simple: the crippled state will happen infinitely often with probability 1; these social welfare functions only care about the utility of generations in that state; hence any choice is optimal in state L; and to maximize the utility of the generations in state C, those generations should make no sacrifices for the future.
4.4 Some counterintutive conclusions in population ethics
The utility functions under study are of the form \(S({{\varvec{u}}}) = \int _{{{\mathbb {I}}}} \varphi (u_t) \, d\Lambda (t)\) where \(\Lambda\) is the uniform distribution on \({{\mathbb {I}}}\). The function \(\varphi (\cdot )\) encodes inequality aversion.
4.4.1 The sadistic conclusion and the mere addition paradox
Consider a choice between two policies, both from the same starting point of a population that is very, very well off: one policy leads to a small number of truly miserable people being added to the future population; and one leads to a large number of well off, but not very very well off, being added to the same population.Footnote 26 There are three situations to compare: the status quo (SQ); the addition of a small number of truly miserable in the future (TM); and the addition of a larger number of well off in the future (WO).
-
The ‘sadistic conclusion’ is the observation that a social welfare function could prefer (TM), a small number of miserable lives, to (WO), the larger number of well off lives.
-
The ‘mere addition paradox’ is that a social welfare function could prefer a status quo policy over either addition of new people.
Arguing that coming to the sadistic conclusion disqualifies a social welfare function is, at its core, an argument that only a Rawlsian “choose to maximize the welfare of the worst off” criterion is acceptable.Footnote 27 On the other hand, such a criterion can, as we saw just above, lead to disastrously bad policy recommendations. It also advocates for the status quo policy, and this runs afoul of the second observation.Footnote 28
Our equitable social welfare functions, \({{\varvec{u}}}\mapsto S_\varphi ({{\varvec{u}}}) = \int _{{{\mathbb {I}}}} \varphi (u_t)\,d\Lambda (t)\), also choose the status quo policy, basically because \(\varphi (\cdot )\) is monotonic and whatever the size of the population, we normalize its mass to 1. In this sense, we have expanded the scope of long run average utility as a social welfare criterion by incorporating both inequality aversion and the ability to study long-run problems. But we have not changed the basics properties of social welfare functions that incorporate averaging.
4.4.2 When is (TM) preferred to (WO)?
We now examine conditions under which \(S_\varphi (\cdot )\) prefers (TM) to (WO) or the reverse. The result hinges on the degree of intergenerational inequality aversion, i.e. the degree of concavity of \(\varphi (\cdot )\).
Fix utilities \(0 \le u_{TM} \ll u_{WO} \ll u_{VWO}\) where \(u_{VWO}\) is the utility of the very well off. We are comparing the (TM) situation, \((\alpha , 0, 1-\alpha )\), i.e. \(\alpha\) of the population at \(u_{TM}\) and the rest at \(u_{VWO}\), to the (WO) situation, \((0, \beta , 1 - \beta )\), i.e. \(\beta\) of the population at \(u_{WO}\) and the rest at \(u_{VWO}\). The \(\alpha\) and \(\beta\) can be understood as proportions of future generations at the different utilities, or, using the stochastic population model discussed in Sect. 3.11, as the proportions of future people at the different utilities.
Associated with a concave \(\varphi :[0, \infty ) \rightarrow [0, \infty )\) is the social welfare function \(S_\varphi :{{\varvec{W}}}\rightarrow [0, \infty )\). For each such \(\varphi\), the set \(\textrm{Sit}(\varphi ) = \{(\alpha , \beta ): \alpha \varphi (u_{TM}) + (1-\alpha ) \varphi (u_{VWO}) > \beta \varphi (u_{WO}) + (1-\beta ) \varphi (u_{VWO})\}\) represents situations in which \(S_\varphi (\cdot )\) yields the sadistic conclusion above. Recall that a concave increasing function f is more concave than a concave increasing function g if there is a concave increasing h such that \(f(r) = h(g(r))\). The more concave is \(\varphi\), the more inequality averse is \(S_\varphi (\cdot )\). The following is an elementary result for concave functions. It tells us that higher inequality aversion lessens the ‘sadistic’ instances.
Lemma 4.4
If \(\varphi\) is more concave than \(\xi\), then \(\textrm{Sit}(\varphi ) \subset \textrm{Sit}(\xi )\).
The contrast between the version of average utilitarianism that we are using and classical utilitarianism is quite striking.Footnote 29 Let \(N_{VWO}\), \(N_{TM}\), and \(N_{WO}\) denote the numbers of people in the population that are, or will be, very well of, well off, and totally miserable. The question of whether (TM) is preferred to (WO) becomes whether or not the inequality
holds. After rearrangement, the question is whether or not \((N_{TM}/N_{WO}) > (u_{WO}/u_{TM})\). There are two results. First, if \(u_{TM} > 0\) shrinks, then the class of situations in which (TM) is preferred to (WO) shrinks. Second, if one uses composition with increasing concave transformations h having the property that \(h(0) = 0\) as the definition of more inequality averse, then more inequality aversion increases rather than decreases the class of situations in which this ‘sadistic’ conclusion holds.
Classical utilitarianism cannot be invariant to positive affine transformations. If one is maximizing \(\sum _{i \in {{\mathbb {I}}}} u_i\) across policies that affect the size of \({{\mathbb {I}}}\), then making the \(u_i\) negative by subtracting a positive constant asks for policies that deliver a world without humans. Thus, the concave transformations \(h(\cdot )\) are changing the total desirability of humans as well as representing more inequality averse preferences over distributions of utilities. Dasgupta’s (2001, Ch. 14.2) examines the implications of classical utilitarianism for the optimal size of the population. Later in the chapter (p. 221), he ends with a question of “why the attitude towards equality influences the optimum number of lives.” The conflation of the value of humans with inequality aversion seems to provide an answer.
5 Conclusions
Before KS, the literature on intergenerationally equitable social preference orderings had documented the difficulties in operationalizing patience, understood as a form of indifference to permutations, and simultaneously satisfying the Pareto criterion. Given the centrality of the Pareto criterion to welfare analyses, this may give an impression that patient and equitable societal preferences cannot be sensibly implemented for intergenerational problems. And, by extension, that some form of discounting must be used. While there are many arguments for discounting, we do not believe that this is one of them. Viewing the set of generations as a non-atomic measure space, something implicit but not explicit in previous literature, KS showed that the purported examples of the failure of the Pareto criterion involved increasing the welfare of a null subset of the population.
KS also integrated patience/intergenerational equity and Pareto responsiveness in a class of social welfare functions that are indifferent to the broadest class of permutations that had been used in the literature, what we call the asymptotic or \({\scriptstyle {\mathcal {O}}}(T)\) permutations. But we now believe that this class of permutations is much narrower than is suitable for use in a definition of patience/intergenerational equity. As Example 2.1 shows, it allows for welfare functions that can be lowered by switching benefits to later generations from an equal number of early generations.
This paper, by contrast, asks for invariance of the intergenerational preference ordering with respect to all almost everywhere permutations. Our class of permutations can switch e.g. the first half of the generations for the second half because we take seriously the idea that our limit models should “look like” and “act like” large but finite models. The results of imposing this strong form of equity are in Theorems B and D. For strong equity to hold, the hyperfinite model \({{\mathbb {I}}}\) that replaces \({{\mathbb {N}}_0}\), the old model of the generations, must be given the uniform (or counting) distribution.
One might worry that working strong equity into social preferences would make the resulting social welfare functions intractable. But the opposite turns out to be true. In a model with potentially avoidable risks of species extinctions, the optimal policy for all of the equitable preferences given here call for sustainability. In a model with partially avoidable risks of huge downturns in human welfare, the equitable preferences call for sacrifices both to avoid the downturn and to speed up the return to a better state. It is also possible to perform comparative statics analyses. In the model with partially avoidable risks, more inequality aversion calls for more efforts to avoid downturns and speed up the return to a better state. And more inequality averse intergenerational preferences more often avoid choices that are problematic in terms of population ethics, like the sadistic conclusion or the mere addition paradox.
Going forward, there are a number of open problems, some purely theoretical in nature, and others containing a mix of theoretical and practical issues. On the purely theoretical side are issues related to what is called the problem of “underselectiveness” in the studies of maximization of the long run average performance of a system. In the context of intergenerational equity, this can arise as the weak optimality of both “present profligacy” and Chichilnisky’s (1996) “dictatorship of the future.” The profligacy of the present arises if some finite number of early generations leave the world in so bad a situation that future generations must make immense sacrifices to recover. The “dictatorship of the future” arises if a finite number of early generations must make immense sacrifices in order that future generations have a better life. For a social welfare function that is invariant with respect to reversible changes in the utility of finitely many generations, both profligacy and dictatorship are at least weakly optimal. The most extreme form of underselectiveness showed up in the case of the infinitely inequality averse ‘Rawslian’ preferences applied to models with recurrent, even if rare, disasters. In such cases, optimal policies for the social welfare functions \(\inf _t u_t\) and \(\liminf _t u_t\) call for no effort to be spent on recovering from disasters, and they are completely mute on efforts to avoid future disasters.
We have preliminary results indicating that taking account of infinitesimal differences in payoffs may solve these problems. There is a direct way to see why our social welfare functions are underselective. They are of the form \(S({{\varvec{u}}}) = \textrm{st}(\frac{1}{T+1} \sum _{t=0}^T \varphi (u_t))\), and taking the standard part strips out the infinitesimal differences in payoffs that arise when finitely many generations have their utility changed. Maximizing without first taking the standard part results in more complicated calculations but, in some models, little change in the “basic aspects of” the solutions. An additional virtue of such an approach is that it could, potentially, allow us to find a version of social welfare functions that respect the classic Pareto criterion, as in Jonsson and Voorneveld’s (2018) “limit of discounted utility” ordering. However, we do not yet have a full characterization of problems in which the basic aspects are stable with respect to accounting for infinitesimal differences.
In a similar fashion, in Example 4.2, we saw that making \(\varphi (\cdot )\) more inequality averse led to policies calling for more efforts to avoid and recover from worldwide catastrophe, whereas using the patient infinitely inequality averse preferences, \(\liminf _t u_t\), little useful policy guidance can be had. By scaling a non-standard \(\varphi (\cdot )\) to have the risk aversion coefficient, \(- \varphi ''(r)/\varphi '(r)\), infinite, it may be possible to recover something like the \(\liminf _t u_t\) preferences, but retain the higher selectivity of our social welfare functions.
Another set of theoretical concerns relate to the ethics of demanding large sacrifices from the earlier generations (see e.g. Portney and Weyant 2013). As Fleurbaey and Tungodden (2010) shows, such ethical dilemmas seem to be an intrinsic feature of aggregative models. In the context of our project, exploring the implications of adopting additional ethical postulates that prohibit “undue” hardship for any generation is a fascinating open question. There is a related set of questions regarding the implementability of such prescriptions for generational sacrifice in decentralized settings.
Problems that contain a mix of theoretical and practical issues stem from the observation that planning for even a 200 years time horizon is not possible ats any serious level of granularity. Still, increasing the set of feasible choices for future generations, even if it costs the current generation, seems intuitively optimal, provided only that one weights the well-being of far future generations. These includes the sorts of sacrifices that make the survival of a knowledge based society more likely, and a return to it possible if it should falter.
Notes
From Charles Sanders Peirce (1878), “\(\ldots\) the practical effects of the objects of your conception \(\ldots\) is the whole of your conception of the object.” For us the ‘practical effects’ are examined within economic models. This is a viewpoint that has been expressed with particular clarity by Dasgupta and Heal (Dasgupta and Heal 1979, Ch. 10, §4, p. 311), “exercises in optimal planning \(\ldots\) enable us to see in what way the implications of various ethical norms differ. It is in this sense that it is a legitimate exercise to revise or criticize ethical norms in the light of their implications.” See also Atkinson (2001), “by applying our ethical criteria to concrete economic models, we learn about their consequences, and this may change our views about their attractiveness.”
For a non-empty finite set S, \(\#S\) denotes its cardinality, i.e. the integer N for which there is a one-to-one and onto \(f:S \leftrightarrow \{1, \ldots , N\}\).
Throughout, we use the convention that “\(A \subset B\)” holds also when the sets A and B are equal.
See Blackorby et al. (1995) for a discussion of these normalizations.
In the classic Landau notation for asymptotic analysis, \(|\pi (T) - T|\) is \({\scriptstyle {\mathcal {O}}}(T)\). What we here call asymptotic permutations are called “bounded” in Lauwers (1998) if they are also one-to-one and onto mappings from \({{\mathbb {N}}_0}\) to \({{\mathbb {N}}_0}\).
The puzzle was first pointed out by Diamond (1965), who showed that there is no sup norm continuous function on the space of sequences of utilities that is simultaneously strongly Pareto and indifferent, for every t, to swapping the utilities of the first and the t’th generations. Basu and Mitra (2003) show that the same is true if “continuous” is replaced by “measurable,” Fleurbaey and Michel (2003), Bossert et al. (2007), and Basu and Mitra (2007) contain extensions and further results, both positive and negative. Zame (2007) showed that the graph of any preference relation on \([0, 1]^{{\mathbb {N}}_0}\) that respects the Pareto principle and translation invariance must be drastically non-measurable, having inner measure 0 and outer measure 1 in the obvious measure. Asheim (2010) provides an extensive review of this literature.
The argument runs as follows. The events that \({\varvec{U}} \succsim _{LDU} {\varvec{V}}\) or \({\varvec{V}} \succsim _{LDU} {\varvec{U}}\) belongs to the tail \(\sigma\)-field. By Kolmorov’s 0-1 law, the events either have probability 0 or probability 1. The random walks \(S_n:= \sum _{t=0}^n (u_t - v_t)\) and \(T_n:= \sum _{t=0}^n (u_t - v_t)\) satisfy \(\lim\, \text{inf}_n S_n = \text{lim}\, \text{inf}_n T_n=-\infty\). Therefore, \(\liminf _{\beta \uparrow 1} \sum _{t=0}^\infty (u_t - v_t)\beta ^t = -\infty\), but also \(\liminf _{\beta \uparrow 1} \sum _{t=0}^\infty (v_t - u_t)\beta ^t = -\infty\), and the comparability events have probability 0.
The existence of such probabilities requires the Axiom of Choice. One development takes points masses as points in \({\mathbb {P}}:= \{0, 1\}^{{\mathcal {P}}({\mathbb {N}})}\), the set of all functions from the subsets of \({{\mathbb {N}}_0}\) to the two point set \(\{0, 1\}\). By the Axiom of Choice, in its logically equivalent form—Tychonoff’s theorem—\({\mathbb {P}}\) is compact in the product topology so that infinite sets have accumulation points, and the set of zero–one probabilities is closed. We can take \(\mu\) to be any accumulation point of the set of point masses, \(\{\delta _n: {n \in {\mathbb {N}}}\}\) where \(\delta _n(E):= 1_E(n)\).
A central result in nonstandard analysis is known as Robinson’s theorem. Applied to metric spaces, it says that X is compact if and only if for every \(x \in {}^*\!X\), there is an \(h \in X\) such that for all standard \(\epsilon > 0\), \(d(x,h) < \epsilon\).
If \({\mathcal {E}}\) is a class of subsets of \(\Omega\), then the \(\sigma\)-field that it generates is the smallest \(\sigma\)-field containing \({\mathcal {E}}\), which always exists because the class of \(\sigma\)-fields is closed under arbitrary intersection.
This is often called the “Loeb counting measure.” For \(A = \langle A_1, A_2, A_3, \ldots \rangle\), the cardinality of A is denoted \(\#A\) and defined as \(N = \langle N_1, N_2, N_3, \ldots \rangle\) where \(N_n = \#A_n\) is the cardinality of \(A_n\) and \({{\mathbb {I}}}= \langle I_1, I_2, I_3, \ldots \rangle\) has the cardinality \(M = \langle \#I_1, \#I_2, \#I_3, \ldots \rangle\).
There is some flexibility to slightly strengthen the assumption, but the strengthening adds nothing of any consequence; Because preferences are continuous with respect to the \(L_\infty\)-norm distance, almost everywhere equality of \({{\varvec{u}}}\) and \({{\varvec{v}}}\) implies indifference; thus, one could as well work with \(\pi\)’s that are \(\Lambda\)-almost everywhere equal to an internal bijection.
As a referee noted, ‘generalized utilitarianism’ as in Grant et al. (2010) incorporates risk aversion which in turn leads to similar functional forms. However, we implement this in an intertemporal, infinite population setting. The similarities to functional forms aside, thinking about Harsanyi-type identity lotteries in an intermporal setting has additional substantive implications, as the externalities can only be imposed on future generations, and not the other way around.
Here we follow the convention that integers and the set of integers preceding them are one and the same. E.g. 3 and the set \(\{0, 1, 2\}\) are the same.
As a referee has pointed out, it may not be clear what an internal union of a class of internal sets is. Starting with a set X, \({\mathcal {P}}(X)\) denotes the class of subsets of X, and \({\mathcal {P}}^{(2)}(X) = {\mathcal {P}}({\mathcal {P}}(X))\) denote the class of classes of subsets of X. “Union” is a function mapping \({\mathcal {P}}^{(2)}(X)\) to \({\mathcal {P}}(X)\) defined \(\bigcup ({\mathcal {E}}) = \{x \in X: (\exists E \in {\mathcal {E}})[x \in E]\}\) for each \({\mathcal {E}}\in {\mathcal {P}}^{(2)}(X)\). The internal union of an internal collection of sets is the \({}^*\)’d version of this function: an internal collection of sets is an \({\mathcal {E}}= \langle {\mathcal {E}}_1, {\mathcal {E}}_2, {\mathcal {E}}_3, \ldots \rangle \in {}^*\!{\mathcal {P}}^{(2)}(X)\), i.e. \(\mu (\{{n \in {\mathbb {N}}}: {\mathcal {E}}_n \in {\mathcal {P}}^{(2)}(X)\}) = 1\); an internal set \(E = \langle E_1, E_2, E_3, \ldots \rangle \in {\mathcal {E}}\) iff \(\mu (\{{n \in {\mathbb {N}}}: E_n \in {\mathcal {E}}_n\}) = 1\); and finally, for each internal collection \({\mathcal {E}}\), \(\bigcup ({\mathcal {E}}) = \{x \in {}^*\!X: (\exists E \in {\mathcal {E}})[x \in E]\}\). For this model, one starts with X as the set of all possible people who might eventually exist, for each \(t \in T\), one has an internal set \(I_t\), and \({{\mathbb {I}}}\) is the union of the \(I_t\). One replaces union with intersection by changing “\(\exists\)” to “\(\forall\).”
We are grateful to a referee for pointing this out.
Lauwers (1998) uses Mokobodzki’s medial limits as developed by Meyer (1973) to give social welfare functions that give equal weight to subsets of the population having equal long run densities. Lauwers does this using linear functionals defining and defined by translation invariant probabilities, i.e. by taking the sets TI to be singletons. KS §3.5, Lemma 8 uses the nonstandard analysis tools exposited here to show that a much broader class of these preferences must ignore distributional concerns in the sense that they are indifferent between, e.g. the streams of utility \({{\varvec{u}}}= (1,3,1,3,1,3,\ldots )\), \({{\varvec{v}}}= (3,1,3,1,3,1,\ldots )\), and \((\frac{1}{2}{{\varvec{u}}}+ \frac{1}{2}{{\varvec{v}}}) = (2,2,2,2,2,2,\ldots )\).
A more extensive discussion of these and related social welfare functions is in §3.5–6 of KS.
As one referee rightly pointed out, this is by no means an exhaustive list of all the different social welfare functions in the literature that could be considered or compared; we made a selection to highlight the contrasts with our results.
It seems unfair to Rawls’ work to name these preferences after him, but the frequency of this usage must be a variant of the Law of Eponymy.
While this model has a very simple, two state dynamic structure to the fish population, hence it applies more to shrimp than to, say, tuna, the essential lessons remain valid with more complicated population dynamics. See Huang and Smith (2014, §1) for the bioeconomic appropriateness of modeling shrimp as an annual industry with simpler dynamics.
From the Tauberian Lemma, 3.9, we expect this answer for limit discounted utility functions and the \(S_\varphi (\cdot )\) utility functions, but the other utility functions require a different argument.
The ideas in this model have several sources. The first one is Ehrlich and Ehrlich’s (1981) analogy between species in an ecosystem and rivets in an airplane: some rivets are redundant and one would not miss them if they are lost, but a large enough cumulative loss of rivets leads to a crash. More specific worries include the production of oxygen by natural systems: Baccini et al. (2017) show, more precisely than before, that ongoing degradation of tropical forests has made them into a large net carbon source; and the majority of the studies of the effects of warmer and more acidic oceans on phytoplankton suggest that roughly two thirds of global production of oxygen by phytoplankton may disappear by the end of the current century. Finally, Costanza et al. (1997) systematically under-estimate the value of the non-marketed services that humanity receives, yearly, from the world’s ecosystem, and arrive at a conservative figure of 1.8 times the yearly world GDP.
Applied to \({{\varvec{u}}}\in {{\textbf {Erg}}}\), two shift invariant probability, the geometric with parameter \(\beta \simeq 1\) and the uniform on \(\{0, 1, \ldots , T\}\), T unlimited, have the same occupation measures, see the Tauberian Lemma 3.9. There is a subset, \({{\textbf {Erg}}}' \subset {{\textbf {Erg}}}\), with the property that, applied to all \({{\varvec{u}}}' \in {{\textbf {Erg}}}\), all shift invariant probabilities yield the same occupation measure. But we do not have a useful characterization of \({{\textbf {Erg}}}'\).
These are related to a general observation in population ethics called the ‘repugnant conclusion’. Spears and Budolfson (2021) contains a discussion and exhaustive analysis of the variant forms that the ‘repugnant conclusion’ takes, including extensive references to the literature on its development. They show that societal preferences over variable populations cannot escape ‘repugnance’ in one form or another.
See Dierksmeier (2006) on the foundations of this social welfare function in intergenerational settings.
As noted, this is consistent with Spears and Budolfson (2021) who show that social welfare functions cannot escape ‘repugnance’ in one form or another.
We are grateful to a referee for pointing this out.
References
Anderson RM (1976) A non-standard representation for Brownian motion and Itô integration. Israel J Math 25(1–2):15–46
Asheim GB (2010) Intergenerational equity. Ann Rev Econ 2(1):197–222
Asheim GB, Mitra T (2010) Sustainability and discounted utilitarianism in models of economic growth. Math Soc Sci 59(2):148–169
Atkinson AB (2001) The strange disappearance of welfare economics. Kyklos 54(2–3):193–206
Aumann RJ, Shapley LS (1974) Values of non-atomic games. Princeton University Press, Princeton, N.J. (A Rand Corporation Research Study)
Baccini A, Walker W, Carvalho L, Farina M, Sulla-Menashe D, Houghton RA (2017) Tropical forests are a net carbon source based on aboveground measurements of gain and loss. Science 358(6360):230–234
Basu K, Mitra T (2003) Aggregating infinite utility streams with intergenerational equity: the impossibility of being Paretian. Econometrica 71(5):1557–1563
Basu K, Mitra T (2007) Possibility theorems for equitably aggregating infinite utility streams. In: Intergenerational equity and sustainability, pp 69–84. Springer, New York
Blackorby C, Bossert W, Donaldson D (1995) Intertemporal population ethics: critical-level utilitarian principles. Econometrica 63(6):1303–1320
Bossert W, Sprumont Y, Suzumura K (2007) Ordering infinite utility streams. J Econ Theory 135(1):579–589
Chichilnisky G (1996) An axiomatic approach to sustainable development. Soc Choice Welf 13(2):231–257
Corbae D, Stinchcombe MB, Zeman J (2009) An introduction to mathematical analysis for economic theory and econometrics. Princeton University Press, Princeton, N.J
Costanza R, D’Arge R, De Groot R, Farber S, Grasso M, Hannon B, Limburg K, Shahid Naeem V, O’Neill R, Paruelo J, Raskin RG, Sutton P, Van DB (1997) The value of the world’s ecosystem services and natural capital. Nature 387(6630):253–260
Dasgupta P et al (2001) Human well-being and the natural environment. Oxford University Press, Demand
Dasgupta PS, Heal GM (1979) Economic theory and exhaustible resources. Cambridge economic handbooks. Cambridge University Press, Cambridge
Diamond PA (1965) The evaluation of infinite utility streams. Econometrica 33(1):170–177
Dierksmeier C (2006) John Rawls on the rights of future generations. In: Joerg Chet T (ed), Handbook of Intergenerational Justice, pages 72–85. Edward Elgar Publishing
Dietz S, Asheim GB (2012) Climate policy under sustainable discounted utilitarianism. J Environ Econ Manag 63(3):321–335
Ehrlich PR, Ehrlich AH (1981) Extinction: the causes and consequence of the disappearance of species. Random House, New York
Fishburn PC (1982) The foundations of expected utility, vol 31. Theory and decision library. D. Reidel Publishing Co., Dordrecht
Fleurbaey M, Michel P (2003) Intertemporal equity and the extension of the Ramsey criterion. J Math Econ 39(7):777–802
Fleurbaey M, Tungodden B (2010) The tyranny of non-aggregation versus the tyranny of aggregation in social choices: a real dilemma. Econ Theor 44(3):399–414
Fudenberg D, Tirole J (1991) Game theory. MIT Press, Cambridge, MA
Goldblatt R (1998) Lectures on the hyperreals, Graduate Texts in Mathematics. An introduction to nonstandard analysis, vol 188. Springer, New York
Grant S, Kajii A, Polak B, Safra Z (2010) Generalized utilitarianism and harsanyi’s impartial observer theorem. Econometrica 78(6):1939–1971
Hildenbrand W (1969) Pareto optimality for a measure space of economic agents. Int Econ Rev 10(3):363–372
Huang L, Smith MD (2014) The dynamic efficiency costs of common-pool resource exploitation. Am Econ Rev 104(12):4071–4103
Jonsson A, Voorneveld M (2018) The limit of discounted utilitarianism. Theor Econ 13(1):19–37
Jerome KH (1984) An infinitesimal approach to stochastic analysis. Mem Am Math Soc 48(297):x+184
Jerome KJ (1988) Infinitesimals in probability theory. In: Nonstandard analysis and its applications (Hull, 1986), volume 10 of London Math. Soc. Stud. Texts, pages 106–139. Cambridge Univ. Press, Cambridge
Khan U, Stinchcombe MB (2018) Planning for the long run: programming with patient, Pareto responsive preferences. J Econ Theory 176:444–478
Lauwers L (1998) Intertemporal objective functions: strong Pareto versus anonymity. Math Soc Sci 35(1):37–55
Lindstrøm T (1988) An invitation to nonstandard analysis. In: Nonstandard analysis and its applications, volume 10 of London Math. Soc. Stud. Texts, pages 1–105. Cambridge Univ. Press, Cambridge
Loeb PA (1971) A nonstandard representation of measurable spaces and \(L_{\infty }\). Bull Am Math Soc 77:540–544
Loeb PA (1975) Conversion from nonstandard to standard measure spaces and applications in probability theory. Trans Am Math Soc 211:113–122
Marinacci M (1998) An axiomatic approach to complete patience and time invariance. J Econ Theory 83(1):105–144
Meyer P-A (1973) Limites médiales d’après Mokobodzki In: Séminaire de Probabilités, VII (Univ. Strasbourg, Année Universitaire 1971–1972), pages 198–204. Lecture Notes in Math., Vol. 321. Springer, Berlin
Portney PR, Weyant JP (2013) Discounting and intergenerational equity. Routledge, England
Ramsey FP (1928) A mathematical theory of saving. Econ J 38:543–559
Robinson A (1964) On generalized limits and linear functionals. Pac J Math 14:269–283
Robinson A (1966) Non-standard analysis. North-Holland Publishing Co., Amsterdam
Robinson A (1996) Non-standard analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton, NJ, Reprint of the second (1974) edition. With a foreword by Wilhelmus A. J, Luxemburg
Spears D, Budolfson M (2021) Repugnant conclusions. Soc Choice Welf pages 1–22
Tremmel JC (ed) (2018) Handbook of Intergenerational Justice. Elgar original reference. Edward Elgar, Cheltenham, UK
von Neumann J (1932) Einige Sätze über messbare Abbildungen. Ann Math (2) 33(3):574–586
Zame WR (2007) Can intergenerational equity be operationalized? Theor Econ 2:187–202
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We have received helpful comments and questions from seminar participants at UC Davis, the 2019 summer Econometric Society meetings, the 2019 SAET meetings, the Social choice, climate change, and population conference at UT Austin. Exceptionally thorough and insightful comments from referees are gratefully acknowledged. And special thanks to the memory of Leo Simon (dec. January 2022), who taught one of the authors the art and the power of being productively puzzled.
Appendix A. Proofs not in the text
Appendix A. Proofs not in the text
Proof of Lemma 3.1
Because \(x_n = x_n\) and \(x_n = y_n\) iff \(y_n = x_n\), we have both \((x_1, x_2, x_3, \ldots ) \sim (x_1, x_2, x_3, \ldots )\) and \([(x_1, x_2, x_3, \ldots ) \sim (y_1, y_2, y_3, \ldots )]\) iff \([(y_1, y_2, y_3, \ldots ) \sim (x_1, x_2, x_3, \ldots )]\). To show transitivity, suppose that \((x_1, x_2, x_3, \ldots ) \sim (y_1, y_2, y_3, \ldots )\) and \((y_1, y_2, y_3, \ldots ) \sim (z_1, z_2, z_3, \ldots )\). Let \(E_{x,y} = \{{n \in {\mathbb {N}}}: x_n = y_n\}\), \(E_{y,z} = \{{n \in {\mathbb {N}}}: y_n = z_n\}\), and \(E_{x,z} = \{{n \in {\mathbb {N}}}: x_n = y_n\}\). Note that \(\mu (E_{x,y}) = \mu (E_{y,z}) = 1\), so that \(\mu (E_{x,y} \cap E_{y,z}) = 1\). Since \(E_{x,z} \subset E_{x,y} \cap E_{y,z}\), \(\mu (E_{x,z}) = 1\). \(\square\)
Proof of Lemma 3.6
We prove the statement about limits first. Suppose first that \(s = \lim _n r_n\), i.e. for all standard \(\epsilon > 0\), there exists \(N_\epsilon \in {\mathbb {N}}\) such that for all \(n \ge N_\epsilon\), \(|r_n - s| < \epsilon\). Let \(N = \langle N_1, N_2, N_3, \ldots \rangle\) be an unlimited number so that \(r_N = \langle r_{N_1}, r_{N_2}, r_{N_3}, \ldots \rangle\). Because N is unlimited, \(\mu (\{k \in {\mathbb {N}}: N_k \ge N_\epsilon \}) = 1\), hence \(\mu (\{k \in {\mathbb {N}}: |r_{N_k} - s| < \epsilon \}) = 1\). Since \(\epsilon\) was arbitrary, \(s = \textrm{st}(r_N)\).
Suppose now that \(s \ne \lim _n r_n\), i.e. there exists a standard \(\epsilon > 0\) such that there is an infinite set of integers \(N_1, N_2, N_3, \ldots\) such that \(|s - r_{N_k}| \ge \epsilon\). If we take \(N = \langle N_1, N_2, N_3, \ldots \rangle\), then \(|r_N - s| \ge \epsilon\).
We now give the argument for \(\liminf _n r_n\), the argument for \(\limsup _n r_n\) is directly parallel. Let \({\underline{r}} = \liminf _n r_n\). No subsequence \(r_{n_k}\) can be less than any \({\underline{r}} - \epsilon\) for infinitely many \(n_k\), so \({\underline{r}} \le \textrm{st}(r_N)\) for any unlimited N, hence if \(r \le \liminf _n r_n\), then \(r \le \textrm{st}(r_N)\) for any unlimited N. On the other hand, if \(r > \liminf _n r_n\), then for some standard \(\epsilon > 0\), there are infinitely many \(n_k\) such that \(r > r_{n_k} + \epsilon\). Taking \(N = \langle n_1, n_2, n_3, \ldots \rangle\) completes the proof. \(\square\)
Proof of Lemma 3.8
Properties (a) and (b) are immediate. Let \(\{A_k: k = 1, \ldots , K\}\) denote a finite collection of internal subsets of \({}^*\!X\). For each k, we have \(A_k = \langle A_{k,1}, A_{k,2}, A_{k,3}, \ldots \rangle\) and \(\mu (E_k) = 1\) where \(E_k = \{{n \in {\mathbb {N}}}: A_{k,n} \in X\})\). For \(E:= \cap _{k=1}^K E_k\), we have \(\mu (E) = 1\), hence \(\mu (\{{n \in {\mathbb {N}}}: \cup _{k=1}^K A_{k,n} \in X\}) = 1\). \(\square\)
The following is a central result in the development of nonstandard analysis.
Theorem A
If \(A^1 \supset A^2 \supset \ldots \supset A^k \supset \ldots\) is a decreasing sequence of of non-empty internal subsets of an internal set \({}^*\!X\), then
Its proof can be found in many introductions (e.g. Lindstrom (1988, Theorem I.2.5, p. 12), Goldblatt (1998, Theorem 11.10.1, p. 138), or Corbae et al. (2009, Theorem 11.2.4, p. 634)). Usually, the proof is a variant of the classical diagonalization arguments, but, per Goldblatt, it is “not easy to motivate intuitively.”
Proof of Corollary A.1
Suppose that A contains arbitrarily large limited integers. For each \(k \in {\mathbb {N}}\), let \(B^k = \{t \in A: t \ge k\}\). This is a nested sequence of non-empty internal sets, hence \(\bigcap _k B^k \ne \emptyset\), and any element of the intersection is an unlimited element of A. Now suppose that the non-empty, internal \(A \subset {{\mathbb {N}}_0}\) contains arbitrarily small unlimited integers. By transfer of the statement that every non-empty subset of \({{\mathbb {N}}_0}\) contains its lower bound, A contains its lower bound. And that lower bound cannot be unlimited. \(\square\)
Proof of Corollary A.2
If \(K \in {\mathbb {N}}\), then, because the class of internal subsets of \({}^*\!X\) is a field, both \(\bigcap _{k \le K} A^k\) and \(\bigcup _{k \le K} A^k\) are internal. Suppose that \(A:= \bigcap _k A^k\) is internal and is not equal to \(\bigcap _{k \le K} A^k\) for any \(K \in {\mathbb {N}}\). Then the nested collection \(B^K:= A^K {\setminus } A\) is a nested collection of internal sets so that \(\bigcap _K B^K\) is not empty, contradicting the definition of A. The argument for unions follows by taking complements. \(\square\)
As there are many proofs of Theorem B in the literature, we only sketch it here. Of those proofs that we are aware, the shortest and cleanest is still Loeb’s (1975) appeal to Carathéodory’s extension theorem, a central result in measure theoretic probability. We give a sketch of his arguments below. There are two proofs that do not appeal to Carathéodory’s theorem: the most leisurely and thorough is in (Goldblatt (1998), Ch. 16); while the one in (Lindstrøm (1988), Ch. II.2) is more suited to those with a familiarity with the basics of measure theoretic probability.
Sketch of a Proof of Theorem B. The class of internal sets is a field and not a \(\sigma\)-field, Lemma A.2 shows that the class of internal sets is not closed under countable unions or intersections unless these reduce to finite unions or intersections; the probability \(E \mapsto \textrm{st}(P(E))\) is finitely additive on the field of internal sets; by the second point, if \(E_n\) is a sequence of internal sets with \(E_n \downarrow \emptyset\), then, by countable saturation, \(P(E_n) \downarrow 0\) because \(E_n = \emptyset\) for all sufficiently large n; this means that Carathéodory’s extension theorem (see e.g. (Corbae et al. 2009, Theorem 7.6.2, p. 412)) applies and P has a unique countably additive extension to the (P-completion of the) \(\sigma\)-field generated by the internal sets proving (1). In the general case, Carathéodory’s extension theorem gives, for every \(E \in {\mathcal {X}}\), a sequence \(E_n\) of elements of \(\mathcal{X}^{\,\circ}\) with \(P(E \Delta E_n) \rightarrow 0\). The exact approximation by internal sets is an application saturation and overspill applied to the sequence \(n \mapsto E_n\) where \(n \in {\mathbb {N}}\) and \(E_n\) is an internal set in \(\mathcal{X}^{\,\circ}\). One can show that there exists an infinite N and such that \(n \mapsto E_n\) for \(n \in \{1, 2, \ldots , N\}\) agrees with the original sequence, \(E_N\) is internal, and \(P(E \Delta E_N) \simeq 0\). \(\square\)
For a proof of Keisler’s Theorem, C, applicable to all complete and separable metric spaces, we refer the reader to the original source, (Jerome Keisler 1984, Theorem 9.2, p. 134), alternatively, to his exposition in Keisler (1988). In the \({\mathbb {R}}\)-valued case, the arguments are a bit simpler.
Proof of Theorem C
If there is an internal automorphism \(\pi\) such that \({{\varvec{v}}}= {{\varvec{u}}}^\pi\) \(\Lambda\)-almost everywhere, then they must induce the same distribution. Assume now that \({{\varvec{u}}}\) and \({{\varvec{v}}}\) induce the same distribution. Let \(\{G_n: {n \in {\mathbb {N}}}\}\) be a countable collection of open subsets of \({\mathbb {R}}\) that for a basis for the Euclidean topology on \({\mathbb {R}}\), e.g. let \(G_n\) enumerate the set of open balls with rational centers and positive rational radii. Inductively define internal sets \(C_n,D_n\) such that \(P(C_n \Delta {{\varvec{u}}}^{-1}(G_n)) = 0\), \(P(D_n \Delta {{\varvec{v}}}^{-1}(G_n)) = 0\), and each Boolean combination of the collection \(C_1, \ldots , C_n\) has the same internal cardinality as the cooresponding Boolean combination of the collection \(D_1, \ldots , D_n\). For each \({n \in {\mathbb {N}}}\), there exists an internal bijection \(\pi _n\) from \({{\mathbb {I}}}\) to \({{\mathbb {I}}}\) that maps each \(C_m\) onto each \(D_m\) for each \(m \le n\). By Theorem A, there is an internal bijection, \(\pi\), from \({{\mathbb {I}}}\) to \({{\mathbb {I}}}\) that maps all of the \(C_n\) onto the corresponding \(D_n\). By construction, \({{\varvec{v}}}= {{\varvec{u}}}^\pi\). \(\square\)
Proof of Theorem D
If the preferences have the form given, Postulates I– VI are immediate.
The first four Postulates are directly from Fishburn’s (1982, Theorem 4, Ch. 3) work on expected utility preferences over distributions on convex subsets of vector spaces. Thus, Postulates I– IV imply the existence of a continuous, concave \(S:{{\varvec{W}}}_{{\mathbb {I}}}\rightarrow [0, \infty )\) representing \(\succsim\). By monotonicity, there is no loss in the normalization \(S({\varvec{0}}) = 0\). By Theorem C, \({{\varvec{u}}}\mapsto S({{\varvec{u}}})\) can depend only on the induced distribution \(p_{{\varvec{u}}}(E) = \Lambda (\{t \in {{\mathbb {I}}}: u_t \in E\})\). For each \(r \in [0, \infty )\), let \({{\varvec{u}}}_r\) denote the function constant at r, define \(\varphi (r) = S({{\varvec{u}}}_r)\). \(\square\)
Proof of Lemma 3.9
By the Lemma 3.6, \({{\varvec{u}}}\) is ergodic with occupation measure \(\nu\) iff for all unlimited T, \(\frac{1}{T+1} \sum _{t=0}^T 1_{E}(u_t) \simeq \nu (E)\), that is, iff for any uniform distribution \(\Lambda\) on a hyperfinite interval and any measurable \(E \subset {\mathbb {R}}\), \(\Lambda (\{t: u_t \in E\}) = \nu (E)\). Again by Lemma 3.6, the equality of each \(Q_\beta (\{t: u_t \in E\})\) and each \(\Lambda (\{t: u_t \in E\}) = \nu (E)\) is the Hardy-Littlewood Tauberian theorem. \(\square\)
Proof of Lemma 4.1
Associated with a policy \(a_t\) are probabilities of extinction, \(q_t = p_{f,e}(a_t)\). This yields a random waiting time \(\varvec{\tau }\) until the last period in which the fishery is viable so that \(u_t = u(f,a_t)\) for \(t \le \varvec{\tau }\) and \(u_t = 0\) for \(t > \varvec{\tau }\). Let Q be an internal shift invariant probability on a hyperfinite interval \(\{0, 1, \ldots , T\}\), i.e. one satisfying \(\sum _{t=0}^T Q(t) \simeq 1\) and \(\sum _t |Q(t+1) - Q(t)| \simeq 0\). Fix a feasible policy \(a_t\) with associated utilities \(u_t = u(f,a_t)\) if \(t \le \varvec{\tau }\) and equal to 0 else. \(\square\)
Claim. \(E\,{}^\circ \sum _{t=0}^{\varvec{\tau }} u(a_t)Q(t) \le u^\circ\) where \(u^\circ = u(f,a^\circ )\) is the sustainable utility achievable with the policy \(a_t \equiv a^\circ\).
Let B be an upper bound for per period utility, it is sufficient to show that for all standard N, \(E\,{}^\circ \sum _{t=0}^{\varvec{\tau }} u(a_t)Q(t) \le u^\circ + B/N\). Since Q is non-atomic, for every standard N, we can subdivide \(\{0, 1, \ldots , T\}\) into intervals \(E_1 = \{0, \ldots , T_1-1\}\), \(E_2 = \{T_1, \ldots , T_2\}\), \(\ldots\), \(E_N = \{T_{N-1}, T_N\}\), such that \({}^\circ Q(E_n) = 1/N\) for \(n = 1, \ldots , N\). We show, if
for some standard \(\epsilon > 0\), then \({}^\circ Prob({\varvec{\tau }} \ge T_n) = 0\). This means that one can only achieve utility greater than \(u^\circ\) for at most a set of generations having Q-mass 1/N. Since that utility is bounded above by B, this will complete the argument for the Claim. If the inequality in (21) holds, then there exists a standard \(\delta > 0\) such that the set of \(t \in E_n\) such that \(q_t \ge \delta\) has unlimited cardinality, say M. This means that the probability that \({\varvec{\tau }} \ge T_n\), which is bounded above by \(\Pi _{t \in E_n} (1 - q_t)\), is itself bounded above by \((1 - \delta )^M \simeq 0\).
KS Theorem C shows that for the social welfare functions \(S_k(\cdot )\), \(k = 1, \ldots , 4\), there exist sets of shift invariant probabilities TI(k), with the property that the social welfare functions evaluated at \({{\varvec{u}}}\in {{\varvec{W}}}\) are given by the minimal values of integral \(\int _{{{\mathbb {N}}_0}} u_t\,dQ(t)\) where Q belongs TI(k). Therefore, for these preferences, the maximal social welfare arises from sustainable policies. Since \(\Lambda\) is also shift invariant, the maximal \(S_\varphi (\cdot )\) social welfare is bounded above by \(\varphi (u^\circ )\) by the same argument.
For the ‘Rawlsian’ preferences, with social welfare function \(\inf _t u_t\), along paths in which the fish go extinct, the social welfare is 0. Hence, to maximize \(\inf _t u_t\), there must be a probability 0 of extinction. Let \(r = \inf _t u_t\) and fix an arbitrary unlimited T. The internal subset of \({}^*{\mathbb {R}}_{++}\) given by \(\{\delta > 0: |r - \min \{u_t: t \in \{0, 1, \ldots , T\}| < \delta \}\) contains all strictly positive standard \(\delta > 0\), hence contains an infinitesimal. This means that we can work with the population model \(\{0, \ldots , T\}\) to analyze the \(\inf _t u_t\). To raise utility \(\inf _t u_t\) above \(u^\circ\) by some standard amount, we must have all generations \(t \in \{0, 1, \ldots , T\}\) receiving at least \(u^\circ + \epsilon\) for some standard \(\epsilon > 0\). But as we have seen above, this means that extinction must happen with probability 1 in the set \(\{0, 1, \ldots , T/2\}\). The arguments for the long run ‘Rawslian’ preferences, with social welfare function \(\liminf _t u_t\), are parallel. \(\square\)
Proof of Lemma 4.2
For comparability with the other utility functions, we use u(a) for \(\varphi (u(a))\). For any \(\beta\), the optimal policy is stationary at some a. Constant use of the action a has value \(V_\beta (a)\) satisfying \(V_\beta (a) = (1-\beta ) u(a) + \beta (p(a) \cdot 0 + (1-p(a))V_\beta (a)\) where p(a) is the probability of extinction in each generation using action a. This leads to \(V_\beta (a) = u(a) \cdot \left[ \frac{(1-\beta )}{1 - \beta (1-p(a))} \right]\). The comparative statics of the problem \(\max _{a \ge a^\circ } V_\beta (a)\) are a bit tedious but fairly clear, and we just report the results: \(a^*_\beta > a^\circ\); \(\beta \mapsto a^*_\beta\) is decreasing in \(\beta\); if \({}^\circ \beta < 1\), then \({}^\circ (a^*_\beta - a^\circ ) > 0\), so that \(E\,\tau _\beta < \infty\); and for \(\beta \simeq 1\), \(a^*_\beta \simeq a^\circ\) so that \(Prob(\tau _\beta > N) \rightarrow 1\) for all standard N. \(\square\)
Proof of Lemma 4.3
For all policies, it is a probability 1 event that there will be infinitely many periods in the bad state. Therefore, across all policies, the maximized expected value of \(S_{5}\) and \(S_{6}\) are constant for all possible values of r, and are both equal to \(u_C(0)\), which is the highest possible utility for the generations in the bad state, i.e. their utility when they make no sacrifice for later generations to return to the better state. \(\square\)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Khan, U., Stinchcombe, M.B. Intergenerational equity and sustainability: a large population approach. Soc Choice Welf (2023). https://doi.org/10.1007/s00355-023-01450-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00355-023-01450-w