1 Introduction

In choosing present policies, there is both a moral and a pragmatic imperative to consider the welfare of those not yet born. The normative theory of intergenerational equity has occupied social choice theorists for a long time. The pressing problems of the day, including climate change and environmental sustainability, have brought generational ethics into sharp focus as a matter of immense practical importance. This paper is a part of an ongoing project that aims to a) build a coherent theoretical framework for dealing with intergenerational equity, and b) develop answers to practical questions that arise in situations involving irreversible, or difficult to reverse, changes, as they often warrant considerations of intergenerational ethics. To this end, the current paper refines and enriches the theory first proposed in Khan and Stinchcombe (2018) (henceforth KS), and illustrates the implications of said framework in two examples drawn from environmental economics and one from political economy/political philosophy.

The key takeaway from this paper is that serious considerations of intergenerational equity tends to provide qualitatively different policy prescriptions compared to standard discounted utility models. In particular, they push for greater sacrifices for the future in the form of more cautious and “sustainable” policies when dealing with potentially irreversible negative externalities. And these lead to very different long-run predicted outcomes. Remarkably, in the presence of irreversibility, these results hold true regardless of the “degree of discounting” in the standard models. When it comes to making decisions that have significant long run consequences, perhaps the right question to ask is not “How much discounting,” rather, it is, “Should we discount at all?” In the rest of the introduction we outline our approach and the structure of the paper.

1.1 Different scales

Thinking about intergenerational ethics is a problem with a very large scale.

With 500 million years left of acceptable habitat for humans on Earth, population being stable at 10 billion with an average length of life equal to 73 years, the ratio of people who will potentially live in the future to people living now is approximately 10 million to 1. (Asheim 2010)

The numbers ‘500 million,” “10 billion” and “10 million” are large, but finite. Our approach is to replace the large but finite populations and time horizon by a particular kind of continuous, non-atomic (or “oceanic” in Aumann and Shapley’s (1974) evocative term) population model. Our approach to non-atomic models takes seriously the idea that infinite models should be interpretable as limits of large finite models. To guarantee this, we define our models using sequences of increasingly large finite models so that the results and definitions from the finite models carry over.

Within this class of population models, KS formulated social welfare functions that are Pareto responsive and patient in the sense of being invariant to the largest class of permutations that had been used in the literature. To the KS model, we add a stronger, ‘limit’ equity condition. It requires invariance with respect to a much larger class of permutations, a class that directly parallels the set of permutations for finite models. This guarantees equal treatment of all generations.

Taking a “pragmatic” point of view, we judge ethical assumptions by an examination of their implications in economic models.Footnote 1 Our limit equity condition delivers a subclass of the KS social welfare functions, a subclass for which one can find strong implications of the precautionary principle for irreversible or difficult-to-reverse problems. It also delivers analyses of intergenerational trade-offs that appear more sensible than some extant formulations of “Rawlsian” and other patient preferences that aim to capture intergenerational equity.

1.2 The limit equity condition

To see what is involved in our limit equity condition, let us start with large but finite models. We will work with a limit formulation of sequences of large population models, examining sequences \(I_n = \{0, 1, 2, \ldots , T_n\}\) of generations with \(T_n \rightarrow \infty\). For each finite model \(I_n\), the most basic of the equity conditions is invariance with respect to permutations, and we extend this directly to the limit population model.

A one-to-one and onto mapping \(\pi\) from a finite \(I = \{0, 1, 2, \ldots , T\}\) to itself is a bijection. Let \({{\varvec{u}}}: I \rightarrow {\mathbb {R}}\) denote the utility assignments of the population. The equity condition for the social preferences is that \({{\varvec{u}}}^\pi\) and \({{\varvec{u}}}\) should be indifferent where \({{\varvec{u}}}^\pi := (u_{\pi (0)}, u_{\pi (1)}, u_{\pi (2)}, \ldots , u_{\pi (I)})\). This has an alternative, probabilistic formulation.

With \(\Lambda _I\) denoting the uniform distribution on I, every utility profile \({{\varvec{u}}}= (u_0, \ldots , u_T)\) induces a distribution of utility \(p_{{\varvec{u}}}\) defined by letting \(p_{{\varvec{u}}}(A)\) denote the proportion of the population receiving a utility level in the set A, \(p_{{\varvec{u}}}(A) = \Lambda _I(\{t: u_t \in A\}) = \frac{1}{T+1} \#\{t: u_t \in A\}\).Footnote 2 With \({\mathcal {B}}_I\) denoting the set of bijections on I, under a uniform distribution on I, we have the following property, called homogeneity, of the finite probability space \(\{0, 1, 2, \ldots , I\}\) and the uniform distribution \(\Lambda _I\):

$$\begin{aligned} {[} p_{{{\varvec{u}}}} = p_{{{\varvec{v}}}} ] \ \Leftrightarrow \ (\exists \pi \in {\mathcal {B}}_I) [{{\varvec{u}}}= {{\varvec{v}}}^\pi ]. \end{aligned}$$
(1)

For finite populations, strong equity is the condition that societal preferences should be invariant to shuffles of who receives what. This is equivalent to social welfare depending only on the portions of the population receiving the various utility levels. Specifically, preferences between \({{\varvec{u}}}\) and \({{\varvec{v}}}\) can only depend on \(p_{{{\varvec{u}}}}\) and \(p_{{{\varvec{v}}}}\). While this works for, indeed characterizes, the class of infinite population models developed here, it does not work for many other infinite population models.

For a non-atomic probability space \((\Omega ,{\mathcal {F}},P)\), measure automorphisms are the generalization of bijections: a measurable function \(\pi :\Omega \rightarrow \Omega\) is a measure automorphism if \(P(E) = P(\pi ^{-1}(E))\) for all measurable \(E \subset \Omega\).Footnote 3 Given an automorphism \(\pi\) and defining \({{\varvec{u}}}^\pi (\omega ) = {{\varvec{u}}}(\pi (\omega ))\), the distributions of utility induced by \({{\varvec{u}}}\) and \({{\varvec{u}}}^\pi\) are equal to each other because \({{\varvec{u}}}^{-1}(A)\) always has the same P-mass as \(\pi ^{-1}({{\varvec{u}}}^{-1}(A))\) for any measurable \(A \subset {\mathbb {R}}\). For preferences that depend only on the distribution of utilities, this is the right set of transformations to consider. But indifference to such automorphisms need not capture the idea of equity that comes from invariance to bijections.

Two examples help pinpoint the difficulties. First, measure automorphisms need not be 1-to-1, if \(\Omega\) is the unit interval (0, 1] (rather than the limit-of-finite population model that we will use) and P is the uniform distribution, then the two-to-one, onto function \(\pi (\omega ) = 2 \omega \cdot 1_{(0, \frac{1}{2}]}(\omega ) + (2\omega - 1) 1_{(\frac{1}{2}, 1]}(\omega )\) is an automorphism. But a two-to-one \(\pi\) can have no interpretation as a switching of utility levels between generations. Second, 1-to-1 and onto functions need not be measure automorphisms. With the same probability space, the one-to-one, onto function \(\pi (\omega ) = \omega ^2\) is not a measure automorphism (except for the point mass distribution on \(\omega = 1\)).

For the non-atomic, limit-of-finite population model \({{\mathbb {I}}}\) that we use here, the appropriate equity condition is that for any almost everywhere one-to-one, onto \(\pi :{{\mathbb {I}}}\rightarrow {{\mathbb {I}}}\) and any measurable utility allocation \({{\varvec{u}}}:{{\mathbb {I}}}\rightarrow {\mathbb {R}}\), \({{\varvec{u}}}\) and \({{\varvec{u}}}^\pi\) are indifferent. This works because, in the model we use, (1) \(\pi\) is a measure automorphism iff (2) it is almost everywhere one-to-one and onto, and the model also satisfies homogeneity, (3) for any measurable \({{\varvec{u}}}, {{\varvec{v}}}:{{\mathbb {I}}}\rightarrow {\mathbb {R}}\), \(p_{{\varvec{u}}}= p_{{\varvec{v}}}\) if and only if \({{\varvec{v}}}= {{\varvec{u}}}^\pi\) for an almost everywhere one-to-one and onto function \(\pi\). Except for the need to have the “almost everywhere” qualifier to deal with null sets, this is the same homogeneity condition that finite models satisfy.

The existence of non-atomic homogeneous probability spaces was settled by von Neumann (1932). However, we use results due to Jerome Keisler (1984) who showed that when a hyperfinite sets from nonstandard analysis is given the uniform (or counting) distribution, then properties (1) and (2) are equivalent, and the model is homogeneous (3). Further, from Robinson (1964, Theorem 5.1), these hyperfinite population models can always be understood as the limits of larger and larger sequences of finite sets with the uniform distribution. As we will see, the tools of nonstandard analysis allow us to analyze these models using techniques familiar from finite sets.

1.3 Outline

The next section gives the formal setting and the relevant results on patient social welfare from KS. Of particular interest is the observation that patience—formalized as indifference to the largest class of permutations used in the previous literature—is compatible with the Pareto criterion. An example shows that this definition of patience is compatible with an almost complete abrogation of equity.

The subsequent section, Sect. 3, gives the nonstandard analysis necessary to construct the limit population models behind the work in KS. Using these tools, we add the strong equity condition, and in Sect. 4 we apply the tools to analyze applications in two environmental economics problems and one political economy problem. One of the environmental problems involves the risk of an irreversible change, the second involves very difficult to reverse changes. The political economy problem involves a study of when some counterintutive conclusions from population ethics do and do not hold within our class of social welfare functions. The final section gives a summary and discusses some of the open problems, and proofs not given in the text are gathered in the appendix.

2 Preliminaries

Equitable social preferences are those indifferent to permutations of the names of those receiving benefits. Patient preferences are those indifferent to permutations in the arrival time of benefits. For social welfare functions defined on intergenerational streams of utilities, these ideas are close to each other.

This section begins with a review of the setting and main results about large intergenerational population models used in KS. It then contrasts the KS settings and results to previous work on intergenerational equity/patience. That literature showed that it is difficult to incorporate patience, understood as a form of intergenerational equity, and still satisfy the Pareto criterion. Such findings make it hard to operationalize equitable societal preferences for intergenerational problems. And, by extension, this can turn into an argument that some form of discounting must be used.

KS gave the population model that was implicit in the literature. In this model, the purported examples of the failure of the Pareto criterion involve increasing the welfare of only a null subset of the population. KS then integrated patience/intergenerational equity and Pareto responsiveness in a class of social welfare functions. This paper adds a more thorough implementation of equity to the KS preferences.

2.1 Overview

The primitives in the model of KS are intergenerational streams of well-being, normalized to belong to \({{\varvec{W}}}\), the non-negative elements of \(\ell _\infty = \ell _\infty ({{\mathbb {N}}_0})\), \({{\mathbb {N}}_0}= \{0, 1, 2, \ldots , \}\).Footnote 4 The typical element in \({{\varvec{W}}}\) is \({{\varvec{u}}}= (u_0, u_1, u_2, \ldots )\), and the norm distance between \({{\varvec{u}}}, {{\varvec{v}}}\in {{\varvec{W}}}\) is given by \(\Vert {{\varvec{u}}}- {{\varvec{v}}}\Vert = \sup _{t \in {{\mathbb {N}}_0}} |u_t - v_t|\). Both KS and this paper study preferences over the subset of \(\Delta ({{\varvec{W}}})\) supported by norm bounded sets that can represented \(p \succ q\) if and only if \(\int _{{{\varvec{W}}}} S({{\varvec{u}}})\,dp({{\varvec{u}}}) > \int _{{{\varvec{W}}}} S({{\varvec{u}}})\,dq({{\varvec{u}}})\) where \(S:{{\varvec{W}}}\rightarrow [0, \infty )\) is norm continuous, concave, and satisfies an additional condition meant to capture aspects of equity/patience. That additional condition is that \(S({{\varvec{u}}}) > S({{\varvec{v}}})\) when \({{\varvec{u}}}\) asymptotically first order dominates \({{\varvec{v}}}\).

Definition 2.1

A stream \({{\varvec{u}}}\in {{\varvec{W}}}\) asymptotically first order dominates a stream \({{\varvec{v}}}\in {{\varvec{W}}}\), denoted \({{\varvec{u}}}\succ _{fo} {{\varvec{v}}}\), if

$$\begin{aligned} \textstyle \liminf _T \textstyle \frac{1}{T+1} \textstyle \sum _{t=0}^T (f(u_t) - f(v_t)) > 0 \end{aligned}$$
(2)

for all continuous strictly increasing \(f:{\mathbb {R}}_+ \rightarrow {\mathbb {R}}\).

Essentially, this asks that for large T, the distribution of the utilities \(\{u_t: t = 0, \ldots , T\}\) strictly first order dominates the distribution of the utilities \(\{v_t: t = 0, \ldots , T\}\). This captures a sense of patience in that it need only hold for large T, and it captures a sense of equity in that the ordering \(\succ _{fo}\) is immune to several classes of permutations.

The set \(\{\ldots , -2, -1, 0, 1, 2, \ldots \}\) is denoted \({\mathbb {Z}}\). A permutation is a 1-to-1 function \(\pi :{{\mathbb {N}}_0}\rightarrow {\mathbb {Z}}\) that is onto \({{\mathbb {N}}_0}\). Given \({{\varvec{u}}}= (u_0, u_1, u_2, \ldots ) \in \ell _\infty\) and a permutation \(\pi\), define \({{\varvec{u}}}^\pi\) as \((u_{\pi ^{-1}(0)}, u_{\pi ^{-1}(1)}, u_{\pi ^{-1}(2)}, \ldots )\). In increasing order of generality, the literature has considered the following classes of permutations, and has interpreted indifference to permutations variously as “equity,” “weak anonymity,” or “intergenerational neutrality.”

  • \(\pi\) is a shift permutation if \(\pi (T) = T-F\) for some integer F.

  • \(\pi\) is a bounded permutation if, for all T, \(|\pi (T) - T| \le F\) for some integer F.

  • \(\pi\) is a asymptotic permutation if \(\lim _{T \rightarrow \infty } |\pi (T)-T|/T = 0\).Footnote 5

It can be shown that if \(\pi\) is an asymptotic permutation, then \({{\varvec{u}}}\succ _{fo} {{\varvec{v}}}\) iff \({{\varvec{u}}}^\pi \succ _{fo} {{\varvec{v}}}\) iff \({{\varvec{u}}}\succ _{fo} {{\varvec{v}}}^\pi\). Missing in these three classes of permutations is a parallel to the strong equity condition for finite sets of generations. The key observation is that the distribution of the utilities \(\{u_t: t = 0, \ldots , T\}\) first order dominates the distribution of the utilities \(\{v_t: t = 0, \ldots , T\}\) iff it dominates after any permutation of the finite set \(\{0, \ldots , T\}\), including those that take large t’s and switch them with small t’s, e.g. \(\pi (t) = T-t\).

In this paper, we study population models that are limits, as \(T_n \rightarrow \infty\), of the finite sets \({{\mathbb {I}}}_n = \{0, \ldots , T_n\}\). The equity condition that we here require is indifference with respect to all limits of permutations \(\pi _n\), where each \(\pi _n\) is a permutation on \({{\mathbb {I}}}_n\). There are two additional points to be made.

  • First, there is an “almost everywhere” qualifier to the “all” in the previous sentence. The sequence of permutations \(\pi _n\) need only apply outside a sequence of ‘exceptional sets.’ The exceptional sets, \(E_n \subset {{\mathbb {I}}}_n\), satisfy \(\frac{\# E_n}{T_n} \rightarrow 0\). With this qualifier, if \(\pi\) is either a shift, a bounded, or an asymptotic permutation, then the sequence of restrictions of \(\pi ^{-1}\) to \({{\mathbb {I}}}_n\) defines an almost everywhere sequence of permutations.

  • Second, restricting a single \(\pi\) to the sequence \({{\mathbb {I}}}_n\) cannot e.g. interchange large t’s with small t’s when \(T_n\) is large. But a sequence \(\pi _n\) can perform this sort of interchange.

This last point is the crucial difference between what we do here and what has come before. While the literature has worked with permutations on all of \({{\mathbb {N}}_0}\), we work with sequences of permutations on sequences of approximations to \({{\mathbb {N}}_0}\). Example 2.1 shows that not having invariance with respect to this richer class of permutations allows for social preferences that totally downweight the far future. All of this is most easily seen within a tractable subclass of the KS preferences.

2.2 A tractable class of preferences

The most tractable class of preferences in KS involve non-atomic, shift-invariant probabilities, Q, on \({{\mathbb {N}}_0}\) (also known as Banach-Mazur limits when the focus is on integrals as continuous linear operators on \(\ell _\infty\)). The non-atomicity captures the ‘limit of large finite populations’ aspect of the problem, and the shift-invariance captures patience. Shift-invariance of a probability Q is the requirement that if \({{\varvec{u}}}= (u_0, u_1, u_2, \ldots )\) and \({{\varvec{v}}}= (u_1, u_2, u_3, \ldots )\), then \(\int u_t\,dQ(t) = \int v_t\,dQ(t)\). If \(g:{\mathbb {R}}_+ \rightarrow {\mathbb {R}}_+\) is bounded, then the shift-invariance of Q also delivers \(\int g(u_t)\,dQ(t) = \int g(v_t)\,dQ(t)\). Of particular note is the case that \(g = 1_A\) for \(A \subset {\mathbb {R}}\).

For each \({{\varvec{u}}}\) and Q, there is an induced distribution of generational utilities given by

$$\begin{aligned} p_{{{\varvec{u}}},Q}(A) = Q(\{t \in {{\mathbb {N}}_0}: u_t \in A\}) = \textstyle \int 1_A(u_t)\,dQ(t). \end{aligned}$$
(3)

The tractable subclass of KS preferences are those represented by utility functions of the form

$$\begin{aligned} S_{\varphi ,Q}({{\varvec{u}}}) = \textstyle \int \varphi (r)\,dp_{{{\varvec{u}}},Q}(r) \end{aligned}$$
(4)

where \(\varphi :{\mathbb {R}}_+ \rightarrow {\mathbb {R}}_+\) is strictly increasing and concave. Integrals are appearing twice in rather different ways in this class of prefences: the function \({{\varvec{u}}}\mapsto S_{\varphi ,Q}({{\varvec{u}}})\) from \({{\varvec{W}}}\) to \({\mathbb {R}}_+\) is the integral, over \({\mathbb {R}}_+\), of \(\varphi (\cdot )\) with respect to \(p_{{{\varvec{u}}},Q}\); and preferences between probabilities \(\alpha\) and \(\beta\) on \({{\varvec{W}}}\) are determined by the integrals over \({{\varvec{W}}}\), \(\int _{{{\varvec{W}}}} S_{\varphi ,Q}({{\varvec{u}}}) \,d\alpha ({{\varvec{u}}})\) and \(\int _{{{\varvec{W}}}} S_{\varphi ,Q}({{\varvec{u}}}) \,d\beta ({{\varvec{u}}})\).

The goal is to find the implications of maximizing the utility functions in the class (4) in stochastic dynamic problems. A pair of results in Robinson (1964) make the solutions both interpretable and tractable.

  • Interpretability. (Robinson 1964, Theorem 5.1) gives an alternative limit formulation for the Q’s: there is a generalized sequence (filter) of large finite population models, \(\{0, 1, 2, \ldots , T_\alpha \}\) where \(T_\alpha \rightarrow \infty\), and a sequence of probabilities, \(Q_\alpha\) on those large finite models, with the property that for each \({{\varvec{u}}}\), \(p_{{{\varvec{u}}},Q} = \lim _\alpha p_{{{\varvec{u}}},Q_\alpha }\). This means that solutions will always have interpretation as the limit of large finite horizon problems.

  • Tractability. (Robinson 1964, Theorem 3.6) shows that it is possible to represent a shift-invariant probability as a distribution Q on a population model \(\{0, 1, 2, \ldots , T\}\) where T is an ‘unlimited’ or ‘infinite’ integer (from the field of mathematics known as non-standard analysis pioneered by Robinson (1966, 1996)). In particular, this means that most of the calculations can be done using techniques involving finite sums. The restriction on Q that delivers both non-atomicity and shift-invariance is that the total differences in weights given to adjoining generations must be infinitesimal, \(\sum _{t=0}^T |Q(t+1) - Q(t)| \simeq 0\). In terms of the limit formulation, this is the requirement that \(\lim _\alpha \sum _{t=0}^{T_\alpha } |Q_\alpha (t+1) - Q_\alpha (t)| = 0\).

2.3 Equity in ‘Limit’ population models

It can be shown that if Q is shift-invariant and \(\pi\) is an asymptotic permutation, then \(p_{{{\varvec{u}}},Q} = p_{{{\varvec{u}}}^\pi ,Q}\). This guarantees that the KS preferences described in (4) satisfy previous equity, weak anonymity, and intergenerational neutrality criteria. However, they do not generally satisfy the strong equity condition we study in this paper, immunity to almost-everywhere ‘limit’ permutations. We here sketch what is involved without the nonstandard analysis tools developed in the next section. Instead, we use sequences of utility vectors in \({{\varvec{W}}}\) restricted to sequences of finite horizon models while simultaneously matching them with sequences of probabilities and sequences of permutations.

Example 2.1

(Two shift invariant distributions) For a sequence \(T_n \rightarrow \infty\), let \(\Lambda _n\) denote the uniform distribution on \(I_n:= \{0, 1, \ldots , T_n\}\) so that \(\Lambda _n(t) = 1/(T_n+1)\) for \(0 \le t \le T_n\). For a sequence \(\beta _n \uparrow 1\), pick \(T_n \rightarrow \infty\) such that \(S_n:= (1-\beta _n)\sum _{t=0}^{T_n} \beta _n^t \uparrow 1\) and let \(Q_{n}\) denote the geometric distribution with parameter \(\beta _n\) conditioned to the interval \(I_n:= \{0, \ldots , T_n\}\) so that \(Q_{n}(t) = \frac{1}{S_n} (1 - \beta _n) \beta _n^t\) for \(0 \le t \le T_n\). The limit probabilities are shift invariant because \(\sum _{t \in {{\mathbb {N}}_0}} |\Lambda _n(t+1) - \Lambda _n(t)| \rightarrow 0\) and \(\sum _{t \in {{\mathbb {N}}_0}} |Q_{n}(t+1) - Q_{n}(t)| \rightarrow 0\).

A classical result tells us that the set of finitely additive probabilities is compact if we define convergence of probabilities by \(p^\alpha \rightarrow p\) iff \(p^\alpha (B) \rightarrow p(B)\) for all B in the domain of the probability. This means that the sequences of probabilities just given have accumulation points. Robinson (1964) uses nonstandard analysis tools to facilitate working with such accumulation points.

Example 2.2

(Limit permutations) On the sequences of population models \(I_n = \{0, 1, \ldots , T_n\}\), consider the sequence of allocations \({{\varvec{u}}}_{|I_n}\) that give the early generations, \(\{0, \ldots , T_n/2\}\) a boon, with additional utility 1, and give the remaining generations no additional utility. Equity requires indifference between switching the boon between the earlier and the later generations. One of the shift invariant distributions above has this property while the other completely downweights the boon if it goes to the later generations.

  1. (a)

    For the sequence \(\Lambda _n\), the limit distribution of boons associated with the allocations \({{\varvec{u}}}_{|I_n}\) is \(\frac{1}{2}\delta _1 + \frac{1}{2}\delta _0\), that is, half of the population receives the good outcome and the remaining half does not. Further, this is invariant with respect to all sequences of permutations \(\pi _n\) on \(I_n\), including those that switch the early for the later generations.

  2. (b)

    For the sequence \(Q_n\), note that \((1-\beta _n) \sum _{t=0}^{T_n} \beta _n^t = (1-\beta _n^{T_n+1}) \rightarrow 1\) iff \((1-\beta _n) \sum _{t=0}^{T_n/2} \beta _n^t = (1-\beta _n^{T_n/2+1}) \rightarrow 1\). Therefore, the limit distribution of utilities associated with the allocations \({{\varvec{u}}}_{|I_n}\) is \(\delta _1\), but switching the early generations, \(\{0, \ldots , T_n/2\}\), for the later generations, \(\{T_n/2, \ldots , T_n\}\), makes the limit distribution of utilities \(\delta _0\).

This example shows that even the asymptotic discounting model yields a theory that is subject to Ramsey’s (1928) rather withering critique—when one considers the later enjoyments of groups of the same size, it “discount(s) later enjoyments in comparison with earlier ones, a practice which is ethically indefensible and arises merely from the weakness of the imagination.” Our response to this ethically indefensible position is the imposition of indifference to all permutations, and we will see that, in terms of the limit constructions just used, the only translation invariant Q’s that satisfy this criterion are the limits of the uniform distributions on \(\{0, 1, \ldots , T_n\}\). This is certainly intuitive—if we wish our criterion to treat generations equally, then it must weight them equally.

2.4 On patience and the Pareto criterion

The literature on intergenerational social welfare functions has various results interpreted to be indicating the difficulties of combining the Pareto criterion and patience. In this literature, patience is variously understood as invariance with respect to the shift, bounded, and asymptotic permutations described above. The non-atomic population perspective used in KS, and here, provides a different interpretation. It shows that the purported Pareto improvements used in this literature are best understood as boons given to null coalitions.Footnote 6

Using the shift invariance criterion from Marinacci’s (1998) development of “complete patience,” one can see how patience works with \({{\mathbb {N}}_0}\) as the model of generations. For a utility stream \({{\varvec{u}}}= (u_0, u_1, u_2, u_3, \ldots )\) and any finite F, consider the shift \({{\varvec{u}}}^F:= (u_F, u_{F+1}, u_{F+2}, u_{F+3}, \ldots )\). Patience can be understood as invariance with respect to such shifts. For example, suppose that \({{\varvec{u}}}\) is a sequence with \(4 \le u_t \le 7\) for all t. The stream \({{\varvec{u}}}\) is the F-shifted stream of, hence is indifferent to, either

$$\begin{aligned} {{\varvec{u}}}(0,F)&:= (\underbrace{0, 0, \ldots , 0}_{F\,\hbox {times}}, u_0, u_1, u_2, u_3, \ldots ) \ \hbox {or} \end{aligned}$$
(5)
$$\begin{aligned} {{\varvec{u}}}(9,F)&:= (\underbrace{9, 9, \ldots , 9}_{F\,\hbox {times}}, u_0, u_1, u_2, u_3, \ldots ) . \end{aligned}$$
(6)

The indifference between \({{\varvec{u}}}(0,F)\) and \({{\varvec{u}}}\) captures patience as a social willingness to wait for rewards. The indifference between \({{\varvec{u}}}(9,F)\) and \({{\varvec{u}}}\) captures a social willingness to ignore benefits accruing to a finite subset of an infinite population while waiting for the long-term pattern to start.

Invariance with respect to finite shifts and continuity seemingly lead to a violation of the Pareto criterion. Consider \({{\varvec{r}}}= (r_0,r_1, \ldots )\) with \(r_t \downarrow 0\) and compare \({{\varvec{u}}}+ {{\varvec{r}}}\) to \({{\varvec{u}}}\) assuming that preferences are represented by a uniformly continuous \(S(\cdot )\) (and KS, Theorem C shows that most of the patient social welfare functions in the literature are Lipschitz continuous). We have

$$\begin{aligned} |S({{\varvec{u}}})&- S({{\varvec{u}}}+ {{\varvec{r}}}) | \le \underbrace{ |S({{\varvec{u}}}) - S({{\varvec{u}}}^F)|}_{= \,0} \nonumber \\&\quad + \underbrace{ |S({{\varvec{u}}}^F) - S(({{\varvec{u}}}+ {{\varvec{r}}})^F)| }_{\rightarrow \,0} + \underbrace{ |S(({{\varvec{u}}}+ {{\varvec{r}}})^F) - S({{\varvec{u}}}+ {{\varvec{r}}})| }_{= \,0} . \end{aligned}$$
(7)

Indifference to finite shifts gives both of the “\(= 0\)” conclusions, and “\(\rightarrow 0\)” conclusion follows from \(\Vert {{\varvec{u}}}^F - ({{\varvec{u}}}+{{\varvec{r}}})^F\Vert = r_F \downarrow 0\). Thus, a preference relation represented by an \(S(\cdot )\) that is both shift invariant and uniformly continuous must be indifferent to improvements in an allocation that are modeled by \({{\varvec{r}}}\).

For any shift invariant Q, the allocational increase represented by \({{\varvec{r}}}\) has the property that for any \(\epsilon > 0\), the Q-mass of the population receiving less than \(\epsilon\) is equal to 1. Put more bluntly, \({{\varvec{r}}}\) represents a positive boon to only a null subset of the population. Since the earliest uses of non-atomic population models, e.g. Hildenbrand (1969), increasing the utility of a null subset of the population does not count as a Pareto improvement.

The limit formulation of null and substantial coalitions in KS are as follows: \(N \subset {{\mathbb {N}}_0}\) is null if \(\limsup _T \frac{1}{T+1} \sum _{t=0}^T 1_N(t) = 0\), and \(S \subset {{\mathbb {N}}_0}\) is substantial if \(\liminf _T \frac{1}{T+1} \sum _{t=0}^T 1_S(t) > 0\). From KS, Theorem A, the preferences satisfying respect for asymptotic first order dominance both ignore boons to null coalitions and respond to boons to substantial coalitions, i.e. they are Pareto responsive.

One can be very uncomfortable with the conclusion that \({{\varvec{u}}}(0,F)\), \({{\varvec{u}}}\) and \({{\varvec{u}}}(9,F)\) from (5) and (6) are indifferent, especially if one belongs to the first F generations. This is a cost of moving to a nonatomic population model, a model in which there are many null sets, even many infinite null sets. Applied to maximization problems, this indifference is a symptom of what is called “underselectiveness” in the parts of the operations research literature that studies the problem of maximizing the long run average performance of a stochastic dynamic system. For such problems, the outcome \({{\varvec{u}}}\) is optimal iff both \({{\varvec{u}}}(0,F)\) and \({{\varvec{u}}}(9,F)\) are also optimal.

An alternative approach is to abandon shift invariance as a patience criterion and build in patience and respect for the Pareto criterion through other methods. Jonsson and Voorneveld (2018) study a “limit of discounted utility” ordering on \({{\varvec{W}}}\) given by \({{\varvec{u}}}\succsim _{LDU} {{\varvec{v}}}\) if

$$\begin{aligned} \textstyle \liminf _{\beta \uparrow 1} \textstyle \sum _{t=0}^\infty (u_t - v_t)\beta ^t \ge 0. \end{aligned}$$
(8)

As shown above, this is subject to Ramsey’s criticism, at least if one takes seriously the idea that the infinite models should be interpretable as limits of large finite models. However, applied to \({{\varvec{u}}}(0,F)\), \({{\varvec{u}}}\) and \({{\varvec{u}}}(9,F)\) from (5) and (6), we have \({{\varvec{u}}}(9,F) \succ _{LDU} {{\varvec{u}}}\succ _{LDU} {{\varvec{u}}}(0,F)\) which may accord better with one’s intuition about what “should be” the case.

However, this respect for Pareto dominance in the classical sense rather than in the KS nonatomic population sense means that its use in maximization problems can be quite difficult. For example, one needs to evaluate outcomes involving randomness, so the relation \(\succsim _{LDU}\) needs to be extended to distributions on \({{\varvec{W}}}\). But this could be problematic: if \({\varvec{U}}\) and \({\varvec{V}}\) are independent random points in \({{\varvec{W}}}\) with the \(\{u_t: t \in {{\mathbb {N}}_0}\}\) and \(\{v_t: t \in {{\mathbb {N}}_0}\}\) both i.i.d. and having non-degenerate distributions with the same mean, then the probability that \({\varvec{U}} \succsim _{LDU} {\varvec{V}}\) or that \({\varvec{V}} \succsim _{LDU} {\varvec{U}}\) is equal to 0, that is, they are non-comparable with probability 1.Footnote 7

3 The infinite population models

This section develops the basic nonstandard analysis needed to represent the ‘limit’ objects we use in our analysis. Of central interest are ‘limit’ versions of large populations, \(\{0, 1, \ldots , T_n\}\), \(T_n \rightarrow \infty\), of ‘limit’ probabilities and ‘limit’ permutations on the large population ‘limit,’ and of ‘limit’ allocations of utilities for those large populations. The general method of construction of a ‘limit’ object in a set X is as follows: it starts with the set of sequences in X; defines an equivalence relation on the set of sequences; defines the set of equivalence classes to be the ‘limit’ objects; and denotes the set of these limit objects as \({}^*\!X\), read as “star X.” For a simple example, if \(X = {\mathbb {R}}\), then \({}^*{\mathbb {R}}\) is the set of nonstandard real numbers. For a more complicated examples, if \(X = {\mathcal {P}}_{Fin}({{\mathbb {N}}_0})\) is the set of finite subsets of \({{\mathbb {N}}_0}= \{0, 1, 2, \ldots \}\), then our ‘limit’ population models belong to \({}^*\!{\mathcal {P}}_{Fin}({{\mathbb {N}}_0})\), i.e. they are subsets of \({}^*{{\mathbb {N}}_0}\).

Notationally, a sequence in X can be denoted \(n \mapsto x_n\) with each \(x_n \in X\) when the focus is on a sequence as a function from \({\mathbb {N}}\) to X, or as \((x_1, x_2, x_3, \ldots )\), when the focus is on a sequence as an ordered list, or as \(\{x_n: {n \in {\mathbb {N}}}\}\) when the focus is on a sequence as an indexed set. The set of all sequences in X is denoted \(X^{\mathbb {N}}\). This section begins with the definition of the equivalence relation, works through the basic properties of the construction in the most familiar case, the real numbers, \({\mathbb {R}}\) and its ‘limit’ or ‘nonstandard’ version, \({}^*{\mathbb {R}}\). Following this, we develop the other tools needed for our results and applications.

To be clear, we use \({\mathbb {N}}\) as an index set for the construction of limit objects, and we use \({{\mathbb {N}}_0}\) to index the generations.

3.1 The equivalence relation

We will use a finitely additive \(\{0, 1\}\)-value probability, denoted \(\mu\), on the index set, \({\mathbb {N}}\), and define two sequences \((x_1, x_2, x_3, \ldots )\) and \((y_1, y_2, y_3, \ldots )\) to be equivalent in \(X^{\mathbb {N}}\) if \(\mu (\{{n \in {\mathbb {N}}}: x_n = y_n\}) = 1\). We will then identify each equivalence class as a point in our new “nonstandard” space, denoted \({}^*\!X\). By doing this systematically, we can extend operations such as addition, multiplication, and division, and relations such as “greater than” or “a permutation of” to these new spaces. All of this starts with an examination of the properties of \(\mu\). Let \({\mathcal {P}}(X)\) denote the class of all subsets of a set X (aka the power set of X).

Definition 3.1

A function \(\mu :{\mathcal {P}}({\mathbb {N}}) \rightarrow [0, 1]\) is purely finitely additive, zero–one probability if

  1. (1)

    for all \(A \subset {\mathbb {N}}\), \(\mu (A) = 0\) or \(\mu (A) = 1\);

  2. (2)

    \(\mu (A \cup B) = \mu (A) + \mu (B)\) for all disjoint \(A,B \subset {\mathbb {N}}\);

  3. (3)

    \(\mu ({\mathbb {N}}) = 1\); and

  4. (4)

    \(\mu (A) = 0\) if \(A \subset {\mathbb {N}}\) is finite.Footnote 8

By induction, we can replace (2) by (2\('\)), \(\mu (\cup _{k=1}^K E_k) = \sum _{k=1}^K \mu (E_k)\) for all finite collections of disjoint sets \(\{E_k: k = 1, \ldots , K\}\). Combined with (1) and (3), if \(\{E_k: k = 1, \ldots , K\}\) is a partition of \({\mathbb {N}}\), then \(\mu (E_k) = 1\) for one and only one of the sets \(E_k\).

Definition 3.2

For any set X, two sequences \((x_1, x_2, x_3, \ldots )\) and \((y_1, y_2, y_3, \ldots )\) in \(X^{\mathbb {N}}\) are equivalent, denoted

$$\begin{aligned} (x_1, x_2, x_3, \ldots ) \sim (y_1, y_2, y_3, \ldots ),\end{aligned}$$

if they are \(\mu\)-almost everywhere equal, \(\mu (\{{n \in {\mathbb {N}}}: x_n = y_n\}) = 1\).

Since the sets \(\{{n \in {\mathbb {N}}}: x_n = y_n\}\) and \(\{{n \in {\mathbb {N}}}: x_n \ne y_n\}\) partition \({\mathbb {N}}\), one and only one of them has \(\mu\)-mass 1, and an elementary check of the properties of \(\mu\) yields the following. For completeness, the proof of this and other results not given in the text are gathered in the appendix.

Lemma 3.1

The relation \(\sim\) is an equivalence relation on \(X^{\mathbb {N}}\).

Equivalence classes are central to the development, and they have their own notation: the equivalence class of an \((x_1, x_2, x_3, \ldots ) \in X^{\mathbb {N}}\) is defined as the set \(\{y \in X^{\mathbb {N}}: (x_1, x_2, x_3, \ldots ) \sim (y_1, y_2, y_3, \ldots )\}\), and it is denoted \(\langle x_1, x_2, x_3, \ldots \rangle\).

Definition 3.3

The nonstandard version of a set X is denoted as \({}^*\!X\) and defined as \(\{ \langle x_1, x_2, x_3, \ldots \rangle : (x_1, x_2, x_3, \ldots ) \in X^{\mathbb {N}}\}\).

We will use this construction for sets X of various degrees of complexity. When X is the set of finite subsets of \({{\mathbb {N}}_0}\), \({}^*\!X\) will contain our population models, \({{\mathbb {I}}}\). When X is the set of probabilities on finite subsets of \({{\mathbb {N}}_0}\), we will have distributions on our population model. The value of such a probability is the equivalence class of a sequence of numbers in [0, 1], that is, it is a number in \({}^*[0, 1]\). As a preview of the developments below: the Unicity Lemma (Lemma 3.3 below) shows how to go from probabilities taking values in \({}^*[0, 1]\) to probabilities taking values in [0, 1]; and Loeb’s Theorem (Theorem B below) shows how to extend these to the appropriate \(\sigma\)-field of subsets of \({{\mathbb {I}}}\).

3.2 Expansions and unicity

There is a natural embedding of X in \({}^*\!X\): every \(x \in X\) is identified as the equivalence class of the constant sequence \((x, x, x, \ldots )\). The corresponding point in \({}^*\!X\) is still denoted x, that is, \(x = \langle x, x, x, \ldots \rangle \in {}^*\!X\). This leads to the distinction between standard objects and nonstandard objects: if \(x \in X\), then the point \(x = \langle x, x, x, \ldots \rangle \in {}^*\!X\) is called standard; and if \(x = \langle x_1, x_2, x_3, \ldots \rangle \in {}^*\!X\) is not standard, then it is called nonstandard.

The next two results use the properties of \(\mu\) on partitions of the index set \({\mathbb {N}}\) in a central way. The first tells us when to expect new nonstandard objects to exist in \({}^*\!X\), and the second is a ‘unicity’ result.

Lemma 3.2

Every \(x \in {}^*\!A\) is standard iff A is finite.

Proof

Suppose that \(A = \{a_k: k = 1, \ldots , K\}\) is finite and that \(x = \langle x_1, x_2, x_3, \ldots \rangle \in {}^*\!A\). For each \(k = 1, \ldots , K\), define \(E_k:= \{{n \in {\mathbb {N}}}: x_n = a_k\}\). The \(E_k\) form a partition of \({\mathbb {N}}\). Hence one and only one of them, say \(E_{k'}\) has \(\mu (E_{k'}) = 1\), that is, \(x = a_{k'}\).

Suppose now that A is infinite. It contains a set \(\{a_n: {n \in {\mathbb {N}}}\}\) with \(a_n \ne a_m\) for \(n \ne m\). Define \(x = \langle a_1, a_2, a_3, \ldots \rangle\). We have \(x \ne a\) for any \(a \in A\) because \(\{{n \in {\mathbb {N}}}: a_n = a\}\) contains at most 1 element, hence has \(\mu\)-mass 0. \(\square\)

The next result shows that for every \(x \in {}^*[0, 1]\), there is a unique number \(h \in [0, 1]\) such that for all standard \(\epsilon > 0\), \(|h - x| < \epsilon\). Our probabilities will take values in \({}^*[0, 1]\), and this result allows us to change them to probabilities taking values in [0, 1]. This unicity result is part of the ‘genius’ of the \(\mu\)-almost everywhere construction, it shows that even if \(x_n\) is a sequence with \(\liminf _n x_n < \limsup _n x_n\), \(\langle x_1, x_2, x_3, \ldots \rangle\) is as close to a number as one could ever hope.Footnote 9 We state the result for the interval [0, 1], but the argument clearly applies to any interval [ab].

Lemma 3.3

(Unicity) If \(\mu(\{n \in \mathbb{N}: x_n \in [0, 1]\}) = 1\) then there exists a unique \(h \in [0, 1]\) such that for all standard \(\epsilon > 0\), \(\mu(\{n \in \mathbb{N}: |x_n - h| < \epsilon\}) = 1\).

Proof

For each \(m \in {\mathbb {N}}\), the interval [0, 1] can be covered by the \(2^m + 1\) disjoint half-open dyadic intervals \((-\frac{1}{2^m}, 0]\), \((\frac{0}{2^m}, \frac{1}{2^m}]\), \((\frac{1}{2^m}, \frac{2}{2^m}]\), etc. Denote these as \(A_{m,k}\) and define \(E_{m,k} = \{{n \in {\mathbb {N}}}: x_n \in A_{m,k}\}\). This is a partition of \({\mathbb {N}}\), hence \(\mu (E_{m,k}) = 1\) for exactly one k, denoted k(m). For each m, we have \(A_{m+1,k(m+1)} \subset A_{m,k(m)}\), and as the diameter of \(A_{m,k(m)}\) decreases to 0, there is a unique h in the intersection of the closures of the \(A_{m,k(m)}\). For any standard \(\epsilon > 0\), once \(1/2^m < \epsilon\), we have \(E_{m,k(m)} \subset \{{n \in {\mathbb {N}}}: |x_n - h| < \epsilon \}\) so \(\mu (\{{n \in {\mathbb {N}}}: |x_n - h| < \epsilon \}) = 1\). \(\square\)

3.3 Monads and ‘Limit’ objects in \({\mathbb {R}}\)

The set \({}^*{\mathbb {R}}\) of all \(\sim\)-equivalence classes of sequences in \({\mathbb {R}}^{\mathbb {N}}\) is called the set of hyperreals. We directly extend the relation “<” and the operations of addition and multiplication from \({\mathbb {R}}\) to \({}^*{\mathbb {R}}\) using the “\(\mu\)-almost everywhere” idea as follows: \(\langle x_1, x_2, x_3, \ldots \rangle < \langle y_1, y_2, y_3, \ldots \rangle\) iff \(\mu (\{{n \in {\mathbb {N}}}: x_n < y_n\}) = 1\); \(\langle x_1, x_2, x_3, \ldots \rangle + \langle y_1, y_2, y_3, \ldots \rangle = \langle x_1+y_1, x_2+y_2, x_3+y_2, \ldots \rangle\); and \(\langle x_1, x_2, x_3, \ldots \rangle \cdot \langle y_1, y_2, y_3, \ldots \rangle = \langle x_1 \cdot y_1, x_2 \cdot y_2, x_3 \cdot y_2, \ldots \rangle\). In principle, the relation should be written “\(\,{}^*\!\!<\),” and the operations should be written “\(\,{}^*\!\!+\)” and “\(\,{}^*\!\,\cdot\),” but the notational burden is too high, so we continue to use “<,” “\(+\)” and “\(\,\cdot\).”

Remember that 0 and 1 are the equivalence classes of the sequences constant at 0 and 1 respectively. We define \(x \in {}^*{\mathbb {R}}\) to be strictly positive if \(0 < x\) (i.e. \(\mu (\{{n \in {\mathbb {N}}}: 0 < x_n\}) = 1\)). It is easy to check that properties of \(\mu\) deliver the following elementary facts: if x is strictly positive, then for all \(y \in {}^*{\mathbb {R}}\), \(y < y+x\); \(y + x = y\) for all \(y \in {}^*{\mathbb {R}}\) iff \(x = 0\); and \(y \cdot x = y\) for all \(y \in {}^*{\mathbb {R}}\) iff \(x = 1\).

Our first example of extending a \({\mathbb {R}}\)-valued function on \({\mathbb {R}}\) is the absolute value function, for \(x = \langle x_1, x_2, x_3, \ldots \rangle\), \(|x| = \langle |x_1|, |x_2|, |x_3|, \ldots \rangle\). It is worth bearing in mind the following examples: \(dx = \langle 1, \frac{1}{2}, \frac{1}{3}, \ldots \rangle\) which satisfies \(0< |x| < \epsilon\) for every standard positive \(\epsilon\) as \(\{{n \in {\mathbb {N}}}: 0< |\frac{1}{n}| < \epsilon \}\) has a finite complement and \(\mu (A) = 0\) for all finite sets; \(y = \frac{1}{dx} = \langle 1, 2, 3, \ldots \rangle\) which satisfies \(|y| > B\) for every standard positive B for the same reason; and \(z = \langle z_1, z_2, z_3, \ldots \rangle\) where \(n \mapsto z_n\) is a bounded sequence, hence satisfies \(|z| \le B\) for some standard positive B. By the Unicity Lemma, there is a unique \(h \in {\mathbb {R}}\) such that \(|z-h| < \epsilon\) for all standard \(\epsilon > 0\).

Definition 3.4

The infinitesimals are the \(x \in {}^*{\mathbb {R}}\) that satisfy, for all standard \(\epsilon > 0\), \(|x| < \epsilon\) so that 0 is the only standard infinitesimal; the limited or finite elements are the \(x \in {}^*{\mathbb {R}}\) that satisfy, for some standard \(B > 0\), \(|x| \le B\); and the unlimited or infinite elements are the \(x \in {}^*{\mathbb {R}}\) such that for all standard \(B > 0\), \(|x| > B\).

The infinitesimals are limited because they satisfy e.g. \(|x| \le 1\). Further, the product or sum of two infinitesimals is also infinitesimal because, for arbitrary standard strictly positive \(\epsilon < 1\), if \(|x| < \epsilon\) and \(|y| < \epsilon\) in \({}^*{\mathbb {R}}\), then \(|xy| < \epsilon\) and \(|x+y| < 2 \epsilon\). In more detail, the statement about the absolute value of the sum follows from setting \(N_x = \{{n \in {\mathbb {N}}}: |x_n| < \epsilon \}\) and \(N_y = \{{n \in {\mathbb {N}}}: |y_n| < \epsilon \}\), then noting that \(\mu (N_x) = \mu (N_y) = 1\) implies that \(\mu (N_x \cap N_y) = 1\), and that \(N_x \cap N_y \subset \{{n \in {\mathbb {N}}}: |x_n+y_n| < 2 \epsilon \}\).

For \(x, y \in {}^*{\mathbb {R}}\), we write \(x \simeq y\) if \(x-y\) is infinitesimal, equivalently, if \((x-y) \simeq 0\).

Definition 3.5

For \(r \in {\mathbb {R}}\), the set of \(r' \in {}^*{\mathbb {R}}\) with \(r-r' \simeq 0\) is called the monad of r, written \(\textrm{mon}(r) = \{r' \in {}^*{\mathbb {R}}: r \simeq r'\}\). If \(r' \in \textrm{mon}(r)\) for some \(r \in {\mathbb {R}}\), then we write \(r = \textrm{st}(r')\) or \(r = {}^\circ r'\) and say that r is the standard part of \(r'\) and that \(r'\) is nearstandard.

Lemma 3.4

If it exists, then the standard part of a \(r' \in {}^*{\mathbb {R}}\) is unique.

Proof

From the triangle inequality, \(|r'-r| + |r'-s| \ge |r-s|\), so if \(r \ne s\) are both the standard part of \(r'\), then \(|r-s| \simeq 0\), but the only standard infinitesimal is 0, hence \(r = s\). \(\square\)

We have used the triangle inequality and the relation “\(\ge\)” here. The triangle inequality holds because \(\mu (\{{n \in {\mathbb {N}}}: |x_n - y_n| + |y_n - z_n| \ge |x_n - z_n|\}) = 1\) for any sequences \(n \mapsto x_n\), \(n \mapsto y_n\) and \(n \mapsto z_n\), and the relation \(\ge\) is defined, in what should begin to look like the “usual procedure,” by \(x \ge y\) in \({}^*{\mathbb {R}}\) if \(\mu (\{{n \in {\mathbb {N}}}: x_n \ge y_n\}) = 1\).

The condition that the standard part exists is simple.

Lemma 3.5

A hyperreal \(r' \in {}^*{\mathbb {R}}\) is limited iff \(\textrm{st}(r')\) exists iff \(r'\) is nearstandard.

The argument for this uses the logic of Unicity Lemma above and we sketch it here: if \(r'\) is limited, then for some standard \(B > 0\), \(-B \le r' \le +B\), the interval \([-B, +B]\) can be covered by an increasingly fine sequence of half-open intervals, and \(\textrm{st}(r')\) belongs to the intersection of the closure of a nested subsequence of these intervals; if \(\textrm{st}(r') = r\), then \(r'\) is nearstandard; and if \(r'\) is nearstandard and \(\textrm{st}(r') = r\), then we can take \(B = |r|+1\) to show that \(r'\) is limited.

The idea of infinitesimals has a long and productive history (see e.g. (Robinson 1996, Ch. X)). Intuitively: a function \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is continuous at \(x \in {\mathbb {R}}\) if for every non-zero infinitesimal dx, \(f(x + dx) \simeq f(x)\), and \(f'(x) = r\) if \(\frac{f(x+dx)-f(x)}{dx} \simeq r\). To make these and related ideas precise we need to extend functions on \({\mathbb {R}}\) to functions on \({}^*{\mathbb {R}}\).

3.4 ‘Limit’ functions and ‘Limit’ sets

For any set X, any \(A \subset {X}\) can be enlarged to \({}^*\!A \subset {}^*\!X\) by applying the “\(\mu\)-almost everywhere” construction as follows: \((x_1, x_2, x_3, \ldots ) \in X^{\mathbb {N}}\), \(\langle x_1, x_2, x_3, \ldots \rangle \in {}^*\!A\) iff \(\mu (\{{n \in {\mathbb {N}}}: x_n \in A\}) = 1\). For example, regarding the relation “<” as a subset of \({\mathbb {R}}\times {\mathbb {R}}\), we have \((x,y) \in ({}^*\!\!<)\) iff \(\mu (\{{n \in {\mathbb {N}}}: (x_n,y_n) \in \, < \}) = 1\) iff \(\mu (\{{n \in {\mathbb {N}}}: x_n < y_n\}) = 1\), which agrees with the definition given above.

The graph of a function \(f:X \rightarrow Y\) is a subset of \(X \times Y\), that is \(gr(f) = \{(x,y): x \in X, \ y \in Y, \ \hbox {and} \ y = f(x)\}\). Taking \({}^*\!gr(f)\) as the definition of the function \({}^*\!f: {}^*\!X \rightarrow {}^*\!Y\), we have

$$\begin{aligned} {}^*\!f(\langle x_1, x_2, x_3, \ldots \rangle ) = \langle f(x_1), f(x_2), f(x_3), \ldots \rangle . \end{aligned}$$
(9)

If it is clear from context that the domain is \({}^*\!X\) rather than X and the range is \({}^*\!Y\) rather than Y, the function \({}^*\!f\) may be denoted by f.

A sequence in X can be regarded as a function \(n \mapsto x_n\) from \({\mathbb {N}}\) to X. Its extension is a function from \({}^*{\mathbb {N}}\) to \({}^*\!X\), sometimes known as a hypersequence. Limit properties of a sequence are determined by the values of the hypersequence at unlimited elements of \({}^*{\mathbb {N}}\). The leading example of this can be used to give continuity and differentiablity infinitesimal formulations.

Lemma 3.6

For a bounded sequence \(n \mapsto r_n\) in \({\mathbb {R}}\) and \(r, s, t \in {\mathbb {R}}\); \(r \le \liminf _n r_n\) iff for all unlimited \(N \in {}^*{\mathbb {N}}\), \(r \le \textrm{st}(r_N)\); \(s = \lim _n r_n\) iff for all unlimited \(N \in {}^*{\mathbb {N}}\), \(\textrm{st}(r_N) = s\); and \(\limsup _n r_n \le t\) iff for all unlimited \(N \in {}^*{\mathbb {N}}\), \(\textrm{st}(r_N) \le t\).

We will have bounded sequences of utilities \({{\varvec{u}}}\), a population model \({{\mathbb {I}}}= \langle I_1, I_2, I_3, \ldots \rangle\), \(I_n = \{0, 1, \ldots , T_n\}\) and probabilities P on \({{\mathbb {I}}}\). The “\(\mu\)-almost everywhere” construction gives the integral of \({{\varvec{u}}}\) against P the form of a finite sum. Let \({{\varvec{u}}}_{|I_n}\) denote the restriction of the sequence \({{\varvec{u}}}\) to the interval \(I_n\) and let \(\Lambda _n\) denote the uniform distribution on \(I_n\). The average, or integral, of \({{\varvec{u}}}_{|I_n}\) with respect to \(\Lambda _n\) over the set \(I_n\) is \(\frac{1}{T_n+1} \sum _{t=0}^{T_n} u_t\). With \(\Lambda\) denoting the ‘limit’ version of the uniform probability \(\langle \Lambda _1, \Lambda _2, \Lambda _3, \ldots \rangle\), the integral of \({{\varvec{u}}}\) with respect to \(\Lambda\) is \(\textrm{st}(\frac{1}{T+1} \sum _{t=0}^{T} u_t)\) where \(T = \langle T_1, T_2, T_3, \ldots \rangle\) and the summation is defined as an extension using, as usual, the “\(\mu\)-almost everywhere” construction. The argument used for the Unicity Lemma and for Lemma 3.5 shows that this standard part is well-defined.

We hope it is becoming clear that the “\(\mu\)-almost everywhere” construction can be widely applied. Being able to see what is happening without all of the sequences will be a drastic simplification. We now begin to develop the tools that allow it.

3.5 The transfer principle and internal sets

Despite its simplicity, the transfer principle has proved to be immensely useful.

Lemma 3.7

(A Simple Transfer Principle) For set a set X and \(A, B \subset X\), \(A \subset B\) iff \({}^*\!A \subset {}^*\!B\).

Proof

If \(A \subset B\) and \(a = \langle a_1, a_2, a_3, \ldots \rangle \in {}^*\!A\), then \(\mu (\{{n \in {\mathbb {N}}}: a_n \in A\}) = 1\), and since \(A \subset B\), \(\mu (\{{n \in {\mathbb {N}}}: a_n \in B\}) = 1\), so \(a \in {}^*\!B\). If there exists \(x \in A\) with \(x \not \in B\), then \(x = \langle x, x, x, \ldots \rangle \not \in {}^*\!B\). \(\square\)

This is very useful because a formal statement “if \({\mathbb {A}}\) holds, then \({\mathbb {B}}\) holds” can be rewritten as a subset relation, \(A \subset B\). This set theory rewrite has A denoting the set of instances in which the statement “\({\mathbb {A}}\)” holds and B the set of instances in which the statement “\({\mathbb {B}}\)” holds. Viewed this way, the transfer principle says, loosely, that, “a statement is true in the standard model iff the corresponding statement ‘with stars everywhere’ is true in the nonstandard model.”

Most often, the sets A and B will themselves be collections of sets. The subtlety arises with the need to account for the implications of having the “stars” before A and B. An example makes the point.

Example 3.1

Let \({\mathcal {A}}\) denote the class of non-empty subsets of \({\mathbb {R}}\) that are bounded below, and let \({\mathcal {B}}\) denote the class of non-empty subsets of \({\mathbb {R}}\) that have a greatest lower bound (aka \(\inf\)). The statement that a non-empty subset of \({\mathbb {R}}\) that is bounded below has a greatest lower bound is \({\mathcal {A}}\subset {\mathcal {B}}\).

Let \(A \subset {}^*{\mathbb {R}}\) denote the set of standard numbers in \({}^*{\mathbb {R}}\) that are strictly positive; it is bounded below, but it does not have a greatest lower bound in \({}^*{\mathbb {R}}\): no strictly positive standard \(\epsilon > 0\) is a lower bound, any strictly positive \(\epsilon \simeq 0\) is a lower bound, but so is \(2 \cdot \epsilon\). Hence to claim \(A \subset {}^*{\mathcal {B}}\) will be a mistake.

The mistake is due to the fact that the set A in Example 3.1 does not belong to \({}^*\!{\mathcal {A}}\), i.e. it is not of the form \(\langle A_1, A_2, A_3, \ldots \rangle\) for a sequence of subsets of \({\mathbb {R}}\). There is a name for the particular kind of sets needed to avoid such mistakes. The set A in the previous example is not an internal subset of \({}^*{\mathbb {R}}\), and the transfer principle, in its statement, only concerns the internal sets.

Definition 3.6

If X is a set and \({\mathcal {P}}(X)\) is the class of all subsets of X, then the internal subsets of \({}^*\!X\) are the elements of \({}^*\!{\mathcal {P}}(X)\).

In terms of the \(\mu\)-almost everywhere construction, A is an internal subset of \({}^*\!X\) iff \(A = \langle A_1, A_2, A_3, \ldots \rangle\) where \(\mu (\{{n \in {\mathbb {N}}}: A_n \in {\mathcal {P}}(X)\}) = 1\). For us, the most usesful class of internal sets are the hyperfinite ones.

Definition 3.7

If X is a set and \({\mathcal {P}}_{Fin}(X)\) is the class of non-empty finite subsets of X, then an internal subset of \({}^*\!X\) is hyperfinite if it belongs to \({}^*\!{\mathcal {P}}_{Fin}(X)\).

Taking \(X = {{\mathbb {N}}_0}\), our population models are the hyperfinite sets of the form \({{\mathbb {I}}}= \langle I_1, I_2, I_3, \ldots \rangle\) where \(I_n = \{0, 1, \ldots , T_n\}\) and \(T_n \rightarrow \infty\). Two properties of such hyperfinite sets play a role below. First, for any \(t \in {{\mathbb {N}}_0}\), \(\mu (\{{n \in {\mathbb {N}}}: t \in I_n\}) = 1\), that is, our model of the infinite population contains each and every \(t \in {{\mathbb {N}}_0}\). And second, despite containing every \(t \in {{\mathbb {N}}_0}\), \({{\mathbb {I}}}\) “acts like” a finite set, e.g. there is an integer \(N \in {}^*{\mathbb {N}}\) and bijection between \(\{1, \ldots , N\}\) and \({{\mathbb {I}}}\). This follows by transfer: let \({\mathcal {A}}\) denote the subsets \(A \subset {{\mathbb {N}}_0}\) for which there is a bijection between A and some initial segment \(\{1, \ldots , N\}\) and let \({\mathcal {B}}\) denote the finite non-empty subsets of \({{\mathbb {N}}_0}\); we have \({\mathcal {A}}\subset {\mathcal {B}}\subset {\mathcal {A}}\), and by transfer, \({}^*\!{\mathcal {A}}\subset {}^*\!{\mathcal {B}}\subset {}^*\!{\mathcal {A}}\).

3.6 Probabilities on hyperfinite sets

Before developing probabilities on hyperfinite sets, we review the fundamental properties of probabilites. The domain of a probability P is the class of sets B for which the probability, P(B), is defined. The domain is always a field, and most often, it has an additional property that makes it a \(\sigma\)-field.

Definition 3.8

An \({\mathcal {F}}\subset {\mathcal {P}}(\Omega )\) is a field if satisfies (a), (b), and (c), and if it also satisfies (d), then it is a \(\sigma\)-field.

  1. (a)

    \(\emptyset , \Omega \in {\mathcal {F}}\).

  2. (b)

    For \(A \in {\mathcal {F}}\), \(A^c \in {\mathcal {F}}\).

  3. (c)

    For any finite \(\{A_k: n = 1, \ldots , K\} \subset {\mathcal {F}}\), \(\cup _{k=1}^K A_k \in {\mathcal {F}}\).

  4. (d)

    For any countable \(\{A_k: k \in {\mathbb {N}}\} \subset {\mathcal {F}}\), \(\cup _{k \in {\mathbb {N}}} A_k \in {\mathcal {F}}\).

We know that \((\cup _i E_i)^c = \cap _i E_i^c\) and that a field is closed under complements. Hence, the assumption of closure under finite or countable unions could as well be written as closure under finite or countable intersections.

Lemma 3.8

If A is an internal subset of \({}^*\!X\) for some set X, then the class of internal subsets of A form a field.

We will soon show that the class of internal sets is not a \(\sigma\)-field except when X is finite and the distinction between a field and a σ-field is moot. The following set difference and symmetric set difference will be of use.

Definition 3.9

The set difference of sets \(A, B \subset X\) is denoted \(A {\setminus } B\) and defined as \(\{x \in X: x \in A, \ x \not \in B\}\), i.e. as \(A \cap B^c\). The symmetric difference of two sets A and B is denoted \(A \Delta B\) and defined as \((A \cap B^c) \cup (B \cap A^c) = (A {\setminus } B) \cup (B {\setminus } A)\).

It is the closure of a \(\sigma\)-field under countable unions and intersection that guarantees that limit events have probabilities. For example, the strong law of large numbers is the statement that with probability 1, the sample average of an i.i.d. sequence will converge to the theoretical average. The point is that we must assign probability to the event that the sample average converges, and this event is only expressible using countable unions and intersections.

There is a property of probabilities, countable additivity, that is complementary to the domain being a \(\sigma\)-field.

Definition 3.10

For a \(\sigma\)-field \({\mathcal {F}}\) of subsets of a set \(\Omega\), a finitely additive probability is a function \(P:{\mathcal {F}}\rightarrow [0, 1]\) that satisfies (a) and (b), and if it also satisfies (c), then it is a countably additive probability.

  1. (a)

    \(P(\Omega ) = 1\).

  2. (b)

    For finite disjoint collections \(\{A_k: k = 1, \ldots K\} \subset {\mathcal {F}}\), \(P(\cup _{k=1}^K A_k = \sum _{k=1}^K P(A_k)\).

  3. (c)

    For countable disjoint collections \(\{A_n: {n \in {\mathbb {N}}}\} \subset {\mathcal {F}}\), \(P(\cup _{{n \in {\mathbb {N}}}} A_n) = \sum _{{n \in {\mathbb {N}}}} P(A_n)\).

The vast majority of probability theory work uses countably additive probabilities on \(\sigma\)-fields. Without countable additivity, basic limit results such as the weak and the strong law of large numbers, or even the Borel-Cantelli lemma do not hold. To do the work we intend to with our model, countable additivity and \(\sigma\)-fields are needed.

3.7 Saturation and internal sets

In \({\mathbb {R}}\), the intersection of the nested sequence of sets \((0, \frac{1}{k})\), \(k \in {\mathbb {N}}\), is empty. By contrast, in \({}^*{\mathbb {R}}\), the intersection of the nested sequence of internal sets \({}^*(0, \frac{1}{k})\), \(k \in {\mathbb {N}}\), is the non-empty set of strictly positive infinitesimals. More generally, the countable intersection of a decreasing sequence of internal sets is always non-empty.

Theorem A

If \(A^1 \supset A^2 \supset \ldots \supset A^k \supset \ldots\) is a decreasing sequence of of non-empty internal subsets of an internal set \({}^*\!X\), then

$$\begin{aligned} \textstyle \bigcap _{k \in {\mathbb {N}}} A^k \ne \emptyset . \end{aligned}$$
(10)

This is called the countable saturation property of internal sets. There are strong parallels between non-empty internal subsets of an internal set and non-empty compact subsets of a metric space. If the \(A^k\) were a nested sequence of non-empty compact subsets of a metric space, then the same non-empty intersection conclusion would hold. One can see a bit more of the parallel in the proofs found in the literature, references are provided in the appendix. It is a variant of the diagonalization argument used to show that countable products of compact metric spaces are compact.

For our purposes, the following two implications of Theorem A will be most useful.

Corollary A.1

(Spillover) Suppose that A is an internal subset of \({}^*{{\mathbb {N}}_0}\). If A contains arbitrarily large limited integers, then it contains an unlimited integer, and if A contains arbitrarily small unlimited integers, then it contains a limited integer.

The proof of the first part uses countable saturation on the sets \(B^k:= \{t \in A: t \ge k\}\), the proof of the second part uses transfer on the statement that every non-empty subset of \({{\mathbb {N}}_0}\) contains its lower bound. The next Corollary gives the sense in which the field of internal subsets of \({{\mathbb {I}}}\) is as far as possible from being a \(\sigma\)-field.

Corollary A.2

If \(\{A^k: k \in {\mathbb {N}}\}\) is a nested decreasing collection of internal subsets of an internal set \({}^*\!X\), then \(\bigcap _{k \in {\mathbb {N}}} A^k\) is internal iff it is equal to \(\bigcap _{k \le K} A^k\) for some \(K \in {\mathbb {N}}\), and if \(\{A^k: k \in {\mathbb {N}}\}\) is a nested increasing collection of internal sets, then \(\bigcup _{k \in {\mathbb {N}}} A^k\) is internal iff it is equal to \(\bigcup _{k \le K} A^k\) for some \(K \in {\mathbb {N}}\).

3.8 ‘Limit’ probabilities and Loeb measures

We find ourselves in the following situation: we have an internal set \({{\mathbb {I}}}= \langle I_1, I_2, I_3, \ldots \rangle\): we have internal \({}^*[0, 1]\)-valued probabilities on \({{\mathbb {I}}}\), \(P = \langle P_1, P_2, P_3, \ldots \rangle\) where each \(P_n\) is a probability on the finite set \(I_n\); and for each internal \(A = \langle A_1, A_2, A_3, \ldots \rangle\) in the field, not \(\sigma\)-field, of internal subsets of \({{\mathbb {I}}}\), we have the [0, 1]-valued, finitely additive probability \(P(A) = {}^\circ \langle P_1(A_1), P_2(A_2), P_3(A_3), \ldots \rangle\) (well-defined by the Unicity Lemma). The finite additivity of \(P(\cdot )\) arises because each \(P_n\) defining P is a probability on a finite set. But \(P(\cdot )\) is not countably additive on a \(\sigma\)-field of sets because it is not defined for anything but internal sets, and Corollary A.2 tells us that the class of internal sets is not a \(\sigma\)-field. It was Loeb’s pioneering Loeb (1971), Loeb (1975) that allowed us to take a finitely additive probability P on a field of internal sets and extend it to a countably additive probability, still denoted P, on the \(\sigma\)-field generated by the field of internal sets.Footnote 10

Theorem B

(Loeb) If X is an internal set, \(\mathcal{X}^{\,\circ}\) is the field of internal subsets of X, \({\mathcal {X}}\) is the \(\sigma\)-field generated by \(\mathcal{X}^{\,\circ}\), and \(P:\mathcal{X}^{\,\circ} \rightarrow [0, 1]\) is a finitely additive probability, then

  1. (1)

    P has a unique countably additive extension, also denoted P, from the \(\mathcal{X}^{\,\circ}\) to \({\mathcal {X}}\), and

  2. (2)

    for any \(B \in {\mathcal {X}}\), there is an internal \(B^\circ \in \mathcal{X}^{\,\circ}\) with \(P(B \Delta B^\circ ) = 0\).

Corollary A.2 told us that the class of internal sets fails to be a \(\sigma\)-field by not containing any countable unions or intersections that are not also finite unions or intersections. This seems to be saying that the field of internal sets is “as far from” being a \(\sigma\)-field as possible. But the last part of Theorem B tells us that for probability theory, the difference does not matter, and the distance is “as small as possible.”

Fix a hyperfinite set \({{\mathbb {I}}}\), let \(\mathcal{I}^{\,\circ}\) denote the field of internal subsets of \({{\mathbb {I}}}\) and \({\mathcal {I}}\) the \(\sigma\)-field generated by \(\mathcal{I}^{\,\circ}\). The uniform distribution on the measure space \(({{\mathbb {I}}}, {\mathcal {I}})\) is the Loeb extension of the finitely additive probability \(\Lambda (A) = {}^\circ (\#A/\#{{\mathbb {I}}})\).Footnote 11 If \({{\varvec{u}}}:{{\mathbb {I}}}\rightarrow {\mathbb {R}}\) is a measurable functions, i.e. \({{\varvec{u}}}^{-1}((-\infty , r]) \in {\mathcal {I}}\), then it induces the distribution having the cdf \(F_{{{\varvec{u}}}}(r):= P(\{t \in {{\mathbb {I}}}: {{\varvec{u}}}(t) \le r\})\).

The following example of the \(\mu\)-almost everywhere construction is central to what we do.

Definition 3.11

An internal bijection on \({{\mathbb {I}}}= \langle I_1, I_2, I_3, \ldots \rangle\) is the equivalence class \(\pi = \langle \pi _1, \pi _2, \pi _3, \ldots \rangle\) where \(\mu (\{{n \in {\mathbb {N}}}: \pi _n:I_n \leftrightarrow I_n \ \hbox {is a bijection}\}) = 1\).

If \(\pi :{{\mathbb {I}}}\leftrightarrow {{\mathbb {I}}}\) is an internal bijection, then B and \(\pi (B)\) have the same cardinality for any internal \(B \in \mathcal{X}^{\,\circ}\), hence \(\Lambda (B) = \Lambda (\pi ^{-1}(B))\). Using Theorem B(2), this extends to \(B \in {\mathcal {X}}\), implying that the measurable \(t \mapsto {{\varvec{v}}}(t):= {{\varvec{u}}}(\pi (t))\) induces the same distribution as \({{\varvec{u}}}\). The reverse is also true.

Jerome Keisler (1984) systematically extended Anderson’s (1976) hyperfinite treatment of Brownian motion to a hyperfinite treatment of more general stochastic processes. We will borrow a representation result that he developed for the solutions to stochastic differential equations and adapt it to hyperfinite population models.

Definition 3.12

A probability space \((\Omega , {\mathcal {F}}, P)\) is homogenous if two random variables \(X, Y:\Omega \rightarrow {\mathbb {R}}\) induce the same distribution iff there is a measurable bijection \(\pi :\Omega \leftrightarrow \Omega\) such that \(X = Y \circ \pi\) P-almost everywhere.

The following is (Jerome Keisler 1984, Theorem 9.2, p. 134), and it delivers the ‘limit’ version the homogeneity property discussed for finite sets in §1.

Theorem C

(Keisler) The probability space \(({{\mathbb {I}}}, {\mathcal {I}}, \Lambda )\) is homogenous and the bijections (in Definition 3.12) can be taken to be internal.

The uniform distribution in this result can be changed and still have homegeneity, but the change must be infinitesimal. One can show that if P is an internal \({}^*[0, 1]\)-valued probability on the internal subsets of \({{\mathbb {I}}}\), then, with P denoting its extension to \({\mathcal {I}}\), \(({{\mathbb {I}}}, {\mathcal {I}}, P)\) is homogenous iff \(\sum _t |P(t) - \Lambda (t)| \simeq 0\). In particular, the extension must equal \(\Lambda\).

An internal \(\Lambda\)-almost everywhere bijection if it is a bijection except perhaps on an internal set of exceptions, \(E = \langle E_1, E_2, E_3, \ldots \rangle\), with \(\Lambda (E) = 0\). One could as easily state Keisler’s theorem using \(\Lambda\)-almost everywhere bijections instead of asking that \({{\varvec{v}}}= {{\varvec{u}}}\circ \pi\) on a set having \(\Lambda\)-mass 1.

3.9 Invariance with respect to internal bijections

Utility allocations for the population model \(({{\mathbb {I}}}, {\mathcal {I}}, \Lambda )\) are bounded, \({\mathcal {I}}\)-measurable functions \({{\varvec{u}}}:{{\mathbb {I}}}\rightarrow [0, \infty )\). The set of all bounded, measurable utility allocations is denoted \({{\varvec{W}}}_{{\mathbb {I}}}\). The \(L_\infty\)-norm is defined by \(\Vert {{\varvec{u}}}\Vert _\infty = \inf \{r \ge 0: \Lambda (\{t \in {{\mathbb {I}}}: |{{\varvec{u}}}(t)| \le t\}) = 1\}\), and the associated distance is \(d({{\varvec{u}}}, {{\varvec{v}}}) = \Vert {{\varvec{u}}}- {{\varvec{v}}}\Vert _\infty\). The domain for preferences is \({\mathcal {M}}_{{\mathbb {I}}}\), the set of countably additive Borel measures, q, on \({{\varvec{W}}}_{{\mathbb {I}}}\) that put mass 1 on norm bounded sets, \(q(\{{{\varvec{u}}}: \Vert {{\varvec{u}}}\Vert \le B\}) = 1\) for some B.

We impose the following on a preference relation \(\succsim\) on \({\mathcal {M}}_{{\mathbb {I}}}\) with strict preference \(\succ\).

  1. Postulate I.

    Weak Order. \(\succ\) is an asymmetric weak order.

  2. Postulate II.

    Independence. For all \(p,q,r \in {\mathcal {M}}_{{\mathbb {I}}}\) and all \(\alpha \in (0, 1)\), if \(p \succ q\), then \(\alpha p + (1-\alpha ) r \succ \alpha q + (1-\alpha ) r\).

  3. Postulate III.

    Continuity. For all \(q \in {\mathcal {M}}_{{\mathbb {I}}}\), the sets \(\{p \in {\mathcal {M}}_{{\mathbb {I}}}: p \succ q\}\) and \(\{p \in {\mathcal {M}}_{{\mathbb {I}}}: p \prec q\}\) are open.

  4. Postulate IV.

    Inequality aversion. For any \({{\varvec{u}}},{{\varvec{v}}}\in {{\varvec{W}}}_{{\mathbb {I}}}\) and any \(0< \alpha < 1\), the distribution putting mass 1 on \(\alpha {{\varvec{u}}}+ (1-\alpha ){{\varvec{v}}}\) is weakly preferred to the distribution putting mass \(\alpha\) on \({{\varvec{u}}}\) and \((1-\alpha )\) on \({{\varvec{v}}}\).

  5. Postulate V.

    Monotonicity. If \({{\varvec{u}}}\ge {{\varvec{v}}}\) and \(\Lambda (\{{{\varvec{u}}}> {{\varvec{v}}}\}) > 0\), then \({{\varvec{u}}}\succ {{\varvec{v}}}\).

  6. Postulate VI.

    Strong equity. For any internal bijection \(\pi :{{\mathbb {I}}}\leftrightarrow {{\mathbb {I}}}\), \({{\varvec{u}}}\sim {{\varvec{u}}}^\pi\).

The first four Postulates are directly from Fishburn’s (1982, Theorem 4, Ch. 3) work on expected utility preferences over distributions on convex subsets of vector spaces. On their own, they guarantee that preferences have a continuous, concave expected utility representation. The monotonicity assumption guarantees Pareto responsiveness (see Sect. 2.4 above).

The strong equity assumption is the essential ingredient. It guarantees that the enjoyments of later generations are equally weighted with the enjoyments of earlier ones.Footnote 12

Theorem D

A preference relation \(\succsim\) on \({\mathcal {M}}_{{\mathbb {I}}}\) satisfies Postulates I - VI if and only if there exists a \(S:{{\varvec{W}}}_{{\mathbb {I}}}\rightarrow [0, \infty )\) such that \([p \succsim q] \Leftrightarrow [\int S({{\varvec{u}}})\,dp({{\varvec{u}}}) \ge \int S({{\varvec{u}}})\,dq({{\varvec{u}}})]\) where \(S({{\varvec{u}}}) = \int \varphi (u_t)\,d\Lambda (t)\) with \(\varphi :[0,\infty ) \rightarrow [0, \infty )\) a continuous, increasing, concave function and \(\Lambda\) the uniform distribution on \(\mathbb{I}\).

Two comments are in order.

The function \(\varphi (\cdot )\) is not uniquely determined. Rather, it captures the inequality aversion of the social welfare function.Footnote 13 Two of the applications in Sect. 4 investigate the implications of differing degrees of inequality aversion. The first finds that more inequality aversion implies that optimal efforts to avoid or to recover from future disasters increase as \(\varphi (\cdot )\) becomes more concave. The second, Lemma 4.4, shows that (one form of) the repugnant conclusion holds in fewer instances when there is more inequality aversion.

The tractable class of preferences identified in KS were of the form \(S({{\varvec{u}}}) = \int \varphi (u_t)\,dQ(t)\) where \(Q = \langle Q_1, Q_2, Q_3, \ldots \rangle\) is the limit population measure associated with a sequence of probabilities \(Q_n\) having the property that \(\lim _n \sum _{t=0}^\infty |Q_n(t+1) - Q_n(t)| = 0\). Here Postulate VI imposes the restriction that the measure Q must be the limit of uniform distributions on intervals, thus what was a special case of the subclass of tractable preferences identified in KS becomes the only possible form.

3.10 Ergodicity

For many applications, the outcomes are random, but still regular in the following sense.

Definition 3.13

A stream of utilities, \({{\varvec{u}}}\), is ergodic with occupation measure \(\nu (\cdot |{{\varvec{u}}})\) if for any sequence \(T_n \rightarrow \infty\), the empirical cdfs of the utilities up till \(T_n\),

$$\begin{aligned} F_T(r|{{\varvec{u}}}):= \textstyle \frac{1}{{T_n}+1} \sum _{t=0}^{T_n} 1_{[0, r]}(u_t), \end{aligned}$$
(11)

converge weakly to \(\nu (\cdot |{{\varvec{u}}})\), i.e. for all bounded continuous f,

$$\begin{aligned} \textstyle \lim _n \textstyle \int f(r)\,dF_{T_n}(r|{{\varvec{u}}}) = \textstyle \int f(r)\,d\nu (r|{{\varvec{u}}}). \end{aligned}$$
(12)

The Hardy-Littlewood Tauberian theorem tells us that if \({{\varvec{u}}}\) is a bounded sequence of numbers, then \(\lim _{\beta \uparrow 1} (1-\beta ) \sum _{t=0}^\infty u_t \beta ^t\) exists if and only if \(\lim _{T \uparrow \infty } \frac{1}{T+1} \sum _{t=0}^T u_t\) exists, and when they exist, the limits are equal. Combined with the nonstandard characterizations of \(\liminf _n r_n\), \(\limsup _n r_n\), and \(\lim _n r_n\) given in Lemma 3.6, we have the following result. For \(\beta \in {}^*[0, 1)\), \(Q_\beta\) is the geometric distribution on \({{\mathbb {N}}_0}\) with parameter \(\beta\), i.e. \(Q_\beta (t) = (1-\beta ) \beta ^t\).

Lemma 3.9

The following are equivalent:

  1. (a)

    \({{\varvec{u}}}\) is ergodic with occupation measure \(\nu\);

  2. (b)

    for any \(Q_\beta\), \(\beta \simeq 1\) and any measurable \(E \subset {\mathbb {R}}\), \(Q_\beta (\{t: u_t \in E\}) = \nu (E)\); and

  3. (c)

    for any uniform distribution \(\Lambda\) on a hyperfinite interval and any measurable \(E \subset {\mathbb {R}}\), \(\Lambda (\{t: u_t \in E\}) = \nu (E)\).

For stochastic dynamic programming problems with outcomes that belong to \({{\textbf {Erg}}}\) with probability 1, this result tells us that maximizing limit discounted sums and maximizing limit average payoffs will yield the same policies. We will see this at work in the first two applications in Sect. 4.

3.11 Stochastic population models

Suppose now that for each \(t \in \{0, 1, \ldots , T\}\), there is a newly born population of size \(I_t\), T and each \(I_t\) an unlimited integer. Each i in the cohort \(I_t\) lives from t to \(t+A_i\), \(A_i\) random.Footnote 14 Policies affect the joint distribution of the \(I_t\), the \(A_i\), and other aspects of the quality of life for \(i \in I_t\). These determine the utility \(u_{i,t}\) for each \(i \in I_t\). To extend the social welfare functions of Theorem D to this class of population models, we replace the \({{\varvec{u}}}= (u_0, u_1, u_2, \ldots , u_T)\), T an infinite integer, with

$$\begin{aligned} {{\varvec{U}}}= ( (u_{i,0})_{i \in I_0}, (u_{i,1})_{i \in I_1}, (u_{i,2})_{i \in I_2}, \ldots , (u_{i,T})_{i \in I_T} ). \end{aligned}$$
(13)

If we take \({{\mathbb {I}}}\) to be the union of the populations \(I_t\) and P to the be uniform distribution on \({{\mathbb {I}}}\), Theorems B and D apply directly. Now however, the permutations can switch people in the same cohort, or switch people across cohorts. In this setting, equity requires invariance with respect to such permutations.Footnote 15

There is a long-standing distinction between studies of equity that focus on current, intra-generational issues and those that focus on future, inter-generational issues (e.g. Tremmel’s introduction in Tremmel (2018)). In this model, invariance with respect to this class of permutations mixes these considerations into a single framework. This allows, for example, investigations of how wide expansions of access to resources in the current generation, as currently happening in Bangladesh, China and India, can lead to changes in intergenerational allocations of welfare. For the applications in the next section, we will work with the hyperfinite interval population models \({{\mathbb {I}}}= \{0, 1, 2, \ldots , T\}\), T unlimited. At the cost of adapting the models to the cohort formulations in (13), similar analyses can be done.

In stochastic models, policies affect the set of future people who are born. This leads to the interpretational question about what the permutations mean when they may apply to different people.Footnote 16 Our interpretation comes from the Dietz and Asheim (2012) ex post approach. In the end, some set of people will be born into and live in different settings. We require that, conditional on whatever the set of people being born ends up being, our welfare criterion is immune to permuting who gets what within that set. This highlights an uncomfortable aspect of all of the social welfare functions with permutation invariance: cross-generational permutations are purely fictional. Just as the choices of future generations do not lead to external effects on current generations, there is no way to permute people across generations.

4 Applications

We give two environmental and one political economy application of the social welfare functions \(S_\varphi ({{\varvec{u}}}) = \int \varphi (u_t)\,d\Lambda (t)\), \(\Lambda\) the uniform on \({{\mathbb {I}}}\). The first application studies the \(S_\varphi\)-optimal level of risk exposure to an irreversible, negative change, and finds that for our equitable preferences, indeed for all of the patient utility functions in the literature that can be applied to stochastic models, none of the early generations should ever run the risk. By contrast, with any finite level of discounting, the optimal cumulative risk guarantees that the irreversible change will happen, and that the future will be, by this measure, impoverished.

The second application studies the \(S_\varphi\)-optimal level efforts to be made to avoid long-lasting and expensive-to-reverse decisions. Here, the degree of inequality aversion encoded in the concavity of \(\varphi (\cdot )\) determines the appropriate levels of sacrifice for the future, and increases in the degree of inequality aversion increase the optimal levels. This is in sharp contrast to the optimal policies for the ‘Rawlsian’ social welfare function, but agrees with many of the other patient preferences. The ‘Rawlsian’ social welfare functions are infinitely inequality averse, they depend only on the welfare of the worst off. Policies that maximize the utility of the worst off generations may require no sacrifice at all for the benefit of future generations.

The third application studies versions of two much-discussed conclusions in population ethics—the ‘sadistic conclusion’ and the ‘mere addition paradox’. Higher levels of inequality aversion in \(\varphi (\cdot )\) mean that there are fewer instances in which \(S_\varphi\)-optimal choices are those that may lead to these counterintutive conclusions.

4.1 Other patient/equitable preferences

We will use the tools developed here to examine the performance of several previous proposals for patient, or intergenerationally equitable, social welfare preferences. The KS preferences include inequality aversion over generational utilities. With the exception of the Rawslian preferences, which are effectively infinitely inequality averse, the others work off of variants of limit average utilities.

Theorem C in KS shows that for the following 4 social welfare functionals, \(S_1, \ldots , S_4\), there are sets of translation invariant probabilities \(TI(1) --- TI(4)\) such \(S_k({{\varvec{u}}}) = \min _{Q \in TI(k)} \int {{\varvec{u}}}\,dQ\). This result allows us to more easily examine optimal policies for these preferences.Footnote 17

  1. (1)

    Limits of discounted utility, \(S_{1}({{\varvec{u}}}) = \liminf _{\beta \uparrow 1} (1-\beta ) \sum _{t=0}^\infty u_t \beta ^t\).

  2. (2)

    Tail patient payoffs, \(S_2({{\varvec{u}}}) = \liminf _{T \uparrow \infty } \inf _{j \ge 0} \frac{1}{T+1} \sum _{t=0}^T u_{j+t}\).

  3. (3)

    \(\epsilon\)-tail patient payoffs, \(S_3({{\varvec{u}}}) = \liminf _{\epsilon \downarrow 0} \liminf _{T \rightarrow \infty } \frac{1}{\epsilon T} \sum _{t=(1-\epsilon )T}^T u_t\).

  4. (4)

    Liminf average payoffs, \(S_4({{\varvec{u}}}) = \liminf _T \frac{1}{T+1} \sum _{t=0}^T u_t = S_{{\mathfrak {U}}}({{\varvec{u}}})\).

The preferences in (1) have been extensively used in the analysis of Folk Theorems in game theory (e.g. (Fudenberg and Tirole 1991, Ch. 5, §1)). The preferences in (4) have been extensively used in operations research applications, and (2) and (3) are variants of these preferences that look to put more weight on the far distant tail/future.Footnote 18 There are two more social welfare functions that been used in the economic theory literature on patient or intergenerationally equitable preferences.Footnote 19

(5) ‘Rawlsian’ preferences, \(S_5({{\varvec{u}}}) = \inf _t u_t\).

(6) Long run ‘Rawslian’ preferences, \(S_6({{\varvec{u}}}) = \liminf _t u_t\).

These social welfare functions pay no attention to generations whose utility is above \(\inf _t u_t\) or above \(\liminf _t u_t\). For applications, this will matter.Footnote 20

4.2 Species extinction tipping points

We give a very simplified fishery model in which different generations balance current exploitation of a resource against the risks of extinction.

Example 4.1

Suppose there are two possible states, f and e, corresponding to a fishery being viable and the fish being extinct. We suppose further that the sets of available actions to each generation are \(A(f) = [0, 1]\) if the fishery is viable, and \(A(e) = \{0\}\) if the fish are extinct. Generational utilities after extinction are \(u(e,0) = 0\). In the viable state, f, the choice of \(a \in [0, 1]\) corresponds to the degree of current exploitation of the resource. We assume that \(u(f,a) > 0\) and that \(\partial u(f,a)/\partial a > 0\). But higher actions also make it more likely that the fish will become extinct. Extinction is absorbing, \(p_{e,e}(1) = 1\). Assume that the probability of extinction (moving from f to e) as a function of a is given by

$$\begin{aligned} p_{f,e}(a) = {\left\{ \begin{array}{ll} 0 &{} \hbox {if } a \le a^\circ \\ g(a-a^\circ ) &{} \hbox {if } a > a^\circ \end{array}\right. } \end{aligned}$$
(14)

where \(g(0) = 0\), \(g(\cdot )\) is positive and strictly increasing, and \(g'(0) = 0\).Footnote 21

4.2.1 The unique patient outcome avoids extinction

Extraction at the rate \(a^\circ\) keeps the fishery safe. For this sustainable policy, the population utility, as measured by \(S_\varphi (\cdot )\), is the utility associated with permanent repetition of the per generation utility \(\varphi (u(a^\circ ))\). The question is, can one do better? The answer is, “No,” and the answer is the same for any of the patient preferences given in (1)-(6) above.Footnote 22

Lemma 4.1

For the \(S_\varphi (\cdot )\) preferences, and for any of the social welfare functions (1)-(6) given above, in any optimal policy, the social welfare is bounded above by the per period welfare of the long-run sustainable policy \(a_t \equiv a^\circ\).

4.2.2 Disastrous discounting

Using any standard discount factor \(\beta < 1\) to discount future utilities leads to ‘optimal’ policies that destroy the fishery, an outcome that minimizes any of the patient/equitable social welfare functions.Footnote 23 There is a discontinuity between the occupation measures: the fish are long-run extinct with standard discounting; and the fish are long-run viable for any of the patient preferences. However, the discontinuity is a bit less sharp than this seems to imply. As \(\beta \uparrow 1\), the associated random time until extinction, \(\tau _\beta\), has the property than \(Prob(\tau _\beta > N) \rightarrow 1\) for all standard N.

Lemma 4.2

For any standard discount factor \(\beta < 1\), maximizing the welfare function \((1-\beta )\sum _{t=0}^\infty \varphi (u_t) \beta ^t\) leads to certain extinction, which minimizes any of the patient/equitable social welfare functions, but as \(\beta \uparrow 1\), \(Prob(\tau _\beta > N) \rightarrow 1\) for all standard \(N\).

There is a fundamental asymmetry between present and future generations. The actions of the current generation impose risks on future generations, but there are no risks that the future generations can impose on the present generation. As Dierksmeier (2006) argues, “Rawls’ attempt to derive the notion of rights out of a conception of reciprocal arrangements to enhance the individuals’ self-interests \(\ldots\) cannot provide a satisfactory foundation for the rights of future generations.” Basic properties of geometric growth show that any standard level of discounting very strongly downweights the far future. The present optimal actions for a discounted social welfare function involve risks that no-one in the future would tolerate, if only they had an effective method to protest.

4.3 World-wide climate catastrophe

The previous application studied extinction, and extinction is forever. We now turn to the study of changes that are reversible, but only at great cost, this in a drastically simplified model in which the richness of the biosphere is a crucial ingredient to human welfare.Footnote 24

Example 4.2

The world’s ecosystem can be in a livable state, L, or a crippled state, C. In C, the seas, forests and the biota that survive are unable to produce oxygen and resources in the amounts humans have become accustomed to. In L, the seas and forests are able to produce oxygen concentrations and resources that can support life as we currently know it. Payoffs and actions capture the following tradeoffs: a generation in L can sacrifice present utility in order to lower the transition probability, r, from L to C, notationally \(u_L'(r) > 0\); and a generation in C can sacrifice present utility to increase the transition probability, s, back from C to L, notationally \(u_C'(s) < 0\). We assume that \(\min _r u_L(r) \gg \max _s u_C(s)\), that is, it is much worse to be living on a planet with a crippled ecosystem.

4.3.1 Analysis

There is no loss in examining stationary policies, those specified by the pair (rs). Stationary policies lead to a Markov chain of outcomes with the steady state distribution spending \(\frac{s}{r+s}\) of the time in L and \(\frac{r}{r+s}\) of the time in C. Because the outcome is Markovian, it can be shown that the problems of maximizing the preferences given in (1), (2), (3), and (4) above reduce to maximizing long run average utility,Footnote 25

$$\begin{aligned} \textstyle \max _{r,s} \textstyle \left[ \frac{s}{r+s}u_L(r) + \textstyle \frac{r}{r+s} u_C(s) \right] = \textstyle \frac{1}{r+s} \left[ s u_L(r) + r u_C(s) \right] . \end{aligned}$$
(15)

In a similar fashion, the problem of maximizing \(S_\varphi (\cdot )\) reduces to

$$\begin{aligned} \textstyle \max _{r,s} \textstyle \left[ \frac{s}{r+s}\varphi (u_L(r)) + \textstyle \frac{r}{r+s} \varphi (u_C(s)) \right] . \end{aligned}$$
(16)

The FOCs for interior solutions for the first utility functions are

$$\begin{aligned} s \left[ u'_L(r) - u'_C(s) \right] = \left[ u_L(r) - u_C(s) \right] , \end{aligned}$$
(17)

and for \(S_\varphi (\cdot )\), they are

$$\begin{aligned} s \left[ \varphi (u_L(r)) (u'_L(r) - \varphi (u_C(s)) u'_C(s) \right] = \left[ \varphi (u_L(r)) - \varphi (u_C(s)) \right] . \end{aligned}$$
(18)

To interpret these: on the left-hand sides, s is the probability of transitioning from C to L in a generation while \(\left[ u'_L(r) - u'_C(s) \right]\) or \(\left[ \varphi (u_L(r)) u'_L(r) - \varphi (u_C(s)) u'_C(s) \right]\) is the difference of two terms, the marginal benefit of the current activities that tip the world toward disaster, and the marginal cost of the activities that make it possible for future generations to recover; the right-hand sides are the difference in per generation utility between being in the livable state and the crippled state.

Because \(\varphi (\cdot )\) is concave, comparing (17) to (18) shows that, since the weight on the \(u'_L(r)\) term relative to the \(u'_C(s)\) term declines, we expect the optimal r to be smaller and s to be larger when the social welfare function dislikes inequality. Put simply, more inequality aversion in the intergenerational utility function means that optimal policies make more effort to avoid inequality. We suspect that in many sensibly calibrated versions of this model, if there are non-boundary limits on s and r, then the solutions will involve s being as high as possible and r as low. Our preliminary investigation of boundary solutions indicates that the same basic analysis holds, although the comparitive statics increases in s or decreases in r are replaced by higher Lagrangean multipliers on the constraints. This too is useful information, the higher the multiplier on a constraint, the more important it is to loosen it.

4.3.2 Disastrous ‘Rawlsian’ preferences

In this model, the patient utility functions call for current sacrifice to avoid hurting future generations. The “maximize the welfare of population’s worst off” encompasses an extreme aversion to inequality and is often understood as a morally sound directive. In this context, the implied complete disregard of everyone who is not worst off is puzzling, and in this class of models, it can lead to disastrous policy recommendations. Disasters are recurrent if the minimal possible r and s are strictly positive, and maximizing the utility functions \(S_{5}({{\varvec{u}}}) = \inf _t u_t\) and \(S_{6}({{\varvec{u}}}) = \liminf _t u_t\) in recurrent versions of the model has the following implications.

Lemma 4.3

If the minimal possible r and s are strictly positive, then optimal policies for either \(S_{5}(\cdot )\) or \(S_{6}(\cdot )\) involve no sacrifice in state C, and any feasible choice is optimal in state L.

The intuition is simple: the crippled state will happen infinitely often with probability 1; these social welfare functions only care about the utility of generations in that state; hence any choice is optimal in state L; and to maximize the utility of the generations in state C, those generations should make no sacrifices for the future.

4.4 Some counterintutive conclusions in population ethics

The utility functions under study are of the form \(S({{\varvec{u}}}) = \int _{{{\mathbb {I}}}} \varphi (u_t) \, d\Lambda (t)\) where \(\Lambda\) is the uniform distribution on \({{\mathbb {I}}}\). The function \(\varphi (\cdot )\) encodes inequality aversion.

4.4.1 The sadistic conclusion and the mere addition paradox

Consider a choice between two policies, both from the same starting point of a population that is very, very well off: one policy leads to a small number of truly miserable people being added to the future population; and one leads to a large number of well off, but not very very well off, being added to the same population.Footnote 26 There are three situations to compare: the status quo (SQ); the addition of a small number of truly miserable in the future (TM); and the addition of a larger number of well off in the future (WO).

  • The ‘sadistic conclusion’ is the observation that a social welfare function could prefer (TM), a small number of miserable lives, to (WO), the larger number of well off lives.

  • The ‘mere addition paradox’ is that a social welfare function could prefer a status quo policy over either addition of new people.

Arguing that coming to the sadistic conclusion disqualifies a social welfare function is, at its core, an argument that only a Rawlsian “choose to maximize the welfare of the worst off” criterion is acceptable.Footnote 27 On the other hand, such a criterion can, as we saw just above, lead to disastrously bad policy recommendations. It also advocates for the status quo policy, and this runs afoul of the second observation.Footnote 28

Our equitable social welfare functions, \({{\varvec{u}}}\mapsto S_\varphi ({{\varvec{u}}}) = \int _{{{\mathbb {I}}}} \varphi (u_t)\,d\Lambda (t)\), also choose the status quo policy, basically because \(\varphi (\cdot )\) is monotonic and whatever the size of the population, we normalize its mass to 1. In this sense, we have expanded the scope of long run average utility as a social welfare criterion by incorporating both inequality aversion and the ability to study long-run problems. But we have not changed the basics properties of social welfare functions that incorporate averaging.

4.4.2 When is (TM) preferred to (WO)?

We now examine conditions under which \(S_\varphi (\cdot )\) prefers (TM) to (WO) or the reverse. The result hinges on the degree of intergenerational inequality aversion, i.e. the degree of concavity of \(\varphi (\cdot )\).

Fix utilities \(0 \le u_{TM} \ll u_{WO} \ll u_{VWO}\) where \(u_{VWO}\) is the utility of the very well off. We are comparing the (TM) situation, \((\alpha , 0, 1-\alpha )\), i.e. \(\alpha\) of the population at \(u_{TM}\) and the rest at \(u_{VWO}\), to the (WO) situation, \((0, \beta , 1 - \beta )\), i.e. \(\beta\) of the population at \(u_{WO}\) and the rest at \(u_{VWO}\). The \(\alpha\) and \(\beta\) can be understood as proportions of future generations at the different utilities, or, using the stochastic population model discussed in Sect. 3.11, as the proportions of future people at the different utilities.

Associated with a concave \(\varphi :[0, \infty ) \rightarrow [0, \infty )\) is the social welfare function \(S_\varphi :{{\varvec{W}}}\rightarrow [0, \infty )\). For each such \(\varphi\), the set \(\textrm{Sit}(\varphi ) = \{(\alpha , \beta ): \alpha \varphi (u_{TM}) + (1-\alpha ) \varphi (u_{VWO}) > \beta \varphi (u_{WO}) + (1-\beta ) \varphi (u_{VWO})\}\) represents situations in which \(S_\varphi (\cdot )\) yields the sadistic conclusion above. Recall that a concave increasing function f is more concave than a concave increasing function g if there is a concave increasing h such that \(f(r) = h(g(r))\). The more concave is \(\varphi\), the more inequality averse is \(S_\varphi (\cdot )\). The following is an elementary result for concave functions. It tells us that higher inequality aversion lessens the ‘sadistic’ instances.

Lemma 4.4

If \(\varphi\) is more concave than \(\xi\), then \(\textrm{Sit}(\varphi ) \subset \textrm{Sit}(\xi )\).

The contrast between the version of average utilitarianism that we are using and classical utilitarianism is quite striking.Footnote 29 Let \(N_{VWO}\), \(N_{TM}\), and \(N_{WO}\) denote the numbers of people in the population that are, or will be, very well of, well off, and totally miserable. The question of whether (TM) is preferred to (WO) becomes whether or not the inequality

$$\begin{aligned} N_{TM} u_{TM} + N_{VWO} u_{VWO} > N_{WO} u_{WO} + N_{VWO} u_{VWO} \end{aligned}$$
(19)

holds. After rearrangement, the question is whether or not \((N_{TM}/N_{WO}) > (u_{WO}/u_{TM})\). There are two results. First, if \(u_{TM} > 0\) shrinks, then the class of situations in which (TM) is preferred to (WO) shrinks. Second, if one uses composition with increasing concave transformations h having the property that \(h(0) = 0\) as the definition of more inequality averse, then more inequality aversion increases rather than decreases the class of situations in which this ‘sadistic’ conclusion holds.

Classical utilitarianism cannot be invariant to positive affine transformations. If one is maximizing \(\sum _{i \in {{\mathbb {I}}}} u_i\) across policies that affect the size of \({{\mathbb {I}}}\), then making the \(u_i\) negative by subtracting a positive constant asks for policies that deliver a world without humans. Thus, the concave transformations \(h(\cdot )\) are changing the total desirability of humans as well as representing more inequality averse preferences over distributions of utilities. Dasgupta’s (2001, Ch. 14.2) examines the implications of classical utilitarianism for the optimal size of the population. Later in the chapter (p. 221), he ends with a question of “why the attitude towards equality influences the optimum number of lives.” The conflation of the value of humans with inequality aversion seems to provide an answer.

5 Conclusions

Before KS, the literature on intergenerationally equitable social preference orderings had documented the difficulties in operationalizing patience, understood as a form of indifference to permutations, and simultaneously satisfying the Pareto criterion. Given the centrality of the Pareto criterion to welfare analyses, this may give an impression that patient and equitable societal preferences cannot be sensibly implemented for intergenerational problems. And, by extension, that some form of discounting must be used. While there are many arguments for discounting, we do not believe that this is one of them. Viewing the set of generations as a non-atomic measure space, something implicit but not explicit in previous literature, KS showed that the purported examples of the failure of the Pareto criterion involved increasing the welfare of a null subset of the population.

KS also integrated patience/intergenerational equity and Pareto responsiveness in a class of social welfare functions that are indifferent to the broadest class of permutations that had been used in the literature, what we call the asymptotic or \({\scriptstyle {\mathcal {O}}}(T)\) permutations. But we now believe that this class of permutations is much narrower than is suitable for use in a definition of patience/intergenerational equity. As Example 2.1 shows, it allows for welfare functions that can be lowered by switching benefits to later generations from an equal number of early generations.

This paper, by contrast, asks for invariance of the intergenerational preference ordering with respect to all almost everywhere permutations. Our class of permutations can switch e.g. the first half of the generations for the second half because we take seriously the idea that our limit models should “look like” and “act like” large but finite models. The results of imposing this strong form of equity are in Theorems B and D. For strong equity to hold, the hyperfinite model \({{\mathbb {I}}}\) that replaces \({{\mathbb {N}}_0}\), the old model of the generations, must be given the uniform (or counting) distribution.

One might worry that working strong equity into social preferences would make the resulting social welfare functions intractable. But the opposite turns out to be true. In a model with potentially avoidable risks of species extinctions, the optimal policy for all of the equitable preferences given here call for sustainability. In a model with partially avoidable risks of huge downturns in human welfare, the equitable preferences call for sacrifices both to avoid the downturn and to speed up the return to a better state. It is also possible to perform comparative statics analyses. In the model with partially avoidable risks, more inequality aversion calls for more efforts to avoid downturns and speed up the return to a better state. And more inequality averse intergenerational preferences more often avoid choices that are problematic in terms of population ethics, like the sadistic conclusion or the mere addition paradox.

Going forward, there are a number of open problems, some purely theoretical in nature, and others containing a mix of theoretical and practical issues. On the purely theoretical side are issues related to what is called the problem of “underselectiveness” in the studies of maximization of the long run average performance of a system. In the context of intergenerational equity, this can arise as the weak optimality of both “present profligacy” and Chichilnisky’s (1996) “dictatorship of the future.” The profligacy of the present arises if some finite number of early generations leave the world in so bad a situation that future generations must make immense sacrifices to recover. The “dictatorship of the future” arises if a finite number of early generations must make immense sacrifices in order that future generations have a better life. For a social welfare function that is invariant with respect to reversible changes in the utility of finitely many generations, both profligacy and dictatorship are at least weakly optimal. The most extreme form of underselectiveness showed up in the case of the infinitely inequality averse ‘Rawslian’ preferences applied to models with recurrent, even if rare, disasters. In such cases, optimal policies for the social welfare functions \(\inf _t u_t\) and \(\liminf _t u_t\) call for no effort to be spent on recovering from disasters, and they are completely mute on efforts to avoid future disasters.

We have preliminary results indicating that taking account of infinitesimal differences in payoffs may solve these problems. There is a direct way to see why our social welfare functions are underselective. They are of the form \(S({{\varvec{u}}}) = \textrm{st}(\frac{1}{T+1} \sum _{t=0}^T \varphi (u_t))\), and taking the standard part strips out the infinitesimal differences in payoffs that arise when finitely many generations have their utility changed. Maximizing without first taking the standard part results in more complicated calculations but, in some models, little change in the “basic aspects of” the solutions. An additional virtue of such an approach is that it could, potentially, allow us to find a version of social welfare functions that respect the classic Pareto criterion, as in Jonsson and Voorneveld’s (2018) “limit of discounted utility” ordering. However, we do not yet have a full characterization of problems in which the basic aspects are stable with respect to accounting for infinitesimal differences.

In a similar fashion, in Example 4.2, we saw that making \(\varphi (\cdot )\) more inequality averse led to policies calling for more efforts to avoid and recover from worldwide catastrophe, whereas using the patient infinitely inequality averse preferences, \(\liminf _t u_t\), little useful policy guidance can be had. By scaling a non-standard \(\varphi (\cdot )\) to have the risk aversion coefficient, \(- \varphi ''(r)/\varphi '(r)\), infinite, it may be possible to recover something like the \(\liminf _t u_t\) preferences, but retain the higher selectivity of our social welfare functions.

Another set of theoretical concerns relate to the ethics of demanding large sacrifices from the earlier generations (see e.g. Portney and Weyant 2013). As Fleurbaey and Tungodden (2010) shows, such ethical dilemmas seem to be an intrinsic feature of aggregative models. In the context of our project, exploring the implications of adopting additional ethical postulates that prohibit “undue” hardship for any generation is a fascinating open question. There is a related set of questions regarding the implementability of such prescriptions for generational sacrifice in decentralized settings.

Problems that contain a mix of theoretical and practical issues stem from the observation that planning for even a 200 years time horizon is not possible ats any serious level of granularity. Still, increasing the set of feasible choices for future generations, even if it costs the current generation, seems intuitively optimal, provided only that one weights the well-being of far future generations. These includes the sorts of sacrifices that make the survival of a knowledge based society more likely, and a return to it possible if it should falter.